Originally posted at https://tech.labs.oliverwyman.com/blog/2011/05/23/dont-use-ienumerable-for-table-references-in-c/
We have a C# project that recently started to run into a few scaling issues. That search can slow down a little when you get a lot more records isn’t so surprising, but when your basic individual item retrieval starts getting sluggish, it’s time to worry. We were using Entity Framework as the ORM layer, but I suspect this would apply for anything in C#.
LINQ, and the overuse of abstraction is a lot of the problem here. I’m a big fan of LINQ, but as with all abstractions, it fails, at least a little, and every so often it fails a lot. Instead of doing a select * from FooTable where id = blah
you’re doing context.FooTable.Single(ft => ft.id == blah)
and expecting that the results to be identical. Well, to some extent they are, but the intermediate steps can be very, very different, and it depends on your original data types.
LINQ is data-structure agnostic as far as it can, and being able to call .Where(func => some_expression)
on an array is pretty funky, but there’s a whole world of LINQ providers and deep magic behind the curtain, and they’re not all very efficient. One of the pairs of interfaces that people don’t pay enough attention to is IEnumerable and IQueryable (generally the generic versions thereof, but the distinction is irrelevant for the moment). If you’ve got only a handful of records, or even quite a lot of records and a reasonably powerful machine, the difference between the two is unimportant. We’d been using the IEnumerable interface (backed by an ObjectQuery on the application side, and a simple List in the tests), which turned out to be the wrong option. IEnumerable as it turns out does almost everything in memory i.e. context.FooTable.Single(ft => ft.id == blah)
results in the whole of FooTable getting loaded into memory, then the Single() function gets run across those items in memory. IQueryable on the other hand, treats any reference to a data table as merely that, and only executes a query once you call certain functions that actually want objects back (Single, First, ToList, etc), which generally results in a lot less data being pulled in.
Those of you running to your codebase and replacing every IEnumerable in there with IQueryable should slow down a second. This efficiency comes at a price, namely the need to be able to translate LINQ code into SQL. If you’re just using relatively simple built-in C# methods, everything might just work. If however you stray from the supported function list, it’s a bit more of a problem. If you’re way off the path and making references to your custom getters and setters, life gets a bit more interesting. There’s still a way, using the magic of Expression<Func<>>
and LinqKit. Expression
can be best described as a “uncompiled”/AST Func
(and indeed, there’s a Compile() method). Luckily they can for the most part be built just like a Func
(e.g. Expression<Func<Food, bool>> isMine = f => f.kind == "my dinner"
) and if they’re built inside an object and then returned they can even play around with private variables. Again, they need to only use the allowed functions, but now you can return an Expression
that works on any instance of that class. What happens is then the LINQ provider can “compile” those Expressions
into SQL code, whereas it can’t do the same for a Func
.
LinqKit lets you then do things like actually call those Expression
‘s inside of the LINQ query using Invoke
e.g. f => MyFunc.Invoke(f)
where MyFunc is an Expression
. LinqKit also incorporates PredicateBuilder so you can combine multiple Expression
‘s together easily, and you generally need its Expand()
method on your data tables before they get called, but it’s not too bad once you get the hang of it.
Of course, sometimes you want to do something a bit fancier and the allowed function list isn’t quite enough. This is still doable, but you’re going to have to tell Entity Framework how to translate your function into SQL. There’s a few suggestions out there already about how to do this (and for our example case, SqlMethods.Like gets close, but appears to be a bit picky about where it’ll work), but here’s an improved example.
<edmx:Runtime><edmx:ConceptualModels><Schema Namespace="Your.Namespace" ...>
. Note the “Your.Namespace” bit, and then insert the following just after the <Schema>
begin tag:Congratulations, you’ve now got a new StringLike methods on all strings that you can call in an IQueryable LINQ query (e.g. s => s.StringLike("foo")
), and it’ll do the right thing whether it’s dealing with a database or something in-memory. If it’s the former, the SQL fragment from the .edmx will be used; and the latter will use the C# code.