Optimising .NET code: Let's blame LINQ
In this series we're looking at ways to improve the performance and memory efficiency of our .NET code, using my solutions to Advent of Code puzzles as a testing ground. In the first post, I introduced 2022's Day 1 puzzle - Calorie Counting - and my initial LINQ-based solution:
TLDR: If you're aiming for zero-allocation, you shouldn't use LINQ. However, for the vast majority of use cases the overhead of using it is negligible.
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(row => row.Split(Environment.NewLine).Select(int.Parse).Sum())
.ToString();
}
The initial benchmark figures for this code looked like this:
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 78.23 us | 0.512 us | 0.212 us | 7.6904 | 0.7324 | 126.43 KB |
In part 2, we went through a basic pass of trying to understand where the memory is being allocated. This showed us that the majority of our allocations come from calls to string.Split
. In order to improve that (for reasons which will become more obvious later), I'm going to need to get rid of LINQ. But will this have any impact in its own right?
When doing this kind of performance engineering work, it's vitally important that you take one step at a time and evaluate the impact of each step as you go. So if we're going to get rid of LINQ, we should look at the results of doing that before we move on.
The LINQ-based code above is doing the following:
- Split the input into individual "rucksacks"
- Split each rucksack into individual lines
- Parse the numbers in each line of the rucksack
- Produce the sum of the numbers for each rucksack
- Determine the largest rucksack based on those sums.
To make this clearer, let's rejig the code a bit:
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
return rucksack.Split(Environment.NewLine).Select(int.Parse).Sum();
}
This results in no real difference in performance.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 77.68 us | 0.304 us | 0.284 us | 7.6904 | 0.7324 | 126.43 KB |
It does make the code a bit clearer though! We can see that we're using three LINQ functions:
Max
- to calculate the rucksack with the largest number of caloriesSelect
- to map our set of input strings to integersSum
- to calculate a total for each rucksack.
Let's get rid of those one at a time to see what happens. We'll start with the Sum
:
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
IEnumerable<int> rucksackCalories = rucksack.Split(Environment.NewLine).Select(int.Parse);
foreach (int item in rucksackCalories)
{
sum += item;
}
return sum;
}
Again, no difference.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 77.42 us | 0.288 us | 0.241 us | 7.6904 | 0.7324 | 126.43 KB |
Next, we'll get rid of the Select
and see what that does for us:
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
string[] rucksackItems = rucksack.Split(Environment.NewLine);
foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}
return sum;
}
That's made a difference!
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 60.75 us | 0.917 us | 0.716 us | 6.9580 | 0.7324 | 114.76 KB |
So, we're now noticably faster and we're also saving ~12 KB of allocation. Why? What's so expensive about the call to Select
?
To help us understand, let's tweak the code slightly. The LINQ-to-objects Select
method we're using operates on objects of type IEnumerable<T>
; what happens if we change our code to work that way too?
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
IEnumerable<string> rucksackItems = rucksack.Split(Environment.NewLine);
foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}
return sum;
}
In this version of the code, I've declared rucksackItems
as IEnumerable<string>
rather than string[]
.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 67.73 us | 0.324 us | 0.303 us | 7.4463 | 0.7324 | 122.54 KB |
With this version, we're still 4 KB short of our fully-LINQ based solution, but that seemingly tiny change has resulted in an additional 8 KB of allocation and slowed us down a bit. Why?
Well, it turns out there's a significant difference under the covers between using a foreach
loop on an array vs on an IEnumerable<T>
.
When you use foreach
to iterate an array, the compiler understands exactly what it's dealing with. More specifically, there are dedicated Common Intermediate Language (the form all .NET code is compiled into when you build your project) instructions in the resulting IL code for indexing into arrays. So under the covers, the code looks very much like a for
loop - an index counts from 0 to the array length, and at each step the element at that index is obtained, parsed and added to the result.
When you use foreach
to iterate an IEnumerable
, things are very different. IEnumerable
is at a higher level of abstraction meaning that fewer assumptions can be made. To iterate an IEnumerable<T>
, you call its GetEnumerator
method which returns an object of type IEnumerator<T>
. The enumerator has a property to retrieve the current item in the array, and a method to move to the next item.
We can see this if we rewrite the code again to explicitly use the enumerator (with a bit of fiddling to ensure it's the generic version IEnumerator<string>
rather than just IEnumerator
):
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
string[] rucksackItems = rucksack.Split(Environment.NewLine);
IEnumerator<string> enumerator = ((IEnumerable<string>)rucksackItems).GetEnumerator();
while (enumerator.MoveNext())
{
sum += int.Parse(enumerator.Current);
}
return sum;
}
Which results in exactly the same amount of allocated memory:
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 69.36 us | 0.327 us | 0.273 us | 7.4463 | 0.7324 | 122.54 KB |
Whilst this gives us an idea of what's going on, the numbers don't match. Let's summarise quickly:
Allocated | Difference from 1 | |
---|---|---|
1. Calling string.Split and then iterating the array | 114.76 KB | - |
2. Calling string.Split and then obtaining an enumerator using GetEnumerator |
122.54 KB | 7.78 KB |
3. Calling Select(int.Parse) and iterating the resulting enumerator |
126.43 KB | 11.67 KB |
The reason there's still a difference in allocations between this code and the code with the Select
call is because the type of enumerator being allocated is different. We can see this if we breakpoint the code.
Firstly, the code above which calls GetEnumerator
on the array. This results in allocation of an SZGenericArrayEnumerator<string>
:
We can see there that the enumerator has two fields, a reference to the array being enumerated and the current index. This means that the total size of an instance of the enumerator is 28 bytes. Heap space is allocated in blocks of 8 bytes, so each enumerator instance will result in 32 bytes allocated. We're allocating one of them for each of the 249 rucksacks, meaning that this will result in a total of 7,968 bytes, which is the difference between the 122.54 KB allocated in the code above and the 114.76 KB allocated when we weren't using enumerators at all.
The code which used Select
and allocated 126.43 KB used a different type of enumerator: a System.Linq.Enumerable.SelectArrayIterator<string, int>
:
LINQ has been clever enough to realise that the thing it's enumerating is an array, and has given us a purpose built enumerator. It's also used a common trick which is to make the thing it returns be both an IEnumerable<int>
and an IEnumerator<int>
- essentially the enumerable is its own enumerator, removing the need for allocating a separate enumerator. However, the enumerator returned by LINQ has some additional fields, such as _selector
- a reference to the function that's going to be executed for each input value. This means that the size of an instance of this enumerator is 44 bytes, meaning 48 bytes allocated per instance for a total of 12,432 bytes.
Now we understand why removing the call to Select
made such a big different, will the same happen if we ditch the call to Max
? Returning to our code as it was after we removed the call to Select
:
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
string[] rucksackItems = rucksack.Split(Environment.NewLine);
foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}
return sum;
}
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 60.75 us | 0.917 us | 0.716 us | 6.9580 | 0.7324 | 114.76 KB |
Let's remove Max
:
public string Solve(string input)
{
string[] rucksacks = input.Split(Environment.NewLine + Environment.NewLine);
int maxCalories = 0;
foreach (string rucksack in rucksacks)
{
int caloriesInRucksack = CountRucksackCalories(rucksack);
if (caloriesInRucksack > maxCalories)
{
maxCalories = caloriesInRucksack;
}
}
return maxCalories.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
string[] rucksackItems = rucksack.Split(Environment.NewLine);
foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}
return sum;
}
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 59.84 us | 0.396 us | 0.371 us | 7.0190 | 0.7324 | 114.73 KB |
As before, we see a drop in allocations. It's nowhere near as significant because we're only making the call to Max
once rather than for each rucksack, so we only allocate one enumerator, not 249. But every little helps!
Summary
In this post, we've made a small but noticable difference in both performance and memory allocation by removing LINQ from the solution. Does this mean you should avoid LINQ in your own code? Absolutely not. The difference in execution time was measured in microseconds, and the difference in memory allocation only stood out because we're not allocating much memory in general.
To demonstrate this, I've put together a benchmark which calculates SHA256 hashes on randomly generated byte arrays:
[MemoryDiagnoser]
public class LinqHashGenerationBenchmarks
{
private static readonly SHA256 Hasher = SHA256.Create();
private readonly byte[][] data;
public LinqHashGenerationBenchmarks()
{
Random random = new();
this.data = Enumerable.Range(0, 1000).Select(_ =>
{
var result = new byte[500];
random.NextBytes(result);
return result;
}).ToArray();
}
[Benchmark]
public void HashGenerationUsingLinq()
{
var target = this.data.Select(x => CalculateHash(x)).ToArray();
}
[Benchmark(Baseline = true)]
public void HashGenerationUsingForLoop()
{
var target = new byte[this.data.Length][];
for (int i = 0; i < this.data.Length; ++i)
{
target[i] = CalculateHash(this.data[i]);
}
}
private static byte[] CalculateHash(byte[] data)
{
byte[] result = data;
for (int i = 0; i < 10; ++i)
{
result = Hasher.ComputeHash(result);
}
return result;
}
}
And here's the output of running the benchmark:
Method | Mean | Error | StdDev | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|
HashGenerationUsingLinq | 1.284 ms | 0.0054 ms | 0.0045 ms | 1.00 | 66.4063 | 11.7188 | 1.08 MB | 1.00 |
HashGenerationUsingForLoop | 1.280 ms | 0.0023 ms | 0.0021 ms | 1.00 | 66.4063 | 11.7188 | 1.08 MB | 1.00 |
In this case we can see that the work itself renders the LINQ overhead more or less invisible.
So the takeaway from this should not be "LINQ is slow", because it really isn't. But it's worth knowing some of what's going on, and that for super hot path code, or scenarios where you're hunting zero allocation, you will need to leave it behind.
In the next post, we'll start tackling the thing that's causing the most memory to be allocated: string handling.