By Jonathan George Software Engineer IV

In this series we're looking at ways to improve the performance and memory efficiency of our .NET code, using my solutions to Advent of Code puzzles as a testing ground. In the first post, I introduced 2022's Day 1 puzzle - Calorie Counting - and my initial LINQ-based solution:

TLDR: If you're aiming for zero-allocation, you shouldn't use LINQ. However, for the vast majority of use cases the overhead of using it is negligible.

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(row => row.Split(Environment.NewLine).Select(int.Parse).Sum())
.ToString();
}
``````

The initial benchmark figures for this code looked like this:

Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 78.23 us 0.512 us 0.212 us 7.6904 0.7324 126.43 KB

In part 2, we went through a basic pass of trying to understand where the memory is being allocated. This showed us that the majority of our allocations come from calls to `string.Split`. In order to improve that (for reasons which will become more obvious later), I'm going to need to get rid of LINQ. But will this have any impact in its own right?

When doing this kind of performance engineering work, it's vitally important that you take one step at a time and evaluate the impact of each step as you go. So if we're going to get rid of LINQ, we should look at the results of doing that before we move on.

The LINQ-based code above is doing the following:

• Split the input into individual "rucksacks"
• Split each rucksack into individual lines
• Parse the numbers in each line of the rucksack
• Produce the sum of the numbers for each rucksack
• Determine the largest rucksack based on those sums.

To make this clearer, let's rejig the code a bit:

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
return rucksack.Split(Environment.NewLine).Select(int.Parse).Sum();
}
``````

This results in no real difference in performance.

Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 77.68 us 0.304 us 0.284 us 7.6904 0.7324 126.43 KB

It does make the code a bit clearer though! We can see that we're using three LINQ functions:

• `Max` - to calculate the rucksack with the largest number of calories
• `Select` - to map our set of input strings to integers
• `Sum` - to calculate a total for each rucksack.

Let's get rid of those one at a time to see what happens. We'll start with the `Sum`:

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
int sum = 0;

IEnumerable<int> rucksackCalories = rucksack.Split(Environment.NewLine).Select(int.Parse);

foreach (int item in rucksackCalories)
{
sum += item;
}

return sum;
}
``````

Again, no difference.

Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 77.42 us 0.288 us 0.241 us 7.6904 0.7324 126.43 KB

Next, we'll get rid of the `Select` and see what that does for us:

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
int sum = 0;

string[] rucksackItems = rucksack.Split(Environment.NewLine);

foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}

return sum;
}
``````

Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 60.75 us 0.917 us 0.716 us 6.9580 0.7324 114.76 KB

So, we're now noticably faster and we're also saving ~12 KB of allocation. Why? What's so expensive about the call to `Select`?

To help us understand, let's tweak the code slightly. The LINQ-to-objects `Select` method we're using operates on objects of type `IEnumerable<T>`; what happens if we change our code to work that way too?

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
int sum = 0;

IEnumerable<string> rucksackItems = rucksack.Split(Environment.NewLine);

foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}

return sum;
}
``````

In this version of the code, I've declared `rucksackItems` as `IEnumerable<string>` rather than `string[]`.

Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 67.73 us 0.324 us 0.303 us 7.4463 0.7324 122.54 KB

With this version, we're still 4 KB short of our fully-LINQ based solution, but that seemingly tiny change has resulted in an additional 8 KB of allocation and slowed us down a bit. Why?

Well, it turns out there's a significant difference under the covers between using a `foreach` loop on an array vs on an `IEnumerable<T>`.

When you use `foreach` to iterate an array, the compiler understands exactly what it's dealing with. More specifically, there are dedicated Common Intermediate Language (the form all .NET code is compiled into when you build your project) instructions in the resulting IL code for indexing into arrays. So under the covers, the code looks very much like a `for` loop - an index counts from 0 to the array length, and at each step the element at that index is obtained, parsed and added to the result.

When you use `foreach` to iterate an `IEnumerable`, things are very different. `IEnumerable` is at a higher level of abstraction meaning that fewer assumptions can be made. To iterate an `IEnumerable<T>`, you call its `GetEnumerator` method which returns an object of type `IEnumerator<T>`. The enumerator has a property to retrieve the current item in the array, and a method to move to the next item.

We can see this if we rewrite the code again to explicitly use the enumerator (with a bit of fiddling to ensure it's the generic version `IEnumerator<string>` rather than just `IEnumerator`):

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
int sum = 0;

string[] rucksackItems = rucksack.Split(Environment.NewLine);

IEnumerator<string> enumerator = ((IEnumerable<string>)rucksackItems).GetEnumerator();

while (enumerator.MoveNext())
{
sum += int.Parse(enumerator.Current);
}

return sum;
}
``````

Which results in exactly the same amount of allocated memory:

Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 69.36 us 0.327 us 0.273 us 7.4463 0.7324 122.54 KB

Whilst this gives us an idea of what's going on, the numbers don't match. Let's summarise quickly:

Allocated Difference from 1
1. Calling string.Split and then iterating the array 114.76 KB -
2. Calling string.Split and then obtaining an enumerator using `GetEnumerator` 122.54 KB 7.78 KB
3. Calling `Select(int.Parse)` and iterating the resulting enumerator 126.43 KB 11.67 KB

The reason there's still a difference in allocations between this code and the code with the `Select` call is because the type of enumerator being allocated is different. We can see this if we breakpoint the code.

Firstly, the code above which calls `GetEnumerator` on the array. This results in allocation of an `SZGenericArrayEnumerator<string>`:

We can see there that the enumerator has two fields, a reference to the array being enumerated and the current index. This means that the total size of an instance of the enumerator is 28 bytes. Heap space is allocated in blocks of 8 bytes, so each enumerator instance will result in 32 bytes allocated. We're allocating one of them for each of the 249 rucksacks, meaning that this will result in a total of 7,968 bytes, which is the difference between the 122.54 KB allocated in the code above and the 114.76 KB allocated when we weren't using enumerators at all.

The code which used `Select` and allocated 126.43 KB used a different type of enumerator: a `System.Linq.Enumerable.SelectArrayIterator<string, int>`:

LINQ has been clever enough to realise that the thing it's enumerating is an array, and has given us a purpose built enumerator. It's also used a common trick which is to make the thing it returns be both an `IEnumerable<int>` and an `IEnumerator<int>` - essentially the enumerable is its own enumerator, removing the need for allocating a separate enumerator. However, the enumerator returned by LINQ has some additional fields, such as `_selector` - a reference to the function that's going to be executed for each input value. This means that the size of an instance of this enumerator is 44 bytes, meaning 48 bytes allocated per instance for a total of 12,432 bytes.

Now we understand why removing the call to `Select` made such a big different, will the same happen if we ditch the call to `Max`? Returning to our code as it was after we removed the call to `Select`:

``````public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(CountRucksackCalories)
.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
int sum = 0;

string[] rucksackItems = rucksack.Split(Environment.NewLine);

foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}

return sum;
}
``````
Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 60.75 us 0.917 us 0.716 us 6.9580 0.7324 114.76 KB

Let's remove `Max`:

``````public string Solve(string input)
{
string[] rucksacks = input.Split(Environment.NewLine + Environment.NewLine);

int maxCalories = 0;

foreach (string rucksack in rucksacks)
{
int caloriesInRucksack = CountRucksackCalories(rucksack);

if (caloriesInRucksack > maxCalories)
{
maxCalories = caloriesInRucksack;
}
}

return maxCalories.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
int sum = 0;

string[] rucksackItems = rucksack.Split(Environment.NewLine);

foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}

return sum;
}
``````
Method Mean Error StdDev Gen0 Gen1 Allocated
RunAgainstRealInput 59.84 us 0.396 us 0.371 us 7.0190 0.7324 114.73 KB

As before, we see a drop in allocations. It's nowhere near as significant because we're only making the call to `Max` once rather than for each rucksack, so we only allocate one enumerator, not 249. But every little helps!

## Summary

In this post, we've made a small but noticable difference in both performance and memory allocation by removing LINQ from the solution. Does this mean you should avoid LINQ in your own code? Absolutely not. The difference in execution time was measured in microseconds, and the difference in memory allocation only stood out because we're not allocating much memory in general.

To demonstrate this, I've put together a benchmark which calculates SHA256 hashes on randomly generated byte arrays:

``````[MemoryDiagnoser]
public class LinqHashGenerationBenchmarks
{
private static readonly SHA256 Hasher = SHA256.Create();

public LinqHashGenerationBenchmarks()
{
Random random = new();
this.data = Enumerable.Range(0, 1000).Select(_ =>
{
var result = new byte[500];
random.NextBytes(result);
return result;
}).ToArray();
}

[Benchmark]
public void HashGenerationUsingLinq()
{
var target = this.data.Select(x => CalculateHash(x)).ToArray();
}

[Benchmark(Baseline = true)]
public void HashGenerationUsingForLoop()
{
var target = new byte[this.data.Length][];

for (int i = 0; i < this.data.Length; ++i)
{
target[i] = CalculateHash(this.data[i]);
}
}

private static byte[] CalculateHash(byte[] data)
{
byte[] result = data;

for (int i = 0; i < 10; ++i)
{
result = Hasher.ComputeHash(result);
}

return result;
}
}
``````

And here's the output of running the benchmark:

Method Mean Error StdDev Ratio Gen0 Gen1 Allocated Alloc Ratio
HashGenerationUsingLinq 1.284 ms 0.0054 ms 0.0045 ms 1.00 66.4063 11.7188 1.08 MB 1.00
HashGenerationUsingForLoop 1.280 ms 0.0023 ms 0.0021 ms 1.00 66.4063 11.7188 1.08 MB 1.00

In this case we can see that the work itself renders the LINQ overhead more or less invisible.

So the takeaway from this should not be "LINQ is slow", because it really isn't. But it's worth knowing some of what's going on, and that for super hot path code, or scenarios where you're hunting zero allocation, you will need to leave it behind.

In the next post, we'll start tackling the thing that's causing the most memory to be allocated: string handling.

### Jonathan George

#### Software Engineer IV

Jon is an experienced project lead and architect who has spent nearly 20 years delivering industry-leading solutions for clients across multiple industries including oil and gas, retail, financial services and healthcare. At endjin, he helps clients take advantage of the huge opportunities presented by cloud technologies to better understand and grow their businesses.