Optimising .NET code: Let's blame LINQ

By Jonathan George Software Engineer IV 6th September 2023

In this series we're looking at ways to improve the performance and memory efficiency of our .NET code, using my solutions to Advent of Code puzzles as a testing ground. In the first post, I introduced 2022's Day 1 puzzle - Calorie Counting - and my initial LINQ-based solution:

TLDR: If you're aiming for zero-allocation, you shouldn't use LINQ. However, for the vast majority of use cases the overhead of using it is negligible.

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(row => row.Split(Environment.NewLine).Select(int.Parse).Sum())
        .ToString();
}

The initial benchmark figures for this code looked like this:

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	78.23 us	0.512 us	0.212 us	7.6904	0.7324	126.43 KB

In part 2, we went through a basic pass of trying to understand where the memory is being allocated. This showed us that the majority of our allocations come from calls to string.Split. In order to improve that (for reasons which will become more obvious later), I'm going to need to get rid of LINQ. But will this have any impact in its own right?

When doing this kind of performance engineering work, it's vitally important that you take one step at a time and evaluate the impact of each step as you go. So if we're going to get rid of LINQ, we should look at the results of doing that before we move on.

The LINQ-based code above is doing the following:

Split the input into individual "rucksacks"
Split each rucksack into individual lines
Parse the numbers in each line of the rucksack
Produce the sum of the numbers for each rucksack
Determine the largest rucksack based on those sums.

To make this clearer, let's rejig the code a bit:

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(CountRucksackCalories)
        .ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    return rucksack.Split(Environment.NewLine).Select(int.Parse).Sum();
}

This results in no real difference in performance.

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	77.68 us	0.304 us	0.284 us	7.6904	0.7324	126.43 KB

It does make the code a bit clearer though! We can see that we're using three LINQ functions:

Max - to calculate the rucksack with the largest number of calories
Select - to map our set of input strings to integers
Sum - to calculate a total for each rucksack.

Let's get rid of those one at a time to see what happens. We'll start with the Sum:

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(CountRucksackCalories)
        .ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    int sum = 0;

    IEnumerable<int> rucksackCalories = rucksack.Split(Environment.NewLine).Select(int.Parse);

    foreach (int item in rucksackCalories)
    {
        sum += item;
    }

    return sum;
}

Again, no difference.

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	77.42 us	0.288 us	0.241 us	7.6904	0.7324	126.43 KB

Next, we'll get rid of the Select and see what that does for us:

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(CountRucksackCalories)
        .ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    int sum = 0;

    string[] rucksackItems = rucksack.Split(Environment.NewLine);

    foreach (string item in rucksackItems)
    {
        sum += int.Parse(item);
    }

    return sum;
}

That's made a difference!

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	60.75 us	0.917 us	0.716 us	6.9580	0.7324	114.76 KB

So, we're now noticably faster and we're also saving ~12 KB of allocation. Why? What's so expensive about the call to Select?

To help us understand, let's tweak the code slightly. The LINQ-to-objects Select method we're using operates on objects of type IEnumerable<T>; what happens if we change our code to work that way too?

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(CountRucksackCalories)
        .ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    int sum = 0;

    IEnumerable<string> rucksackItems = rucksack.Split(Environment.NewLine);

    foreach (string item in rucksackItems)
    {
        sum += int.Parse(item);
    }

    return sum;
}

In this version of the code, I've declared rucksackItems as IEnumerable<string> rather than string[].

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	67.73 us	0.324 us	0.303 us	7.4463	0.7324	122.54 KB

With this version, we're still 4 KB short of our fully-LINQ based solution, but that seemingly tiny change has resulted in an additional 8 KB of allocation and slowed us down a bit. Why?

Well, it turns out there's a significant difference under the covers between using a foreach loop on an array vs on an IEnumerable<T>.

The Introduction to Rx.NET 2nd Edition (2024) Book, by Ian Griffiths & Lee Campbell, is now available to download for FREE.

When you use foreach to iterate an array, the compiler understands exactly what it's dealing with. More specifically, there are dedicated Common Intermediate Language (the form all .NET code is compiled into when you build your project) instructions in the resulting IL code for indexing into arrays. So under the covers, the code looks very much like a for loop - an index counts from 0 to the array length, and at each step the element at that index is obtained, parsed and added to the result.

When you use foreach to iterate an IEnumerable, things are very different. IEnumerable is at a higher level of abstraction meaning that fewer assumptions can be made. To iterate an IEnumerable<T>, you call its GetEnumerator method which returns an object of type IEnumerator<T>. The enumerator has a property to retrieve the current item in the array, and a method to move to the next item.

We can see this if we rewrite the code again to explicitly use the enumerator (with a bit of fiddling to ensure it's the generic version IEnumerator<string> rather than just IEnumerator):

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(CountRucksackCalories)
        .ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    int sum = 0;

    string[] rucksackItems = rucksack.Split(Environment.NewLine);

    IEnumerator<string> enumerator = ((IEnumerable<string>)rucksackItems).GetEnumerator();

    while (enumerator.MoveNext())
    {
        sum += int.Parse(enumerator.Current);
    }

    return sum;
}

Which results in exactly the same amount of allocated memory:

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	69.36 us	0.327 us	0.273 us	7.4463	0.7324	122.54 KB

Whilst this gives us an idea of what's going on, the numbers don't match. Let's summarise quickly:

	Allocated	Difference from 1
1. Calling string.Split and then iterating the array	114.76 KB	-
2. Calling string.Split and then obtaining an enumerator using `GetEnumerator`	122.54 KB	7.78 KB
3. Calling `Select(int.Parse)` and iterating the resulting enumerator	126.43 KB	11.67 KB

The reason there's still a difference in allocations between this code and the code with the Select call is because the type of enumerator being allocated is different. We can see this if we breakpoint the code.

Firstly, the code above which calls GetEnumerator on the array. This results in allocation of an SZGenericArrayEnumerator<string>:

Visual Studio window with the Locals pane showing details of the enumerator allocated by calling GetEnumerator on the array

We can see there that the enumerator has two fields, a reference to the array being enumerated and the current index. This means that the total size of an instance of the enumerator is 28 bytes. Heap space is allocated in blocks of 8 bytes, so each enumerator instance will result in 32 bytes allocated. We're allocating one of them for each of the 249 rucksacks, meaning that this will result in a total of 7,968 bytes, which is the difference between the 122.54 KB allocated in the code above and the 114.76 KB allocated when we weren't using enumerators at all.

The code which used Select and allocated 126.43 KB used a different type of enumerator: a System.Linq.Enumerable.SelectArrayIterator<string, int>:

Visual studio window with the Locals pane showing details of the enumerator allocated by calling Select on the array

LINQ has been clever enough to realise that the thing it's enumerating is an array, and has given us a purpose built enumerator. It's also used a common trick which is to make the thing it returns be both an IEnumerable<int> and an IEnumerator<int> - essentially the enumerable is its own enumerator, removing the need for allocating a separate enumerator. However, the enumerator returned by LINQ has some additional fields, such as _selector - a reference to the function that's going to be executed for each input value. This means that the size of an instance of this enumerator is 44 bytes, meaning 48 bytes allocated per instance for a total of 12,432 bytes.

Now we understand why removing the call to Select made such a big different, will the same happen if we ditch the call to Max? Returning to our code as it was after we removed the call to Select:

public string Solve(string input)
{
    return input.Split(Environment.NewLine + Environment.NewLine)
        .Max(CountRucksackCalories)
        .ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    int sum = 0;

    string[] rucksackItems = rucksack.Split(Environment.NewLine);

    foreach (string item in rucksackItems)
    {
        sum += int.Parse(item);
    }

    return sum;
}

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	60.75 us	0.917 us	0.716 us	6.9580	0.7324	114.76 KB

Let's remove Max:

public string Solve(string input)
{
    string[] rucksacks = input.Split(Environment.NewLine + Environment.NewLine);

    int maxCalories = 0;

    foreach (string rucksack in rucksacks)
    {
        int caloriesInRucksack = CountRucksackCalories(rucksack);

        if (caloriesInRucksack > maxCalories)
        {
            maxCalories = caloriesInRucksack;
        }
    }

    return maxCalories.ToString();
}

private static int CountRucksackCalories(string rucksack)
{
    int sum = 0;

    string[] rucksackItems = rucksack.Split(Environment.NewLine);

    foreach (string item in rucksackItems)
    {
        sum += int.Parse(item);
    }

    return sum;
}

Method	Mean	Error	StdDev	Gen0	Gen1	Allocated
RunAgainstRealInput	59.84 us	0.396 us	0.371 us	7.0190	0.7324	114.73 KB

Programming C# 12 Book, by Ian Griffiths, published by O'Reilly Media, is now available to buy.

As before, we see a drop in allocations. It's nowhere near as significant because we're only making the call to Max once rather than for each rucksack, so we only allocate one enumerator, not 249. But every little helps!

Summary

In this post, we've made a small but noticable difference in both performance and memory allocation by removing LINQ from the solution. Does this mean you should avoid LINQ in your own code? Absolutely not. The difference in execution time was measured in microseconds, and the difference in memory allocation only stood out because we're not allocating much memory in general.

To demonstrate this, I've put together a benchmark which calculates SHA256 hashes on randomly generated byte arrays:

[MemoryDiagnoser]
public class LinqHashGenerationBenchmarks
{
    private static readonly SHA256 Hasher = SHA256.Create();
    private readonly byte[][] data;

    public LinqHashGenerationBenchmarks()
    {
        Random random = new();
        this.data = Enumerable.Range(0, 1000).Select(_ =>
        {
            var result = new byte[500];
            random.NextBytes(result);
            return result;
        }).ToArray();
    }

    [Benchmark]
    public void HashGenerationUsingLinq()
    {
        var target = this.data.Select(x => CalculateHash(x)).ToArray();
    }

    [Benchmark(Baseline = true)]
    public void HashGenerationUsingForLoop()
    {
        var target = new byte[this.data.Length][];

        for (int i = 0; i < this.data.Length; ++i)
        {
            target[i] = CalculateHash(this.data[i]);
        }
    }

    private static byte[] CalculateHash(byte[] data)
    {
        byte[] result = data;

        for (int i = 0; i < 10; ++i)
        {
            result = Hasher.ComputeHash(result);
        }

        return result;
    }
}

And here's the output of running the benchmark:

Method	Mean	Error	StdDev	Ratio	Gen0	Gen1	Allocated	Alloc Ratio
HashGenerationUsingLinq	1.284 ms	0.0054 ms	0.0045 ms	1.00	66.4063	11.7188	1.08 MB	1.00
HashGenerationUsingForLoop	1.280 ms	0.0023 ms	0.0021 ms	1.00	66.4063	11.7188	1.08 MB	1.00

In this case we can see that the work itself renders the LINQ overhead more or less invisible.

So the takeaway from this should not be "LINQ is slow", because it really isn't. But it's worth knowing some of what's going on, and that for super hot path code, or scenarios where you're hunting zero allocation, you will need to leave it behind.

In the next post, we'll start tackling the thing that's causing the most memory to be allocated: string handling.

Who We Are

What We Do

Who We Help

What We Think

Contact Us

Summary

Also worth reading:

An Overview of the Corvus.Globbing Library

Liam Mooney05/12/2022

Optimising .NET code: Hunting for allocations

Jonathan George05/09/2023

Optimising .NET code: Avoiding allocations using Span<T>

Jonathan George07/09/2023

Jonathan George

Software Engineer IV