Optimising .NET code: Avoiding allocations using Span<T>
In this series we're looking at ways to improve the performance and memory efficiency of our .NET code, using my solutions to Advent of Code puzzles as a testing ground. In the first post, I introduced 2022's Day 1 puzzle - Calorie Counting - and my initial LINQ-based solution:
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)a
.Max(row => row.Split(Environment.NewLine).Select(int.Parse).Sum())
.ToString();
}
The initial benchmark figures for this code looked like this:
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 78.23 us | 0.512 us | 0.212 us | 7.6904 | 0.7324 | 126.43 KB |
In part 2, we went through a basic pass of trying to understand where the memory is being allocated, which showed us that the majority of our allocations come from calls to string.Split
. And in part 3, we had a look at the impact of LINQ and found that if we want to get allocations as close to zero as possible we need to move away from it.
As a result, the code currently looks like this:
public string Solve(string input)
{
string[] rucksacks = input.Split(Environment.NewLine + Environment.NewLine);
int currentMaxElfLoad = 0;
foreach (string rucksack in rucksacks)
{
int currentElfLoad = CountRucksackCalories(rucksack);
if (currentElfLoad > currentMaxElfLoad)
{
currentMaxElfLoad = currentElfLoad;
}
}
return currentMaxElfLoad.ToString();
}
private static int CountRucksackCalories(string rucksack)
{
int sum = 0;
string[] rucksackItems = rucksack.Split(Environment.NewLine);
foreach (string item in rucksackItems)
{
sum += int.Parse(item);
}
return sum;
}
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 59.84 us | 0.396 us | 0.371 us | 7.0190 | 0.7324 | 114.73 KB |
At the moment, we're making two calls to string.Split
; firstly to split the input into "rucksacks", and the second to split a rucksack into its individual lines. Each of these calls resulted in a significant chunk of memory being allocated.
The reason for this is that each of these calls results in strings being copied in memory. To avoid that, we could rewrite the code to only split once and detect when we're on a new rucksack by looking for empty lines.
public string Solve(string input)
{
string[] contents = input.Split(Environment.NewLine);
int currentMaxElfLoad = 0;
int currentElfLoad = 0;
foreach (string item in contents)
{
if (string.IsNullOrEmpty(item))
{
if (currentElfLoad > currentMaxElfLoad)
{
currentMaxElfLoad = currentElfLoad ;
}
currentElfLoad = 0;
}
else
{
currentElfLoad += int.Parse(item);
}
}
if (currentElfLoad > currentMaxElfLoad)
{
currentMaxElfLoad = currentElfLoad ;
}
return currentMaxElfLoad.ToString();
}
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 54.71 us | 0.808 us | 0.716 us | 4.8828 | 1.1597 | 80.39 KB |
This is better again - we've saved ourselves just over 30 KB. The only way we could reduce this further is to avoid the copying we incur as a result of the remaining call to string.Split
. This is where one of the new types introduced in .NET Core, Span<T>
comes into play.
Introducing Span<T>
Span<T>
is a new value type which represents a contiguous region of memory. The memory in question can be allocated on the stack or the heap - or even, in some cases, external memory that isn't owned by the CLR - and you can create one from an existing array. Essentially, it's like a pointer with benefits.
Since a string is essentially an array of char
, it will come as no surprise that it's possible to create a ReadOnlySpan<char>
(which is exactly what it sounds like) from an existing string, using the AsSpan()
method.
Why is this helpful? Firstly, many other parts of the framework have been modified to work with spans. For example, the int.Parse
method we're using in our code can now take a ReadOnlySpan<char>
as its argument, meaning that instead of needing a string containing the number, we can pass our safe pointer to the portion of the string which contains that number which results in no allocation at all.
Here's some code which demonstrates exactly that:
[MemoryDiagnoser]
public class ParsingIntsWithSpans
{
private const string InputData = "This string contains the number 1532 at position 32";
[Benchmark(Baseline = true)]
public void IntParseSubstring()
{
string raw = InputData[32..36];
int result = int.Parse(raw);
}
[Benchmark]
public void IntParseSpan()
{
ReadOnlySpan<char> input = InputData.AsSpan();
ReadOnlySpan<char> raw = input[32..36];
int result = int.Parse(raw);
}
}
The allocation in the first method comes from copying the relevent section of the InputData
string into a new string
. In the second method we bypass that by obtaining a ReadOnlySpan<char>
which points to the portion of InputData
we care about.
Method | Mean | Error | StdDev | Ratio | Gen0 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|
IntParseSubstring | 12.160 ns | 0.0448 ns | 0.0397 ns | 1.00 | 0.0019 | 32 B | 1.00 |
IntParseSpan | 8.510 ns | 0.0056 ns | 0.0050 ns | 0.70 | - | - | 0.00 |
Unfortunately there isn't a native equivalent of the string.Split
method that returns spans. Fortunately, this gap has been filled by Gérald Barré, who's published some code on his blog that does exactly what we need.
As it can no longer return an array of strings (since this is exactly what we need to avoid), the new SplitLines
extension method returns a StringSplitEnumerator
allowing us to move through the segments of the input data. This doesn't implement IEnumerator
because that would result in an allocation, but it does have the necessary functionality to work with foreach
loops.
The enumerator gives access to a LineSplitEntry
which contains the current line as a ReadOnlySpan<char>
as well as its position in the input string. Both the StringSplitEnumerator
and LineSplitEntry
are ref struct
s. This is a relatively recent addition to .NET and results in a type whose instances are guaranteed to be allocated on the stack. We'll talk about these more in future posts.
With that code safely embedded in the solution, let's have a look at how the code changes.
public string Solve(string input)
{
int currentMaxElfLoad = 0;
int currentElfLoad = 0;
foreach (StringExtensions.LineSplitEntry entry in input.SplitLines())
{
if (entry.Line.Length == 0)
{
if (currentElfLoad > currentMaxElfLoad)
{
currentMaxElfLoad = currentElfLoad;
}
currentElfLoad = 0;
}
else
{
currentElfLoad += int.Parse(entry.Line);
}
}
if (currentElfLoad > currentMaxElfLoad)
{
currentMaxElfLoad = currentElfLoad;
}
return currentMaxElfLoad.ToString();
}
As you can see, the code looks very similar. Instead of splitting the input and iterating over it, we're iterating the result of calling input.SplitLines()
, with each entry
being LineSplitEntry
. This contains a Line
property which is a ReadOnlySpan<char>
.
The results of benchmarking that code speak for themselves:
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
RunAgainstRealInput | 45.77 us | 0.140 us | 0.131 us | 32 B |
We've gone from our original code, which took 80.57us to run and allocated 126.43 KB of memory, to our reworked code which takes just over half the time and only allocates 32 B of memory.
The performance gain is interesting. Around three fifths of it seems to have come from dropping LINQ, and the other two fifths from avoiding memory allocation - after all, we haven't come up with a new and innovative way to sum the contents of each rucksack and then find the max; we still have to add up every number and keep track of our current maximum.
That remaining 32 B is a result of calling currentMaxElfLoad.ToString()
in the return statement. If I hadn't imposed that limitation on my code, we'd be able to get all the way down to zero.
But if, as I said in my previous post, the overhead of using LINQ is actually negligible, why do my results show getting rid of it as having had the biggest impact on performance? In reality, that's a side effect of the way I approached the problem; getting rid of LINQ early made it easier to introduce the new code for zero-allocation string splitting, and I knew it would be necessary to reduce allocations as far as possible. If I'd been happy to accept the minimal overhead and allocations that LINQ brings, but still brought in the Span<T>
based code, we would still have seen significant performance improvements. But I'll leave the proof of that as an exercise for the reader.
Summary
We've learned a few things as part of working through this relatively simple problem.
- Strings are a big source of memory allocations and you're probably copying them more than you think.
Span<T>
is the first big thing that can help here. - Reducing memory allocations will almost certainly give you a performance boost.
- LINQ is great for writing simple code, but comes with a performance penalty (both speed and memory).
- Writing performant code will most likely result in writing more code that's potentially harder to understand.
Finally, a word of warning: this work is addictive. Once you've started analysing memory usage and begun to cut down allocations in your code, it can become an obsession. That's the reason for this blog series - I'd like to get better at optimising code like this and I hope that others will get something out of my experience too.
In the next post we'll look at part 2 of the first day's problem and see what can be done to improve things there.