Optimising .NET code: Introduction
For the past 8 years I've enjoyed working through Eric Wastl's Advent of Code. For those who haven't seen it, it's an Advent calendar-style series of 25 two-part puzzles, with a new puzzle being released daily throughout December each year.
When solving the problems my focus has generally been on getting to an answer; whilst I have compared run times with colleagues, it was more to see how solutions in different languages performed - C# vs Javascript vs Python and so on. However, as I was working on the puzzles from 2022, I was also doing some work on optimizing a specific piece of hot-path code to make it as performant as possible.
Some of my colleagues have talked about this kind of thing before - see Ian's blog about the AIS parser we built, and Carmel's series on High Performance C# - but it's not an area I've spent much time looking at in recent years.
However, it did make me wonder; how could I improve my Advent of Code solutions to make them as performant and memory efficient as possible?
What do we mean by memory efficiency?
When we talk about memory efficient code in C#, we're generally talking about limiting allocation of managed memory on the heap, and avoiding unnecessary copying of data in memory.
The main reason to limit allocation on the heap is because the more you allocate, the more work is needed by the .NET garbage collector to keep things in good shape (the first post in Carmel's series explains how this all works). Garbage collections can be CPU intensive and can also result in code execution being paused, so if you want really high performing code then you need to look at ways to reduce GC activity.
Avoiding unnecessary copying of data sounds obvious but it isn't always - and we'll see a lot of examples of that as we work through some real world code. Ian goes into quite a lot of detail about this in his AIS parser post in the section "Performance and copying data", so it's worth reading through that.
Spoiler alert!
Since I'm going to be looking at and improving on my solutions to Advent of Code problems, it goes without saying that this series will contain massive spoilers for the 2022 puzzles. If you'd like to work through the problems yourself you shouldn't read these posts until you have your own solutions in place.
Caveat
Whilst I'm going to be looking at ways we can make code more efficient, your should bear in mind that these techniques are absolutely not necessary everywhere. Writing highly performant code might require you to write code that's harder to read or work with and depending on what you're building, this trade off may not be worth it.
With that said, lets have a look at the first puzzle.
Day 1: Calorie Counting
https://adventofcode.com/2022/day/1
In this puzzle, we're given a list of numbers as our input. Each number is on its own line, and each group of numbers is split by an empty line:
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
The first part of the puzzle is to find the total of the largest group.
I normally find that the early days lend themselves to quick and simple solutions using lots of LINQ. As we'll see, this doesn't necessarily lead to the most performant code. So let's have a look at my first pass:
public string Solve(string input)
{
return input.Split(Environment.NewLine + Environment.NewLine)
.Max(row => row.Split(Environment.NewLine).Select(int.Parse).Sum())
.ToString();
}
You can also see the code here.
Note: I've got a standard solution structure I use for AoC; each day is implemented as a class with a Solve
method that takes the input as a string and returns the answer as a string - not all the solutions are numeric.
If we're going to improve the performance and memory efficiency of this code, we first need to know how performant it currently is, and how much memory it's using to execute. How do we do that?
Introducing benchmarking and BenchmarkDotNet
In order to assess the performance of the solutions, I'll be using BenchmarkDotNet. This is an excellent benchmarking library that allows you to quickly get up and running with benchmarking applications.
It's important to note a few things about benchmarking. The most obvious one is that the results of your benchmark will largely depend on the hardware you're using, and the obvious side effect is that benchmarks executed on different hardware will likely have very different results, at least as far as execution time goes.
In fact, repeated runs of benchmarks on the same hardware will likely give slightly different results for execution time. This is why BenchmarkDotNet runs your benchmarks multiple times and gives you averages of the results. It also "warms up" your code by running it several times prior to the "real" benchmarking runs. This ensures all your code has been "JITted" by the runtime, all necessary assemblies have been loaded into the process and so on.
If you want to try running any of my code yourself, bear in mind that my hardware is likely different to yours and this will show up as (potentially significant) differences in execution times. As a result, it's normal to try and establish a baseline to compare other benchmarks against. In our case, our baseline for measuring improvements will be the benchmarks for the initial solution shown above.
For memory usage, hardware differences are less important. However, if your hardware is wildly different from mine you may well see differences.
The last two things to bear in mind about the code and associated benchmarks are:
- It's all .NET 7.0.
- It's all running on Windows.
So with that said, let's get on with it...
Benchmarking the initial solution
Since I've got a generic solution structure for my Advent of Code solutions, which you can see here, it's pretty simple for me to add benchmarks. Due to the way in which they execute, it's not really possible to pass parameters in, so there's a bit of repetition in the way they are put together.
The benchmark tests execution of each solution. This doesn't include:
- Loading the input from file
- Creating an instance of the solution class for the puzzle
- Writing the result to the console
But it does include all the other aspects of the solution:
- Parsing the input
- Finding the solution
- Returning the solution as a string
It's worth noting that this is probably not what you'd want to do in a real world scenario. All the things that I'm excluding from the benchmarking are still part of the application and it would likely be necessary to include them if I were optimizing a production application. However, for the purpose of these posts I'm only interested in the performance of the code that "does the work".
So, for my initial LINQ based solution of day 1 part 1, here's the benchmarking results:
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
RunAgainstRealInput | 80.57 us | 1.596 us | 2.184 us | 7.6904 | 0.7324 | 126.43 KB |
Let's have a quick look at what each column means, courtesy of BenchmarkDotNet's legend:
Column | Definition |
---|---|
Mean | Arithmetic mean of all measurements |
Error | Half of 99.9% confidence interval |
StdDev | Standard deviation of all measurements |
Gen0 | GC Generation 0 collects per 1000 operations |
Gen1 | GC Generation 1 collects per 1000 operations |
Allocated | Allocated memory per single operation (managed only, inclusive, 1KB = 1024B) |
1 us | 1 Microsecond (0.000001 sec) |
(If you'd like to better understand the difference between GC gen 0 and gen 1, this blog post is an excellent starting point.)
So, we can see that the initial solution is executing pretty fast - 0.08ms, which is an excellent start. We can also see that it's allocating 126.43 KB of memory on the heap, which is a bit surprising. Why so much? The input data is around 13KB on disk and the code is trivially simple, so what's it being used for?
In the next post, we'll hunt down the reasons for this memory allocation and try and explain why 13KB on disk turns into nearly 10 times that in our working set.