There's a recurring theme in recent new versions of .NET and C#: do a bit of extra work early on to speed things up later. This post explains one example of this: the
SearchValues<T> type that .NET 8.0 adds to the runtime libraries.
Pay now, gain later
When you need to do a lot of work of a particular kind, proper preparation can pay off. Chefs sometimes talk about mise en place (literally 'put in place'). This describes the steps they take to ensure that when the restaurant starts taking food orders, they have to hand everything they need to get on with the main task of cooking as efficiently as possible. It takes some time to lay everything out in your workspace, but this pays off many times over because it enables a work rate that wouldn't have been possible without the preparation.
Some of the work we get code to perform has similar characteristics. Compiler optimization increases how long we have to wait before code starts to run, but the code will run much faster once it starts. If the code is doing repetitive work, this will usually pay off.
As ever with performance, there isn't a one-size-fits-all "best practice" because there are many factors that can influence our choices. When we're considering doing preparation in advance we need to consider:
- Are there any parts of the prep that are fundamentally unavoidable? For example:
- with regular expressions, something needs to detect when the expression is invalid
- if we're going to deserialize JSON data into a .NET object's properties, it will be necessary to know what properties the object has defined
- Are there aspects of preparation which, with hindsight, might turn out to have been unnecessary? E.g.:
- perhaps the app only does certain aspects of the work in specific scenarios
- a large regular expression might have a complex structure, but be formed in such a way that for most inputs, most of it never comes into play, so work done to be able to process all of it quickly might never pay off
- with JSON deserialization, we might be able to process rarely used structures, but not actually see them in the lifetime of most processes
- How much runtime flexibility do we need? E.g.:
- might we want to generate regular expressions at runtime?
- when searching for strings, might the specific text we're looking for be something that an end user typed into a search box?
- might we want to use Entity Framework to define queries but be able to manipulate the query expressions at runtime?
How soon is now?
It's important to consider how early we can get started on the preparation. In some cases, it might be possible to do some or all of the prep at compile time. This can offer some major benefits:
- no runtime delay in getting started on the actual work
- might enable implementation strategies that are inherently faster than any purely "at runtime" approach could offer
- makes code generation available even in Native AOT (you can't generate code at runtime on Native AOT, but that doesn't need to be a problem if you're able to do all necessary code generation at compile time)
Sometimes, preparation work needs to take into account information that's only available at runtime. This blog is about searching for values in text, and there are many situations in which you won't know what particular text to search for until runtime (e.g., because you're searching for something the user typed in).
So in general, preparation work falls into two categories. In some cases, there's a runtime tradeoff: the extra preparation work means it takes longer to get your first bit of work done, and it's only a net win in the long run. Your average performance goes up, but your cold start performance can get worse. But if, like a chef, we can get our preparation done before service starts, we might be able to push the preparation work all the way back into the build process. For example, the .NET SDK includes a source generator for
System.Text.Json serialization. This changes when a program does the work of inspecting a type's structure in order to know how to serialize it. By default, this happens the first time you serialize or deserialize a particular type. But with the code generator, that work happens at compile time. This means that at runtime, there is no performance downside: even the first-time case will be faster because all the slow work has already been done. In fact there's a further benefit in that particular case—the code generator enables JSON reflection to avoid ever using reflection, being able instead to use its own more specialized but very efficient representation of the type information it requires.
So whenever you're looking at any kind of pay now, gain later feature, it is helpful to understand which of these categories it is in. This table summarizes the differences:
|When prep occurs
|Average performance with repeated use
|May be worse than no prep
|Better than runtime prep; could be better or worse than no prep
If the extra prep work would need to be done at runtime, there will be some scenarios in which you won't want to do it, either because you're not doing enough repetitive work to recoup the up-front costs, or because first-run performance matters more to your scenario than average performance. But if a compile-time option is available, then that typically is a pure win for runtime performance. There's a price to pay of course: there might be a loss of flexibility. If your code needs to adapt at runtime to circumstances you could not foresee at compile time, you won't be able to do your prep at compile time.
SearchValues<T> type is in the first category. That means that the extra prep work is being done at runtime, so there will be some scenarios in which you won't want to use it, either because you're not doing enough repetitive work to recoup the up-front costs, or because first-run performance matters more to you than average performance.
Searching strings for particular values is a very common task. Perhaps we have input in the form of a sequence of elements delimited by dots (e.g.
Owner.Name.Title), in which case we might need to write code that searches for the
'.' character in a string. Or perhaps we're writing some sort of tokenization system, and we need to search large volumes of text for markers that indicate places where we should substitute values for tokens. This turns out to be a job where a little bit of preparation can boost performance.
Modern CPUs typically offer vector processing features, meaning that it's possible for single instructions to process multiple values simultaneously. This feature can be exploited to perform string searches more quickly than is possible with more conventional coding. However, with some kinds of searches, it can take a bit of work to determine the best strategy. For example, .NET strings offer an
IndexOfAny method, which you supply with a list of characters of interest, and it will tell you whether a string contains any of those characters, and if so, where. Here are a couple of examples:
int indexOfFirstDot = s.IndexOfAny('.');
char alphaNumeric = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".ToArray();
int indexOfFirstAlphaNumeric = s.IndexOfAny(alphaNumeric)
While both calls to
IndexOfAny here are in some sense doing the same thing, there's quite a difference in the details. The optimal strategy for looking for a single character is probably not the best possible strategy for the second call, which is looking for any of a fairly wide range of characters. So the implementors of
IndexOrAny are faced with a choice:
- spend some time to determine the best implementation strategy for the inputs at hand
- use a one-size-fits-all implementation
In fact this more of a spectrum than a choice.
IndexOrAny does not use a one-size-fits-all approach, but that still leaves the question of just how much time to spend on choosing the strategy. The more work it does to fit the strategy to the data, the faster the actual search will be, but if it takes longer to pick a strategy than it would have taken to execute with a slightly suboptimal strategy, then the suboptimal strategy wins overall.
If we're using
IndexOfAny in performance-critical paths, we are like to be searching for the same sort of thing again and again. Wouldn't be be good if we could get .NET to do the work of picking the best implementation strategy just once for any particular search criteria? That way, each individual search could immediately get started without having to spend any time picking an approach—that decision will already have been made.
SearchValues<T> type enables us to separate the work of choosing a search strategy from the actual search. If we write this:
SearchValues<char> alphaNumeric = SearchValues.Create(
.NET will look at the input to determine which of the available search techniques is likely to work best. It will come up with a very different solution for this example than if we had written
SearchValues.Create("."). Once we have a
SearchValues<T>, we can use it on any
ReadOnlySpan<T>. So to search a string we could write:
int indexOfFirstAlphaNumeric = s.AsSpan().IndexOfAny(alphaNumeric)
Since we use this via
ReadOnlySpan<T>, we can apply
SearchValues<char> not just to a
string, but to any suitable range of memory, such as an array, or even some block of memory allocated outside of .NET's control.
SearchValues<T> we are essentially telling the runtime that we expect to be using the specified search criteria a lot, so it makes a slightly different call about how much prep would be too much prep than
string.IndexOfAny. For example,
SearchValues<T> inspects the input to see if it's actually a range of values. If we write
SearchValues<char>.Create("0123456789") it will recognize that this is a contiguous range of values, and it will use an implementation optimized for that case. It also handles very short value lists as special cases. The exact set of strategies
SearchValues<T> will consider is not documented, and will change across hardware—it will make decisions based on the specific set of vectorized features available on the CPU it is running on. The exact behaviour is likely to change over time, as new optimizations are added to the runtime libraries.
How much faster?
The effect of using
SearchValues<T> varies based on a few factors. One particularly significant factor is the volume of data being searched. If the search finds what it's looking for right at the very start, vectorized implementations might not be noticeably better than the default.
For example, in benchmarks running on my machine (a desktop with an Intel Core i9-9900K CPU) I've found that in cases where the set of values being searched for is small (4 or fewer) BenchmarkDotNet reports that both
SearchValues<char>.IndexOfAny complete in about 5ns if there's a match right at the very start of the string.
However, as the size of the input being searched goes up, the
SearchValues<char> starts to show larger benefits. For example, if the string contains around 2,000 characters, and the match is right at the end,
string.IndexOfAny takes around 76ns, whereas
SearchValues<char>.IndexOfAny takes only 35ns.
So in that scenario,
SearchValues runs twice as fast!
The nature of the set of values being searched for also has an impact. As already mentioned, ranges get special handling, as do small sets of values. I created one test in which the set of characters being searched contained 74
char values, but all in a single contiguous range. In the case where the first matching character in the input was at the end of a 2,000 character string,
string.IndexOfAny takes around 115ns, whereas
SearchValues<char>.IndexOfAny takes only 28ns.
That's over four times faster.
Of course, real application code won't spend its whole time searching in strings. Even so, these are significant improvements in a task that many applications need to perform frequently. The .NET runtime is already using this new mechanism internally, so you may well benefit from it even when you don't use it directly. It's all part of .NET's ongoing drive to improve performance with each new release.