C# 11.0 new features: list pattern matching
In this fourth post in my series on new features in C# 11.0 we move onto the pattern matching enhancements, starting with the new support for matching lists.
Pattern matching recap
C# 7.0 added pattern matching, and each subsequent version has extended this functionality. In general, pattern matching performs some sort of runtime test. Here's a simple example:
if (item is not null)
{
item.Process(context);
}
Pattern matching informs the compiler's flow analysis. The compiler knows that if the body of that if
statement executes, item
will not be null
, so if we're using C#'s nullable reference types feature, the compiler will know not to issue a warning for that item.Process
method invocation, because it can infer that item
will be non-null if the pattern match succeeded.
Some patterns do more than enable compiler inference. Some explicitly produce outputs. For example this tests to see if an object implements a particular interface, and if so, initialises a variable allowing access to members of that interface:
if (item is IDisposable d)
{
d.Dispose();
}
So there's still a runtime test here. And this also feeds into flow control inferences: the compile knows that d
will be initialized, and non-null in cases where the body of the if
runs.
For some patterns this kind of extraction of data goes beyond simple type conversions, and can pick out pieces of data embedded in the pattern's input:
if (item is (double x, double y))
{
return Math.Sqrt(x * x + y * y);
}
This resembles a feature some languages call "destructuring" but it's not quite the same thing because this also executes a runtime test to see if the data matches the pattern. Actually this is slightly off the point, but C# does have one destructuring-like language feature, tuple deconstruction and it looks similar to the tuple pattern above:
(double x, double y) = item; // Not pattern matching.
That decomposes a tuple into its constituents in exactly the same way as the pattern, but the difference is that the compiler will only allow this if it can determine at compile time that item
is a suitable tuple. Patterns, by contrast, do that check at runtime—the destructuring cannot be separated from the runtime test. (The lines are slightly blurred though because if the compiler can determine that a pattern will definitely fail, it will report an error. And if it can determine that it will definitely always succeed, it omits the unnecessary runtime check.)
The new bit: list patterns
C# 11.0 adds a new pattern type: suppose that instead of a 2-tuple, we were looking for a two-element list:
if (items is [double x, double y])
{
return Math.Sqrt(x * x + y * y);
}
As with any pattern, this performs a runtime test. It asks "is items
a list containing two values of type double
?" And if it is, it then extracts those two values into the variables supplied. Not all uses of a list pattern will extract information. You could test for an empty list:
if (items is [])
{
throw new ArgumentException("Input must not be empty", nameof(items));
}
Since a list pattern is just one more kind of pattern, you can use it anywhere you'd use any other kind of pattern. For example we could combine it with the negation pattern to perform the opposite of the preceding test:
if (items is not [])
{
Console.WriteLine($"Average: {items.Average()}");
}
List patterns can be useful in scenarios where you need to provide general-purpose handling for lists, but might be able to offer specialized handling that works better for some common cases, e.g.:
double HyperPythagoras(ReadOnlySpan<double> vector)
{
switch (vector)
{
case []:
throw new ArgumentException("Vector must have at least 1 dimension");
case [double x]:
return x;
case [double x, double y]:
return Math.Sqrt(x * x + y * y);
default:
double total = 0;
for (int i = 0; i < vector.Length; ++i)
{
double v = vector[i];
total += v * v;
}
return Math.Sqrt(total);
};
}
In the one-dimensional case, this completely avoids the multiplication and square root operation that the general code path would perform, and it avoids the overhead of executing a loop in the very common two-dimensional case. (In reality you'd need to run performance analysis to work out whether this was having a useful effect. This example is just here to illustrate how you can use list patterns to implement this approach.)
What's a list, exactly?
I've been a bit vague about exactly which types work with list patterns. The syntax is deliberately suggestive of arrays, so it's unsurprising that these patterns will work with arrays. But they will also work with strings:
void DaftStringExample(string message)
{
if (message is [char first, char second])
{
Console.WriteLine($"String is '{first}' then '{second}");
}
}
That particular example isn't useful, but with slices and recursion (described shortly) there are situations where you might want to apply a list pattern to a string.
So what determines whether a list pattern will recognize something as a list? C# imposes two requirements: the input to the pattern must be countable, and indexable. To be countable, a type must have either a Length
or a Count
property of type int
(with Length
being preferred where both are defined). A type is indexable if it defines an indexer accepting either an Index
or an int
. (These notions of countability and indexability were not invented for list patterns—these definitions originate from the index and range features added in C# 8.0.)
This approach means that some quite diverse types can be used with list patterns. As well as obvious candidates such as arrays and IReadOnlyList<T>
, string
also works, as you've seen, but you can also use Span<T>
and ReadOnlySpan<T>
. And when it comes to your own types, you can make them work as inputs to list patterns by supplying the necessary members.
List patterns are somewhat unusual in that although they need their input types to meet certain requirements, they do not test for that at runtime. They will perform a runtime check to inspect the shape or contents of a list, but when it comes to asking the basic question "Is this thing even a list?" list patterns do that at compile time. This is different from most patterns that impose requirements on their input types. For example, you can write this:
double GetLength(object vector)
{
// Performs a runtime type test to see if this is implements ITuple.
if (vector is (double x, double y))
{
return Math.Sqrt(x * x + y * y);
}
return 0;
}
If you were to change that pattern from a tuple pattern to a list pattern (i.e., to [double x, double y]
) the compiler would complain with:
CS0021: Cannot apply indexing with [] to an expression of type 'object'
Since most patterns will perform a runtime type test in cases where the compiler can't deduce that outcome of that test is a foregone conclusion, it is initially surprising that this doesn't happen for lists. However, the problem is that there are a lot of different types it could be testing for. The obvious candidate is IReadOnlyList<T>
, and since not everything that could implement that does, perhaps a test for IList<T>
would also be appropriate. But to be consistent with the behaviour when the input type is completely determined at compile time, we'd also have to take into account the fact that user-defined types can be countable and indexable without implementing any particular interfaces. If I have some custom type called ListLike
that is countable and indexable, I'm allowed to use that as an input to a list pattern, so if I have a pattern whose input is object
at some point in the code where that ListLike
type is in scope, the compiler would need to emit a test for that too for the runtime testing to be consistent with compile-time testing. And if I have 100 such types in scope at that point, the compiler would need to emit tests for every single one of them to produce runtime test behaviour that is consistent with the behaviour you get when the types are completely known at compile time. That would obviously be undesirable.
The reason tuple tests are able to check for a suitable type at runtime is that they only need to check for one interface, ITuple
. But there is no single interface for being list-like—just like the criteria for whether something can be used with foreach
, it's based on the availability of members with certain names and characteristics. (Confusingly, language features that work this way also sometimes describe these rules as "patterns.") It's possible for there to be hundreds of acceptable types in scope. This is why C# insists that when you use a list pattern, it must be able to determine at compile time the Length
or Count
and indexer properties it will be using at runtime to test the input against the pattern.
Recursive patterns
List patterns are kind of recursive pattern, meaning that other patterns can be nested inside them. For example, the pattern [double x, double y]
is a list pattern containing two nested declaration patterns. This means that if we want to, we can use other patterns such as the discard pattern or a constant pattern:
if (items is [_, 42])
{
Console.WriteLine("Two-element list ending in 42");
}
If the input is a list of lists, you can even use a list pattern inside a list pattern:
void ListInList(IList<int[]> listOfArrays)
{
if (listOfArrays is [[], [_]])
{
Console.WriteLine("An empty array, then a 1-element array");
}
}
The fact that we can put a list pattern inside a list pattern illustrates why nestable patterns are described as 'recursive'.
Slice pattern
So far, all of my examples have tested for lists with specific lengths. But it's also possible to write a list pattern that indicates that any number of elements could appear at a particular point:
if (items is [.., 42])
{
Console.WriteLine("Ends with 42");
}
Technically speaking, that ..
is a slice pattern, which is a kind of pattern in its own right. However, although the C# language grammar defines a slice pattern to be just another kind of pattern, the compiler will reject any attempt to use a slice pattern anywhere other than inside a list pattern. (Putting them in any of the other places patterns are allowed conforms to C#'s grammar, but violates rules of use.)
The example above makes a slice pattern look a bit like a wildcard. If you are familiar with regular expressions, you might look at that and think it means something broadly similar to ".*42"
. It's a bit like that but not really, because there is a significant limitation: you can have at most one slice in a list pattern. So you're not allowed to write something like [.., 42, ..]
to test whether a particular value appears somewhere in a list.
C# restricts us to a single slice pattern per list pattern because this can be implemented efficiently: unlike a general-purpose wildcard-like construct, C# does not need to generate code that scans the input. The code shown above is effectively equivalent to:
if (items.Length > 0 && items[^1] == 42)
{
Console.WriteLine("Ends with 42");
}
Because there can be only a single ..
, the compiler can generate code that first verifies that the list is long enough to be able to match the pattern, and then it can check any nested patterns before the ..
by inspecting the start of the list, and it can check any that follow the ..
by inspecting elements at the end of the list. It can effectively skip over the part of the list represented by the ..
because it can infer how long that part would be based on the total length of the list. But if it allowed multiple ..
items it would need something much more like a regular expression mechanism to determine a match.
The example above is essentially using the ..
to say "I don't care what's in this part." But what if you do care? What if, having determined that the list is a match, you want to do something with the part represented by the ..
? For these scenarios, you can supply a pattern immediately after the ..
(and before the following ,
or ]
):
if (items is [.. var front, 42])
{
Console.WriteLine(front.Length);
}
Be careful though. This is yet another demonstration of why var
is almost invariably pure evil (as though the world needed any more evidence to establish that this is an undeniable, objective fact). It makes it really easy to miss the fact that we might have forced C# into causing an unnecessary allocation here. Actually it's impossible to tell from looking at this code alone whether that's even happening. As ever, var
forces us into inspecting however much of the surrounding code it takes for us to be able to perform type inference in our heads before we can understand what the code really means. If items
happens to be an int[]
, the code above is functionally identical to this:
if (items is [.. int[] front, 42])
{
Console.WriteLine(front.Length);
}
Now the problem stands out. When the code states the type explicitly, we can see that it's asking the compiler to put the section of of the array corresponding to the ..
pattern into a variable called front
, and since that variable's type is int[]
it's going to have to be an array. (Technically, this int[] front
is a nested type pattern, so it also means that the code is stating that the output of the ..
must be of type int[]
. As long time readers of this blog will know, changing a var
pattern to a type pattern normally changes the behaviour of the pattern: the type pattern performs an additional runtime test in certain circumstances. But that doesn't happen here because as discussed earlier, list patterns never perform runtime type tests. So in this particular context, the var
pattern will behave in exactly the same way as the type pattern.)
That array will necessarily be one element shorter than items
. (The section of the input represented by the ..
does not include any elements that the containing list pattern matched before or after it. So the 42
constant pattern here won't be part of the section that the ..
stands for.) This code therefore forces the compiler to initialize front
with a brand new array, containing a copy of all but the final element of items
. In critical code paths, this kind of avoidable allocation can have a significant impact on performance.
Perhaps surprisingly, placing any kind of pattern after the ..
will run into this problem if the source is an array, even if you don't actually introduce a variable to hold the resulting subarray. Even this allocates a new array:
void CauseUnnecessaryArrayAllocation(int[] items)
{
// This still allocates a new array corresponding to the ..
if (items is [.. not [], 42])
{
Console.WriteLine("That was reassuringly expensive.")
}
}
There's no real need to create a new array here to get the effect we want. (We just need to check that the array's final element is 42 and that that is not the only element.) But to avoid that, the compiler would need to detect special cases such as these and generate slightly different code.
While that might be possible, there's no real need, because if you want to use patterns to do this, you can avoid forcing allocations. For example, you could use ArraySegment<int>
instead of int[]
. Or you could write a more general-purpose form using ReadOnlySpan<int>
:
void AvoidingUnnecessaryAllocationsWithSpans(ReadOnlySpan<int> items)
{
// No allocation here.
if (items is [.. not [], 42])
{
// ...well, no allocation until we call Console.WriteLine
Console.WriteLine("List ending with 42, preceded by at least one element.");
}
}
This uses exactly the same pattern as the preceding example, but it doesn't cause an allocation in order to test the subsection of the list represented by the ..
against the not []
pattern. The reason for the difference is that when you slice an array, the CLR has to allocate a new array, but when you slice a span, the resulting span points to the same underlying data (it just points to slightly less of it). Obviously you get a new ReadOnlySpan<T>
but that's a value type, so no heap allocation occurs, so we're just using some extra space on the stack (and in fact the JIT compiler may be smart enough just to reuse space previously occupied by other variables).
But wait, how is the compiler even deciding what to do to create the sliced representation of the data? How did it know to do one thing for an array, but something entirely different with a ReadOnlySpan<int>
?
Sliceable types
We saw earlier that list patterns demand that their inputs are countable and indexable. It turns out that if you write a slice pattern with an optional pattern that receives the slice's output, then the input to the corresponding list pattern must also be sliceable. This is another way in which list patterns build on the older index and range language feature: a type is sliceable if it meets the requirements for range-based indexing. In essence types are required to offer either an indexer that accepts a Range
or an accessible Slice
method, but there are some special cases to enable certain built-in types that existed long before this language feature was introduced also to support range-based indexing (and therefore sliceability).
Note that this means that some types that can be used with list patterns will only work if the list pattern does not include a slice pattern that has an optional pattern applied to its output. Specifically, this will apply for types that are countable and indexable but not sliceable. IReadOnlyList<T>
is one such type. You can use that with a some list patterns, such as [1,2]
and [1,2,..]
. But [1,2, .. var theRest]
will not work. You get a slightly odd error:
CS1503: Argument 1: cannot convert from 'System.Range' to 'int'
That makes some sense once you understand that the compiler is asking "Is this thing sliceable?" You are sliceable if you have an indexer that accepts a System.Range
, but IReadOnlyList<T>
's indexer takes an int
. But the error message doesn't tell you that it's looking at the indexer, nor does it explain that what it's really trying to do is work out whether the type is sliceable. The cryptic nature of the error aside, the basic issue here is that we've asked the compiler to give us all but the first two elements as a slice, but there's no general way to slice an IReadOnlyList<T>
.
This intrinsic support for array types is what causes the allocation in the earlier example. Arrays don't have a Slice
method, but they are sliceable because they are a special case. The rules for range-based indexing say that array slicing should be done by calling the System.Runtime.CompilerServices.RuntimeHelpers
type's GetSubArray
method. And if you inspect the code generated for the CauseUnnecessaryArrayAllocation
example above, you'll see that's exactly what happens. This is where the allocation occurs.
The version that uses ReadOnlySpan<int>
instead of int[]
has exactly the same list pattern (with an identical slice pattern), but because ReadOnlySpan<T>
defines a suitable Slice
method, it doesn't rely on any special behaviour from the compiler. And ReadOnlySpan<T>.Slice
does not allocate anything new.
Summary
List patterns are a useful addition to C#'s ever-growing set of pattern types. They provide a natural syntax for recognizing empty lists, and for recognizing and destructuring lists with specific sizes. Splice patterns enable list patterns to match certain kinds of variable-length lists. And since list patterns are recursive, you can use any of C#'s patterns when defining the match criteria for individual elements.