Skip to content
Ian Griffiths By Ian Griffiths Technical Fellow I
C# 11.0 new features: Span<char> pattern matching

This fifth post in my series on new features in C# 11.0 is the second of two posts on pattern matching.

String constant pattern recap

We've been able to use string constants as patterns for a long time in C#, e.g.:

if (name is "Lobby Lud")
{
    Console.WriteLine("I claim my five pounds");
}

Back in C# 1.0, switch statements provided special support for using string constants as case values. When C# 7.0 enhanced switch statements to allow the use of patterns, the special string handling was retconned into being treated as a pattern just like all the other forms of case. So when you write this sort of thing:

switch (name)
{
    case "Lobby Lud":
        Console.WriteLine("I claim my five pounds");
        break;
    case "What?":
        Console.WriteLine("Who?");
        break;
}

C# would once have processed this as relying on the intrinsic support for the string constant form of case. But nowadays, the case keyword can be followed by any pattern, so the compiler sees these as constant patterns.

Span as input to string constant pattern

In the examples above, I've not shown the declaration of the name variable, making it hard to tell what its type is. You might be wondering if, once again, I'm taking the opportunity to explain what I dislike about var, but on this occasion I have a different motive.

Prior to C# 11.0, name would need to be a string for those examples to compile correctly. But now, it would also work if name were a ReadOnlySpan<char> or Span<char>.

If you're not familiar with these types, I wrote years ago about how we used them to implement high performance AIS.NET parsing. But to quickly recap, a Span<T> or ReadOnlySpan<T> can represent any sequence of values in memory. They are conceptually like arrays, but they are more flexible in that they can refer to different kinds of memory—they don't necessarily have to refer to an array, or even to memory that lives on the GC heap. A string is a sequence of char values, but it's not an array. So we couldn't use it via char[], but we can refer to it as a ReadOnlySpan<char>.

Why would we? An important feature of Span<T> and ReadOnlySpan<T> is that they provide a highly efficient way to represents subsections of the data. If you slice out a subsection of a string using someString[4..10], C# will generate code that creates a brand new string object containing a copy of just the characters you asked for. But if you do exactly the same thing with a ReadOnlySpan<char>, no new copies of the data will be made. You end up with a new ReadOnlySpan<char> (which, being a value type, doesn't require a new heap object) which points to the same underlying data, it just points to slightly less of it.

So imagine the string you wish to inspect is a substring of some larger string. Before C# 11.0, to use either of the examples above you'd have had to write something like:

string name = doc[nameStartIndex..nameEndIndex]; // Allocates a new string on the GC heap

But in C# 11.0, you can instead write this:

ReadOnlySpan<char> name = doc.AsSpan[nameStartIndex..nameEndIndex];

The resulting span doesn't have its own copy of the data. It just refers to the part of the doc string that contains the data of interest. And thanks to the new C# 11.0 feature I'm discussing, you can use that ReadOnlySpan<char> name with patterns as shown in the earlier examples.

What it compiles to

The compiler generates code that performs the comparison using the SequenceEqual extension method defined by the MemoryExtensions class. So the first example is equivalent to this:

if (name.SequenceEqual("Lobby Lud"))
{
    Console.WriteLine("I claim my five pounds");
}

No UTF-8 support

If you've been reading this whole series, you may recall C# 11.0's new UTF-8 string literals feature, and you might be wondering whether this new support for span-based pattern matching also works with UTF-8 text. Could you write this, for example?

ReadOnlySpan<byte> textUtf8 = "Hello"u8;
if (textUtf8 is "Hello"u8)) // Won't compile
{
    Console.WriteLine("Match");
}

Let's ignore the entirely unnecessary nature of the comparison—obviously by inspection we'd expect this test always to succeed. But in fact it won't compile. The compiler does not recognize UTF-8 string constants as a type of constant pattern. (As far as I know, there isn't some fundamental reason that it couldn't. But it would require additional language support, because the expression "Hello"u8 is of type ReadOnlySpan<byte>, and that's not in the list of things that be used as a constant pattern. And if you're thinking that this sounds like an odd choice because ReadOnlySpan<char> isn't exactly million miles from ReadOnlySpan<byte>, remember that the new language feature I'm discussing changes only what is allowed as the input to a pattern; it does add any new ways of defining a pattern.)

In this specific example, we could get the behaviour we want by just calling the same extension the compiler uses with strings:

ReadOnlySpan<byte> textUtf8 = "Hello"u8;
if (textUtf8.SequenceEqual("Hello"u8))
{
    Console.WriteLine("Match");
}

but that's not a general solution: we've replaced a pattern with a method invocation, meaning we can't do this in all places where a pattern is expected. So we can't use a UTF-8 string constant as a pattern in a switch statement or expression, for example.

Summary

ReadOnlySpan<char> can provide a memory efficient way to work with substrings. You can obtain a ReadOnlySpan<char> that refers to some subsection of a string (or of any sequence of char values) without needing to allocate a new object. C# 11.0 enables us to use the resulting ReadOnlySpan<char> as the input to a string constant pattern.

Ian Griffiths

Technical Fellow I

Ian Griffiths

Ian has worked in various aspects of computing, including computer networking, embedded real-time systems, broadcast television systems, medical imaging, and all forms of cloud computing. Ian is a Technical Fellow at endjin, and Microsoft MVP in Developer Technologies. He is the author of O'Reilly's Programming C# 10.0, and has written Pluralsight courses on WPF (and here) and the TPL. He's a maintainer of Reactive Extensions for .NET, Reaqtor, and endjin's 50+ open source projects. Technology brings him joy.