C# 11.0 new features: ref fields and the scoped keyword
 
                The most recent post in my series on new features in C# 11.0 described how you can use Span<char> and ReadOnlySpan<char> as the input to a string constant pattern. The span types are part of a C#'s suite of high-performance, low-allocation features, which has been growing steadily since C# 7. C# 11.0 marked a significant milestone in the development of these features: the span types no longer rely on special runtime support, because C# 11.0 and .NET 7.0 introduced the tools required to build these kinds of types from scratch.
(I apologise for the rather long hiatus between the last article and this one. I stopped to write a book on C# 12.0, by which time I'd lost momentum on this series. Obviously C# 12.0 has been out for some time now, and C# 13.0 is just around the corner, but even though the features in this series on C# 11.0 features are by no means new, that doesn't make it any less useful to understand them. And this feature in particular is one of the harder ones to understand properly.)
The C# 11.0 feature that enables this is support for ref fields. This has gone hand in hand with changes to the "ref safety" rules that ensure that types such as Span<T> can be used without violating .NET's type safety rules. The upshot is that there are scenarios where we would have needed to use unsafe code to maximise performance, but the new rules give us the flexibility to write code for these scenarios while remaining within the rules for type safety.
New compiler errors with ref
A surprising upshot of this new power is that some code that used to compile just fine now provokes compiler errors when targetting C# 11.0 or later. This is because the new power means that certain coding patterns might no longer be safe. To be more precise, there are now situations in which a method might be able to hang onto a reference after returning (just like a Span<T> retains a reference to whatever was passed to its constructor), meaning the compiler now has to make more conservative assumptions.
In this blog I'll show how a keyword introduced in C# 11.0, scoped, enables us to describe our intentions more precisely to the compiler, fixing these problems. But before we can understand scoped we need to understand what the new powers are.
Span types
For many years now, the Span<T> and ReadOnlySpan<T> types have given us the performance of raw, unmanaged pointers, but with full type safety. They have enabled C# to be used in scenarios that might once have been considered the domain of more systems-programming-oriented languages such as C and Rust.
These span types make it possible to avoid two expensive practices:
- making multiple copies of data
- allocating memory on the garbage collected heap
The string type tends to encourage both of these. If you want to extract data buried inside a string (e.g., given the string "https://example.com/some/path" you might want to know just the hostname, part "example.com") it's common practice for APIs to return new string objects. Ultimately, a method for extracting the hostname would do something equivalent to "https://example.com/some/path".Substring(8, 11) to return a new string containing just the hostname. You might use the more modern slice syntax, "https://example.com/some/path"[8..19], but that compiles into a call to Substring.
ReadOnlySpan<char> gives us a more efficient alternative. The type represents a sequence of char values stored sequentially in memory, and which we're not allowed to modify. This is exactly how string stores its data internally, and it's possible to obtain a ReadOnlySpan<char> for the data in a string. But a char[] also stores a sequence of char values, and we can get a ReadOnlySpan<char> for one of those too.
ReadOnlySpan<char> doesn't care what contains the data. It doesn't even have to be inside a .NET object. You can get a ReadOnlySpan<char> for data that lives on the call stack:
Random randoms = new();
int random = randoms.Next(0, 99999);
Span<char> chars = stackalloc char[5];
random.TryFormat(chars, out _);
This stack-based allocation mechanism is extremely efficient. It's how the memory for local variables is allocated, it often requires only a single machine language instruction to reserve the space, and the memory is automatically reclaimed when the method returns without requiring the garbage collector to do anything.
Finally, it's possible for a span to refer to memory that was allocated entirely outside of the .NET runtime's control. This can be useful when working with non-.NET APIs (e.g., operating system APIs).
No matter where the data lives, spans do not need to make a copy of it. They just refer to the data, wherever it lies. And this enables them to perform operations such as slicing extremely efficiently. This shows how, given a string, we can wrap it in a span, and then get hold of what is effectively a substring, but without needing additional allocations.
string url = "https://example.com/some/path"; // The only heap allocation in this example
ReadOnlySpan<char> urlSpan = url;
ReadOnlySpan<char> hostNameSpan = urlSpan[8..19];
The span types are value types, so the two ReadOnlySpan<char> variables here do not require any additional objects on the heap. I've used the slice syntax in exactly the same way I showed earlier to extract the hostname, but this doesn't make a new copy of the data. It just returns a span that points to just that part of the string.
Inside spans
Spans hold two things: a pointer to the data, and an integer describing the length. This relies on a certain amount of help from the .NET runtime. A pointer is just a number that is interpreted as the address in memory of the data. But what if that data is on the GC heap? The garbage collector moves objects around when it reclaims memory that has fallen out of use. It typically relocates all the objects that are still in use to be adjacent to each other, to avoid leaving unused gaps on the heap.
What does this mean for spans that are pointing into those objects?
As you are probably aware, whenever the garbage collector relocates an object, it also updates any references to that object, so that they have the correct new address. It's also able to do this for pointers pointing into the middle of an object (e.g., a pointer to, say, the 4th element in an array, or to some field of an object). As long as the CLR knows a pointer exists (meaning it must be declared as a managed pointer), it is able to adjust any pointers that were referring to objects that got moved during garbage collection.
This is actually a very old feature of the .NET runtime. It has been able to do this since .NET 1.0. That's because you've always been able to write this sort of method:
public static void SetValue(ref int ri)
{
    ri = 42;
}
That ref int is a managed pointer: at runtime, this argument won't be an integer, it will be the address of the memory in which the integer's value is being stored, enabling the method to change its value to something else (42 in this case). We can call this in more than one way:
int variable = 0;
SetValue(ref variable);
int[] array = { 1, 2, 3 };
SetValue(ref array[1]);
The first call passes a reference to a local variable. Local variables often live on the stack, in which case this will be a pointer to a memory location on the calling method's stack frame. The second call passes a reference to the second element in an array. Arrays always live on the  GC heap. (There's another varation, not shown here: we could have passed a ref to a field. Also, sometimes local variables can end up living in fields of objects, in which case ref variable will be a pointer to some field inside an object on the GC heap.)
This shows that there are cases where a managed pointer points to something inside some object. With the array example here, the managed pointer won't point directly to the array object containing the field: it will point directly to the 2nd element in the array. Likewise, with a ref to a field, or a non-stack-based local, the pointers point directly to the field inside the object. These are sometimes referred to as interior pointers. They are different from normal references, which point to the start of the object.
It is harder for the GC to determine which objects are referred to by interior pointers than by normal references. (For a normal reference, it can just do a direct comparison of a variable with the object's address. But to work out whether an interior pointer refers to a particular object, it has to check whether it's anywhere in the range of addresses occupied by the object.) Because of this, the .NET runtime has always imposed some constraints on the use of managed pointers. Up until fairly recently, managed pointers could only be used as either local variables or arguments.
In C#, this corresponds to the fact that you can declare a parameter with the ref keyword, and you can also (as of C# 7.0) use ref on local variables. But you couldn't declare fields using ref.
So how could the span types work? If you're not allowed to declare a ref field, how is a span going to hold onto a managed pointer? For years, the answer was essentially that spans got special treatment from the runtime: they were able to do things we couldn't in our own code. Specifically, they could, in effect, have a ref field. (That wasn't quite how it looked, but that was the effect.)
ref safety
In C# 11.0, it became possible to use ref with fields. However, before we look at that, there are some significant restrictions we need to understand around the use of ref types.
Some constraints on the use of ref are to do with performance (because, as discussed, they're harder for the GC to manage than ordinary references). But there are also constraints designed to ensure type safety. For example, if you attempt to get a ref int that refers to some memory that it might be unsafe to try and use, the compiler will stop you. What might be unsafe? Consider this example:
public class TestClass
{
    private static int StaticValue;
    private int InstanceValue;
    public static ref int GetRef1() => ref StaticValue; // OK
    public ref int GetRef2() => ref this.InstanceValue; // OK
    public static ref int GetRef3()
    {
        int local = 42;
        return ref local; // Compiler error here
    }
}
This defines three methods, each of which returns a ref int (essentially, the address of some memory location that holds some integer value). The first one works perfectly well. It returns a reference to a static field. Static fields live until their containing process exits (and as it happens they never move in current implementations of .NET) so we're free to hand out managed pointers to these. The second is also fine: it returns a reference to an instance field inside whichever TestClass instance you call the method on. The garbage collector will keep the object alive as long as there is at least one reference to it, and interior pointers count as a kind of reference. And since a ref int is a managed pointer, it doesn't matter if the object moves around on the GC heap, because the GC will update all managed interior pointers to the object.
The third method, GetRef3, fails to compile. That's because we're attempting to return the address of a local variable. Local variables cease to exist as soon as execution leaves their containing scope. In this case, that's the method scope, so local ceases to exist when GetRef3 returns. It would therefore be a bad idea for GetRef3 to return a reference to it. The caller would have a reference to a variable that no longer exists. (In this particular case, this would most likely be a pointer to a stack frame that has already been freed up. Future method calls will likely use that same memory for other stack frames, so the consequences of allowing a reference to that memory outlive that stack frame are potentially very bad.)
C# defines ref safety rules which are designed to prevent this kind of unsafe use of managed pointers. Up until the span types were introduced, these rules only needed to cover the direct use of managed pointer variables and parameters. But spans introduced the possibility that some structs might contain managed pointers. This made the rules more complex, but the basic goal was the same: it should not be possible to get hold of a managed pointer (either directly, or as wrapped by a span) that refers to something which no longer exists.
How ref fields change ref safety
Starting with C# 11.0, spans are no longer special types. The runtime features that made them work are now available for us to use: we can write our own types with ref fields. However, because this means we are working with managed pointers, there are some significant constraints.
The first rule is that ref instance fields may appear only inside of a ref struct. If you've used the span types much you may already be familiar with this rule: instance fields of type Span<T> or ReadOnlySpan<T> may appear only inside a ref struct. In fact the rule in play here is slightly more general: any ref struct may only be used as an instance field inside of another ref struct; these rules apply to the span types because those are also ref struct types.
You could think of ref struct as meaning "a type to which the restrictions for ref types applies". It's the fact that a ref struct is subject to the restrictions that apply to any ref type that makes it safe for it to contain a reference (either directly or indirectly). One upshot of this is that you can't use a ref struct as a field of a reference type (because the .NET runtime doesn't support putting managed pointers inside of heap-based objects). And a more subtle upshot is that methods where local variables can't live on the stack (notably async methods and iterators) can't use ref struct types as local variables. (C# 13.0 relaxes this slightly: variables whose usage does not span an await or yield can in fact live on the stack, so use of ref-like types for such variables will be permitted.)
In addition to this rule about what kinds of types can hold a ref (or ref struct), there are rules about whether the value of some ref or ref struct-typed variable is allowed to escape to some wider scope than that in which it was declared. We've seen this already: TestClass.GetRef1() contains the expression ref StaticValue, which creates a ref int, and it then returns this to its caller, allowing the ref int it created to escape to its caller. The GetRef3() method tries to do the same thing, but the compiler doesn't allow it. How does the compiler determine that one is OK and the other is not?
C# uses two concepts to determine ref safety. The first is that every variable (strictly speaking every lvalue) has a ref-safe-context. (If you read older specifications concerning ref safety, you'll see this called the ref-safe-to-escape scope). This is the context in which the compiler knows that it is safe to use a reference to the variable. There are rules that determine this for any variable. The rules say that a static field's ref-safe-context is effectively anywhere at all. To be more precise, this kind of context is always one of the following:
- declaration-block: the variable must not be used outside of the block in which the variable is defined
- function-member: the variable must not be used outside of the function in which the variable is defined
- return-context: the variable can be made available to callers through a returnstatement and will be safe for the caller to use
- caller-context: the variable can be made available to callers through any means (return,refandoutparameters)
So a static field has a ref-safe-context of caller-context, which is why GetRef1 is allowed. But a local variable's ref-safe-context scope is the scope in which it's defined. (More precisely, local variables have a ref-safe-context of declaration-block.) This makes it illegal to attempt to return it from the method in which it is declared. In fact it's not even allowed to escape from a nested scope inside the method to a wider scope in the same method.
The second concept for ref safety is the safe-context. Every expression (whether or not it's an lvalue) has one of these. This is the context in which it is safe to use the value of the expression. In many cases this will be the same as ref-safe-context, but not always: for example, if you have some variable int local, you can return local but you can't return ref local. The rules represent this by saying that the safe-context for this variable is caller-context but its ref-safe-context is declaration-block.
These rules get a bit more complex when it comes values whose type is a ref of some kind, and also when dealing with to fields of structs held in local variables. I'm not going to go through them exhaustively. What really matters is the effect of these rules: they prevent unsafe use of ref values.
These ref-safe-context and safe-context concepts already had to exist to support spans, but as of C# 11.0, the rules were extended because we can now define ref fields.
C# 11.0 and ref fields
C# 11.0 enables us to write this sort of thing:
public ref struct SpanLike<T>
{
    public ref T Value;
    public nint Length;
    public SpanLike(ref T value)
    {
        Value = ref value;
        Length = 1;
    }
}
Since we declared this as ref struct it's allowed to contain fields with ref-like types. This has always included other ref struct types such as Span<char> (which has been possible since C# 7) and since C# 11.0, it has also included actual ref types like the ref T Value field here.
(Note: if you write a ref struct type, in practice you almost always want it to be readonly. I've avoided that here because I want to show some of what happens when ref-like types escape to wider scopes, and mutable ref struct types create various extra opportunities for that to happen. In fact, that's one of the reasons you normally want them to be readonly.)
That's the new feature. We weren't able to write public ref T Value; in C# 10, but now we can.
The ref safety rules are essentially the same for this kind of field as they are for ref struct types like Span<T>. However, to get the maximum value out of this new feature, C# needed to make some changes to the ref safety rules.
New capabilities for ref-like arguments
Now that ref fields are a thing (and because, unusually, this ref struct type is not readonly) we can add methods such as these to the SpanLike<T> type:
public void SetRef(SpanLike<T> value)
{
    this.Value = ref value.Value;
}
public void SetRefFromSpan(Span<T> value)
{
    this.Value = ref value[0];
}
Both accept a ref T passed in from outside, and store it in the Value field. (We couldn't do this in C# 10 because we couldn't have a field of type ref T.) These particular examples extract this ref T from another SpanLike<T> and a Span<T> respectively. You might be wondering why I didn't also show a method taking a ref T directly. That is possible but there's a slight catch that I'll show later.
It would be unsafe to call these methods in the way shown here:
void BadUseSetRef(ref SpanLike<int> nonLocalSl)
{
    int local = 123;
    SpanLike<int> localRef = new SpanLike<int>(ref local);
    nonLocalSl.SetRef(localRef);
}
Here, we're being passed a reference to a SpanLike<int> and the final line of the method attempts to modify it so that it refers to a local variable. That would be bad because when BadUseSetRef returns, that local variable ceases to exist, and yet if this code were allowed, the SpanLike<int> passed in by ref as an argument would refer to that local even after BadUseSetRef returned. (In effect, whoever called BadUseSetRef would now have a pointer into a stack frame that no longer exists.)
So the modifications to the ref safety rules in C# 11.0 ensure that the compiler does not allow this. However, this can sometimes create problems for existing libraries.
Previously valid code now rejected
When the compiler determines whether code such as the call to SetRef in BadUseSetRef is safe, it bases its decisions on the signatures of the methods being called. So even if SetRef had looked like this:
public void SetRef(SpanLike<T> value)
{
}
i.e., even if it had not in fact stashed away the reference being passed in, the C# compiler would still reject it. Part of the rationale for this is that when you're using external libraries, implementations can change, so just because SetRef doesn't stash the reference in V1.0, maybe it will in V1.1. So the compiler makes its decisions based on what a method with SetRef's signature could do with its argument.
However, this ability to define a ref field is new in C# 11. As far as the C# 10 compiler was concerned, there wasn't any problem with the way BadUseSetRef called SetRef, because it was simply impossible for SetRef to hold on to the reference.
This means that BadUseSetRef compiled without error in C# 10. This is intertwined with the fact that if we want to try this on C# 10.0, we'd also need to modify SpanLike<T>—we'd need to remove the ref T field. So the C# 10 version looks like this:
void BadUseSetRef(ref SpanLike<int> nonLocalSl)
{
    int local = 123;
    SpanLike<int> localRef = new SpanLike<int>(ref local);
    nonLocalSl.SetRef(localRef);
}
public ref struct SpanLike<T>
{
    public SpanLike(ref T value)
    {
    }
    public void SetRef(SpanLike<T> value)
    {
    }
}
This makes the SpanLike<T> class completely useless of course. That's because I wrote it specifically to illustrate something you can do only in C# 11 or later, so a C# 10 version necessarily does nothing.
But this lets us illustrate the important point.
If we take this pointless C# 10 version (which is useless, but does compile without error) and attempt to compile it on C# 11, the compiler rejects it! We get this error:
CS8352 Cannot use variable 'localRef' in this context because it may expose referenced variables outside of their declaration scope
We know that in fact it doesn't cause the problem describe. But because C# makes its decisions based on method signatures (because we might later on end up running against a newer component version where the method body has changed) it rejects this, even though the exact same code was accepted on C# 10.
This can be rather frustrating: code that worked perfectly well is suddenly rejected as invalid when you update to a new language version.
The example above is contrived, and you wouldn't have written code like that in C# 10 in the first place. And it was the language designers' hope that in practice this change wouldn't be a problem. In the spec for these changes in C# 11, they wrote:
The impact of this compatibility break is expected to be very small. The impacted API shape made little sense in the absence of ref fields
However, in our AIS.NET library, we did in fact run into this problem. That library has readonly ref struct types that retain one or more ReadOnlySpan<T> fields. Since these are immutable, it's impossible to write methods like SetRef, so for the most part the issue I've described doesn't arise. However, constructors of readonly ref structs get to set fields, and some of our constructors made use of helper methods that were fine in C# 10, but which get rejected under the new rules in C# 11.0. (The exact way in which we fell foul of the rules was more complex and subtle than these examples, but the end result was the same: C# 11.0 grants new powers to your code meaning some safety rules had to be tightened up; our code didn't use these new powers because it was written before C# 11.0 existed, so it can't do anything unsafe, but because the new rules err on the side of conservatism to ensure safety, our code is now rejected.)
scoped to the rescue
What can you do if you find that previously OK code is now rejected because it has access to new powers that you neither need nor want? C# introduces a new keyword, scoped, that effectively lets you reject these new powers.
All I need to do is modify the SetRef method to indicate that I do not wish to be able to stash an incoming reference in a field:
public void SetRef(scoped SpanLike<T> value)
{
}
That scoped keyword opts out of the new power. If I tried to write this.Value = value; inside this method the compiler will reject it, because the scoped keyword is effectively a promise not to do that.
Unscoped
Earlier, in the version of SpanLike<T> that does stash references, I showed a couple of overloads of SetRef, one taking another SpanLike<T> and another taking a Span<T>. But I did not include the obvious way of doing this, which would be to define a method that just takes a ref T directly. You might expect that to look like this:
public ref struct SpanLike<T>
{
    public ref T Value;
    public nint Length;
    public SpanLike(ref T value)
    {
        Value = ref value;
        Length = 1;
    }
    public void SetRef(ref T value)
    {
        Value = ref value;
        Length = 1;
    }
}
This looks like it should work. After all, SetRef is basically identical to the constructor. And yet, I get this error:
CS9079 Cannot ref-assign 'value' to 'Value' because 'value' can only escape the current method through a return statement.
I've not been able to find a clear explanation of the rationale for this, but I suspect it's because ref parameters have been around since the dawn of C#, so there's more chance of existing code falling foul of a change to the rules. Whatever the reason, the fact is that although C# 11's new ability to capture references seems to be on by default for arguments where the reference is wrapped in a ref struct (such as Span<T>), it is off by default for plain ref arguments. So if you want the new powers with a method of this form, you need to opt in explicitly:
public void SetRef([UnscopedRef] ref T value)
{
    Value = ref value;
    Length = 1;
}
It seems a little peculiar that the opt out is a keyword (scoped) but the opt in is an attribute ([UnscopedRef]). There is a discussion of the keywords vs. attributes issue in the spec, but it's not at all clear to me how these arguments lead to two of the scenarios (scoped and scoped ref) being keyword-based and the third ([UnscopedRef]) being attribute-based. I can only guess that the unscoped case was considered to be less common, and therefore slightly less deserving of being a keyword.
Conclusion
Since C# 11.0, we've been able to define fields that hold references to values. This gives us pointer-like behaviour (which can unlock pointer-like performance in some useful scenarios), but with the type safety we expect from C#. Occasionally those type safety rules can create problems, but C# 11.0 also added some tools (in the form of the scoped keyword and [UnscopedRef] attribute) enabling us to express our intentions more clearly, in a way that can avoid falling foul of the new type safety rules.
 
                 
         
         
        