C# 12.0: inline arrays
For most of .NET's existence, arrays have been distinct objects on the heap. C# 12.0 makes it possible for a fixed-size array to live entirely inside another data type, just like fields do. This is useful in some performance sensitive scenarios, and can also be helpful when interoperating with some operating system APIs, or libraries written for other languages.
Classic .NET arrays
Arrays have always been an integral part of .NET's type system. At first, they were the only constructed type (a type that takes some other type as a parameter) until .NET 2.0 added generics. They are also one of the two types we get to use in C# where different instances can have different sizes (the other being string
).
The variable size is why each array gets its very own block in the garbage-collected heap (and likewise for string
). The .NET runtime has no general provision for enabling different instances of the same type to take up different amounts of space—it handles string
and arrays as special cases. (It's not that this would be fundamentally impossible. String and array objects get an extra header field holding the length, and it's not hard to imagine some hypothetical variant of .NET that made it possible for other types to do something similar. But that's not something .NET actually does.)
So although we can embed values inside another object, we can't embed a variable length array. If I declare a class called Numbers
, the .NET runtime requires every instance of that type to be exactly the same size. So if this type has, say, four fields all of type int
, .NET knows each instance of Numbers
needs 16 bytes of space to hold those fields (4 bytes for each int
). But if I want Numbers
to have an int[]
field, the actual array data can't live inside the Numbers
instance, because that would mean the size of a Numbers
object on the heap would depend on how many elements were in its array. To ensure that every Numbers
instance can have a fixed size, the array field will hold a reference (8 bytes in a 64-bit process) and the actual data for the array lives in a separate array object. (And that array object gets to be any size it wants because arrays are special.)
Fixed-size arrays before C# 12.0 and .NET 8.0
But what if I have a data type where I always want exactly the same array length? For example, that's essentially what .NET's Vector3
type is: conceptually it's just a 3-element array. But if you look at the source code you'll see that it does not use an array. It declares three fields:
public partial struct Vector3 : IEquatable<Vector3>, IFormattable
{
/// <summary>The X component of the vector.</summary>
public float X;
/// <summary>The Y component of the vector.</summary>
public float Y;
/// <summary>The Z component of the vector.</summary>
public float Z;
Why is that? This is pretty much exactly the job arrays are designed for—in fact some computer science texts use the word 'vector' to describe what most programmers would call an array.
Up until C# 12.0, the only kind of array available was the classic kind described in the preceding section: the kind that supports variable length whether or not you need that. So if Vector3
had looked like this:
// NOT how it's really implemented
public partial struct Vector3 : IEquatable<Vector3>, IFormattable
{
/// <summary>The components of the vector.</summary>
public float[] XYZ;
that one field would be a reference to a separate array object on the GC heap.
Notice that this type is a struct
, so consider what such a change would mean if we do this:
Vector3[] allFields = new Vector3[100, 100, 100];
This declares a three-dimensional rectangular array of Vector3D
elements. And the thing about rectangular arrays of struct
types is that you get just a single object for the whole array. There are 1,000,000 elements in this array, but it's just one object on the heap. It makes efficient use of space—whereas each object on the GC heap has a header (16 bytes per object in a 64-bit process) the CLR doesn't need to create a header for each value in an array. With Vector3
as it is actually defined (with the three float
fields) this requires 12,000,024 bytes of memory. Each float
is 4 bytes, so a Vector3
containing three of those requires 12 bytes. That implies a minimum of 12 million bytes to hold that data, but there's a little bit of overhead here: array objects on the heap have the usual 16 byte object header, plus another 8 bytes to hold the array length, which is why it's 24 more than the bare minimum. But as a proportion of the whole, 24 bytes of overhead on a 12,000,000 byte array is pretty small.
Now let's consider what would happen if instead of the three float
fields, we had that single float[]
field.
Each Vector3
would now need to create a 3-element array on the GC heap. That would require 12 bytes of data, the 16 byte object header, and 8 bytes of length, a total of 36 bytes. Except in a 64-bit process, the .NET runtime aligns heap objects to 64-bit (8-byte) boundaries, so this gets rounded up to 40 bytes. (In the million-element array example I just showed, there was no rounding because 12,000,024 is already a multiple of 8.) In addition to this array, a Vector3
instance itself would require 8 bytes (to hold its float[] XYZ
field, which is a reference to an object on the GC heap).
This figure illustrates the effect of this change on how an array of Vector3
items would look in memory. The actual implementation is shown on the left. The middle shows the modified version that uses a float[]
array, and the per-row array objects that this entails are shown on the right.
Let's work out what our new Vector3[100, 100, 100]
would mean for memory consumption if implemented this way. We'd need 8,000,24 bytes for that array (a million 8-byte references, plus the heap block and array length overhead). Since Vector3
is a value type, all 1 million instances can live inside the array—we don't need a separate heap block for each Vector3
instance. So far, this is smaller, because a single array field takes less space than three float
fields. However, each Vector3
needs its own 3-element array on the heap for that float[] XYZ
field to refer to. As we just worked out, that will require 40 bytes per Vector3
, and since there are a million of these, that's going to need an extra 40,000,000 bytes. This adds up to a total of 48,000,024 bytes.
So we'd be using very nearly four times as much memory. And we'd also be giving the garbage collector a massive amount of extra work to do: it would have to work far harder to deal with a million and one arrays on the heap than it does for the single array we get when using fields.
So no wonder Vector3
uses fields, not an array.
Inline arrays in C# 12.0 and .NET 8.0
Declaring a bunch of fields for something that logically represents a single list is a viable workaround, but it can get tedious. You can't write loops over a set of fields. (You could copy the fields into an array to enable iteration, but since the whole point here was to avoid creating lots of tiny arrays on the heap, that would be a performance fail.) So C# 12.0 and .NET 8.0 made it possible to define an inline array type.
[InlineArray(3)]
public struct ThreeFloats
{
private float element;
}
That might not look much like an array. But that InlineArray
attribute triggers behaviour in the .NET runtime (new in .NET 8.0): the CLR knows that we want it to expand that float element
field into a three-element array.
So thanks to this new behaviour in the .NET runtime, this will give us the three-element array we want, despite how it looks. And C# 12.0 also understands this new attribute, so when we declare a variable of type ThreeFloat
, we can used normal array syntax. For example:
public struct Vector3
{
private ThreeFloats xyz;
public Vector3(float x, float y, float z)
{
xyz[0] = x;
xyz[1] = y;
xyz[2] = z;
}
public float X
{
get => xyz[0];
// In general, mutable structs are a bad idea, but the real
// Vector3 is mutable, so this example follows suit.
set => xyz[0] = value;
}
}
So we're now using what looks like ordinary array syntax, but when it comes to memory usage, this works in exactly the same way as the original code with three separate float
fields. Just as those hold the data directly inside a Vector3
instance (and not in a separate array object), so does this. So even though we're able to use this xyz
field just like a normal array, it does not come with the overheads of an array.
The attribute-based syntax for defining a fixed-size array type is a little peculiar: it's immediately obvious that ThreeFloats
is an array type. The merit of this approach is that it didn't require any new syntax, or any modifications to the metadata format. If a program running on a runtime that did not support this feature (e.g. .NET 6.0 or .NET Framework) loaded a component that defined a type like ThreeFloats
using the metadata-only loading mechanisms (which make it possible to inspect a component build for some runtime other than the one you're running on), they would recognize this as a struct with an attribute applied: something that it has always been possible to write. This is something the existing metadata mechanisms in .NET have always been able to represent, so this required only a change to the runtime, and not to any file formats. (You won't be able to use this type on older runtimes, but at least old tooling will not be broken by its presence.) Likewise, this didn't require any new syntax to be added to C#. The only change is that C# recognizes this as an array type, and lets you use array syntax with it, something older versions of the language would not permit here.
The fact that this array has a fixed size means it no longer needs its own heap object: the values it contains can live as a field inside Vector3
just like any other value type. So our million-element array ends up using 12,000,024 bytes with this example, exactly the same as with three individual float
fields.
(A corollary of this is that inline arrays must always have a fixed size. The .NET runtime still only supports variable-length objects for string
and classic array objects.)
Interop
Some operating system APIs use data structures that include fixed-size inline arrays. (This has always been a straightforward thing to do in C and C++, and all widely used modern operating systems have APIs designed to be used from C.)
Historically, interoperating with APIs of this kind in .NET typically required us to declare a lot of fields to stand in for the fixed size array. But with this new feature, that is no longer necessary.
C-influenced APIs often use a trick where they appear to declare a fixed-sized array as the final field of some structure, but where this array may actually have a dynamically-determined size. C# 12.0's inline array feature does not make any attempt to model that style of API. Those continue to be somewhat awkward to work with in C#.
Summary
C# 12.0 enables us to define data types which support array syntax, but which work like normal value types, in that they don't need to have their own dedicated heap objects. We can use inline array types as local variables or fields, enabling more efficient memory usage than might be possible with ordinary arrays.