Skip to content
Matthew Adams By Matthew Adams Co-Founder · 5 min read
Introducing Corvus.Text.Json V5: Extended Types

At endjin, we maintain Corvus.JsonSchema, and in the previous post we looked at JSON Pointer resolution.

JSON has a deliberately simple type system - strings, numbers, booleans, null, objects, and arrays. But the data those types carry is often richer than the JSON grammar suggests. A string might be a URI. A number might have 50 significant digits. A date-time might need proper time zone handling. V5 extends the core type system with first-class support for all of these.

UTF-8 URIs and IRIs

JSON Schema defines four URI-related format keywords: uri, uri-reference, iri, and iri-reference. V5 validates and parses all four with zero-allocation ref struct types that operate directly on the UTF-8 bytes in the document buffer.

Utf8Uri and Utf8Iri

Utf8Uri is a readonly ref struct that parses a URI from a ReadOnlySpan<byte> without allocating. It gives you access to every component - scheme, authority, user, host, port, path, query, and fragment - as ReadOnlySpan<byte> slices into the original buffer:

Utf8Uri uri = Utf8Uri.CreateUri(
    "https://api.example.com:8080/v1/users?active=true#top"u8);

// Each component is a ReadOnlySpan<byte> slice - no allocation
ReadOnlySpan<byte> scheme = uri.Scheme;       // "https"
ReadOnlySpan<byte> host = uri.Host;           // "api.example.com"
ReadOnlySpan<byte> path = uri.Path;           // "/v1/users"
ReadOnlySpan<byte> query = uri.Query;         // "active=true"
ReadOnlySpan<byte> fragment = uri.Fragment;    // "top"
int port = uri.PortValue;                      // 8080

For schema-generated types with "format": "uri", the code generator emits a TryGetValue method and an explicit conversion operator:

// Schema: { "type": "string", "format": "uri" }
// Generated type: MyEndpoint

if (endpoint.TryGetValue(out Utf8UriValue uriValue))
{
    using (uriValue)
    {
        Utf8Uri uri = uriValue.Uri;
        // Access components via uri.Scheme, uri.Host, uri.Path, etc.
    }
}

// Or via explicit cast (throws FormatException if invalid)
using Utf8UriValue uriValue = (Utf8UriValue)endpoint;

Utf8UriValue is a regular (non-ref) struct that owns its backing memory. It implements IDisposable, so always use a using declaration so the backing buffer is returned to the pool.

Canonical and display forms

URIs have two standard string representations. The canonical form percent-encodes reserved characters for safe transmission. The display form decodes those sequences for human readability:

Utf8Uri uri = Utf8Uri.CreateUri(
    "https://example.com/caf%C3%A9?q=hello%20world"u8);

// Display form: decodes percent-encoded sequences for readability
// "https://example.com/café?q=hello world"
string display = uri.ToString();

// Canonical form: percent-encodes reserved characters for safe transmission
Span<byte> buffer = stackalloc byte[256];
if (uri.TryFormatCanonical(buffer, out int written))
{
    // "https://example.com/caf%C3%A9?q=hello%20world"
    ReadOnlySpan<byte> canonical = buffer.Slice(0, written);
}

// Display form as UTF-8 bytes
if (uri.TryFormatDisplay(buffer, out written))
{
    ReadOnlySpan<byte> displayUtf8 = buffer.Slice(0, written);
}

Both methods write directly to a Span<byte> with no allocation. ToString() is the convenience overload that allocates a string for the display form.

Why not System.Uri?

System.Uri merges several distinct RFC concepts into a single type. It handles absolute URIs, relative references, and IRIs all through one class, which can be confusing. A method that accepts System.Uri gives no indication of whether it expects an absolute URI, a relative reference, or an IRI. V5 separates these into distinct types (Utf8Uri, Utf8UriReference, Utf8Iri, Utf8IriReference) so the semantic intent is clear at the API boundary.

Beyond the type-safety question, System.Uri allocates a managed string and normalises the URI, which can change its representation. The Utf8 variants validate and decompose the URI in place, with no allocation and no normalisation surprises. For JSON Schema format validation, this means checking whether a string is a valid uri-reference costs nothing beyond the parse itself.

All four types are derived from the .NET runtime's own System.Uri parser, rewritten to operate on UTF-8 spans rather than managed strings.

Arbitrary-precision numerics

JSON has no precision limit on numbers. The string 99999999999999999999999999999.123456789 is perfectly valid JSON. But double gives you about 15 significant digits, and decimal gives you 28. Anything beyond that is silently truncated.

In practice, you will almost never need arbitrary-precision types. The vast majority of JSON numbers fit comfortably in int, long, double, or decimal. The right approach is to use the format keyword in your schema to bound your numeric types appropriately. Use "format": "int32", "format": "double", "format": "decimal", and so on. The code generator will then select the matching .NET type, and you get compile-time safety for free.

BigNumber and BigInteger exist for the vanishingly small number of scenarios where unbounded precision is genuinely required. That includes cryptographic values, scientific datasets with extreme precision, or financial interop where the source system sends numbers beyond 28 significant digits.

How V5 handles numbers internally

V5 never converts a JSON number to a floating-point type during validation or comparison. Instead, it parses the raw UTF-8 bytes into normalised components:

Component Type Example for 1.200e3
isNegative bool false
integral ReadOnlySpan<byte> "1"
fractional ReadOnlySpan<byte> "2"
exponent int 2

All comparison and validation operates on these components. A 500-digit JSON number is compared with perfect accuracy.

BigNumber and BigInteger

When you do need to materialise an arbitrary-precision value, there are two types. BigNumber handles decimal numbers (with a fractional part or exponent), while BigInteger handles integers of unlimited size:

using Corvus.Numerics;

// Arbitrary-precision decimal
BigNumber decimalValue = element.GetBigNumber();
BigNumber result = decimalValue * 2 + BigNumber.Parse("0.001");

// Arbitrary-precision integer
BigInteger intValue = element.GetBigInteger();

BigNumber stores a BigInteger significand and an int exponent (value = significand × 10^exponent). Both types implement INumber<T> on .NET 9+, so they work with generic math APIs.

Formatting

BigNumber implements IFormattable, ISpanFormattable, and IUtf8SpanFormattable on .NET 9+, and the static formatting methods are available on all targets including netstandard2.0. It works with string interpolation, String.Format, and direct span formatting. All the standard numeric format specifiers are supported:

BigNumber value = BigNumber.Parse("12345678901234567890.123456789");

value.ToString("G", CultureInfo.InvariantCulture);   // General: "12345678901234567890.123456789"
value.ToString("F2", CultureInfo.InvariantCulture);  // Fixed-point: "12345678901234567890.12"
value.ToString("N0", CultureInfo.InvariantCulture);  // Number with grouping: "12,345,678,901,234,567,890"
value.ToString("E3", CultureInfo.InvariantCulture);  // Scientific: "1.235E+019"
value.ToString("C", CultureInfo.GetCultureInfo("en-GB"));  // Currency: "£12,345,678,901,234,567,890.12"

For zero-allocation formatting, write directly to a UTF-8 byte span:

Span<byte> buffer = stackalloc byte[128];
if (value.TryFormat(buffer, out int bytesWritten, "F2", CultureInfo.InvariantCulture))
{
    ReadOnlySpan<byte> utf8Result = buffer.Slice(0, bytesWritten);
    // Use utf8Result directly - no string allocation
}

Extended numeric types in code generation

The code generator reads the JSON Schema format keyword to select the appropriate .NET type:

Format .NET type Notes
"int32" int
"int64" long
"int128" Int128 .NET 9+ only; falls back to long on netstandard2.0
"uint128" UInt128 .NET 9+ only; falls back to ulong on netstandard2.0
"half" Half .NET 9+ only; falls back to double on netstandard2.0
"single" float
"double" double
"decimal" decimal
(none, type: integer) long Default for unformatted integers
(none, type: number) double Default for unformatted numbers

For types that are only available on modern .NET, the code generator emits #if NET guards with appropriate fallbacks.

NodaTime integration

If you work with dates and times in .NET, NodaTime is the de-facto library for rich date and time handling. It helps you think about your data more clearly and express operations on that data more precisely. V5 includes built-in UTF-8 parsers for ISO 8601 formats that produce NodaTime types directly, without going through DateTime or DateTimeOffset as an intermediate step.

JSON Schema format NodaTime type Example value
"date" LocalDate "2026-05-31"
"date-time" OffsetDateTime "2026-05-31T10:30:00+01:00"
"time" OffsetTime "10:30:00+01:00"
"duration" Period "P1Y2M3DT4H5M6S"

When the code generator encounters these format keywords, the generated types automatically include NodaTime-typed accessors alongside the standard .NET ones:

// Generated from a schema with "format": "date-time"
OffsetDateTime when = calendarEvent.When.GetOffsetDateTime();

// The standard .NET accessor is also available
DateTimeOffset whenDto = calendarEvent.When.GetDateTimeOffset();

The parsers operate directly on the UTF-8 bytes in the document buffer. There is no intermediate string allocation. The NodaTimeExtensions namespace includes custom implementations of the Gregorian calendar calculations needed for validation, so there's no runtime dependency on the NodaTime NuGet package. The parsing is self-contained.

The NodaTime parsers handle the full complexity of ISO 8601 duration syntax, including fractional seconds, negative durations, and the distinction between date-based periods (P1Y2M) and time-based durations (PT1H30M). The Period type preserves the original components rather than normalising to a single unit, so P1M and P30D remain distinct.

Next up

In the [ref slug=introducing-corvus-text-json-v5-toon text=next post], we'll look at TOON - a compact text format for JSON-shaped data that reduces token count when working with LLMs.

FAQs

Why not just use System.Uri? System.Uri allocates a managed string, normalises the URI, and doesn't support IRIs natively. Utf8Uri operates directly on the UTF-8 bytes in the JSON document buffer with no allocation. It also validates uri, uri-reference, iri, and iri-reference formats as required by JSON Schema.
When should I use BigNumber instead of decimal? When your JSON may contain numbers with more than 28 significant digits, or when you need exact representation of very large or very small values. Financial APIs, scientific data, and blockchain applications are common cases.
Do I need to install NodaTime separately? No. The NodaTime parsing is built into the core library. V5 includes its own UTF-8 parsers for ISO 8601 date, date-time, time, and duration formats that produce NodaTime types directly.

Matthew Adams

Co-Founder

Matthew Adams

Matthew was CTO of a venture-backed technology start-up in the UK & US for 10 years, and is now the co-founder of endjin, which provides technology strategy, experience and development services to its customers who are seeking to take advantage of Microsoft Azure and the Cloud.