Skip to content
Matthew Adams By Matthew Adams Co-Founder · 6 min read
Introducing Corvus.Text.Json V5: YAML 1.2 - Zero-Allocation Conversion

At endjin, we maintain Corvus.JsonSchema, and in the previous post we looked at JsonLogic for safe business rules.

Now let's talk about a format that sits alongside JSON in almost every modern development workflow: YAML.

YAML is everywhere

Kubernetes manifests, GitHub Actions workflows, Azure DevOps pipelines, Docker Compose files, Helm charts, OpenAPI specifications. Almost every infrastructure-as-code and CI/CD tool uses YAML as its primary configuration format.

If you validate, transform, or process configuration, you inevitably need to convert YAML to JSON. The schema validation, query languages, and processing tools all operate on JSON.

V5 includes a YAML 1.2 to JSON converter that does this with zero allocation on the hot path.

Quick start

Two packages are available:

# Full Corvus document model - when you want ParsedJsonDocument<T>, schema validation, etc.
dotnet add package Corvus.Text.Json.Yaml

# System.Text.Json only - when you want a lightweight JsonDocument, no Corvus dependencies
dotnet add package Corvus.Yaml.SystemTextJson

Parse YAML to a typed document

using Corvus.Text.Json;
using Corvus.Text.Json.Yaml;

string yaml = """
    name: Alice
    age: 30
    hobbies:
      - reading
      - cycling
    """;

using var doc = YamlDocument.Parse<JsonElement>(yaml);
JsonElement root = doc.RootElement;
Console.WriteLine(root.GetProperty("name").GetString()); // "Alice"
Console.WriteLine(root.GetProperty("age").GetInt32());    // 30

That gives you a ParsedJsonDocument<JsonElement>, the same pooled-memory document we discussed in post 4. From here you can validate against a schema, query with JMESPath or JSONata, mutate with a builder, or just read values.

Parse YAML directly to a strongly-typed element

This is where the real power of YAML-to-JSON conversion becomes clear. Because YamlDocument.Parse<T> is generic over any IJsonElement<T>, you can parse YAML directly into a schema-generated type. There is no intermediate untyped step, and the result is fully validated and strongly typed from the moment you access it.

Consider a Kubernetes Deployment manifest. You'd typically write it in YAML, but the Kubernetes API schema is published as JSON Schema. Generate your types from that schema, and then:

// Your YAML manifest - the format every Kubernetes user writes in
string manifest = """
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-frontend
      labels:
        app: web
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: web
      template:
        metadata:
          labels:
            app: web
        spec:
          containers:
            - name: nginx
              image: nginx:1.27
              ports:
                - containerPort: 80
    """;

// Parse directly to the generated Deployment type
using var doc = YamlDocument.Parse<Deployment>(manifest);
Deployment deployment = doc.RootElement;

// Strongly-typed access - IntelliSense, compile-time safety, no casting
string name = (string)deployment.Metadata.Name;          // "web-frontend"
int replicas = (int)deployment.Spec.Replicas;            // 3
string image = (string)deployment.Spec.Template.Spec
    .Containers[0].Image;                                // "nginx:1.27"

// Schema validation is built in
bool isValid = deployment.EvaluateSchema();

The YAML bytes flow through the tokenizer into the document's pooled memory, and you get a typed view with full IntelliSense and schema validation. The same pattern works for any schema-defined format that people author in YAML: OpenAPI specifications, GitHub Actions workflows, Helm values files, Azure Resource Manager templates, and more.

Convert to a JSON string

string json = YamlDocument.ConvertToJsonString("key: value");
Console.WriteLine(json); // {"key":"value"}

Stream to a Utf8JsonWriter

For pipeline scenarios where you're writing directly to an output buffer:

using var stream = new MemoryStream();
using var writer = new Utf8JsonWriter(stream,
    new JsonWriterOptions { Indented = true });

YamlDocument.Convert("items:\n  - one\n  - two"u8, writer);
writer.Flush();

System.Text.Json only

If you don't need the Corvus document model:

using Corvus.Yaml;

string yaml = "name: Bob\nage: 25";
using JsonDocument doc = YamlDocument.Parse(yaml);
Console.WriteLine(doc.RootElement.GetProperty("name").GetString());

It uses the same tokenizer and achieves the same conformance, without any Corvus dependency.

How it works

The converter uses a custom ref struct tokenizer that operates directly on UTF-8 bytes. There's no intermediate object model. The tokenizer emits events (scalar, sequence start, mapping start, etc.) that the converter translates directly into Utf8JsonWriter calls.

This means the hot path allocates nothing. The YAML goes in as bytes, the JSON comes out through a writer, and the only allocations are the ones Utf8JsonWriter makes for its own output buffer (which is pooled if you configure it that way).

Event streaming

The internal event model is also exposed as a public API. YamlDocument.EnumerateEvents calls your callback for each parse event, giving you zero-copy access to the raw UTF-8 data:

YamlDocument.EnumerateEvents(yamlBytes, static (in YamlEvent e) =>
{
    switch (e.Type)
    {
        case YamlEventType.Scalar:
            Console.WriteLine($"Scalar: {Encoding.UTF8.GetString(e.Value)}");
            break;
        case YamlEventType.MappingStart:
            Console.WriteLine("Object start");
            break;
        case YamlEventType.SequenceStart:
            Console.WriteLine("Array start");
            break;
    }

    return true; // continue parsing (return false to stop early)
});

Each YamlEvent is a ref struct whose spans point directly into the source buffer. The event types mirror the YAML specification: StreamStart/End, DocumentStart/End, MappingStart/End, SequenceStart/End, Scalar, and Alias. Events also carry line/column positions, anchor names, tags, and scalar styles. This is useful when you need to process YAML without converting to JSON at all. For example, you might extract specific values from a large file without parsing the whole thing.

JSON to YAML

Conversion works in both directions. YamlDocument.ConvertToYamlString takes a JSON element or raw UTF-8 JSON and produces YAML output:

string yaml = YamlDocument.ConvertToYamlString(
    """{"name": "Alice", "roles": ["admin", "user"]}""");

// name: Alice
// roles:
// - admin
// - user

There's also a streaming overload that writes to an IBufferWriter<byte> or Stream:

YamlDocument.ConvertToYaml(jsonElement, outputStream);

YamlWriterOptions controls the output format. IndentSize sets the indentation width, and SkipValidation disables structural validation for a small performance gain:

var options = new YamlWriterOptions { IndentSize = 4 };
string yaml = YamlDocument.ConvertToYamlString(json, options);

This works with both System.Text.Json.JsonElement and Corvus IJsonElement<T> types, so you can round-trip YAML through a ParsedJsonDocument. Once parsed, the document can be changed through a builder and written back out as YAML.

Utf8YamlWriter

For fine-grained control over YAML output, Utf8YamlWriter is a ref struct that writes directly to an IBufferWriter<byte> or Stream. Its API mirrors System.Text.Json.Utf8JsonWriter, so the programming model will feel familiar:

var bufferWriter = new ArrayBufferWriter<byte>();
using var writer = new Utf8YamlWriter(bufferWriter, new YamlWriterOptions { IndentSize = 2 });

writer.WriteStartMapping();
writer.WritePropertyName("name"u8);
writer.WriteStringValue("Alice"u8);
writer.WritePropertyName("roles"u8);
writer.WriteStartSequence();
writer.WriteStringValue("admin"u8);
writer.WriteStringValue("user"u8);
writer.WriteEndSequence();
writer.WriteEndMapping();

This produces:

name: Alice
roles:
  - admin
  - user

The writer supports block and flow collection styles. You can mix them in the same document. For example, use flow style for short inline sequences:

writer.WritePropertyName("tags"u8);
writer.WriteStartSequence(YamlCollectionStyle.Flow);
writer.WriteStringValue("v5"u8);
writer.WriteStringValue("release"u8);
writer.WriteEndSequence();
// tags: [v5, release]

When SkipValidation is false (the default), the writer validates structural correctness. Property names must precede values in mappings, containers must be properly closed, and you can't write a second root value. This catches mistakes at the point of the write call rather than producing silently broken output.

Schema modes

The converter supports four YAML schema modes:

Schema Behaviour
Core (default) YAML 1.2 Core Schema. Recognizes null, true/false, integers (decimal, 0o77, 0xFF), floats (decimal, .inf, .nan)
JSON Strict JSON-only: only null, true/false, and JSON-style numbers
Failsafe All scalars become JSON strings. No implicit type coercion
YAML 1.1 Backward compatibility. Adds yes/no/on/off/y/n booleans, sexagesimal integers, and merge keys (<<)
var options = new YamlReaderOptions
{
    Schema = YamlSchema.Core,
    DocumentMode = YamlDocumentMode.SingleRequired,
    DuplicateKeyBehavior = DuplicateKeyBehavior.Error,
};

using var doc = YamlDocument.Parse<JsonElement>(yaml, options);

Multi-document streams

YAML supports multiple documents in a single stream, separated by ---:

---
name: Alice
---
name: Bob

Set DocumentMode = YamlDocumentMode.MultiAsArray to wrap all documents in a JSON array:

var options = new YamlReaderOptions
{
    DocumentMode = YamlDocumentMode.MultiAsArray,
};

using var doc = YamlDocument.Parse<JsonElement>(multiDocYaml, options);
// Result: [{"name":"Alice"},{"name":"Bob"}]

All YAML features

The converter supports every YAML 1.2 feature:

  • Scalar styles: plain, single-quoted, double-quoted, literal block (|), folded block (>)
  • Collections: block and flow sequences, block and flow mappings
  • Anchors and aliases: &anchor and *alias with billion-laughs protection
  • Tags: !!str, !!int, !!float, !!null, !!bool, !!seq, !!map, and custom tags
  • Multi-document: --- and ... document markers
  • Comments: preserved in the event stream (ignored in JSON output)

Billion-laughs protection

The YAML "billion laughs" attack uses nested anchor/alias expansion to create exponentially large documents from tiny input. The converter enforces two configurable limits:

var options = new YamlReaderOptions
{
    MaxAliasExpansionDepth = 64,         // Default
    MaxAliasExpansionSize = 1_000_000,   // Default - max nodes from alias expansion
};

Expansion that exceeds either limit throws a YamlException.

Conformance

The converter passes 100% of the JSON-testable cases in the yaml-test-suite. That means 279 valid and 94 error cases (373 of 402 total). The remaining 29 cases exercise YAML features with no JSON equivalent (complex keys, empty keys, bare tags) and don't provide JSON reference output.

Next up

In the next post, we'll look at JSON Patch. It provides RFC 6902 support with a fluent builder that operates directly on the mutable document model.

FAQs

Which YAML features are supported? All YAML 1.2 features: plain, single-quoted, double-quoted, literal block, and folded block scalars; flow and block collections; anchors and aliases; multi-document streams; and four schema modes (Core, JSON, Failsafe, YAML 1.1).
Can I use this without the Corvus document model? Yes. The Corvus.Yaml.SystemTextJson package depends only on System.Text.Json and produces a standard JsonDocument - no Corvus dependencies.
How does the billion-laughs protection work? The converter enforces configurable limits on alias expansion depth (default 64) and total expansion size (default 1,000,000 nodes). Expansion that exceeds either limit throws a YamlException.

Matthew Adams

Co-Founder

Matthew Adams

Matthew was CTO of a venture-backed technology start-up in the UK & US for 10 years, and is now the co-founder of endjin, which provides technology strategy, experience and development services to its customers who are seeking to take advantage of Microsoft Azure and the Cloud.