Introducing Corvus.Text.Json V5: TOON - Compact JSON for LLMs
At endjin, we maintain Corvus.JsonSchema, and in the previous post we looked at extended types - URIs, BigNumber, and NodaTime. This time we're crossing into AI territory: how do you feed structured data to an LLM without burning through your token budget?
LLMs process tokens, not bytes. Every {, }, ", and repeated property name costs tokens. Those tokens cost money and latency. TOON (Token-Oriented Object Notation) is a compact text format that preserves the JSON data model while stripping extraneous detail, and making it easier for LLMs to interpret the content.
The problem: repeated property names
Consider a 100-row array of objects - a common pattern when you feed query results or catalogue data into an LLM:
[
{"id": 1, "name": "Alice", "score": 95},
{"id": 2, "name": "Bob", "score": 87},
{"id": 3, "name": "Carol", "score": 91}
]
The property names id, name, and score are repeated on every row. The braces, colons, and quotes add overhead that carries no new information after the first row.
In TOON, the same data is a table:
[3]{id,name,score}:
1,Alice,95
2,Bob,87
3,Carol,91
The field list appears once. Each row is a comma-delimited value list. For arrays with many rows, the token saving is substantial.
Packages
| Package | Dependency | Use when |
|---|---|---|
Corvus.Text.Json.Toon |
Corvus.Text.Json |
You want ParsedJsonDocument<T> and the Corvus document model |
Corvus.Toon.SystemTextJson |
System.Text.Json only |
You want TOON conversion without a dependency on Corvus.Text.Json |
Install with:
dotnet add package Corvus.Text.Json.Toon
or, for the lighter-weight package:
dotnet add package Corvus.Toon.SystemTextJson
Parsing TOON into a document
Parse TOON into the same pooled document model used by the rest of Corvus.Text.Json:
using Corvus.Text.Json;
using Corvus.Text.Json.Toon;
string toon = """
name: Alice
age: 30
active: true
scores[3]: 95,87,92
""";
using ParsedJsonDocument<JsonElement> document = ToonDocument.Parse<JsonElement>(toon);
JsonElement root = document.RootElement;
Console.WriteLine($"Name: {root.GetProperty("name").GetString()}");
Console.WriteLine($"Age: {root.GetProperty("age").GetInt32()}");
Console.WriteLine($"Scores: {root.GetProperty("scores")}");
Output:
Name: Alice
Age: 30
Scores: [95,87,92]
The returned ParsedJsonDocument<T> uses ArrayPool-backed memory - the same pooled lifetime model described in Part 4.
Converting TOON to JSON
When you need a JSON string (e.g. to pass to another API):
using Corvus.Text.Json.Toon;
string toon = """
[2]{id,name,score}:
1,Alice,95
2,Bob,87
""";
string json = ToonDocument.ConvertToJsonString(toon);
// [{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87}]
Converting JSON to TOON
The reverse direction detects uniform object arrays and emits them as tables automatically:
using Corvus.Text.Json.Toon;
string json = """[{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87}]""";
string toon = ToonDocument.ConvertToToonString(json);
Result:
[2]{id,name,score}:
1,Alice,95
2,Bob,87
Zero-allocation UTF-8 path
For hot paths, you can write TOON directly to an IBufferWriter<byte>. There is no intermediate string allocation:
using System.Buffers;
using Corvus.Text.Json.Toon;
ArrayBufferWriter<byte> buffer = new(256);
ToonDocument.ConvertToToon(
"""[{"id":1,"name":"Alice","score":95},{"id":2,"name":"Bob","score":87}]"""u8,
buffer);
ReadOnlySpan<byte> utf8Toon = buffer.WrittenSpan;
This measured 0 B/op in benchmarks. Prefer the UTF-8 overloads whenever your input is already UTF-8 or your output destination accepts bytes.
Reader and writer options
Expanding dotted keys
By default, user.name is a literal property name. Enable path expansion to convert it into nested JSON:
using Corvus.Text.Json.Toon;
ToonReaderOptions options = new()
{
ExpandPaths = ToonPathExpansion.Safe,
};
string json = ToonDocument.ConvertToJsonString(
"user.name: Alice\nuser.age: 30",
options);
// {"user":{"name":"Alice","age":30}}
Folding nested JSON keys
The reverse operation folds nested objects into dotted keys in TOON output:
using Corvus.Text.Json;
using Corvus.Text.Json.Toon;
ToonWriterOptions options = new()
{
KeyFolding = ToonKeyFolding.Safe,
};
using ParsedJsonDocument<JsonElement> document =
ParsedJsonDocument<JsonElement>.Parse("""{"user":{"name":"Alice"},"active":true}""");
JsonElement root = document.RootElement;
string toon = ToonDocument.ConvertToToon(in root, options);
// user.name: Alice
// active: true
All options
| Option | Default | Description |
|---|---|---|
ToonReaderOptions.Strict |
true |
Checks declared array counts and duplicate object keys |
ToonReaderOptions.IndentSize |
2 |
Spaces per indentation level |
ToonReaderOptions.ExpandPaths |
Off |
Expands dotted keys into nested objects when Safe |
ToonWriterOptions.IndentSize |
2 |
Spaces per indentation level |
ToonWriterOptions.Delimiter |
Comma |
Delimiter for arrays and tables (Comma, Pipe, or Tab) |
ToonWriterOptions.KeyFolding |
Off |
Folds nested objects into dotted keys when Safe |
ToonWriterOptions.FlattenDepth |
int.MaxValue |
Max path segments to fold |
Error handling
Invalid TOON input throws ToonException with a 1-based line and column location:
using Corvus.Text.Json.Toon;
try
{
ToonDocument.ConvertToJsonString("[2]: 1");
}
catch (ToonException ex)
{
Console.WriteLine(ex.Message);
// Reports the line and column where parsing failed
}
Corvus vs Cysharp
Cysharp's ToonEncoder is an established .NET package for encoding System.Text.Json values to TOON. The key difference: Cysharp is an encoder (JSON → TOON only), while Corvus packages are bidirectional converters. If you need to consume TOON and produce JSON, use Corvus. If you only need to serialize POCOs to TOON, Cysharp may be the simpler fit.
Benchmarks on a 100-row person array show Corvus is 1.04–1.74× faster for encoding, with the UTF-8 buffer path allocating 0 B/op compared to Cysharp's 368–648 B.
Next up
In the final post, we'll cover migration from V4, the production analyzers, and how to get started.