A schema-aware data format with human-readable text and compact binary representation.
~51% fewer input tokens than JSON for LLM applications, with zero accuracy loss.
TeaLeaf is a data format that combines:
- Human-readable text (
.tl) for editing and version control - Compact binary (
.tlbx) for storage and transmission - Inline schemas for validation and compression
- JSON interoperability for easy integration
The existing data format landscape presents trade-offs that TeaLeaf attempts to bridge. TeaLeaf does not attempt to replace any of the formats listed below, but rather presents a different perspective that users can objectively compare to identify if it fits their specific use cases.
| Format | Observation |
|---|---|
| JSON | Verbose, no comments, no schema |
| YAML | Indentation-sensitive, error-prone at scale |
| Protobuf | Schema external, binary-only, requires codegen |
| Avro | Schema embedded but not human-readable |
| CSV/TSV/TOON | Too simple for nested or typed data |
| MessagePack/CBOR | Compact but schemaless |
Converting some formats to binary yielded marginal benefits. Schema information was almost always external, requiring coordination between files.
TeaLeaf was designed to unify these concerns: a single file that humans can read and edit, that compiles to an efficient binary, with schemas inline rather than external. Though the format is general-purpose, LLM-Context Engineering uses cases can take advantage of significant token efficiency compared to JSON.
The same data — TeaLeaf uses schemas so field names are defined once, not repeated per record:
| TeaLeaf (schemas with nested structures) | JSON (no schema, names repeated) |
|---|---|
# Schema: define structure once
@struct Location (city: string, country: string)
@struct Department (name: string, location: Location)
@struct Employee (
id: int,
name: string,
role: string,
department: Department,
skills: []string,
)
# Data: field names not repeated
employees: @table Employee [
(1, "Alice", "Engineer",
("Platform", ("Seattle", "USA")),
["rust", "python"])
(2, "Bob", "Designer",
("Product", ("Austin", "USA")),
["figma", "css"])
(3, "Carol", "Manager",
("Platform", ("Seattle", "USA")),
["leadership", "agile"])
] |
{
"employees": [
{
"id": 1,
"name": "Alice",
"role": "Engineer",
"department": {
"name": "Platform",
"location": {
"city": "Seattle",
"country": "USA"
}
},
"skills": ["rust", "python"]
},
{
"id": 2,
"name": "Bob",
"role": "Designer",
"department": {
"name": "Product",
"location": {
"city": "Austin",
"country": "USA"
}
},
"skills": ["figma", "css"]
},
{
"id": 3,
"name": "Carol",
"role": "Manager",
"department": {
"name": "Platform",
"location": {
"city": "Seattle",
"country": "USA"
}
},
"skills": ["leadership", "agile"]
}
]
} |
Why This Matters:
| Aspect | JSON | TeaLeaf |
|---|---|---|
| Field names | Repeated for every record | Defined once in schema |
| Types | Implicit, inferred at runtime | Explicit in schema, structural checks at parse |
| Binary size | Large (names + values) | Compact (positional data only) |
| LLM tokens | 9,829 tokens (retail example shown below) | 5,632 tokens (43% fewer) |
| Validation | External tools needed | Field count validation via schema |
The schema approach means:
- Text format is human-readable with explicit types
- Binary format stores only values (field names in schema table)
- String deduplication — "Seattle", "USA", "Platform" stored once, referenced by index
A complete retail orders dataset demonstrating the full TeaLeaf workflow:
┌─────────────────────────────────────────────────────────────────────────────┐
│ RETAIL ORDERS WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ retail_orders.json ──────► retail_orders.tl ───────► retail_orders.tlbx │
│ 36.8 KB from-json 14.5 KB compile 6.9 KB │
│ 9,829 tokens 5,632 tokens (43% fewer) │
│ │
│ • 10 orders • 11 schemas defined • 81% size reduction │
│ • 4 products • Human-readable • 43% fewer LLM tokens │
│ • 3 customers • Comments & formatting • Fast transmission │
│ │
├─────────────────────────────────────────────────────────────────────────────┤
│ LLM ANALYSIS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ test_retail_analysis.ps1 │
│ │ │
│ ▼ │
│ Anthropic API (retail_orders.tl) ──────► responses/retail_analysis.tl │
│ │
│ • Sends TeaLeaf-formatted order data • Business intelligence insights │
│ • Schema-first = fewer tokens • Revenue analysis │
│ • Structured prompts • Customer segmentation │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Try it yourself:
| File | Description |
|---|---|
examples/retail_orders.json |
Original JSON (36.8 KB, 9,829 tokens) |
examples/retail_orders.tl |
TeaLeaf text format (14.5 KB, 5,632 tokens) |
examples/retail_orders.tlbx |
TeaLeaf binary (6.9 KB) |
examples/test_retail_analysis.ps1 |
Send to Anthropic API |
examples/responses/retail_analysis.tl |
Anthropics's analysis |
Download the latest release from GitHub Releases.
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | tealeaf-windows-x64.zip |
| Windows | ARM64 | tealeaf-windows-arm64.zip |
| Linux | x64 | tealeaf-linux-x64.tar.gz |
| Linux | ARM64 | tealeaf-linux-arm64.tar.gz |
| Linux (musl) | x64 | tealeaf-linux-musl-x64.tar.gz |
| macOS | x64 (Intel) | tealeaf-macos-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | tealeaf-macos-arm64.tar.gz |
Windows (PowerShell):
# Download and extract to current directory
Invoke-WebRequest -Uri "https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-windows-x64.zip" -OutFile tealeaf.zip
Expand-Archive tealeaf.zip -DestinationPath .
# Optional: add to PATH
$env:PATH += ";$PWD"Linux/macOS:
# Download and extract (replace with your platform)
curl -LO https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-linux-x64.tar.gz
tar -xzf tealeaf-linux-x64.tar.gz
# Optional: move to PATH
sudo mv tealeaf /usr/local/bin/Requires Rust toolchain:
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-core
# Binary at: target/release/tealeaf (or tealeaf.exe on Windows)tealeaf --version
tealeaf helptealeaf <command> [options]
Commands:
compile Compile text (.tl) to binary (.tlbx)
decompile Decompile binary (.tlbx) to text (.tl)
info Show file info (auto-detects format)
validate Validate text format syntax
to-json Convert TeaLeaf text to JSON
from-json Convert JSON to TeaLeaf text
tlbx-to-json Convert TeaLeaf binary to JSON
json-to-tlbx Convert JSON to TeaLeaf binary
completions Generate shell completionsRun tealeaf help <command> for detailed usage. Shell completions are available for Bash, Zsh, Fish, PowerShell, and Elvish via tealeaf completions <shell>.
| Language | Type | Package |
|---|---|---|
| Rust | Native | tealeaf-core crate |
| .NET | FFI | TeaLeaf NuGet package |
Both bindings provide:
- Parse text (
.tl) and read binary (.tlbx) - Dynamic key-based value access
- Schema introspection at runtime
- JSON conversion (bidirectional)
- Memory-mapped reading for large files
Community contributions welcome for Python, Java, and other languages.
TeaLeaf combines ideas from several formats: human-readable text like JSON/YAML, schema-embedded binaries like Avro, and positional encoding like Protobuf. The key difference is that TeaLeaf keeps schemas inline with data in the text format, making .tl files self-documenting and git-friendly.
The binary format (.tlbx) embeds schemas, enabling readers to decode files without external .proto or .avsc files. No code generation is required—schemas are discovered at runtime.
| Feature | JSON | Protobuf | Avro | MsgPack | TeaLeaf |
|---|---|---|---|---|---|
| Human-readable data format | ✅ | ❌ | ❌ | ✅ | |
| Compact binary | ❌ | ✅ | ✅ | ✅ | ✅ |
| Schema embedded in binary | ❌ | ❌ | ✅ | ❌ | ✅ |
| No code generation required | ✅ | ❌ | ✅ | ✅ | |
| Comments in source | ❌ | N/A | N/A | ❌ | ✅ |
| Built-in JSON conversion | — | ❌ | ❌ | ❌ | ✅ |
| Built-in compression | ❌ | ❌ | ✅ | ❌ | ✅ |
*Protobuf TextFormat exists but is rarely used. **Avro supports GenericRecord but codegen is typical.
Data from cargo run --example size_report on tealeaf-core.
| Format | Small Object | 10K Points | 1K Users |
|---|---|---|---|
| JSON | 1.00x | 1.00x | 1.00x |
| Protobuf | 0.38x | 0.65x | 0.41x |
| MessagePack | 0.35x | 0.63x | 0.38x |
| TeaLeaf Text | 1.38x | 0.87x | 0.63x |
| TeaLeaf Compressed | 3.56x | 0.15x | 0.47x |
TeaLeaf has 64-byte header overhead (bad for small objects). For large arrays with compression, TeaLeaf achieves 6-7x better compression than JSON.
Trade-off: TeaLeaf decode is ~2-5x slower than Protobuf due to dynamic key-based access. Choose TeaLeaf when size matters more than decode speed.
TeaLeaf is well-suited for assembling and managing context for large language models
Why TeaLeaf for LLM context:
- ~51% fewer input tokens on real-world data (14 tasks, 7 domains, SEC EDGAR + BLS + clinical trials + court filings + patents + census + NYC PLUTO) — verified across Claude Sonnet 4.5 and GPT-5.2
- Zero accuracy loss — three-format benchmark (TeaLeaf vs JSON vs TOON) scores within noise across all providers
- Binary format for fast cached context retrieval
- String deduplication (roles, tool names stored once)
- Human-readable text for prompt authoring
Three-format comparison (real-world data, 14 tasks, 7 domains, Claude Sonnet 4.5 + GPT-5.2):
| Metric | TeaLeaf | JSON | TOON |
|---|---|---|---|
| Anthropic accuracy | 0.942 | 0.945 | 0.939 |
| OpenAI accuracy | 0.925 | 0.924 | 0.928 |
| Input token savings | -51% | baseline | -20% |
- Configuration files — Human-editable text, compile to binary for deployment
- API data exchange — Bidirectional JSON conversion
- Scientific/tabular data — Null bitmap optimization for sparse data
- Embedded/IoT — Memory-mapped reads, no parsing allocations
For the complete technical specification including text format syntax, type system, binary format details, and grammar, see:
Planned improvements and areas where contributions are welcome.
See CONTRIBUTING.md for current development guidelines and how to get involved.
The FFI layer (tealeaf-ffi) exposes a C-compatible API, making bindings straightforward for any language with C interop. Currently supported:
| Language | Status |
|---|---|
| Rust | Stable (native) |
| .NET | Stable (NuGet: TeaLeaf) |
| Python | Planned |
| Java/Kotlin | Planned |
| Go | Planned |
| JavaScript/TypeScript | Planned |
See tealeaf-ffi/src/lib.rs for the exported API and bindings/dotnet/ as a reference implementation.
SLSA 3 Compliance — Planned for v2.1.0
Achieve SLSA Level 3 for enhanced supply chain security:
- Build provenance generation with slsa-github-generator
- Artifact signing with Sigstore/cosign
- Hermetic builds with pinned dependencies and isolated environments
- Provenance verification tooling and documentation
- Build security hardening (RELRO, stack canaries, PIE, code signing)
Benefits: Enhanced trust, tamper detection, compliance with security-conscious organizations
- Streaming mode for append-only and incremental writes
- Web playground for interactive TeaLeaf editing and conversion
- Package manager distribution (Homebrew, apt, Scoop, etc.)
- RFC process — Structured proposals for breaking changes and new features

