Business Intelligence

TOON vs YAML vs JSON

November 17, 2025

TOON
JSON
YAML

TOON

TOON (Typed Object Orientation Notation) is a relatively new human readable data serialization format designed with LLMs in mind. Its syntax is intentionally close to natural language and programming conventions [using indentation, colons, and minimal punctuation], making it one of the easiest formats for LLMs to read, generate, and correctly parse without hallucinations. Because it avoids ambiguous symbols [like JSON’s heavy use of quotes and braces or YAML’s complex indentation rules], LLMs tend to make fewer structural mistakes when outputting or interpreting TOON, resulting in higher reliability in chain of thought prompting and tool calling scenarios.

Extremely high reliability: LLMs almost never break structure [rarely miss colons, forget indentation, or hallucinate quotes or braces]
Closest to natural code like English, so LLMs understand the intent with almost no explanation
No ambiguous syntax variants which gives near zero risk of multiple ways to write the same thing confusion
Minimal escaping needed which means LLMs do not hallucinate unnecessary backslashes
Indentation based but with forgiving rules [less strict than YAML], making generation robust even in long outputs
Designed from the ground up for LLM tool calling and structured output which makes it currently the most LLM friendly format in existence

Back to Top

JSON

JSON [JavaScript Object Notation] is the de facto standard for machine to machine communication and is moderately readable for LLMs. Its strict requirement for double quotes around keys and string values, plus heavy use of curly braces, square brackets, and commas, creates clear boundaries that LLMs can usually follow accurately when properly instructed. However, LLMs still occasionally hallucinate missing commas, extra commas, or unbalanced braces, especially in long or nested objects, and they sometimes forget to escape special characters in strings, which reduces its reliability compared to newer LLM friendly formats.

Very clear delimiters [braces, brackets, commas, quotes] give LLMs strong visual cues
Only one canonical way to write most data which reduces ambiguity
Supported natively by virtually all LLM structured output systems [OpenAI function calling, Gemini, etc.]
LLMs have been trained on massive amounts of JSON, so they know it well
Easy for LLMs to count opening and closing braces and validate locally
Downside [moderate hallucination of missing or extra commas or unescaped quotes] is well understood and can be mitigated with good prompts

Very clear delimiters [braces, brackets, commas, quotes] give LLMs strong visual cues
Only one canonical way to write most data which reduces ambiguity
Supported natively by virtually all LLM structured output systems [OpenAI function calling, Gemini, etc.]
LLMs have been trained on massive amounts of JSON, so they know it well
Easy for LLMs to count opening and closing braces and validate locally
Downside [moderate hallucination of missing or extra commas or unescaped quotes] is well understood and can be mitigated with good prompts

Back to Top

YAML

YAML [YAML Ain’t Markup Language] prioritizes human readability above all, using indentation and minimal punctuation, which makes it very natural for humans to write and read. For LLMs, however, this is a double edged sword; the heavy reliance on significant whitespace and the many ways to represent the same data [flow style vs block style, folded literals, different quote rules, etc.] dramatically increase the chance of parsing errors or hallucinations when an LLM generates or interprets YAML. As a result, YAML is generally considered one of the least LLM friendly formats despite being one of the most human friendly.

Highly human readable, which indirectly helps LLMs when they are asked to explain or reason about the content
No braces or heavy punctuation which makes it look clean in prompts
Supports comments natively [# …] which allows LLMs to include reasoning directly in the data
Can represent complex data with very little boilerplate
However: indentation sensitivity, multiple syntax styles, and subtle rules [norwegian problem, sexagesimal numbers, etc.] make LLMs prone to invalid output which creates the highest structural error rate of the three when generated by LLMs

Back to Top