TOON

TOON (Typed Object Orientation Notation) is a relatively new human readable data serialization format designed with LLMs in mind. Its syntax is intentionally close to natural language and programming conventions [using indentation, colons, and minimal punctuation], making it one of the easiest formats for LLMs to read, generate, and correctly parse without hallucinations. Because it avoids ambiguous symbols [like JSON’s heavy use of quotes and braces or YAML’s complex indentation rules], LLMs tend to make fewer structural mistakes when outputting or interpreting TOON, resulting in higher reliability in chain of thought prompting and tool calling scenarios.
  • Extremely high reliability: LLMs almost never break structure [rarely miss colons, forget indentation, or hallucinate quotes or braces]
  • Closest to natural code like English, so LLMs understand the intent with almost no explanation
  • No ambiguous syntax variants which gives near zero risk of multiple ways to write the same thing confusion
  • Minimal escaping needed which means LLMs do not hallucinate unnecessary backslashes
  • Indentation based but with forgiving rules [less strict than YAML], making generation robust even in long outputs
  • Designed from the ground up for LLM tool calling and structured output which makes it currently the most LLM friendly format in existence

Back to Top

JSON

JSON [JavaScript Object Notation] is the de facto standard for machine to machine communication and is moderately readable for LLMs. Its strict requirement for double quotes around keys and string values, plus heavy use of curly braces, square brackets, and commas, creates clear boundaries that LLMs can usually follow accurately when properly instructed. However, LLMs still occasionally hallucinate missing commas, extra commas, or unbalanced braces, especially in long or nested objects, and they sometimes forget to escape special characters in strings, which reduces its reliability compared to newer LLM friendly formats.
  • Very clear delimiters [braces, brackets, commas, quotes] give LLMs strong visual cues
  • Only one canonical way to write most data which reduces ambiguity
  • Supported natively by virtually all LLM structured output systems [OpenAI function calling, Gemini, etc.]
  • LLMs have been trained on massive amounts of JSON, so they know it well
  • Easy for LLMs to count opening and closing braces and validate locally
  • Downside [moderate hallucination of missing or extra commas or unescaped quotes] is well understood and can be mitigated with good prompts
JSON [JavaScript Object Notation] is the de facto standard for machine to machine communication and is moderately readable for LLMs. Its strict requirement for double quotes around keys and string values, plus heavy use of curly braces, square brackets, and commas, creates clear boundaries that LLMs can usually follow accurately when properly instructed. However, LLMs still occasionally hallucinate missing commas, extra commas, or unbalanced braces, especially in long or nested objects, and they sometimes forget to escape special characters in strings, which reduces its reliability compared to newer LLM friendly formats.
  • Very clear delimiters [braces, brackets, commas, quotes] give LLMs strong visual cues
  • Only one canonical way to write most data which reduces ambiguity
  • Supported natively by virtually all LLM structured output systems [OpenAI function calling, Gemini, etc.]
  • LLMs have been trained on massive amounts of JSON, so they know it well
  • Easy for LLMs to count opening and closing braces and validate locally
  • Downside [moderate hallucination of missing or extra commas or unescaped quotes] is well understood and can be mitigated with good prompts

Back to Top

YAML

YAML [YAML Ain’t Markup Language] prioritizes human readability above all, using indentation and minimal punctuation, which makes it very natural for humans to write and read. For LLMs, however, this is a double edged sword; the heavy reliance on significant whitespace and the many ways to represent the same data [flow style vs block style, folded literals, different quote rules, etc.] dramatically increase the chance of parsing errors or hallucinations when an LLM generates or interprets YAML. As a result, YAML is generally considered one of the least LLM friendly formats despite being one of the most human friendly.
  • Highly human readable, which indirectly helps LLMs when they are asked to explain or reason about the content
  • No braces or heavy punctuation which makes it look clean in prompts
  • Supports comments natively [# …] which allows LLMs to include reasoning directly in the data
  • Can represent complex data with very little boilerplate
  • However: indentation sensitivity, multiple syntax styles, and subtle rules [norwegian problem, sexagesimal numbers, etc.] make LLMs prone to invalid output which creates the highest structural error rate of the three when generated by LLMs

Back to Top

Views: 3

Leave a Reply

Your email address will not be published. Required fields are marked *

Popular Categories

Newsletter

Get free tips and resources right in your inbox, along with 10,000+ others

Latest Post