JSON vs YAML: Choosing the Right Format for LLM Prompts

JSON and YAML solve the same general problem: structured data. The difference is that JSON is strict, while YAML is flexible and human-friendly. That flexibility is where most surprises are introduced.

A broader guide is available in Best Prompt Data.

Quick comparison

Topic	JSON	YAML
Best for	Machine parsing, APIs, validation	Human-edited config and small manifests
Schema validation	Strong	Possible, but less common in practice
Comments	Not supported	Supported
Typing surprises	Fewer	More (implicit types)
Common failure	Trailing commas, quoting	Indentation, implicit booleans/dates

What JSON is good at

JSON is usually preferred when:

A downstream parser must not guess
A contract is required (keys, types, required fields)
Data is being stored as objects or sent over APIs

JSON paired with Markdown is covered in Markdown vs JSON.

What YAML is good at

YAML is usually preferred when:

Humans will edit the output
Comments are useful
Config-like nesting is needed, but strictness is not

If a readable document is needed instead of config, Markdown is often selected, as covered in Markdown vs YAML.

Use cases in web crawling, scraping, and RAG

When JSON should be used

JSON is usually the safer choice when:

Page extractions will be inserted into a database
A batch crawl produces many records that must be merged or deduped
RAG metadata must be consistent across all chunks

When YAML should be used

YAML is usually a fit when:

Extraction rules are being generated and edited manually
A "job spec" is being passed around by humans
Small manifests are being produced where a strict validator is not needed

For tabular datasets, CSV can be compared in JSON vs CSV.

Practical failure modes

YAML implicit types

In YAML, on, yes, 2026-02-01, and 123 can be interpreted as boolean, date, and number depending on the parser. In scraping, that can silently change meaning.

JSON strictness is a feature

The strictness is usually annoying for humans, but it is valuable for pipelines. If the model emits invalid JSON, the failure is immediate and detectable.

Node.js snippet: Enforce "JSON only" output in a pipeline

The simplest enforcement is: parsing is attempted, and the job is failed if parsing fails. That behavior tends to tighten model behavior over time.

// Node 18+
// Fail fast if JSON is invalid.

import { readFile } from "node:fs/promises";

const text = await readFile("output.json", "utf8");

let data;
try {
  data = JSON.parse(text);
} catch (e) {
  console.error("Invalid JSON output:", e.message);
  process.exit(1);
}

console.log("OK:", Array.isArray(data) ? "array" : "object");

Conclusion

JSON is usually selected for reliability, validation, and downstream parsing.
YAML is usually selected for human-edited config-like content and comments.
For most scraping and RAG ingestion pipelines, JSON is usually the default unless human editing is a core requirement.

If minimal text is desired instead of structured data, JSON vs Plain Text should be read next.

A broader guide is available in Best Prompt Data.

Quick comparison

Topic	JSON	YAML
Best for	Machine parsing, APIs, validation	Human-edited config and small manifests
Schema validation	Strong	Possible, but less common in practice
Comments	Not supported	Supported
Typing surprises	Fewer	More (implicit types)
Common failure	Trailing commas, quoting	Indentation, implicit booleans/dates

What JSON is good at

JSON is usually preferred when:

A downstream parser must not guess
A contract is required (keys, types, required fields)
Data is being stored as objects or sent over APIs

JSON paired with Markdown is covered in Markdown vs JSON.

What YAML is good at

YAML is usually preferred when:

Humans will edit the output
Comments are useful
Config-like nesting is needed, but strictness is not

If a readable document is needed instead of config, Markdown is often selected, as covered in Markdown vs YAML.

Use cases in web crawling, scraping, and RAG

When JSON should be used

JSON is usually the safer choice when:

Page extractions will be inserted into a database
A batch crawl produces many records that must be merged or deduped
RAG metadata must be consistent across all chunks

When YAML should be used

YAML is usually a fit when:

Extraction rules are being generated and edited manually
A "job spec" is being passed around by humans
Small manifests are being produced where a strict validator is not needed

For tabular datasets, CSV can be compared in JSON vs CSV.

Practical failure modes

YAML implicit types

In YAML, on, yes, 2026-02-01, and 123 can be interpreted as boolean, date, and number depending on the parser. In scraping, that can silently change meaning.

JSON strictness is a feature

The strictness is usually annoying for humans, but it is valuable for pipelines. If the model emits invalid JSON, the failure is immediate and detectable.

Node.js snippet: Enforce "JSON only" output in a pipeline

The simplest enforcement is: parsing is attempted, and the job is failed if parsing fails. That behavior tends to tighten model behavior over time.

// Node 18+
// Fail fast if JSON is invalid.

import { readFile } from "node:fs/promises";

const text = await readFile("output.json", "utf8");

let data;
try {
  data = JSON.parse(text);
} catch (e) {
  console.error("Invalid JSON output:", e.message);
  process.exit(1);
}

console.log("OK:", Array.isArray(data) ? "array" : "object");

Conclusion

JSON is usually selected for reliability, validation, and downstream parsing.
YAML is usually selected for human-edited config-like content and comments.
For most scraping and RAG ingestion pipelines, JSON is usually the default unless human editing is a core requirement.

If minimal text is desired instead of structured data, JSON vs Plain Text should be read next.

JSON vs YAML: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What JSON is good at

What YAML is good at

Use cases in web crawling, scraping, and RAG

When JSON should be used

When YAML should be used

Practical failure modes

YAML implicit types

JSON strictness is a feature

Node.js snippet: Enforce "JSON only" output in a pipeline

Conclusion

JSON vs YAML: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What JSON is good at

What YAML is good at

Use cases in web crawling, scraping, and RAG

When JSON should be used

When YAML should be used

Practical failure modes

YAML implicit types

JSON strictness is a feature

Node.js snippet: Enforce "JSON only" output in a pipeline

Conclusion