YAML vs Plain Text: Choosing the Right Format for LLM Prompts

YAML and plain text are often used at different stages. YAML is usually used for structured manifests and small records. Plain text is usually used for page content and embeddings.

A broader overview is available in Best Prompt Data.

Quick comparison

Topic	YAML	Plain Text
Best for	Config-like data and manifests	Raw content and simple outputs
Parsing reliability	Medium (indentation matters)	Low (no structure)
Human readability	High	High
RAG fit	Good for metadata	Good for content
Common failure	Indentation and implicit types	Missing boundaries and ambiguity

What YAML is good at

YAML is usually selected when:

A job manifest is being created (rules, filters, selectors)
Humans will tweak values
Nested config is needed and comments matter

If strict parsing is required, JSON can be preferred, as covered in JSON vs YAML.

What plain text is good at

Plain text is usually selected when:

The focus is on content, not fields
Embeddings will be created for RAG
Formatting should be minimized

If structure is helpful for chunking, Markdown can be compared in Markdown vs Plain Text.

Use cases in web crawling, scraping, and RAG

When YAML should be used

YAML is usually preferred when:

Extraction rules are being passed between humans
A small record is being stored, and a schema is not enforced
Comments are needed to explain choices

When plain text should be used

Plain text is usually preferred when:

The goal is search and retrieval over page content
Chunking will be done later
The output must be resilient to minor formatting issues

If the output is coming from HTML, the "raw vs cleaned" decision is covered in HTML vs Cleaned Text.

Practical tradeoffs

YAML is not ideal for large generated datasets

If thousands of YAML records are emitted by a model, indentation mistakes and typing surprises become frequent. JSON or CSV is usually safer at that scale.

Plain text makes structured QA difficult

If a "price" field is required, plain text alone can make validation hard. JSON can be compared in JSON vs Plain Text.

Node.js snippet: Combine YAML-like config with plain text content

A common pattern is: a config is kept in YAML and content is kept as plain text, then both are wrapped into a JSON record for ingestion.

// Node 18+
// Wrap plain text content with a config object.

const config = {
  extract: ["title", "author", "date"],
  language: "en",
};

const content = "Long page text goes here...";

const record = { config, content };
console.log(JSON.stringify(record, null, 2));

Conclusion

YAML is usually selected for human-edited manifests and config-like data.
Plain text is usually selected for content-first outputs and embeddings.
In crawling and RAG pipelines, YAML often describes what should be extracted, while plain text carries the actual page content.

If a tabular export is needed, YAML vs CSV can be compared too.

YAML and plain text are often used at different stages. YAML is usually used for structured manifests and small records. Plain text is usually used for page content and embeddings.

A broader overview is available in Best Prompt Data.

Quick comparison

Topic	YAML	Plain Text
Best for	Config-like data and manifests	Raw content and simple outputs
Parsing reliability	Medium (indentation matters)	Low (no structure)
Human readability	High	High
RAG fit	Good for metadata	Good for content
Common failure	Indentation and implicit types	Missing boundaries and ambiguity

What YAML is good at

YAML is usually selected when:

A job manifest is being created (rules, filters, selectors)
Humans will tweak values
Nested config is needed and comments matter

If strict parsing is required, JSON can be preferred, as covered in JSON vs YAML.

What plain text is good at

Plain text is usually selected when:

The focus is on content, not fields
Embeddings will be created for RAG
Formatting should be minimized

If structure is helpful for chunking, Markdown can be compared in Markdown vs Plain Text.

Use cases in web crawling, scraping, and RAG

When YAML should be used

YAML is usually preferred when:

Extraction rules are being passed between humans
A small record is being stored, and a schema is not enforced
Comments are needed to explain choices

When plain text should be used

Plain text is usually preferred when:

The goal is search and retrieval over page content
Chunking will be done later
The output must be resilient to minor formatting issues

If the output is coming from HTML, the "raw vs cleaned" decision is covered in HTML vs Cleaned Text.

Practical tradeoffs

YAML is not ideal for large generated datasets

If thousands of YAML records are emitted by a model, indentation mistakes and typing surprises become frequent. JSON or CSV is usually safer at that scale.

Plain text makes structured QA difficult

If a "price" field is required, plain text alone can make validation hard. JSON can be compared in JSON vs Plain Text.

Node.js snippet: Combine YAML-like config with plain text content

A common pattern is: a config is kept in YAML and content is kept as plain text, then both are wrapped into a JSON record for ingestion.

// Node 18+
// Wrap plain text content with a config object.

const config = {
  extract: ["title", "author", "date"],
  language: "en",
};

const content = "Long page text goes here...";

const record = { config, content };
console.log(JSON.stringify(record, null, 2));

Conclusion

YAML is usually selected for human-edited manifests and config-like data.
Plain text is usually selected for content-first outputs and embeddings.
In crawling and RAG pipelines, YAML often describes what should be extracted, while plain text carries the actual page content.

If a tabular export is needed, YAML vs CSV can be compared too.

YAML vs Plain Text: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What YAML is good at

What plain text is good at

Use cases in web crawling, scraping, and RAG

When YAML should be used

When plain text should be used

Practical tradeoffs

YAML is not ideal for large generated datasets

Plain text makes structured QA difficult

Node.js snippet: Combine YAML-like config with plain text content

Conclusion

YAML vs Plain Text: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What YAML is good at

What plain text is good at

Use cases in web crawling, scraping, and RAG

When YAML should be used

When plain text should be used

Practical tradeoffs

YAML is not ideal for large generated datasets

Plain text makes structured QA difficult

Node.js snippet: Combine YAML-like config with plain text content

Conclusion