YAML vs CSV: Choosing the Right Format for LLM Prompts

YAML and CSV are often picked for "human friendliness", but they represent different shapes. YAML is key-value and nested. CSV is flat rows and columns.

A full format overview is available in Best Prompt Data.

Quick comparison

Topic	YAML	CSV
Best for	Config-like manifests	Flat tabular datasets
Human editing	High	High (spreadsheets)
Nesting	Supported	Not supported
Parsing reliability	Medium to High	High (with correct quoting)
Common failure	Indentation and implicit types	Commas/quotes/newlines in fields

What YAML is good at

YAML is usually selected for:

Crawl/extraction manifests (rules, flags, selectors)
Small per-page records that humans will tweak
When comments are useful

YAML compared to JSON is covered in JSON vs YAML.

What CSV is good at

CSV is usually selected for:

Exports of extracted data
One row per page/product
Quick review in spreadsheets

If objects and metadata are needed, JSON vs CSV is often the better comparison.

Use cases in web crawling, scraping, and RAG

When YAML should be used

YAML is usually preferred when:

Humans will edit the output before it is used
A manifest is needed (what to extract, how to filter)
Nesting is useful (per-domain rules, per-section options)

When CSV should be used

CSV is usually preferred when:

A stable set of columns exists
Export and reporting is the primary goal
The data is already flat (directory listings, price tables)

For readable narrative outputs, Markdown is often selected instead, as covered in Markdown vs YAML and Markdown vs CSV.

Practical tradeoffs

YAML becomes fragile at scale

As nesting grows, small indentation errors can break parsing. That risk grows when thousands of records are generated.

CSV forces flattening

When nested data exists, flattening decisions must be made. Those decisions are often the real problem, not the format.

Node.js snippet: Generate CSV from a simple YAML-like manifest

No YAML parser is used. A common approach is: YAML is kept for job manifests, and CSV is generated only for extracted tabular outputs.

// Node 18+
// Create a tiny CSV from a JSON array (standing in for parsed YAML).

const items = [
  { url: "https://example.com/a", category: "news" },
  { url: "https://example.com/b", category: "docs" },
];

const headers = ["url", "category"];
const lines = [headers.join(",")];

for (const item of items) {
  lines.push(
    headers
      .map((h) => `"${String(item[h] ?? "").replaceAll('"', '""')}"`)
      .join(",")
  );
}

console.log(lines.join("\n"));

Conclusion

YAML is usually selected for human-edited manifests and nested config-like data.
CSV is usually selected for flat datasets and exports.
In many crawling pipelines, YAML (or JSON) is used for configuration and CSV is used only as an export format.

If minimal text is desired, YAML vs Plain Text can be compared next.

Topic

YAML

CSV

Best for

Config-like manifests

Flat tabular datasets

Human editing

High

High (spreadsheets)

Nesting

Supported

Not supported

Parsing reliability

Medium to High

High (with correct quoting)

Common failure

Indentation and implicit types

Commas/quotes/newlines in fields

Use cases in web crawling, scraping, and RAG

When YAML should be used

YAML is usually preferred when:

Humans will edit the output before it is used

A manifest is needed (what to extract, how to filter)

Nesting is useful (per-domain rules, per-section options)

When CSV should be used

CSV is usually preferred when:

A stable set of columns exists

Export and reporting is the primary goal

The data is already flat (directory listings, price tables)

For readable narrative outputs, Markdown is often selected instead, as covered in Markdown vs YAML and Markdown vs CSV.

Node.js snippet: Generate CSV from a simple YAML-like manifest

No YAML parser is used. A common approach is: YAML is kept for job manifests, and CSV is generated only for extracted tabular outputs.

// Node 18+
// Create a tiny CSV from a JSON array (standing in for parsed YAML).

const items = [
  { url: "https://example.com/a", category: "news" },
  { url: "https://example.com/b", category: "docs" },
];

const headers = ["url", "category"];
const lines = [headers.join(",")];

for (const item of items) {
  lines.push(
    headers
      .map((h) => `"${String(item[h] ?? "").replaceAll('"', '""')}"`)
      .join(",")
  );
}

console.log(lines.join("\n"));

Conclusion

YAML is usually selected for human-edited manifests and nested config-like data.

CSV is usually selected for flat datasets and exports.

In many crawling pipelines, YAML (or JSON) is used for configuration and CSV is used only as an export format.

If minimal text is desired, YAML vs Plain Text can be compared next.

YAML vs CSV: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What YAML is good at

What CSV is good at

Use cases in web crawling, scraping, and RAG

When YAML should be used

When CSV should be used

Practical tradeoffs

YAML becomes fragile at scale

CSV forces flattening

Node.js snippet: Generate CSV from a simple YAML-like manifest

Conclusion

YAML vs CSV: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What YAML is good at

What CSV is good at

Use cases in web crawling, scraping, and RAG

When YAML should be used

When CSV should be used

Practical tradeoffs

YAML becomes fragile at scale

CSV forces flattening

Node.js snippet: Generate CSV from a simple YAML-like manifest

Conclusion