YAML vs CSV: Choosing the Right Format for LLM Prompts

YAML vs CSV for prompt data and scraping outputs: config manifests vs flat tables, with practical crawling and RAG examples.

Written byAndrii
Published on

YAML and CSV are often picked for "human friendliness", but they represent different shapes. YAML is key-value and nested. CSV is flat rows and columns.

A full format overview is available in Best Prompt Data.

Quick comparison

TopicYAMLCSV
Best forConfig-like manifestsFlat tabular datasets
Human editingHighHigh (spreadsheets)
NestingSupportedNot supported
Parsing reliabilityMedium to HighHigh (with correct quoting)
Common failureIndentation and implicit typesCommas/quotes/newlines in fields

What YAML is good at

YAML is usually selected for:

  • Crawl/extraction manifests (rules, flags, selectors)
  • Small per-page records that humans will tweak
  • When comments are useful

YAML compared to JSON is covered in JSON vs YAML.

What CSV is good at

CSV is usually selected for:

  • Exports of extracted data
  • One row per page/product
  • Quick review in spreadsheets

If objects and metadata are needed, JSON vs CSV is often the better comparison.

Use cases in web crawling, scraping, and RAG

When YAML should be used

YAML is usually preferred when:

  • Humans will edit the output before it is used
  • A manifest is needed (what to extract, how to filter)
  • Nesting is useful (per-domain rules, per-section options)

When CSV should be used

CSV is usually preferred when:

  • A stable set of columns exists
  • Export and reporting is the primary goal
  • The data is already flat (directory listings, price tables)

For readable narrative outputs, Markdown is often selected instead, as covered in Markdown vs YAML and Markdown vs CSV.

Practical tradeoffs

YAML becomes fragile at scale

As nesting grows, small indentation errors can break parsing. That risk grows when thousands of records are generated.

CSV forces flattening

When nested data exists, flattening decisions must be made. Those decisions are often the real problem, not the format.

Node.js snippet: Generate CSV from a simple YAML-like manifest

No YAML parser is used. A common approach is: YAML is kept for job manifests, and CSV is generated only for extracted tabular outputs.

// Node 18+
// Create a tiny CSV from a JSON array (standing in for parsed YAML).

const items = [
  { url: "https://example.com/a", category: "news" },
  { url: "https://example.com/b", category: "docs" },
];

const headers = ["url", "category"];
const lines = [headers.join(",")];

for (const item of items) {
  lines.push(
    headers
      .map((h) => `"${String(item[h] ?? "").replaceAll('"', '""')}"`)
      .join(",")
  );
}

console.log(lines.join("\n"));

Conclusion

  • YAML is usually selected for human-edited manifests and nested config-like data.
  • CSV is usually selected for flat datasets and exports.
  • In many crawling pipelines, YAML (or JSON) is used for configuration and CSV is used only as an export format.

If minimal text is desired, YAML vs Plain Text can be compared next.


About the Author

Andrii Mazurian
Andrew Mazurian@andriixzvf

Founder, WebCrawlerAPI · 🇳🇱 Netherlands

Engineer with 15 years of experience in APIs, big data, and infrastructure. Founded WebCrawlerAPI in 2024 with a single goal: to build the best data API, and have been shipping it every day since.