Table of Contents
YAML and CSV are often picked for "human friendliness", but they represent different shapes. YAML is key-value and nested. CSV is flat rows and columns.
A full format overview is available in Best Prompt Data.
Quick comparison
| Topic | YAML | CSV |
|---|---|---|
| Best for | Config-like manifests | Flat tabular datasets |
| Human editing | High | High (spreadsheets) |
| Nesting | Supported | Not supported |
| Parsing reliability | Medium to High | High (with correct quoting) |
| Common failure | Indentation and implicit types | Commas/quotes/newlines in fields |
What YAML is good at
YAML is usually selected for:
- Crawl/extraction manifests (rules, flags, selectors)
- Small per-page records that humans will tweak
- When comments are useful
YAML compared to JSON is covered in JSON vs YAML.
What CSV is good at
CSV is usually selected for:
- Exports of extracted data
- One row per page/product
- Quick review in spreadsheets
If objects and metadata are needed, JSON vs CSV is often the better comparison.
Use cases in web crawling, scraping, and RAG
When YAML should be used
YAML is usually preferred when:
- Humans will edit the output before it is used
- A manifest is needed (what to extract, how to filter)
- Nesting is useful (per-domain rules, per-section options)
When CSV should be used
CSV is usually preferred when:
- A stable set of columns exists
- Export and reporting is the primary goal
- The data is already flat (directory listings, price tables)
For readable narrative outputs, Markdown is often selected instead, as covered in Markdown vs YAML and Markdown vs CSV.
Practical tradeoffs
YAML becomes fragile at scale
As nesting grows, small indentation errors can break parsing. That risk grows when thousands of records are generated.
CSV forces flattening
When nested data exists, flattening decisions must be made. Those decisions are often the real problem, not the format.
Node.js snippet: Generate CSV from a simple YAML-like manifest
No YAML parser is used. A common approach is: YAML is kept for job manifests, and CSV is generated only for extracted tabular outputs.
// Node 18+
// Create a tiny CSV from a JSON array (standing in for parsed YAML).
const items = [
{ url: "https://example.com/a", category: "news" },
{ url: "https://example.com/b", category: "docs" },
];
const headers = ["url", "category"];
const lines = [headers.join(",")];
for (const item of items) {
lines.push(
headers
.map((h) => `"${String(item[h] ?? "").replaceAll('"', '""')}"`)
.join(",")
);
}
console.log(lines.join("\n"));
Conclusion
- YAML is usually selected for human-edited manifests and nested config-like data.
- CSV is usually selected for flat datasets and exports.
- In many crawling pipelines, YAML (or JSON) is used for configuration and CSV is used only as an export format.
If minimal text is desired, YAML vs Plain Text can be compared next.