Table of Contents
JSON and CSV are both used for structured outputs, but different data shapes are assumed. JSON is used for objects and nested structures. CSV is used for flat rows.
A broader overview is covered in Best Prompt Data.
Quick comparison
| Topic | JSON | CSV |
|---|---|---|
| Best for | Nested objects, metadata, APIs | Flat tables and exports |
| Parsing reliability | High | High (with correct quoting) |
| Human editing | Medium | Medium to High (spreadsheets) |
| Nested data | Supported | Not supported |
| Common failure | Schema drift | Commas/quotes/newlines in cells |
What JSON is good at
JSON is usually selected when:
- Each record contains nested fields (offers, variants, breadcrumbs)
- Metadata is required for RAG (url, section, chunk_id)
- Validation and type checking are needed
JSON paired with readable docs is covered in Markdown vs JSON.
What CSV is good at
CSV is usually selected when:
- A table is desired (one row per page/product)
- Data must be used in spreadsheets
- Simple imports are planned
If the data is not tabular, CSV is often the wrong tool. Plain narrative output is covered in CSV vs Plain Text.
Use cases in web crawling, scraping, and RAG
When JSON should be used
JSON is usually preferred when:
- Crawled pages produce different optional fields
- Arrays are expected (multiple images, multiple prices, multiple authors)
- Downstream systems expect objects
When CSV should be used
CSV is usually preferred when:
- A stable schema exists (same columns every time)
- Data will be filtered and reviewed in spreadsheets
- A quick export is more important than perfect expressiveness
For readability-first outputs, Markdown is often used, as covered in Markdown vs CSV.
Practical tradeoffs
CSV forces decisions early
If arrays or nested objects exist, flattening rules must be invented (join with ;, create repeated columns, or explode rows). Those rules can be correct, but they must be maintained.
JSON makes "optional" fields easy
Fields can be omitted or set to null. That flexibility works well for scraped pages where data is inconsistent.
Node.js snippet: Flatten JSON records for CSV export
A simple flattening approach is shown: nested values are serialized as JSON strings. That is not pretty, but it is predictable.
// Node 18+
// Convert an array of JSON objects into a CSV with stable columns.
import { readFile } from "node:fs/promises";
const items = JSON.parse(await readFile("items.json", "utf8"));
const keys = new Set();
for (const item of items) for (const k of Object.keys(item)) keys.add(k);
const headers = [...keys];
function cell(v) {
const s = typeof v === "string" ? v : JSON.stringify(v);
return `"${String(s).replaceAll('"', '""')}"`;
}
const lines = [];
lines.push(headers.join(","));
for (const item of items) {
lines.push(headers.map((h) => cell(item[h] ?? "")).join(","));
}
console.log(lines.slice(0, 5).join("\n"));
Conclusion
- JSON is usually preferred for nested data, metadata, and reliable ingestion.
- CSV is usually preferred for flat datasets and spreadsheet-friendly exports.
- In many scraping pipelines, JSON is used internally and CSV is generated only as an export.
If human-edited configs are needed, YAML can be compared in YAML vs CSV.