JSON vs CSV: Choosing the Right Format for LLM Prompts

JSON and CSV are both used for structured outputs, but different data shapes are assumed. JSON is used for objects and nested structures. CSV is used for flat rows.

A broader overview is covered in Best Prompt Data.

Quick comparison

Topic	JSON	CSV
Best for	Nested objects, metadata, APIs	Flat tables and exports
Parsing reliability	High	High (with correct quoting)
Human editing	Medium	Medium to High (spreadsheets)
Nested data	Supported	Not supported
Common failure	Schema drift	Commas/quotes/newlines in cells

What JSON is good at

JSON is usually selected when:

Each record contains nested fields (offers, variants, breadcrumbs)
Metadata is required for RAG (url, section, chunk_id)
Validation and type checking are needed

JSON paired with readable docs is covered in Markdown vs JSON.

What CSV is good at

CSV is usually selected when:

A table is desired (one row per page/product)
Data must be used in spreadsheets
Simple imports are planned

If the data is not tabular, CSV is often the wrong tool. Plain narrative output is covered in CSV vs Plain Text.

Use cases in web crawling, scraping, and RAG

When JSON should be used

JSON is usually preferred when:

Crawled pages produce different optional fields
Arrays are expected (multiple images, multiple prices, multiple authors)
Downstream systems expect objects

When CSV should be used

CSV is usually preferred when:

A stable schema exists (same columns every time)
Data will be filtered and reviewed in spreadsheets
A quick export is more important than perfect expressiveness

For readability-first outputs, Markdown is often used, as covered in Markdown vs CSV.

Practical tradeoffs

CSV forces decisions early

If arrays or nested objects exist, flattening rules must be invented (join with ;, create repeated columns, or explode rows). Those rules can be correct, but they must be maintained.

JSON makes "optional" fields easy

Fields can be omitted or set to null. That flexibility works well for scraped pages where data is inconsistent.

Node.js snippet: Flatten JSON records for CSV export

A simple flattening approach is shown: nested values are serialized as JSON strings. That is not pretty, but it is predictable.

// Node 18+
// Convert an array of JSON objects into a CSV with stable columns.

import { readFile } from "node:fs/promises";

const items = JSON.parse(await readFile("items.json", "utf8"));

const keys = new Set();
for (const item of items) for (const k of Object.keys(item)) keys.add(k);
const headers = [...keys];

function cell(v) {
  const s = typeof v === "string" ? v : JSON.stringify(v);
  return `"${String(s).replaceAll('"', '""')}"`;
}

const lines = [];
lines.push(headers.join(","));
for (const item of items) {
  lines.push(headers.map((h) => cell(item[h] ?? "")).join(","));
}

console.log(lines.slice(0, 5).join("\n"));

Conclusion

JSON is usually preferred for nested data, metadata, and reliable ingestion.
CSV is usually preferred for flat datasets and spreadsheet-friendly exports.
In many scraping pipelines, JSON is used internally and CSV is generated only as an export.

If human-edited configs are needed, YAML can be compared in YAML vs CSV.

JSON and CSV are both used for structured outputs, but different data shapes are assumed. JSON is used for objects and nested structures. CSV is used for flat rows.

A broader overview is covered in Best Prompt Data.

Quick comparison

Topic	JSON	CSV
Best for	Nested objects, metadata, APIs	Flat tables and exports
Parsing reliability	High	High (with correct quoting)
Human editing	Medium	Medium to High (spreadsheets)
Nested data	Supported	Not supported
Common failure	Schema drift	Commas/quotes/newlines in cells

What JSON is good at

JSON is usually selected when:

Each record contains nested fields (offers, variants, breadcrumbs)
Metadata is required for RAG (url, section, chunk_id)
Validation and type checking are needed

JSON paired with readable docs is covered in Markdown vs JSON.

What CSV is good at

CSV is usually selected when:

A table is desired (one row per page/product)
Data must be used in spreadsheets
Simple imports are planned

If the data is not tabular, CSV is often the wrong tool. Plain narrative output is covered in CSV vs Plain Text.

Use cases in web crawling, scraping, and RAG

When JSON should be used

JSON is usually preferred when:

Crawled pages produce different optional fields
Arrays are expected (multiple images, multiple prices, multiple authors)
Downstream systems expect objects

When CSV should be used

CSV is usually preferred when:

A stable schema exists (same columns every time)
Data will be filtered and reviewed in spreadsheets
A quick export is more important than perfect expressiveness

For readability-first outputs, Markdown is often used, as covered in Markdown vs CSV.

Practical tradeoffs

CSV forces decisions early

If arrays or nested objects exist, flattening rules must be invented (join with ;, create repeated columns, or explode rows). Those rules can be correct, but they must be maintained.

JSON makes "optional" fields easy

Fields can be omitted or set to null. That flexibility works well for scraped pages where data is inconsistent.

Node.js snippet: Flatten JSON records for CSV export

A simple flattening approach is shown: nested values are serialized as JSON strings. That is not pretty, but it is predictable.

// Node 18+
// Convert an array of JSON objects into a CSV with stable columns.

import { readFile } from "node:fs/promises";

const items = JSON.parse(await readFile("items.json", "utf8"));

const keys = new Set();
for (const item of items) for (const k of Object.keys(item)) keys.add(k);
const headers = [...keys];

function cell(v) {
  const s = typeof v === "string" ? v : JSON.stringify(v);
  return `"${String(s).replaceAll('"', '""')}"`;
}

const lines = [];
lines.push(headers.join(","));
for (const item of items) {
  lines.push(headers.map((h) => cell(item[h] ?? "")).join(","));
}

console.log(lines.slice(0, 5).join("\n"));

Conclusion

JSON is usually preferred for nested data, metadata, and reliable ingestion.
CSV is usually preferred for flat datasets and spreadsheet-friendly exports.
In many scraping pipelines, JSON is used internally and CSV is generated only as an export.

If human-edited configs are needed, YAML can be compared in YAML vs CSV.

JSON vs CSV: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What JSON is good at

What CSV is good at

Use cases in web crawling, scraping, and RAG

When JSON should be used

When CSV should be used

Practical tradeoffs

CSV forces decisions early

JSON makes "optional" fields easy

Node.js snippet: Flatten JSON records for CSV export

Conclusion

JSON vs CSV: Choosing the Right Format for LLM Prompts

Table of Contents

Table of Contents

Quick comparison

What JSON is good at

What CSV is good at

Use cases in web crawling, scraping, and RAG

When JSON should be used

When CSV should be used

Practical tradeoffs

CSV forces decisions early

JSON makes "optional" fields easy

Node.js snippet: Flatten JSON records for CSV export

Conclusion