JSON vs CSV: Choosing the Right Format for LLM Prompts

JSON vs CSV for scraped datasets and LLM prompt outputs: structure, nesting, parsing, and what works best for pipelines and RAG.

Written byAndrii
Published on

JSON and CSV are both used for structured outputs, but different data shapes are assumed. JSON is used for objects and nested structures. CSV is used for flat rows.

A broader overview is covered in Best Prompt Data.

Quick comparison

TopicJSONCSV
Best forNested objects, metadata, APIsFlat tables and exports
Parsing reliabilityHighHigh (with correct quoting)
Human editingMediumMedium to High (spreadsheets)
Nested dataSupportedNot supported
Common failureSchema driftCommas/quotes/newlines in cells

What JSON is good at

JSON is usually selected when:

  • Each record contains nested fields (offers, variants, breadcrumbs)
  • Metadata is required for RAG (url, section, chunk_id)
  • Validation and type checking are needed

JSON paired with readable docs is covered in Markdown vs JSON.

What CSV is good at

CSV is usually selected when:

  • A table is desired (one row per page/product)
  • Data must be used in spreadsheets
  • Simple imports are planned

If the data is not tabular, CSV is often the wrong tool. Plain narrative output is covered in CSV vs Plain Text.

Use cases in web crawling, scraping, and RAG

When JSON should be used

JSON is usually preferred when:

  • Crawled pages produce different optional fields
  • Arrays are expected (multiple images, multiple prices, multiple authors)
  • Downstream systems expect objects

When CSV should be used

CSV is usually preferred when:

  • A stable schema exists (same columns every time)
  • Data will be filtered and reviewed in spreadsheets
  • A quick export is more important than perfect expressiveness

For readability-first outputs, Markdown is often used, as covered in Markdown vs CSV.

Practical tradeoffs

CSV forces decisions early

If arrays or nested objects exist, flattening rules must be invented (join with ;, create repeated columns, or explode rows). Those rules can be correct, but they must be maintained.

JSON makes "optional" fields easy

Fields can be omitted or set to null. That flexibility works well for scraped pages where data is inconsistent.

Node.js snippet: Flatten JSON records for CSV export

A simple flattening approach is shown: nested values are serialized as JSON strings. That is not pretty, but it is predictable.

// Node 18+
// Convert an array of JSON objects into a CSV with stable columns.

import { readFile } from "node:fs/promises";

const items = JSON.parse(await readFile("items.json", "utf8"));

const keys = new Set();
for (const item of items) for (const k of Object.keys(item)) keys.add(k);
const headers = [...keys];

function cell(v) {
  const s = typeof v === "string" ? v : JSON.stringify(v);
  return `"${String(s).replaceAll('"', '""')}"`;
}

const lines = [];
lines.push(headers.join(","));
for (const item of items) {
  lines.push(headers.map((h) => cell(item[h] ?? "")).join(","));
}

console.log(lines.slice(0, 5).join("\n"));

Conclusion

  • JSON is usually preferred for nested data, metadata, and reliable ingestion.
  • CSV is usually preferred for flat datasets and spreadsheet-friendly exports.
  • In many scraping pipelines, JSON is used internally and CSV is generated only as an export.

If human-edited configs are needed, YAML can be compared in YAML vs CSV.


About the Author

Andrii Mazurian
Andrew Mazurian@andriixzvf

Founder, WebCrawlerAPI · 🇳🇱 Netherlands

Engineer with 15 years of experience in APIs, big data, and infrastructure. Founded WebCrawlerAPI in 2024 with a single goal: to build the best data API, and have been shipping it every day since.