Table of Contents
- Quick comparison
- What Markdown is good at
- What CSV is good at
- Use cases in web crawling, scraping, and RAG
- When Markdown should be used
- When CSV should be used
- Practical tradeoffs
- Markdown tables are not a contract
- CSV breaks on "real world" text
- Node.js snippet: Convert a small CSV into JSON records
- Conclusion
Markdown is used for readable documents. CSV is used for rows and columns. Confusion is usually created when a Markdown table is expected to behave like a CSV file.
A full format overview is provided in Best Prompt Data.
Quick comparison
| Topic | Markdown | CSV |
|---|---|---|
| Best for | Narrative text with light structure | Flat tabular data |
| Parsing reliability | Medium | High (when quoting is correct) |
| Human readability | High | Medium |
| Nested data | Awkward | Not supported |
| Common failure | Tables drift in formatting | Commas, quotes, newlines in fields |
What Markdown is good at
Markdown is usually selected for:
- Summaries, notes, extraction explanations
- Long text that should remain readable
- Mixed content: headings, bullets, code blocks
Markdown as an output format is compared in Cleaned Text vs Markdown.
What CSV is good at
CSV is usually selected for:
- One row per page (or per product, per listing)
- Easy export to spreadsheets and BI tools
- Simple ingestion into databases
If structured objects are needed, CSV vs Plain Text and JSON vs CSV are worth reading.
Use cases in web crawling, scraping, and RAG
When Markdown should be used
Markdown is usually preferred when:
- The output is a report, not a dataset
- Evidence and quotes should be preserved in a readable way
- The model is expected to explain edge cases
When CSV should be used
CSV is usually preferred when:
- A flat dataset is being produced (price list, directory, catalog)
- A predictable schema is needed (columns)
- Rows will be deduped, filtered, or joined downstream
For RAG ingestion, CSV is usually not used as-is. The content is often converted into text chunks and metadata. If chunking is the main goal, Markdown vs Plain Text is usually more relevant.
Practical tradeoffs
Markdown tables are not a contract
Markdown tables are often reformatted by models. Column alignment, escaped pipes, and wrapped text can be changed. If the output must be parsed, CSV or JSON is usually safer.
CSV breaks on "real world" text
CSV stays simple until commas, quotes, and newlines appear inside fields. That is common in scraped content (descriptions, addresses). Quoting rules must be enforced.
Node.js snippet: Convert a small CSV into JSON records
A minimal CSV parser is shown. It is safe only for simple CSV without escaped quotes inside quoted fields. For production parsing, a dedicated CSV parser is usually used.
// Node 18+
// Minimal CSV to JSON for simple data (no escaped quotes support).
import { readFile } from "node:fs/promises";
const csv = (await readFile("data.csv", "utf8")).trimEnd();
const lines = csv.split("\n");
const headers = lines[0].split(",").map((s) => s.trim());
const rows = [];
for (const line of lines.slice(1)) {
const cols = line.split(",").map((s) => s.trim());
const obj = {};
for (let i = 0; i < headers.length; i++) obj[headers[i]] = cols[i] ?? "";
rows.push(obj);
}
console.log(JSON.stringify(rows.slice(0, 3), null, 2));
Conclusion
- Markdown is usually used for readable reports and explanations.
- CSV is usually used for flat datasets with predictable columns.
- If strict structure is required and nesting is needed, JSON is usually preferred over CSV.
If CSV is being considered mainly for readability, YAML can be evaluated too in YAML vs CSV.