Markdown vs CSV: Choosing the Right Format for LLM Prompts

Markdown vs CSV for scraped data and prompt inputs: when tables help, when they break, and what works best for RAG and pipelines.

Written byAndrii
Published on

Markdown is used for readable documents. CSV is used for rows and columns. Confusion is usually created when a Markdown table is expected to behave like a CSV file.

A full format overview is provided in Best Prompt Data.

Quick comparison

TopicMarkdownCSV
Best forNarrative text with light structureFlat tabular data
Parsing reliabilityMediumHigh (when quoting is correct)
Human readabilityHighMedium
Nested dataAwkwardNot supported
Common failureTables drift in formattingCommas, quotes, newlines in fields

What Markdown is good at

Markdown is usually selected for:

  • Summaries, notes, extraction explanations
  • Long text that should remain readable
  • Mixed content: headings, bullets, code blocks

Markdown as an output format is compared in Cleaned Text vs Markdown.

What CSV is good at

CSV is usually selected for:

  • One row per page (or per product, per listing)
  • Easy export to spreadsheets and BI tools
  • Simple ingestion into databases

If structured objects are needed, CSV vs Plain Text and JSON vs CSV are worth reading.

Use cases in web crawling, scraping, and RAG

When Markdown should be used

Markdown is usually preferred when:

  • The output is a report, not a dataset
  • Evidence and quotes should be preserved in a readable way
  • The model is expected to explain edge cases

When CSV should be used

CSV is usually preferred when:

  • A flat dataset is being produced (price list, directory, catalog)
  • A predictable schema is needed (columns)
  • Rows will be deduped, filtered, or joined downstream

For RAG ingestion, CSV is usually not used as-is. The content is often converted into text chunks and metadata. If chunking is the main goal, Markdown vs Plain Text is usually more relevant.

Practical tradeoffs

Markdown tables are not a contract

Markdown tables are often reformatted by models. Column alignment, escaped pipes, and wrapped text can be changed. If the output must be parsed, CSV or JSON is usually safer.

CSV breaks on "real world" text

CSV stays simple until commas, quotes, and newlines appear inside fields. That is common in scraped content (descriptions, addresses). Quoting rules must be enforced.

Node.js snippet: Convert a small CSV into JSON records

A minimal CSV parser is shown. It is safe only for simple CSV without escaped quotes inside quoted fields. For production parsing, a dedicated CSV parser is usually used.

// Node 18+
// Minimal CSV to JSON for simple data (no escaped quotes support).

import { readFile } from "node:fs/promises";

const csv = (await readFile("data.csv", "utf8")).trimEnd();
const lines = csv.split("\n");
const headers = lines[0].split(",").map((s) => s.trim());

const rows = [];
for (const line of lines.slice(1)) {
  const cols = line.split(",").map((s) => s.trim());
  const obj = {};
  for (let i = 0; i < headers.length; i++) obj[headers[i]] = cols[i] ?? "";
  rows.push(obj);
}

console.log(JSON.stringify(rows.slice(0, 3), null, 2));

Conclusion

  • Markdown is usually used for readable reports and explanations.
  • CSV is usually used for flat datasets with predictable columns.
  • If strict structure is required and nesting is needed, JSON is usually preferred over CSV.

If CSV is being considered mainly for readability, YAML can be evaluated too in YAML vs CSV.


About the Author

Andrii Mazurian
Andrew Mazurian@andriixzvf

Founder, WebCrawlerAPI · 🇳🇱 Netherlands

Engineer with 15 years of experience in APIs, big data, and infrastructure. Founded WebCrawlerAPI in 2024 with a single goal: to build the best data API, and have been shipping it every day since.