Table of Contents
- Quick comparison
- What Markdown is good at
- What JSON is good at
- Use cases in web crawling, scraping, and RAG
- When Markdown should be used
- When JSON should be used
- Practical prompt patterns
- Pattern 1: Markdown instructions + JSON output
- Pattern 2: Markdown report with embedded JSON blocks
- Node.js snippet: Extract a JSON code block from Markdown
- Conclusion
Markdown and JSON are both used as "prompt data", but different failure modes are triggered by each. Markdown is usually chosen when humans are expected to read or edit the content. JSON is usually chosen when machines are expected to parse it reliably.
For a broader map of formats, Best Prompt Data should be read first.
Quick comparison
| Topic | Markdown | JSON |
|---|---|---|
| Best for | Mixed text + structure | Strict structure + validation |
| Parsing reliability | Medium | High (when schema is used) |
| Human readability | High | Medium |
| LLM output stability | Medium | High (when keys are constrained) |
| Common failure | Broken structure in long docs | Trailing commas, quoting, schema drift |
What Markdown is good at
Markdown is a lightweight way to mix narrative text and lightweight structure (headings, bullet lists, code blocks). It is usually used when the prompt is expected to be iterated on by a human.
Typical uses:
- Instructions and constraints that should be seen at a glance
- A "report" style output that is expected to be read by a person
- Small embedded JSON snippets inside fenced code blocks
Markdown output comparisons are covered in HTML vs Markdown and Cleaned Text vs Markdown.
What JSON is good at
JSON is a strict data format. It is usually used when a downstream step is going to parse the result and store it, validate it, or feed it into another system.
Typical uses:
- Extracted fields from crawled pages (title, price, author, date)
- RAG ingestion where chunk metadata is expected to be consistent
- Pipelines where schema validation is needed
A related format tradeoff is covered in JSON vs YAML.
Use cases in web crawling, scraping, and RAG
When Markdown should be used
Markdown is usually preferred when:
- The output is expected to be read by a human (audits, summaries, notes)
- The result includes long text where strict structure is not required
- The model is expected to quote passages and keep them readable
A common pattern is: JSON is used for extracted fields, while Markdown is used for a human-facing explanation.
When JSON should be used
JSON is usually preferred when:
- The output must be parsed without ambiguity
- A contract is needed (schema, required keys, value types)
- Records are expected to be stored in a database as objects
- RAG metadata (url, title, headings, chunk_id) must be consistent
If the content is tabular, JSON vs CSV can be a better comparison to read next.
Practical prompt patterns
Pattern 1: Markdown instructions + JSON output
This pattern is often used to keep instructions readable while forcing the model to emit parseable data.
- Instructions are written in Markdown
- Output is required as JSON only, with an example object
- A validator is used in the pipeline
Pattern 2: Markdown report with embedded JSON blocks
This pattern is often used when both humans and machines are involved.
- A short JSON block is embedded in a fenced code block
- The rest is written as narrative Markdown
Node.js snippet: Extract a JSON code block from Markdown
This snippet is intentionally simple. If multiple JSON blocks are expected, iteration should be added.
// Node 18+
// Extract the first ```json ... ``` block from Markdown and parse it.
import { readFile } from "node:fs/promises";
const md = await readFile("output.md", "utf8");
const match = md.match(/```json\s*([\s\S]*?)\s*```/i);
if (!match) {
throw new Error("No ```json``` block found");
}
const jsonText = match[1];
const data = JSON.parse(jsonText);
console.log("Parsed keys:", Object.keys(data));
Conclusion
- Markdown is usually chosen for human readability and mixed narrative content.
- JSON is usually chosen for strict extraction, validation, and reliable downstream parsing.
- For many crawling and RAG pipelines, a hybrid approach is used: Markdown for instructions and JSON for results.
If a plain narrative output is being considered, Markdown vs Plain Text should be compared too.