Table of Contents
- Quick comparison
- What Markdown is good at
- What YAML is good at
- Use cases in web crawling, scraping, and RAG
- When Markdown should be used
- When YAML should be used
- Practical tradeoffs and failure modes
- YAML typing surprises
- Markdown "looks structured" but is not strict
- Node.js snippet: Guard YAML-like output by forcing strings
- Conclusion
Markdown and YAML are both selected for readability, but different kinds of ambiguity are introduced. Markdown is usually used for documents. YAML is usually used for configuration-like data with keys and values.
A bigger overview of formats is provided in Best Prompt Data.
Quick comparison
| Topic | Markdown | YAML |
|---|---|---|
| Best for | Narrative docs and reports | Config-shaped data and small records |
| Parsing reliability | Medium | Medium to High (but indentation mistakes hurt) |
| Human editing | Easy | Easy (until nesting gets deep) |
| Common failure | Structure drifts in long outputs | Indentation and implicit types surprise |
| RAG fit | Good for readable chunks | Good for metadata and small manifests |
What Markdown is good at
Markdown is usually used when:
- A long answer is expected to be read by a human
- Sections, headings, and lists are useful
- Code blocks and examples must remain readable
Markdown as an output format is compared in HTML vs Markdown.
What YAML is good at
YAML is usually used when:
- Key-value structure is needed, but it should remain human-friendly
- Config files or small manifests are being produced
- Comments are helpful (YAML supports comments, JSON does not)
A close alternative is JSON, and the tradeoffs are covered in JSON vs YAML.
Use cases in web crawling, scraping, and RAG
When Markdown should be used
Markdown is usually preferred when:
- Page content is being summarized for a human review step
- A "what was found" report is being generated (headings, bullets, quotes)
- The primary value is the readable text, not strict fields
When YAML should be used
YAML is usually preferred when:
- A small extraction manifest is being produced (selectors, flags, rules)
- A batch job definition is being generated and edited by hand
- A compact record per page is enough, and strict validation is not required
If the output must be parsed and stored reliably, Markdown vs JSON should usually be chosen over YAML.
Practical tradeoffs and failure modes
YAML typing surprises
YAML parsers can treat unquoted values as booleans, numbers, or dates. That behavior can be helpful, but it can also be surprising in scraping where strings are expected.
Markdown "looks structured" but is not strict
A table in Markdown looks like a table, but it is not guaranteed to be parseable as a table. If a database insert is planned, JSON or CSV is usually safer.
Node.js snippet: Guard YAML-like output by forcing strings
No YAML parser is used here on purpose. A common mitigation is: YAML is requested, but values are required to be quoted strings for predictable typing.
// Node 18+
// Simple check: ensure every ":" value is quoted.
// This is not a YAML parser. It is a guardrail.
import { readFile } from "node:fs/promises";
const text = await readFile("output.yml", "utf8");
const badLines = [];
for (const [i, line] of text.split("\n").entries()) {
const trimmed = line.trim();
if (!trimmed || trimmed.startsWith("#") || !trimmed.includes(":")) continue;
const idx = trimmed.indexOf(":");
const value = trimmed.slice(idx + 1).trim();
if (value && !value.startsWith('"')) {
badLines.push({ line: i + 1, value });
}
}
if (badLines.length) {
console.error("Unquoted YAML values found:", badLines.slice(0, 10));
process.exit(1);
}
console.log("OK: values look quoted");
Conclusion
- Markdown is usually selected for long, readable documents.
- YAML is usually selected for config-like key-value data that is edited by humans.
- For machine-parsed pipelines, JSON is usually more reliable than YAML.
If a flat dataset is being extracted, YAML vs CSV should be compared too.