GET /job/:id/markdown/content

Downloads all successfully crawled pages from a completed job as a single combined markdown file.

Method: GET

Request example

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/46c7b8ff-eb5e-4ebb-96f1-2685334c07d7/markdown/content \
  --header 'Authorization: Bearer <YOUR TOKEN>' \
  --output combined.md

Response format

Content-Type: text/markdown; charset=utf-8
Each page is separated by a block showing the source URL:

----
url: <page_url>
----

<markdown content>

Example response

----
url: https://docs.example.com/getting-started
----

# Getting Started

Welcome to the docs...


----
url: https://docs.example.com/faq
----

# FAQ

Common questions and answers...

Requirements

Markdown Type: The job must have been created with output_formats: ["markdown"] (default)
Completed Status: The job status must be done
Successful Items: At least one job item must have completed successfully

Error responses

400 Bad Request

{
  "error": "Job is not a markdown type",
  "message": "This endpoint only supports jobs with markdown scrape type"
}

The job was created with a different scrape type (e.g., html or cleaned).

401 Unauthorized

{
  "error": "Access denied"
}

The job does not belong to your organization.

404 Not Found

{
  "error": "Job not found"
}

The job ID does not exist.

{
  "error": "No markdown content available",
  "message": "No successful items with markdown content found"
}

The job exists but has no successfully crawled pages with markdown content.

422 Unprocessable Entity

{
  "error": "Job not finished",
  "message": "Job must be in 'done' status to generate markdown file",
  "status": "in_progress"
}

The job is still processing. Wait for it to complete before requesting the combined markdown.

500 Internal Server Error

{
  "error": "Failed to download markdown content",
  "errors": [
    "https://example.com/page1: connection timeout",
    "https://example.com/page2: network error"
  ]
}

Content could not be retrieved for any pages. Individual error messages are in the errors array.

Use Cases

Batch Processing: Get all crawled content in a single request
Data Analysis: Process entire website content at once for analysis or indexing
RAG Applications: Feed combined content into vector databases or AI models
Documentation Extraction: Extract and combine documentation from multiple pages

Example Workflow

# Step 1: Create a crawl job
curl --request POST \
  --url https://api.webcrawlerapi.com/v1/crawl \
  --header 'Authorization: Bearer <YOUR TOKEN>' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com",
    "output_formats": ["markdown"],
    "items_limit": 10
  }'

# Response: {"id": "job-id-here"}

# Step 2: Wait for job to complete (poll /v1/job/{id} until status is "done")

# Step 3: Download combined markdown
curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/job-id-here/markdown/content \
  --header 'Authorization: Bearer <YOUR TOKEN>' \
  --output website-content.md

Notes

Only successfully crawled pages are included; failed items are silently skipped
If some pages fail but at least one succeeds, the available content is returned
Repeated requests for the same job are served instantly from cache
To get a shareable URL to the file instead of downloading directly, use GET /job/:id/markdown

GET /job/:id/markdown/content

On this page