Webcrawler API LogoWebCrawlerAPI
API

GET /job/:id/markdown

Get a URL to the combined markdown file for a completed markdown crawl

Returns a URL to a single combined markdown file containing all successfully crawled pages from a completed job.

Method: GET

Request example

curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/46c7b8ff-eb5e-4ebb-96f1-2685334c07d7/markdown \
  --header 'Authorization: Bearer <YOUR TOKEN>'

Response format

  • Content-Type: application/json
  • Returns a JSON object with content_url — a direct link to the combined markdown file

Example response

{
  "content_url": "https://data.webcrawlerapi.com/content/..."
}

The file at content_url is a combined markdown document where each page is separated by:

----
url: <page_url>
----

<markdown content>

Requirements

  1. Markdown Type: The job must have been created with output_formats: ["markdown"] (default)
  2. Completed Status: The job status must be done
  3. Successful Items: At least one job item must have completed successfully

Error responses

400 Bad Request

{
  "error": "Job is not a markdown type",
  "message": "This endpoint only supports jobs with markdown scrape type"
}

The job was created with a different scrape type (e.g., html or cleaned).

401 Unauthorized

{
  "error": "Access denied"
}

The job does not belong to your organization.

404 Not Found

{
  "error": "Job not found"
}

The job ID does not exist.

{
  "error": "No markdown content available",
  "message": "No successful items with markdown content found"
}

The job exists but has no successfully crawled pages with markdown content.

422 Unprocessable Entity

{
  "error": "Job not finished",
  "message": "Job must be in 'done' status to generate markdown file",
  "status": "in_progress"
}

The job is still processing. Wait for it to complete before requesting the combined markdown.

500 Internal Server Error

{
  "error": "Failed to upload markdown file"
}

An unexpected error occurred. Please try again.

Use Cases

  1. Batch Processing: Get all crawled content in a single request
  2. Data Analysis: Process entire website content at once for analysis or indexing
  3. RAG Applications: Feed combined content into vector databases or AI models
  4. Documentation Extraction: Extract and combine documentation from multiple pages

Example Workflow

# Step 1: Create a crawl job
curl --request POST \
  --url https://api.webcrawlerapi.com/v1/crawl \
  --header 'Authorization: Bearer <YOUR TOKEN>' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com",
    "output_formats": ["markdown"],
    "items_limit": 10
  }'

# Response: {"id": "job-id-here"}

# Step 2: Wait for job to complete (poll /v1/job/{id} until status is "done")

# Step 3: Get the combined markdown URL
curl --request GET \
  --url https://api.webcrawlerapi.com/v1/job/job-id-here/markdown \
  --header 'Authorization: Bearer <YOUR TOKEN>'

# Response: {"content_url": "https://data.webcrawlerapi.com/content/..."}

# Step 4: Download the file
curl --output website-content.md "<content_url from step 3>"

Notes

  • Only successfully crawled pages are included; failed items are silently skipped
  • The URL is stable — subsequent requests for the same job return the same URL instantly
  • To download the file content directly in one step, use GET /job/:id/markdown/content