docs
Actions
Upload to S3

S3 Upload

The S3 Upload action allows you to automatically upload the crawled data to your Amazon S3 bucket or any S3-compatible storage service. This is particularly useful for integrating the crawl results directly into your data pipeline without requiring an additional step to download and then upload the data.

Usage

⚠️

Security Warning: We temporarily store your S3 credentials while the job is processing. All credentials are automatically removed immediately after the job completes.

To use the S3 upload action, include an actions array in your request with an action of type upload_s3. This action requires several parameters to authenticate and specify the destination in your S3 bucket.

Required Parameters

ParameterTypeDescription
typestringMust be set to upload_s3
pathstringThe file path/key where the data will be stored in your bucket
access_key_idstringYour S3 access key ID
secret_access_keystringYour S3 secret access key
bucketstringThe name of your S3 bucket
endpointstringThe S3 endpoint URL (especially needed for S3-compatible services)
⚠️

If you haven't created the bucket in the us-east-1 AWS region, please, specify your bucket region through an endpoint in a format like https://s3.{your-region}.amazonaws.com (opens in a new tab).

Example Request

curl -i --request POST \
  --url https://api.webcrawlerapi.com/v1/crawl \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --data '{
  "url": "https://books.toscrape.com/",
  "scrape_type": "markdown",
  "items_limit": 20,
  "actions": [
    {
      "type": "upload_s3",
      "path": "/testupload",
      "access_key_id": "<ACCESS_KEY>",
      "secret_access_key": "<SECRET_KEY>",
      "bucket": "mybucket",
      "endpoint": "https://s3.eu-west-1.amazonaws.com"
    }
  ]
}'

Response

When the S3 upload action is successfully executed, the response will include information about the upload:

{
  "id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
  "actions": [
    {
      "type": "upload_s3",
      "status": "success",
      "path": "/testupload"
    }
  ]
}

Compatible Storage Services

This action works with:

  • Amazon S3
  • Cloudflare R2
  • DigitalOcean Spaces
  • Backblaze B2
  • Any other S3-compatible storage service

Error Handling

If there's an error with the S3 upload, the action's status will be set to error with a message explaining the issue:

{
	"error_code": "invalid_request",
	"error_message": "invalid S3 credentials: operation error S3: PutObject, https response error StatusCode: 403, api error InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records."
}

If you upload files to a private, non-accessible bucket, subsequent attempts to retrieve the content using the file URL might fail. Ensure that you have proper permissions set up for accessing the uploaded files if you need to retrieve them later.