Upload to any S3

The S3 Upload action allows you to automatically upload the crawled data to your Amazon S3 bucket or any S3-compatible storage service. This is particularly useful for integrating the crawl results directly into your data pipeline without requiring an additional step to download and then upload the data.

Usage

⚠️

Security Warning: We temporarily store your S3 credentials while the job is processing. All credentials are automatically removed immediately after the job completes.

To use the S3 upload action, include an actions array in your request with an action of type upload_s3. This action requires several parameters to authenticate and specify the destination in your S3 bucket.

Required Parameters

Parameter	Type	Description
`type`	string	Must be set to `upload_s3`
`path`	string	The file path/key where the data will be stored in your bucket
`access_key_id`	string	Your S3 access key ID
`secret_access_key`	string	Your S3 secret access key
`bucket`	string	The name of your S3 bucket
`endpoint`	string	The S3 endpoint URL (especially needed for S3-compatible services)

⚠️

If you haven't created the bucket in the us-east-1 AWS region, please, specify your bucket region through an endpoint in a format like https://s3.{your-region}.amazonaws.com (opens in a new tab).

Example Request

curl -i --request POST \
  --url https://api.webcrawlerapi.com/v1/crawl \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --data '{
  "url": "https://books.toscrape.com/",
  "scrape_type": "markdown",
  "items_limit": 20,
  "actions": [
    {
      "type": "upload_s3",
      "path": "/testupload",
      "access_key_id": "<ACCESS_KEY>",
      "secret_access_key": "<SECRET_KEY>",
      "bucket": "mybucket",
      "endpoint": "https://s3.eu-west-1.amazonaws.com"
    }
  ]
}'

Response

When the S3 upload action is successfully executed, the response will include information about the upload:

{
  "id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
  "actions": [
    {
      "type": "upload_s3",
      "status": "success",
      "path": "/testupload"
    }
  ]
}

Compatible Storage Services

This action works with:

Amazon S3
Cloudflare R2
DigitalOcean Spaces
Backblaze B2
Any other S3-compatible storage service

Error Handling

If there's an error with the S3 upload, the action's status will be set to error with a message explaining the issue:

{
	"error_code": "invalid_request",
	"error_message": "invalid S3 credentials: operation error S3: PutObject, https response error StatusCode: 403, api error InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records."
}

If you upload files to a private, non-accessible bucket, subsequent attempts to retrieve the content using the file URL might fail. Ensure that you have proper permissions set up for accessing the uploaded files if you need to retrieve them later.

What is action?Async requests and Webhooks