S3 Upload
The S3 Upload action allows you to automatically upload the crawled data to your Amazon S3 bucket or any S3-compatible storage service. This is particularly useful for integrating the crawl results directly into your data pipeline without requiring an additional step to download and then upload the data.
Usage
Security Warning: We temporarily store your S3 credentials while the job is processing. All credentials are automatically removed immediately after the job completes.
To use the S3 upload action, include an actions
array in your request with an action of type upload_s3
. This action requires several parameters to authenticate and specify the destination in your S3 bucket.
Required Parameters
Parameter | Type | Description |
---|---|---|
type | string | Must be set to upload_s3 |
path | string | The file path/key where the data will be stored in your bucket |
access_key_id | string | Your S3 access key ID |
secret_access_key | string | Your S3 secret access key |
bucket | string | The name of your S3 bucket |
endpoint | string | The S3 endpoint URL (especially needed for S3-compatible services) |
If you haven't created the bucket in the us-east-1
AWS region, please, specify your bucket region through an endpoint in a format like https://s3.{your-region}.amazonaws.com (opens in a new tab).
Example Request
curl -i --request POST \
--url https://api.webcrawlerapi.com/v1/crawl \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
"url": "https://books.toscrape.com/",
"scrape_type": "markdown",
"items_limit": 20,
"actions": [
{
"type": "upload_s3",
"path": "/testupload",
"access_key_id": "<ACCESS_KEY>",
"secret_access_key": "<SECRET_KEY>",
"bucket": "mybucket",
"endpoint": "https://s3.eu-west-1.amazonaws.com"
}
]
}'
Response
When the S3 upload action is successfully executed, the response will include information about the upload:
{
"id": "5f7b1b7b-7b7b-4b7b-8b7b-7b7b7b7b7b7b",
"actions": [
{
"type": "upload_s3",
"status": "success",
"path": "/testupload"
}
]
}
Compatible Storage Services
This action works with:
- Amazon S3
- Cloudflare R2
- DigitalOcean Spaces
- Backblaze B2
- Any other S3-compatible storage service
Error Handling
If there's an error with the S3 upload, the action's status will be set to error
with a message explaining the issue:
{
"error_code": "invalid_request",
"error_message": "invalid S3 credentials: operation error S3: PutObject, https response error StatusCode: 403, api error InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records."
}
If you upload files to a private, non-accessible bucket, subsequent attempts to retrieve the content using the file URL might fail. Ensure that you have proper permissions set up for accessing the uploaded files if you need to retrieve them later.