Discovery Endpoints
Discover URLs on websites and execute batch scrapes via HTTP.Start discovery
Start URL discovery using sitemap, pagination, or link pattern.Request body
- Sitemap
- Pagination
- Link Pattern
Discovery parameters
Sitemap
| Parameter | Type | Required | Description |
|---|---|---|---|
method | string | Yes | Must be "sitemap" |
sitemap_url | string | Yes | URL to sitemap.xml file |
url_pattern | string | No | Glob pattern to filter URLs |
max_urls | integer | No | Maximum URLs to discover (default: 1000, max: 10000) |
Pagination
| Parameter | Type | Required | Description |
|---|---|---|---|
method | string | Yes | Must be "pagination" |
url_template | string | Yes | URL with {n} placeholder |
url_pattern | string | No | Glob pattern to filter URLs |
start_index | integer | No | First page number (default: 1) |
step | integer | No | Increment between pages (default: 1) |
max_pages | integer | No | Maximum pages to generate (default: 100, max: 1000) |
Link Pattern
| Parameter | Type | Required | Description |
|---|---|---|---|
method | string | Yes | Must be "link_pattern" |
seed_url | string | Yes | Starting URL for crawl |
link_pattern | string | Yes | Glob pattern for URLs to collect |
navigation_pattern | string | No | Pattern for pages to visit during crawl |
max_depth | integer | No | How deep to crawl (default: 2, max: 10) |
max_urls | integer | No | Maximum URLs to discover (default: 1000, max: 10000) |
Response
Example
Get discovery status
Get discovery status and results.Response
When pending/running:pending, running, completed, failed
Example
Get discovery URLs
Fetch all discovered URLs with pagination.Query parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Max URLs to return (default: 1000, max: 10000) |
offset | integer | No | Results to skip (default: 0) |
Response
Example
Execute discovery
Execute a one-time batch scrape from discovered URLs.Request body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
strategy_id | string | Yes | Strategy UUID to use for scraping |
max_urls | integer | No | Maximum URLs to process |
url_filter | string | No | Regex pattern to filter URLs |
Response
Example
Create discovery schedule
Create a recurring schedule from discovered URLs.Request body
- Interval
- Cron
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
strategy_id | string | Yes | Strategy UUID to use |
interval_seconds | integer | No* | Seconds between runs |
cron_expression | string | No* | Cron schedule expression |
webhook_url | string | No | URL for completion notifications |
max_urls | integer | No | Maximum URLs per run |
url_filter | string | No | Regex pattern to filter URLs |
interval_seconds or cron_expression is required.
Response
Example
List discoveries
List all discoveries for the authenticated user.Query parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
status | string | No | Filter by status |
limit | integer | No | Max results (default: 20, max: 100) |
offset | integer | No | Results to skip (default: 0) |
Response
Array of discovery objects (same format as Get discovery).Example
Delete discovery
Delete a discovery and its associated URLs.Response
Example
Polling for completion
Since discovery runs asynchronously, poll until status iscompleted or failed:
Error responses
See REST API Errors for common error codes.Discovery-specific errors
| Status | Error | Description |
|---|---|---|
| 400 | Invalid url_filter regex | The regex pattern is invalid |
| 400 | Discovery not ready | Tried to execute before completion |
| 400 | No URLs match the filter | Filter excluded all URLs |
| 404 | Discovery not found | Invalid discovery ID |
| 404 | Strategy not found | Invalid strategy ID |
Next steps
Site Crawling Guide
Step-by-step crawling guide
Site Crawling Concepts
Understand how site crawling works
Schedule Endpoints
Manage recurring scrapes
Job Endpoints
Track batch job results