Discovery Endpoints

Discover URLs on websites and execute batch scrapes via HTTP.

Start discovery

Start URL discovery using sitemap, pagination, or link pattern.

POST /discover

Request body

Sitemap
Pagination
Link Pattern

{
  "discovery": {
    "method": "sitemap",
    "sitemap_url": "https://shop.com/sitemap.xml",
    "url_pattern": "products/*/",
    "max_urls": 1000
  }
}

{
  "discovery": {
    "method": "pagination",
    "url_template": "https://shop.com/products?page={n}",
    "start_index": 1,
    "step": 1,
    "max_pages": 100
  }
}

{
  "discovery": {
    "method": "link_pattern",
    "seed_url": "https://news.com",
    "link_pattern": "/article/*/",
    "navigation_pattern": "/category/",
    "max_depth": 2,
    "max_urls": 500
  }
}

Discovery parameters

Sitemap

Parameter	Type	Required	Description
`method`	string	Yes	Must be `"sitemap"`
`sitemap_url`	string	Yes	URL to sitemap.xml file
`url_pattern`	string	No	Glob pattern to filter URLs
`max_urls`	integer	No	Maximum URLs to discover (default: 1000, max: 10000)

Pagination

Parameter	Type	Required	Description
`method`	string	Yes	Must be `"pagination"`
`url_template`	string	Yes	URL with `{n}` placeholder
`url_pattern`	string	No	Glob pattern to filter URLs
`start_index`	integer	No	First page number (default: 1)
`step`	integer	No	Increment between pages (default: 1)
`max_pages`	integer	No	Maximum pages to generate (default: 100, max: 1000)

Link Pattern

Parameter	Type	Required	Description
`method`	string	Yes	Must be `"link_pattern"`
`seed_url`	string	Yes	Starting URL for crawl
`link_pattern`	string	Yes	Glob pattern for URLs to collect
`navigation_pattern`	string	No	Pattern for pages to visit during crawl
`max_depth`	integer	No	How deep to crawl (default: 2, max: 10)
`max_urls`	integer	No	Maximum URLs to discover (default: 1000, max: 10000)

Response

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml"
}

Example

curl -X POST https://api.meter.sh/discover \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "discovery": {
      "method": "sitemap",
      "sitemap_url": "https://shop.com/sitemap.xml",
      "max_urls": 500
    }
  }'

Get discovery status

Get discovery status and results.

GET /discover/{discovery_id}

Response

When pending/running:

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml"
}

When completed:

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml",
  "total_urls": 847,
  "filtered_count": 847,
  "inferred_pattern": "/products/[slug]/",
  "url_patterns": {
    "/products/": 847
  },
  "sample_urls": [
    "https://shop.com/products/widget-a",
    "https://shop.com/products/widget-b"
  ],
  "errors": []
}

When failed:

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml",
  "errors": ["Sitemap not accessible: 404 Not Found"]
}

Status values: pending, running, completed, failed

Example

curl https://api.meter.sh/discover/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer sk_live_..."

Get discovery URLs

Fetch all discovered URLs with pagination.

GET /discover/{discovery_id}/urls?limit=1000&offset=0

Query parameters

Parameter	Type	Required	Description
`limit`	integer	No	Max URLs to return (default: 1000, max: 10000)
`offset`	integer	No	Results to skip (default: 0)

Response

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "urls": [
    "https://shop.com/products/widget-a",
    "https://shop.com/products/widget-b",
    "..."
  ],
  "total": 847,
  "limit": 1000,
  "offset": 0
}

Example

# Get first 100 URLs
curl "https://api.meter.sh/discover/550e8400.../urls?limit=100" \
  -H "Authorization: Bearer sk_live_..."

# Get next 100 URLs
curl "https://api.meter.sh/discover/550e8400.../urls?limit=100&offset=100" \
  -H "Authorization: Bearer sk_live_..."

Execute discovery

Execute a one-time batch scrape from discovered URLs.

POST /discover/{discovery_id}/execute

Request body

{
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "max_urls": 100,
  "url_filter": ".*widget.*"
}

Parameters

Parameter	Type	Required	Description
`strategy_id`	string	Yes	Strategy UUID to use for scraping
`max_urls`	integer	No	Maximum URLs to process
`url_filter`	string	No	Regex pattern to filter URLs

Response

{
  "batch_id": "770e8400-e29b-41d4-a716-446655440000",
  "jobs_queued": 100
}

Example

curl -X POST https://api.meter.sh/discover/550e8400.../execute \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
    "max_urls": 50
  }'

Create discovery schedule

Create a recurring schedule from discovered URLs.

POST /discover/{discovery_id}/schedule

Request body

Interval
Cron

{
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "interval_seconds": 86400,
  "webhook_url": "https://your-app.com/webhooks/meter",
  "max_urls": 500
}

{
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "cron_expression": "0 9 * * *",
  "webhook_url": "https://your-app.com/webhooks/meter",
  "max_urls": 500
}

Parameters

Parameter	Type	Required	Description
`strategy_id`	string	Yes	Strategy UUID to use
`interval_seconds`	integer	No*	Seconds between runs
`cron_expression`	string	No*	Cron schedule expression
`webhook_url`	string	No	URL for completion notifications
`max_urls`	integer	No	Maximum URLs per run
`url_filter`	string	No	Regex pattern to filter URLs

*Either interval_seconds or cron_expression is required.

Response

{
  "schedule_id": "880e8400-e29b-41d4-a716-446655440000",
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "urls": ["https://shop.com/products/widget-a", "..."],
  "schedule_type": "interval",
  "interval_seconds": 86400,
  "enabled": true,
  "webhook_url": "https://your-app.com/webhooks/meter",
  "next_run_at": "2025-01-16T10:30:00Z",
  "created_at": "2025-01-15T10:30:00Z"
}

Example

curl -X POST https://api.meter.sh/discover/550e8400.../schedule \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
    "interval_seconds": 86400,
    "webhook_url": "https://your-app.com/webhooks/meter"
  }'

List discoveries

List all discoveries for the authenticated user.

GET /discoveries?status={status}&limit=20&offset=0

Query parameters

Parameter	Type	Required	Description
`status`	string	No	Filter by status
`limit`	integer	No	Max results (default: 20, max: 100)
`offset`	integer	No	Results to skip (default: 0)

Response

Array of discovery objects (same format as Get discovery).

Example

# All discoveries
curl https://api.meter.sh/discoveries \
  -H "Authorization: Bearer sk_live_..."

# Only completed
curl "https://api.meter.sh/discoveries?status=completed" \
  -H "Authorization: Bearer sk_live_..."

Delete discovery

Delete a discovery and its associated URLs.

DELETE /discover/{discovery_id}

Response

{
  "message": "Discovery deleted",
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example

curl -X DELETE https://api.meter.sh/discover/550e8400... \
  -H "Authorization: Bearer sk_live_..."

Polling for completion

Since discovery runs asynchronously, poll until status is completed or failed:

async function waitForDiscovery(discoveryId) {
  while (true) {
    const response = await fetch(
      `https://api.meter.sh/discover/${discoveryId}`,
      {
        headers: {
          Authorization: `Bearer ${process.env.METER_API_KEY}`,
        },
      }
    );

    const discovery = await response.json();

    if (discovery.status === "completed") {
      return discovery;
    } else if (discovery.status === "failed") {
      throw new Error(discovery.errors.join(", "));
    }

    // Wait 2 seconds before next check
    await new Promise((resolve) => setTimeout(resolve, 2000));
  }
}

Error responses

Status	Description
`400`	Invalid request (see discovery-specific errors below)
`401`	Invalid or missing API key
`404`	Discovery or strategy not found
`500`	Internal server error
`503`	Service temporarily unavailable

Discovery-specific errors

Status	Error	Description
`400`	Invalid url_filter regex	The regex pattern is invalid
`400`	Discovery not ready	Tried to execute before completion
`400`	No URLs match the filter	Filter excluded all URLs

See REST API Errors for detailed error handling.

Next steps

Site Crawling Guide

Step-by-step crawling guide

Site Crawling Concepts

Understand how site crawling works

Schedule Endpoints

Manage recurring scrapes

Job Endpoints

Track batch job results

Need help?

Email me at mckinnon@meter.sh

Python SDK

REST API

​Discovery Endpoints

​Start discovery

​Request body

​Discovery parameters

​Sitemap

​Pagination

​Link Pattern

​Response

​Example

​Get discovery status

​Response

​Example

​Get discovery URLs

​Query parameters

​Response

​Example

​Execute discovery

​Request body

​Parameters

​Response

​Example

​Create discovery schedule

​Request body

​Parameters

​Response

​Example

​List discoveries

​Query parameters

​Response

​Example

​Delete discovery

​Response

​Example

​Polling for completion

​Error responses

​Discovery-specific errors

​Next steps

Site Crawling Guide

Site Crawling Concepts

Schedule Endpoints

Job Endpoints

​Need help?

Discovery Endpoints

Start discovery

Request body

Discovery parameters

Sitemap

Pagination

Link Pattern

Response

Example

Get discovery status

Response

Example

Get discovery URLs

Query parameters

Response

Example

Execute discovery

Request body

Parameters

Response

Example

Create discovery schedule

Request body

Parameters

Response

Example

List discoveries

Query parameters

Response

Example

Delete discovery

Response

Example

Polling for completion

Error responses

Discovery-specific errors

Next steps

Need help?