Skip to main content

Discovery Endpoints

Discover URLs on websites and execute batch scrapes via HTTP.

Start discovery

Start URL discovery using sitemap, pagination, or link pattern.
POST /discover

Request body

{
  "discovery": {
    "method": "sitemap",
    "sitemap_url": "https://shop.com/sitemap.xml",
    "url_pattern": "products/*/",
    "max_urls": 1000
  }
}

Discovery parameters

Sitemap

ParameterTypeRequiredDescription
methodstringYesMust be "sitemap"
sitemap_urlstringYesURL to sitemap.xml file
url_patternstringNoGlob pattern to filter URLs
max_urlsintegerNoMaximum URLs to discover (default: 1000, max: 10000)

Pagination

ParameterTypeRequiredDescription
methodstringYesMust be "pagination"
url_templatestringYesURL with {n} placeholder
url_patternstringNoGlob pattern to filter URLs
start_indexintegerNoFirst page number (default: 1)
stepintegerNoIncrement between pages (default: 1)
max_pagesintegerNoMaximum pages to generate (default: 100, max: 1000)
ParameterTypeRequiredDescription
methodstringYesMust be "link_pattern"
seed_urlstringYesStarting URL for crawl
link_patternstringYesGlob pattern for URLs to collect
navigation_patternstringNoPattern for pages to visit during crawl
max_depthintegerNoHow deep to crawl (default: 2, max: 10)
max_urlsintegerNoMaximum URLs to discover (default: 1000, max: 10000)

Response

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml"
}

Example

curl -X POST https://api.meter.sh/discover \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "discovery": {
      "method": "sitemap",
      "sitemap_url": "https://shop.com/sitemap.xml",
      "max_urls": 500
    }
  }'

Get discovery status

Get discovery status and results.
GET /discover/{discovery_id}

Response

When pending/running:
{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml"
}
When completed:
{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml",
  "total_urls": 847,
  "filtered_count": 847,
  "inferred_pattern": "/products/[slug]/",
  "url_patterns": {
    "/products/": 847
  },
  "sample_urls": [
    "https://shop.com/products/widget-a",
    "https://shop.com/products/widget-b"
  ],
  "errors": []
}
When failed:
{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "discovery_method": "sitemap",
  "root_url": "https://shop.com/sitemap.xml",
  "errors": ["Sitemap not accessible: 404 Not Found"]
}
Status values: pending, running, completed, failed

Example

curl https://api.meter.sh/discover/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer sk_live_..."

Get discovery URLs

Fetch all discovered URLs with pagination.
GET /discover/{discovery_id}/urls?limit=1000&offset=0

Query parameters

ParameterTypeRequiredDescription
limitintegerNoMax URLs to return (default: 1000, max: 10000)
offsetintegerNoResults to skip (default: 0)

Response

{
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000",
  "urls": [
    "https://shop.com/products/widget-a",
    "https://shop.com/products/widget-b",
    "..."
  ],
  "total": 847,
  "limit": 1000,
  "offset": 0
}

Example

# Get first 100 URLs
curl "https://api.meter.sh/discover/550e8400.../urls?limit=100" \
  -H "Authorization: Bearer sk_live_..."

# Get next 100 URLs
curl "https://api.meter.sh/discover/550e8400.../urls?limit=100&offset=100" \
  -H "Authorization: Bearer sk_live_..."

Execute discovery

Execute a one-time batch scrape from discovered URLs.
POST /discover/{discovery_id}/execute

Request body

{
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "max_urls": 100,
  "url_filter": ".*widget.*"
}

Parameters

ParameterTypeRequiredDescription
strategy_idstringYesStrategy UUID to use for scraping
max_urlsintegerNoMaximum URLs to process
url_filterstringNoRegex pattern to filter URLs

Response

{
  "batch_id": "770e8400-e29b-41d4-a716-446655440000",
  "jobs_queued": 100
}

Example

curl -X POST https://api.meter.sh/discover/550e8400.../execute \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
    "max_urls": 50
  }'

Create discovery schedule

Create a recurring schedule from discovered URLs.
POST /discover/{discovery_id}/schedule

Request body

{
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "interval_seconds": 86400,
  "webhook_url": "https://your-app.com/webhooks/meter",
  "max_urls": 500
}

Parameters

ParameterTypeRequiredDescription
strategy_idstringYesStrategy UUID to use
interval_secondsintegerNo*Seconds between runs
cron_expressionstringNo*Cron schedule expression
webhook_urlstringNoURL for completion notifications
max_urlsintegerNoMaximum URLs per run
url_filterstringNoRegex pattern to filter URLs
*Either interval_seconds or cron_expression is required.

Response

{
  "id": "880e8400-e29b-41d4-a716-446655440000",
  "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
  "urls": ["https://shop.com/products/widget-a", "..."],
  "schedule_type": "interval",
  "interval_seconds": 86400,
  "enabled": true,
  "webhook_url": "https://your-app.com/webhooks/meter",
  "next_run_at": "2025-01-16T10:30:00Z",
  "created_at": "2025-01-15T10:30:00Z"
}

Example

curl -X POST https://api.meter.sh/discover/550e8400.../schedule \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "660e8400-e29b-41d4-a716-446655440000",
    "interval_seconds": 86400,
    "webhook_url": "https://your-app.com/webhooks/meter"
  }'

List discoveries

List all discoveries for the authenticated user.
GET /discoveries?status={status}&limit=20&offset=0

Query parameters

ParameterTypeRequiredDescription
statusstringNoFilter by status
limitintegerNoMax results (default: 20, max: 100)
offsetintegerNoResults to skip (default: 0)

Response

Array of discovery objects (same format as Get discovery).

Example

# All discoveries
curl https://api.meter.sh/discoveries \
  -H "Authorization: Bearer sk_live_..."

# Only completed
curl "https://api.meter.sh/discoveries?status=completed" \
  -H "Authorization: Bearer sk_live_..."

Delete discovery

Delete a discovery and its associated URLs.
DELETE /discover/{discovery_id}

Response

{
  "message": "Discovery deleted",
  "discovery_id": "550e8400-e29b-41d4-a716-446655440000"
}

Example

curl -X DELETE https://api.meter.sh/discover/550e8400... \
  -H "Authorization: Bearer sk_live_..."

Polling for completion

Since discovery runs asynchronously, poll until status is completed or failed:
async function waitForDiscovery(discoveryId) {
  while (true) {
    const response = await fetch(
      `https://api.meter.sh/discover/${discoveryId}`,
      {
        headers: {
          Authorization: `Bearer ${process.env.METER_API_KEY}`,
        },
      }
    );

    const discovery = await response.json();

    if (discovery.status === "completed") {
      return discovery;
    } else if (discovery.status === "failed") {
      throw new Error(discovery.errors.join(", "));
    }

    // Wait 2 seconds before next check
    await new Promise((resolve) => setTimeout(resolve, 2000));
  }
}

Error responses

See REST API Errors for common error codes.

Discovery-specific errors

StatusErrorDescription
400Invalid url_filter regexThe regex pattern is invalid
400Discovery not readyTried to execute before completion
400No URLs match the filterFilter excluded all URLs
404Discovery not foundInvalid discovery ID
404Strategy not foundInvalid strategy ID

Next steps

Need help?

Email me at [email protected]