Skip to main content

Schedules

A schedule automatically runs scrape jobs at specified intervals or cron times. Schedules are perfect for monitoring websites for changes without manual intervention.

What is a schedule?

A schedule combines:
  • A strategy: What to extract
  • A URL: Where to scrape
  • A timing rule: When to scrape (interval or cron)
  • Optional webhook: Where to send change notifications
Once created, schedules run automatically, creating jobs at the specified times.

Creating schedules

Interval-based schedules

Run jobs at regular intervals:
from meter_sdk import MeterClient

client = MeterClient(api_key="sk_live_...")

# Run every hour (3600 seconds)
schedule = client.create_schedule(
    strategy_id="your-strategy-uuid",
    url="https://example.com/page",
    interval_seconds=3600
)

print(f"Schedule ID: {schedule['id']}")
print(f"Next run: {schedule['next_run_at']}")
Common intervals:
  • 15 minutes: 900 seconds
  • 1 hour: 3600 seconds
  • 6 hours: 21600 seconds
  • Daily: 86400 seconds

Cron-based schedules

Use cron expressions for precise scheduling:
# Run daily at 9 AM
schedule = client.create_schedule(
    strategy_id="your-strategy-uuid",
    url="https://example.com/page",
    cron_expression="0 9 * * *"
)

# Run every weekday at 8 AM
schedule = client.create_schedule(
    strategy_id="your-strategy-uuid",
    url="https://example.com/page",
    cron_expression="0 8 * * 1-5"
)

# Run every 6 hours
schedule = client.create_schedule(
    strategy_id="your-strategy-uuid",
    url="https://example.com/page",
    cron_expression="0 */6 * * *"
)
Use crontab.guru to build and test cron expressions.

Webhooks

Receive real-time notifications when jobs complete:
schedule = client.create_schedule(
    strategy_id="your-strategy-uuid",
    url="https://example.com/products",
    interval_seconds=3600,
    webhook_url="https://your-app.com/webhooks/meter"
)
Meter will POST job results to your webhook URL. See the webhooks guide for payload details and implementation.

Pull-based change detection

Instead of webhooks, poll for changes:
# Create schedule without webhook
schedule = client.create_schedule(
    strategy_id="your-strategy-uuid",
    url="https://example.com/products",
    interval_seconds=3600
)

# Later, check for changes
changes = client.get_schedule_changes(
    schedule_id=schedule['id'],
    mark_seen=True  # Mark changes as seen after reading
)

if changes['count'] > 0:
    print(f"Found {changes['count']} jobs with changes")
    for change in changes['changes']:
        print(f"Job {change['job_id']}: {change['item_count']} items")
        # Process change['results']
Set mark_seen=False to preview changes without marking them as read.
This is useful for:
  • Batch processing: Check for changes once per hour, process in bulk
  • Webhook alternatives: When webhooks aren’t feasible
  • Manual review: Preview changes before processing

Managing schedules

Listing schedules

schedules = client.list_schedules()

for schedule in schedules:
    print(f"Schedule {schedule['id']}:")
    print(f"  Enabled: {schedule['enabled']}")
    print(f"  Type: {schedule['schedule_type']}")  # 'interval' or 'cron'
    print(f"  Next run: {schedule['next_run_at']}")

Updating schedules

# Disable a schedule temporarily
client.update_schedule(
    schedule_id,
    enabled=False
)

# Change the interval
client.update_schedule(
    schedule_id,
    interval_seconds=7200  # Every 2 hours instead
)

# Switch to cron
client.update_schedule(
    schedule_id,
    cron_expression="0 10 * * *"  # Daily at 10 AM
)

# Update webhook URL
client.update_schedule(
    schedule_id,
    webhook_url="https://your-new-domain.com/webhooks/meter"
)

# Remove webhook (use pull-based instead)
client.update_schedule(
    schedule_id,
    webhook_url=None
)

Deleting schedules

# Delete a schedule (stops future jobs)
client.delete_schedule(schedule_id)
Deleting a schedule doesn’t delete past jobs. Use list_jobs() to access historical data.

Monitoring schedules

Check recent runs

# Get jobs created by this schedule
jobs = client.list_jobs(
    strategy_id=schedule['strategy_id'],
    limit=20
)

for job in jobs:
    print(f"Job {job['id']} ({job['created_at']}):")
    print(f"  Status: {job['status']}")
    print(f"  Items: {job['item_count']}")

Detect failures

# Check for recent failures
failed_jobs = client.list_jobs(
    strategy_id=schedule['strategy_id'],
    status='failed',
    limit=5
)

if len(failed_jobs) > 0:
    print(f"Warning: {len(failed_jobs)} recent failures")
    print(f"Error: {failed_jobs[0]['error']}")

Best practices

Balance freshness with cost and load:Every 15-30 minutes:
  • Stock prices, sports scores
  • Time-sensitive monitoring
  • High-value data
Every 1-6 hours:
  • E-commerce products
  • News articles
  • Job listings
  • Most monitoring use cases
Daily:
  • Documentation, policies
  • Blog posts
  • Low-frequency content
Webhooks are ideal when:
  • Changes need immediate action
  • Building real-time systems
  • Triggering downstream workflows
Pull-based is better when:
  • Batch processing changes
  • Webhooks aren’t feasible (firewall, no public endpoint)
  • Manual review before processing
Set up alerts for schedule failures:
import time

def check_schedule_health(schedule_id, threshold=3):
    """Alert if >threshold failures in recent jobs"""
    failed = client.list_jobs(
        strategy_id=schedule['strategy_id'],
        status='failed',
        limit=10
    )

    if len(failed) >= threshold:
        send_alert(f"Schedule {schedule_id} has {len(failed)} failures")
Temporarily disable schedules when doing maintenance:
# Disable before maintenance
client.update_schedule(schedule_id, enabled=False)

# Do maintenance work
update_strategy_or_database()

# Re-enable after
client.update_schedule(schedule_id, enabled=True)

Change detection workflow

Schedules automatically compare jobs to detect changes: When you call get_schedule_changes(), Meter returns only jobs where content actually changed.

Troubleshooting

Possible causes:
  • Schedule is disabled
  • Cron expression is incorrect
  • Server issues
Solutions:
  • Check enabled field: client.get_schedule(schedule_id)
  • Verify cron expression at crontab.guru
  • Check next_run_at to see when it’s scheduled
Problem: get_schedule_changes() returns 0 results but you expect changesPossible causes:
  • Content genuinely hasn’t changed
  • Changes already marked as seen
  • Looking at wrong schedule
Solutions:
  • Use mark_seen=False to check without marking
  • Compare jobs manually: client.compare_jobs(job1_id, job2_id)
  • Verify schedule ID is correct
Problem: Webhooks aren’t being receivedSolutions:
  • Verify webhook URL is publicly accessible
  • Check endpoint responds with 200 OK within 30 seconds
  • Test webhook with tools like webhook.site
  • Switch to pull-based if webhooks aren’t working

Next steps

Need help?

Email me at [email protected]