Skip to main content

Strategies

A strategy is a reusable extraction plan that tells Meter how to extract data from a webpage. Think of it like a recipe: you create it once by describing what you want, and Meter’s AI figures out the exact selectors and extraction logic.

What is a strategy?

A strategy contains:
  • Extraction method: Either CSS Path (for traditional HTML) or API Path (for JavaScript-heavy sites)
  • Field definitions mapping selectors or API responses to your data fields
  • Extraction metadata like item containers, scopes, or API endpoints
Meter automatically detects which extraction method works best for each site—you don’t need to choose. Once created, a strategy can be reused unlimited times across similar pages—no LLM costs after initial generation.

How strategies are generated

Meter uses AI to analyze your target webpage and generate precise extraction strategies:
  1. You provide: A URL and plain-English description of what to extract
  2. Meter analyzes: The page structure, HTML patterns, and content layout
  3. AI generates: CSS selectors and extraction rules optimized for that page
  4. You get: A reusable strategy plus preview data showing what was extracted
This approach combines the intelligence of AI setup with the speed and reliability of traditional scraping.

Example

from meter_sdk import MeterClient

client = MeterClient(api_key="sk_live_...")

# Generate a strategy
result = client.generate_strategy(
    url="https://news.ycombinator.com",
    description="Extract post titles and scores",
    name="HN Front Page"
)

# Check the preview
print(f"Extracted {len(result['preview_data'])} items")
for item in result['preview_data'][:3]:
    print(item)

# Output:
# {'title': 'Launch HN: ...', 'score': 42}
# {'title': 'Ask HN: ...', 'score': 15}
# ...

Extraction methods

Meter automatically selects the best extraction method for each site. You describe what you want, and Meter figures out how to get it.

CSS Path extraction

For traditional HTML pages, Meter generates CSS selectors that target the content you need. Best for:
  • Static HTML sites
  • Server-rendered pages
  • Sites with stable DOM structure
  • Blogs, news sites, and content pages
CSS Path extraction is fast and reliable for sites where content is present in the initial HTML response.

API Path extraction

For JavaScript-heavy sites, Meter automatically discovers the underlying APIs that power the page and extracts data directly from them.
1

Automatic detection

Meter identifies when a page relies on JavaScript to load its content.
2

API discovery

The data source APIs are automatically identified—no reverse engineering required.
3

Authentication handled

Any required tokens or session data are handled automatically.
4

Direct extraction

Data is extracted directly from API responses—cleaner and more reliable than parsing the DOM.
Best for:
  • Single-page applications (React, Vue, Angular)
  • Financial data sites
  • Dynamic dashboards
  • Sites with client-side rendering
API Path extraction often returns cleaner, more structured data than DOM scraping—and it’s more resilient to UI changes.

Automatic token handling

JavaScript-heavy sites often require authentication tokens to access their APIs. Meter handles this automatically—you don’t need to worry about the details.
Many sites protect their APIs with CSRF tokens. Meter detects and includes these tokens automatically, so your extractions work without manual configuration.
Session state is maintained across the extraction process. If a site requires cookies to access its APIs, Meter handles this for you.
API keys, authorization headers, and other required headers are automatically included in requests.
Some sites require multiple API calls in sequence. Meter handles these dependencies and chains requests in the correct order.

Real-world example: Financial data

Consider extracting stock quotes from a financial data site. When you visit the page, you see prices updating in real-time—but the HTML source shows almost nothing. The data is loaded via JavaScript from a hidden API. With traditional scraping, you would need to:
  1. Reverse-engineer the API endpoints
  2. Figure out the authentication requirements
  3. Handle CSRF tokens and session management
  4. Parse the JSON response format
With Meter, you simply describe what you want: “Extract stock symbol, current price, and daily change.” Meter automatically:
  • Discovers the quote API endpoint
  • Captures any required authentication tokens
  • Extracts the structured data from API responses
  • Returns clean, normalized data
The result is faster extraction, cleaner data, and a strategy that’s resilient to UI redesigns—because you’re hitting the same API the site uses internally.

Strategy lifecycle

  1. Generate: Create strategy with AI
  2. Preview: Check the preview_data to verify extraction
  3. Refine (optional): Provide feedback if something’s missing
  4. Use: Run jobs with the strategy
  5. Monitor: Check if results are still accurate over time

Refining strategies

If the initial extraction isn’t perfect, refine it with feedback:
# Initial generation
result = client.generate_strategy(
    url="https://shop.com/products",
    description="Extract product info",
    name="Product Scraper"
)

# Check preview - oops, missing images
print(result['preview_data'])  # No 'image' field

# Refine with feedback
refined = client.refine_strategy(
    strategy_id=result['strategy_id'],
    feedback="Also extract product images"
)

# Check again
print(refined['preview_data'])  # Now has 'image' field
Refinement uses cached HTML from the initial generation, so it’s fast and doesn’t re-fetch the page.

When to create new strategies

Create a new strategy when:

Different Site Structure

Each website layout needs its own strategy

Different Data Fields

Different extraction requirements need different strategies

Major Site Redesign

If a site changes its HTML structure significantly

Different Page Types

Product pages vs. category pages need separate strategies

Reusing strategies

You can reuse the same strategy across:
  • Multiple URLs on the same site (e.g., different products)
  • Pagination (if the structure is consistent)
  • Similar pages (if they share HTML structure)
# Generate once
strategy = client.generate_strategy(
    url="https://shop.com/product/123",
    description="Extract name, price, description"
)
strategy_id = strategy['strategy_id']

# Reuse for different products
job1 = client.create_job(strategy_id, "https://shop.com/product/123")
job2 = client.create_job(strategy_id, "https://shop.com/product/456")
job3 = client.create_job(strategy_id, "https://shop.com/product/789")

Strategy management

Listing strategies

# Get all strategies
strategies = client.list_strategies(limit=20)

for strategy in strategies:
    print(f"{strategy['name']}: {strategy['strategy_id']}")

Getting strategy details

strategy = client.get_strategy(strategy_id)

print(f"Name: {strategy['name']}")
print(f"Description: {strategy['description']}")
print(f"Created: {strategy['created_at']}")
print(f"Preview: {strategy['preview_data']}")

Deleting strategies

# Delete a strategy (also deletes associated jobs and schedules)
client.delete_strategy(strategy_id)
Deleting a strategy also deletes all associated jobs and schedules. This action cannot be undone.

Best practices

Give strategies clear names that describe their purpose:Good: "HN Front Page - Titles and Scores" Bad: "Strategy 1"This helps when managing multiple strategies.
Always check preview_data before creating jobs:
result = client.generate_strategy(...)

# Verify all required fields are present
required_fields = {'title', 'price', 'image'}
actual_fields = set(result['preview_data'][0].keys())

if not required_fields.issubset(actual_fields):
    missing = required_fields - actual_fields
    client.refine_strategy(
        strategy_id=result['strategy_id'],
        feedback=f"Also extract: {', '.join(missing)}"
    )
Provide clear, specific extraction instructions:Good: “Extract product name, price with currency, main image URL, and stock availability from the product grid”Bad: “Get products”Specific descriptions lead to better strategies on the first try.
Strategies can break if sites change their HTML:
# Check recent jobs for failures
jobs = client.list_jobs(
    strategy_id=strategy_id,
    status='failed',
    limit=5
)

if len(jobs) > 0:
    print(f"Strategy {strategy_id} may need updating")

Troubleshooting

Possible causes:
  • URL is not accessible
  • Page requires authentication
  • Description is too vague
Solutions:
  • Verify the URL loads in a browser
  • For auth-required pages, contact support
  • Make your description more specific
Problem: Some expected fields aren’t in preview_dataSolution: Use refinement:
client.refine_strategy(
    strategy_id=strategy_id,
    feedback="Also extract the product SKU and brand name"
)
Problem: Jobs that worked before now fail or return incorrect dataCause: Website HTML structure changedSolutions:
  1. Generate a new strategy for the updated site
  2. Update your jobs to use the new strategy
  3. Delete the old strategy
Problem: Meter detected an API but returns no dataPossible causes:
  • The API requires authentication that expired
  • The site changed its API endpoints
  • Rate limiting is blocking requests
Solutions:
  • Generate a fresh strategy to capture new authentication tokens
  • If the site has changed significantly, the strategy may need regeneration
  • For rate-limited sites, reduce scrape frequency

Next steps

Need help?

Email me at mckinnon@meter.sh