Skip to main content

Post-Extraction Filtering

Post-extraction filtering lets you define conditions that extraction results must match. Only items that pass the filter are kept — everything else is discarded. This runs after extraction, so you’re filtering structured data, not raw HTML.

When to use filters

  • Keep only products above a price threshold
  • Exclude items in certain categories
  • Match URLs against a pattern
  • Filter for items that have a specific field present
  • Combine conditions with AND/OR logic

Filter structure

A filter configuration has two parts:
  1. Modeall (AND) or any (OR)
  2. Conditions — a list of field/operator/value checks
{
  "mode": "all",
  "conditions": [
    {"field": "price", "operator": "gt", "value": "50"},
    {"field": "category", "operator": "contains", "value": "electronics"}
  ]
}
With mode: "all", an item must match every condition. With mode: "any", an item matches if at least one condition is true.

Operator reference

OperatorDescriptionRequires value
containsField contains substringYes
not_containsField does not contain substringYes
equalsExact matchYes
not_equalsNot an exact matchYes
regex_matchRegex pattern matchYes
existsField exists and is non-emptyNo
not_existsField is missing or emptyNo
gtGreater than (numeric comparison)Yes
ltLess than (numeric comparison)Yes
All string operators support an optional case_sensitive flag (default: false).

Where filters apply

Strategy generation

Pass a filter_config when generating a strategy to filter results at extraction time:
result = client.generate_strategy(
    url="https://shop.com/products",
    description="Extract product listings",
    name="Premium Products",
    filter_config={
        "mode": "all",
        "conditions": [
            {"field": "price", "operator": "gt", "value": "100"},
            {"field": "in_stock", "operator": "equals", "value": "true"}
        ]
    }
)

Strategy updates

Update an existing strategy’s filter:
curl -X PATCH https://api.meter.sh/api/strategies/{strategy_id} \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "filter_config": {
      "mode": "any",
      "conditions": [
        {"field": "category", "operator": "equals", "value": "electronics"},
        {"field": "category", "operator": "equals", "value": "computers"}
      ]
    }
  }'

Workflow edges

Filters on workflow edges control which results pass between nodes. See Workflows for details.
from meter_sdk.workflow import Workflow, Filter

workflow = Workflow("Filtered Pipeline")
index = workflow.start("index", index_strategy_id, urls=["https://news.com"])

# Only pass articles in the technology section
tech = index.then(
    "tech",
    article_strategy_id,
    url_field="link",
    filter=Filter.contains("category", "technology")
)

Watch creation

Pass filter_config when creating a watch (combined strategy + schedule):
curl -X POST https://api.meter.sh/api/watch \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://jobs.com/listings",
    "description": "Extract job listings",
    "name": "Remote Jobs",
    "interval_seconds": 3600,
    "filter_config": {
      "mode": "all",
      "conditions": [
        {"field": "location", "operator": "contains", "value": "remote"}
      ]
    }
  }'

Examples

Price threshold filtering

Keep only items above a minimum price:
{
  "mode": "all",
  "conditions": [
    {"field": "price", "operator": "gt", "value": "50"}
  ]
}

Category filtering with OR logic

Keep items in any of several categories:
{
  "mode": "any",
  "conditions": [
    {"field": "category", "operator": "equals", "value": "electronics"},
    {"field": "category", "operator": "equals", "value": "computers"},
    {"field": "category", "operator": "equals", "value": "phones"}
  ]
}

Regex URL pattern matching

Keep items whose URL matches a pattern:
{
  "mode": "all",
  "conditions": [
    {"field": "url", "operator": "regex_match", "value": "/products/\\d+$"}
  ]
}

Combined AND conditions

Keep only in-stock premium products:
{
  "mode": "all",
  "conditions": [
    {"field": "price", "operator": "gt", "value": "100"},
    {"field": "in_stock", "operator": "equals", "value": "true"},
    {"field": "image_url", "operator": "exists"}
  ]
}

Next steps

Strategies

Apply filters during strategy generation

Workflows

Use filters on workflow edges between nodes

Output Schemas

Define extraction result structure

Strategy API

Update strategy filters via REST API

Need help?

Email me at mckinnon@meter.sh