Post-Extraction Filtering
Post-extraction filtering lets you define conditions that extraction results must match. Only items that pass the filter are kept — everything else is discarded. This runs after extraction, so you’re filtering structured data, not raw HTML.When to use filters
- Keep only products above a price threshold
- Exclude items in certain categories
- Match URLs against a pattern
- Filter for items that have a specific field present
- Combine conditions with AND/OR logic
Filter structure
A filter configuration has two parts:- Mode —
all(AND) orany(OR) - Conditions — a list of field/operator/value checks
mode: "all", an item must match every condition. With mode: "any", an item matches if at least one condition is true.
Operator reference
| Operator | Description | Requires value |
|---|---|---|
contains | Field contains substring | Yes |
not_contains | Field does not contain substring | Yes |
equals | Exact match | Yes |
not_equals | Not an exact match | Yes |
regex_match | Regex pattern match | Yes |
exists | Field exists and is non-empty | No |
not_exists | Field is missing or empty | No |
gt | Greater than (numeric comparison) | Yes |
lt | Less than (numeric comparison) | Yes |
case_sensitive flag (default: false).
Where filters apply
Strategy generation
Pass afilter_config when generating a strategy to filter results at extraction time:
Strategy updates
Update an existing strategy’s filter:Workflow edges
Filters on workflow edges control which results pass between nodes. See Workflows for details.Watch creation
Passfilter_config when creating a watch (combined strategy + schedule):
Examples
Price threshold filtering
Keep only items above a minimum price:Category filtering with OR logic
Keep items in any of several categories:Regex URL pattern matching
Keep items whose URL matches a pattern:Combined AND conditions
Keep only in-stock premium products:Next steps
Strategies
Apply filters during strategy generation
Workflows
Use filters on workflow edges between nodes
Output Schemas
Define extraction result structure
Strategy API
Update strategy filters via REST API