Change Detection
Meter’s change detection system identifies when scraped content has actually changed, filtering out layout updates, ads, and timestamps that don’t represent meaningful updates.Why change detection matters
Traditional scraping wastes resources by re-processing unchanged data. For RAG systems, this means:- Wasted embeddings: Re-embedding identical content
- Stale timestamps: Triggers on irrelevant date changes
- Layout noise: Reacting to CSS class or ad changes
- Higher costs: Unnecessary API calls and storage
How it works
Meter generates multiple signatures for each scrape job:Content Hash
A hash of the extracted data itself. Changes only if the actual content changes.
Structural Signature
A fingerprint of the content structure and patterns. Detects additions, removals, and reordering.
Content hash
The content hash is a cryptographic hash of the extracted data:- Text content is different
- Prices, numbers, or values change
- New items appear or old ones disappear
- Item order changes significantly
- CSS classes or styling
- Ad content (if not part of extraction)
- Timestamps (if not extracted)
Structural signature
The structural signature captures patterns in the data:- Number of items changing
- Field presence/absence
- Data type changes
- List length changes
Comparing jobs
Automatic comparison (schedules)
Schedules automatically compare new jobs with previous ones:Manual comparison
Compare two specific jobs:Change detection strategies
Pull-based monitoring
Poll for changes periodically:Webhook-based monitoring
Receive immediate notifications:Use cases
RAG system updates
RAG system updates
Only re-embed when content changes:Savings: Up to 95% reduction in embedding costs
Price monitoring
Price monitoring
Alert only on actual price changes:
Content freshness tracking
Content freshness tracking
Track when content was last updated:
Filtering noise
Meter’s change detection automatically filters:- Layout changes: CSS classes, div structure changes
- Ad rotations: If ads aren’t part of your extraction strategy
- Timestamps: If not included in extraction fields
- Order changes: Minor reordering that doesn’t affect content
Focus extractions
Be specific about what you extract:Compare strategically
Only compare the fields that matter:Roadmap: Semantic similarity
Coming soon: Semantic similarity detection using embeddings to detect meaning-level changes even when wording differs.
- Semantic comparison of text content
- Paraphrase detection
- Meaning-level change scoring
Best practices
Mark changes as seen promptly
Mark changes as seen promptly
Avoid duplicate processing by marking changes as seen:
Handle empty changes gracefully
Handle empty changes gracefully
Not all scrapes will detect changes:
Log change detection for debugging
Log change detection for debugging
Track when changes are detected:
Troubleshooting
Too many false positives
Too many false positives
Problem: Changes detected for minor updatesSolutions:
- Make extraction more specific (exclude dynamic elements)
- Regenerate strategy with clearer description
- Implement custom filtering logic on top of Meter’s detection
Missing real changes
Missing real changes
Problem: Actual changes aren’t detectedPossible causes:
- Changes already marked as seen
- Looking at wrong schedule
- Strategy extraction failing
- Use
mark_seen=Falseto check without affecting state - Verify schedule ID
- Check recent jobs for failures:
client.list_jobs(status='failed')
Understanding change signatures
Understanding change signatures
Problem: Want to understand why change was detectedSolution: Compare jobs manually:
Next steps
Pull-Based Monitoring
Implement change polling in your application
Webhooks
Set up real-time change notifications
RAG Integration
Connect change detection to your vector database
Jobs API Reference
Explore job comparison methods