Manifest Comparison
Manifest comparison lets you submit a list of known items (a “manifest”) and compare it against your scrape results using fuzzy matching. Meter identifies which items were added, removed, or still present — even when names don’t match exactly.When to use manifest comparison
- Track portfolio companies on a firm’s website — detect when new companies are added or removed
- Monitor team pages for personnel changes
- Compare a known product catalog against a competitor’s current listings
- Verify that a list of partners or clients on a website matches your records
How it works
- You scrape a page using a strategy (e.g., extract company names from a portfolio page)
- You submit your manifest — a JSON list of items you already know about
- Meter fuzzy-matches each manifest item against the scraped results
- You get back three lists: matched, added, and removed
Fuzzy matching
Meter uses fuzzy string matching to handle common variations:| Manifest | Website | Score |
|---|---|---|
| Acme Corp | Acme Corporation | 90 |
| Beta Inc | Beta Industries | 86 |
| JP Morgan | JPMorgan Chase | 85 |
| Gamma Solutions | Gamma Solutions Inc | 95 |
Fuzzy matching handles abbreviations (“Corp” → “Corporation”), word order differences, and minor spelling variations. It does not handle semantic equivalence like “Facebook” → “Meta Platforms” or “IBM” → “International Business Machines”. For those cases, consider lowering the threshold or using exact field matches on other identifiers (like URLs).
Match fields
You choose which field(s) to match on viamatch_fields. For example, if your scrape results have name and website fields, you can match on ["name"] or ["name", "website"].
When multiple match fields are provided, Meter takes the best score across fields. This means an exact URL match will count even if the name is slightly different.
Quick example
Response
Typical workflow
1. Create a strategy with an output schema
Define the exact fields you want extracted:2. Schedule regular scrapes
3. Compare your manifest whenever you need
Tuning the threshold
| Threshold | Use case |
|---|---|
| 90-100 | Strict matching — names must be nearly identical |
| 80 (default) | Balanced — handles “Corp” vs “Corporation”, “Inc” vs “Industries” |
| 60-70 | Loose — catches more variations but may produce false positives |
Endpoints
There are two ways to compare a manifest:| Endpoint | Description |
|---|---|
POST /api/strategies/{id}/compare-manifest | Compare against the latest completed job for a strategy |
POST /api/jobs/{id}/compare-manifest | Compare against a specific job’s results |
Best practices
Use output schemas for consistent field names
Use output schemas for consistent field names
Define an
output_schema when creating your strategy so that field names are predictable and consistent across scrapes. This makes match_fields reliable.Match on the most distinctive field
Match on the most distinctive field
Company names are usually the best match field. URLs can be a good secondary field. Avoid matching on generic fields like “description” where content varies significantly.
Include multiple match fields as a fallback
Include multiple match fields as a fallback
If you have both
name and website fields, use match_fields: ["name", "website"]. Meter takes the best score across fields, so an exact URL match will work even if the name format differs.Your manifest doesn't need to match the scrape schema
Your manifest doesn't need to match the scrape schema
Your manifest items only need to contain the fields listed in
match_fields. Extra fields are preserved in the response but ignored during matching.Next steps
Output Schemas
Define consistent extraction shapes
Change Detection
Automatic change tracking between scrapes
Schedules
Automate regular scrapes
REST API Reference
Full endpoint documentation