Skip to main content

Manifest Comparison

Manifest comparison lets you submit a list of known items (a “manifest”) and compare it against your scrape results using fuzzy matching. Meter identifies which items were added, removed, or still present — even when names don’t match exactly.

When to use manifest comparison

  • Track portfolio companies on a firm’s website — detect when new companies are added or removed
  • Monitor team pages for personnel changes
  • Compare a known product catalog against a competitor’s current listings
  • Verify that a list of partners or clients on a website matches your records

How it works

  1. You scrape a page using a strategy (e.g., extract company names from a portfolio page)
  2. You submit your manifest — a JSON list of items you already know about
  3. Meter fuzzy-matches each manifest item against the scraped results
  4. You get back three lists: matched, added, and removed
Manifest (12 items)     Website (13 items)
├── Acme Corp       ←→  Acme Corporation     ✓ matched (90%)
├── Beta Industries  ✗  (not found)          ✗ removed
├── Gamma Solutions ←→  Gamma Solutions Inc   ✓ matched (95%)
│   ...                  ...
└──                      Delta Partners       + added

Fuzzy matching

Meter uses fuzzy string matching to handle common variations:
ManifestWebsiteScore
Acme CorpAcme Corporation90
Beta IncBeta Industries86
JP MorganJPMorgan Chase85
Gamma SolutionsGamma Solutions Inc95
The default threshold is 80 (out of 100). Items scoring below the threshold are treated as non-matches. You can adjust this per request.
Fuzzy matching handles abbreviations (“Corp” → “Corporation”), word order differences, and minor spelling variations. It does not handle semantic equivalence like “Facebook” → “Meta Platforms” or “IBM” → “International Business Machines”. For those cases, consider lowering the threshold or using exact field matches on other identifiers (like URLs).

Match fields

You choose which field(s) to match on via match_fields. For example, if your scrape results have name and website fields, you can match on ["name"] or ["name", "website"]. When multiple match fields are provided, Meter takes the best score across fields. This means an exact URL match will count even if the name is slightly different.

Quick example

# Compare your manifest against the latest results for a strategy
curl -X POST https://api.meter.sh/api/strategies/{strategy_id}/compare-manifest \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": [
      {"name": "Acme Corp"},
      {"name": "Beta Industries"},
      {"name": "Gamma Solutions"}
    ],
    "match_fields": ["name"],
    "threshold": 80
  }'

Response

{
  "matched": [
    {
      "manifest_item": {"name": "Acme Corp"},
      "scraped_item": {"name": "Acme Corporation", "website": "acme.com"},
      "score": 90.0,
      "matched_on": "name"
    },
    {
      "manifest_item": {"name": "Gamma Solutions"},
      "scraped_item": {"name": "Gamma Solutions Inc", "website": "gamma.com"},
      "score": 95.0,
      "matched_on": "name"
    }
  ],
  "added": [
    {"name": "Delta Partners", "website": "delta.com"}
  ],
  "removed": [
    {"name": "Beta Industries"}
  ],
  "summary": {
    "matched": 2,
    "added": 1,
    "removed": 1,
    "manifest_count": 3,
    "scraped_count": 3
  },
  "threshold_used": 80.0,
  "match_fields_used": ["name"]
}

Typical workflow

1. Create a strategy with an output schema

Define the exact fields you want extracted:
curl -X POST https://api.meter.sh/api/strategies/generate \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example-vc.com/portfolio",
    "description": "Extract portfolio company names and websites",
    "name": "Portfolio Tracker",
    "output_schema": {
      "name": "string",
      "website": "string"
    }
  }'

2. Schedule regular scrapes

curl -X POST https://api.meter.sh/api/schedules \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "STRATEGY_ID",
    "url": "https://example-vc.com/portfolio",
    "interval_seconds": 86400
  }'

3. Compare your manifest whenever you need

curl -X POST https://api.meter.sh/api/strategies/STRATEGY_ID/compare-manifest \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": [
      {"name": "Company A", "website": "companya.com"},
      {"name": "Company B", "website": "companyb.com"}
    ],
    "match_fields": ["name"],
    "threshold": 80
  }'

Tuning the threshold

ThresholdUse case
90-100Strict matching — names must be nearly identical
80 (default)Balanced — handles “Corp” vs “Corporation”, “Inc” vs “Industries”
60-70Loose — catches more variations but may produce false positives
Start with the default threshold of 80. If you see items incorrectly showing as “removed” that are actually on the site with a slightly different name, lower the threshold. If you see false matches, raise it.

Endpoints

There are two ways to compare a manifest:
EndpointDescription
POST /api/strategies/{id}/compare-manifestCompare against the latest completed job for a strategy
POST /api/jobs/{id}/compare-manifestCompare against a specific job’s results
The strategy endpoint is the most common choice — it automatically uses the most recent results. Use the job endpoint when you need to compare against a specific point in time. See the full API reference: Jobs REST API and Strategies REST API.

Best practices

Define an output_schema when creating your strategy so that field names are predictable and consistent across scrapes. This makes match_fields reliable.
Company names are usually the best match field. URLs can be a good secondary field. Avoid matching on generic fields like “description” where content varies significantly.
If you have both name and website fields, use match_fields: ["name", "website"]. Meter takes the best score across fields, so an exact URL match will work even if the name format differs.
Your manifest items only need to contain the fields listed in match_fields. Extra fields are preserved in the response but ignored during matching.

Next steps

Output Schemas

Define consistent extraction shapes

Change Detection

Automatic change tracking between scrapes

Schedules

Automate regular scrapes

REST API Reference

Full endpoint documentation

Need help?

Email me at mckinnon@meter.sh