Manifest Comparison

Manifest comparison lets you submit a list of known items (a “manifest”) and compare it against your scrape results using fuzzy matching. Meter identifies which items were added, removed, or still present — even when names don’t match exactly.

When to use manifest comparison

Track portfolio companies on a firm’s website — detect when new companies are added or removed
Monitor team pages for personnel changes
Compare a known product catalog against a competitor’s current listings
Verify that a list of partners or clients on a website matches your records

How it works

You scrape a page using a strategy (e.g., extract company names from a portfolio page)
You submit your manifest — a JSON list of items you already know about
Meter fuzzy-matches each manifest item against the scraped results
You get back three lists: matched, added, and removed

Manifest (12 items)     Website (13 items)
├── Acme Corp       ←→  Acme Corporation     ✓ matched (90%)
├── Beta Industries  ✗  (not found)          ✗ removed
├── Gamma Solutions ←→  Gamma Solutions Inc   ✓ matched (95%)
│   ...                  ...
└──                      Delta Partners       + added

Fuzzy matching

Meter uses fuzzy string matching to handle common variations:

Manifest	Website	Score
Acme Corp	Acme Corporation	90
Beta Inc	Beta Industries	86
JP Morgan	JPMorgan Chase	85
Gamma Solutions	Gamma Solutions Inc	95

The default threshold is 80 (out of 100). Items scoring below the threshold are treated as non-matches. You can adjust this per request.

Fuzzy matching handles abbreviations (“Corp” → “Corporation”), word order differences, and minor spelling variations. It does not handle semantic equivalence like “Facebook” → “Meta Platforms” or “IBM” → “International Business Machines”. For those cases, consider lowering the threshold or using exact field matches on other identifiers (like URLs).

Match fields

You choose which field(s) to match on via match_fields. For example, if your scrape results have name and website fields, you can match on ["name"] or ["name", "website"]. When multiple match fields are provided, Meter takes the best score across fields. This means an exact URL match will count even if the name is slightly different.

Quick example

# Compare your manifest against the latest results for a strategy
curl -X POST https://api.meter.sh/api/strategies/{strategy_id}/compare-manifest \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": [
      {"name": "Acme Corp"},
      {"name": "Beta Industries"},
      {"name": "Gamma Solutions"}
    ],
    "match_fields": ["name"],
    "threshold": 80
  }'

Response

{
  "matched": [
    {
      "manifest_item": {"name": "Acme Corp"},
      "scraped_item": {"name": "Acme Corporation", "website": "acme.com"},
      "score": 90.0,
      "matched_on": "name"
    },
    {
      "manifest_item": {"name": "Gamma Solutions"},
      "scraped_item": {"name": "Gamma Solutions Inc", "website": "gamma.com"},
      "score": 95.0,
      "matched_on": "name"
    }
  ],
  "added": [
    {"name": "Delta Partners", "website": "delta.com"}
  ],
  "removed": [
    {"name": "Beta Industries"}
  ],
  "summary": {
    "matched": 2,
    "added": 1,
    "removed": 1,
    "manifest_count": 3,
    "scraped_count": 3
  },
  "threshold_used": 80.0,
  "match_fields_used": ["name"]
}

Typical workflow

1. Create a strategy with an output schema

Define the exact fields you want extracted:

curl -X POST https://api.meter.sh/api/strategies/generate \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example-vc.com/portfolio",
    "description": "Extract portfolio company names and websites",
    "name": "Portfolio Tracker",
    "output_schema": {
      "name": "string",
      "website": "string"
    }
  }'

2. Schedule regular scrapes

curl -X POST https://api.meter.sh/api/schedules \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "STRATEGY_ID",
    "url": "https://example-vc.com/portfolio",
    "interval_seconds": 86400
  }'

3. Compare your manifest whenever you need

curl -X POST https://api.meter.sh/api/strategies/STRATEGY_ID/compare-manifest \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "manifest": [
      {"name": "Company A", "website": "companya.com"},
      {"name": "Company B", "website": "companyb.com"}
    ],
    "match_fields": ["name"],
    "threshold": 80
  }'

Tuning the threshold

Threshold	Use case
90-100	Strict matching — names must be nearly identical
80 (default)	Balanced — handles “Corp” vs “Corporation”, “Inc” vs “Industries”
60-70	Loose — catches more variations but may produce false positives

Start with the default threshold of 80. If you see items incorrectly showing as “removed” that are actually on the site with a slightly different name, lower the threshold. If you see false matches, raise it.

Endpoints

There are two ways to compare a manifest:

Endpoint	Description
`POST /api/strategies/{id}/compare-manifest`	Compare against the latest completed job for a strategy
`POST /api/jobs/{id}/compare-manifest`	Compare against a specific job’s results

The strategy endpoint is the most common choice — it automatically uses the most recent results. Use the job endpoint when you need to compare against a specific point in time. See the full API reference: Jobs REST API and Strategies REST API.

Best practices

Use output schemas for consistent field names

Define an output_schema when creating your strategy so that field names are predictable and consistent across scrapes. This makes match_fields reliable.

Match on the most distinctive field

Company names are usually the best match field. URLs can be a good secondary field. Avoid matching on generic fields like “description” where content varies significantly.

Include multiple match fields as a fallback

If you have both name and website fields, use match_fields: ["name", "website"]. Meter takes the best score across fields, so an exact URL match will work even if the name format differs.

Your manifest doesn't need to match the scrape schema

Your manifest items only need to contain the fields listed in match_fields. Extra fields are preserved in the response but ignored during matching.

Next steps

Output Schemas

Define consistent extraction shapes

Change Detection

Automatic change tracking between scrapes

Schedules

Automate regular scrapes

REST API Reference

Full endpoint documentation

Need help?

Email me at mckinnon@meter.sh

Getting Started

Core Concepts

Manifest Comparison

Manifest Comparison

When to use manifest comparison

How it works

Fuzzy matching

Match fields

Quick example

Response

Typical workflow

1. Create a strategy with an output schema

2. Schedule regular scrapes

3. Compare your manifest whenever you need

Tuning the threshold

Endpoints

Best practices

Next steps

Output Schemas

Change Detection

Schedules

REST API Reference

Need help?

Getting Started

Core Concepts

Documentation Index

​Manifest Comparison

​When to use manifest comparison

​How it works

​Fuzzy matching

​Match fields

​Quick example

​Response

​Typical workflow

​1. Create a strategy with an output schema

​2. Schedule regular scrapes

​3. Compare your manifest whenever you need

​Tuning the threshold

​Endpoints

​Best practices

​Next steps

Output Schemas

Change Detection

Schedules

REST API Reference

​Need help?

Manifest Comparison

When to use manifest comparison

How it works

Fuzzy matching

Match fields

Quick example

Response

Typical workflow

1. Create a strategy with an output schema

2. Schedule regular scrapes

3. Compare your manifest whenever you need

Tuning the threshold

Endpoints

Best practices

Next steps

Need help?