1. Overview
This agent compares two records (companies, contacts, or products) and determines whether they represent the same real-world entity. It normalizes fields, calculates a weighted confidence score, and returns a match decision with a field-by-field comparison showing exactly what matched, what conflicted, and what was missing.
When a field is ambiguous or incomplete, the agent researches publicly available information (company websites, business registries, etc.) to fill gaps before making its determination.
2. Business value
-
Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines," "123 Main St" vs. "123 Main Street, Suite 200").
-
Audit trail: the field-by-field comparison table makes it easy to verify why a match was or wasn't made, which matters for compliance and data governance.
-
Reduced manual review: only records in the "Possible Match" range need human attention. Clear matches and clear non-matches are resolved automatically.
3. Inputs
Two records with the following fields. Not every field needs to be filled; the agent works with whatever is available.
| Field | Type | Details |
|---|---|---|
| Record A: Name | Text | Company name, person name, or product name |
| Record A: Address | Text | Full mailing address |
| Record A: Phone | Text | Any format |
| Record A: Email | Text | Email address |
| Record A: Website | Text | URL |
| Record A: ID Numbers | Text (optional) | Tax ID, DUNS number, registration number, etc. |
| Record B: Name | Text | Same fields as Record A |
| Record B: Address | Text | |
| Record B: Phone | Text | |
| Record B: Email | Text | |
| Record B: Website | Text | |
| Record B: ID Numbers | Text (optional) |
4. Outputs
| Field | Contents |
|---|---|
| Decision | Match, No Match, or Possible Match |
| Confidence Score | 0 to 100 |
| Field Comparison | Table showing each field, both values, the normalized forms, and whether they matched |
| Conflicts | List of fields where the two records directly contradict each other |
| Research Notes | Any publicly available information the agent found to fill gaps or resolve ambiguity |
5. Execution steps
-
Normalize both records. Apply the normalization rules in Appendix B to each field: strip whitespace, standardize phone to E.164, expand address abbreviations (St → Street, Ave → Avenue), lowercase email domains, remove "http(s)://www." from URLs.
-
Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:
-
Exact match after normalization → 1.0
-
ID numbers match → 1.0 (these are definitive)
-
Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation") → 0.7 to 0.9 depending on edit distance
-
Partial address match (same street, different suite) → 0.5 to 0.8
-
Same email domain but different local part → 0.3
-
No meaningful similarity → 0.0
-
-
Research gaps. If a field is present in one record but missing in the other, and the confidence score is in the 40-75 range, attempt to fill the gap using publicly available information. Note what was found and the source in Research Notes.
-
Calculate weighted confidence score. Multiply each field's similarity score by its weight from Appendix A, then sum:
Confidence = Σ (field_similarity × field_weight) / Σ (active_field_weights) × 100Only include fields where at least one record has a value. If neither record has a value for a field, exclude it from the calculation entirely.
-
Determine decision:
-
80 to 100:
Match -
40 to 79:
Possible Match -
0 to 39:
No Match
Exception: if any ID number matches exactly, the decision is
Matchregardless of other fields. If ID numbers are present in both records and don't match, the decision isNo Matchregardless of other fields. -
-
Flag conflicts. Any field where both records have values and the similarity score is below 0.3 should be listed as a conflict.
6. Validation checks
-
The confidence score must be mathematically consistent with the field-by-field similarity scores and weights. Don't round intermediate calculations.
-
Every field present in either record must appear in the comparison table, even if it's empty on one side.
-
Research notes must cite specific sources (e.g., "LinkedIn company page" or "State business registry") rather than vague references.
7. Edge cases
-
Both records nearly empty: if fewer than two comparable fields exist, return
Possible Matchwith confidence of 50 and a note that there isn't enough data for a reliable determination. -
One record is clearly a subsidiary: if research reveals that one entity is a subsidiary or DBA of the other, return
Possible Matchwith an explanation rather than a flatMatch. -
Conflicting ID numbers: this overrides everything else. If both records contain an ID number and they don't match, the decision is
No Matcheven if every other field is identical.
Appendix A: Field weights
| Field | Weight | Rationale |
|---|---|---|
| ID Numbers | 50 | Definitive identifier when present |
| Name | 25 | Primary identifier but subject to variation |
| Website | 15 | Strong signal, low ambiguity |
| Email Domain | 10 | Moderately strong if domains match |
| Phone | 10 | Useful but often changes or has multiple numbers |
| Address | 10 | Useful but companies relocate and have multiple offices |
Weights are relative, not absolute. The agent normalizes to 100 based on which fields are actually present.
Appendix B: Normalization rules
| Field | Normalization |
|---|---|
| Name | Strip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace |
| Phone | Convert to E.164 format (+1XXXXXXXXXX). Strip extensions. |
| Lowercase the entire address. Compare domains separately from local parts. | |
| Address | Expand abbreviations (St→Street, Ave→Avenue, Blvd→Boulevard, Ste→Suite, Apt→Apartment). Standardize directionals (N→North, etc.). Remove punctuation. |
| Website | Remove protocol (http/https), "www.", and trailing slashes. Lowercase. |
| ID Numbers | Strip all non-alphanumeric characters. Uppercase any letters. |

