Skip to main content

**Entity Matcher**

**Entity Matcher** header
You describe it

Entity Matcher

1. Overview

This agent compares two records and determines whether they represent the same real-world entity. It accepts free-text input for each record, automatically extracts and normalizes relevant fields, calculates a weighted confidence score, and returns a match decision with a field-by-field comparison.

When extracted information is ambiguous or incomplete, the agent researches publicly available sources to fill gaps before making its determination.

2. Business value

  • Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines").

  • Audit trail: the field-by-field comparison table shows exactly what matched, conflicted, or was missing.

  • Reduced manual review: only "Possible Match" results need human attention.

  • Zero formatting burden: users paste whatever they have (a CRM export, an email signature, a LinkedIn snippet, a business card photo's text) and the agent figures it out.

3. Inputs

FieldTypeDetails
Record AFree textAny unstructured text describing the entity: company name, contact info, address, ID numbers, or any combination.
Record BFree textSame as above for the second entity.

The agent parses each text block and extracts whatever fields it can identify: name, address, phone, email, website, ID numbers (tax ID, DUNS, registration number, etc.). Unrecognized text is retained as context for research and fuzzy matching.

4. Outputs

FieldContents
DecisionMatch, No Match, or Possible Match
Confidence Score0 to 100
Extracted FieldsTable showing what the agent parsed from each input, so the user can verify extraction was correct
Field ComparisonTable showing each extracted field, both values, normalized forms, and whether they matched
ConflictsList of fields where the two records directly contradict each other
Research NotesAny publicly available information the agent found to fill gaps or resolve ambiguity

5. Execution steps

  1. Extract fields from free text. Parse each input and identify: entity name, address, phone number(s), email address(es), website, and any ID numbers. If the text is ambiguous (e.g., two names appear), use surrounding context to determine which is the primary entity. Present the extracted fields table so the user can see what was parsed.

  2. Normalize extracted fields. Apply the normalization rules in Appendix B: strip whitespace, standardize phone to E.164, expand address abbreviations, lowercase email domains, remove protocol/www from URLs, strip legal suffixes from names.

  3. Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:

    • Exact match after normalization: 1.0

    • ID numbers match: 1.0 (definitive)

    • Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation"): 0.7 to 0.9 depending on edit distance

    • Partial address match (same street, different suite): 0.5 to 0.8

    • Same email domain but different local part: 0.3

    • No meaningful similarity: 0.0

  4. Research gaps. If a field is present in one record but missing in the other, and the running confidence is in the 40-75 range, attempt to fill the gap using publicly available information on the web. Note findings and sources in Research Notes.

  5. Calculate weighted confidence score.

    Confidence = Sum(field_similarity x field_weight) / Sum(active_field_weights) x 100

    Only include fields where at least one record has a value. Use weights from Appendix A.

  6. Determine decision:

    • 80 to 100: Match

    • 40 to 79: Possible Match

    • 0 to 39: No Match

    Exception: if any ID number matches exactly, the decision is Match regardless of other fields. If ID numbers are present in both records and don't match, the decision is No Match regardless of other fields.

  7. Flag conflicts. Any field where both records have values and the similarity score is below 0.3 is listed as a conflict.

6. Validation checks

  • The confidence score must be mathematically consistent with the field-by-field scores and weights. Don't round intermediate calculations.

  • Every field extracted from either record must appear in the comparison table, even if it's empty on one side.

  • Research notes must cite specific sources (e.g., "LinkedIn company page," "State business registry").

  • If extraction is uncertain (e.g., it's unclear whether a string is a company name or a person's name), note the ambiguity in the Extracted Fields table.

7. Edge cases

  • Minimal or unstructured input: If the agent can only extract one identifiable field from each input, return Possible Match with confidence of 50 and a note that there isn't enough data for a reliable determination.

  • Subsidiary or DBA: If research reveals one entity is a subsidiary or DBA of the other, return Possible Match with an explanation rather than a flat Match.

  • Conflicting ID numbers: Overrides everything else. If both records contain an ID number and they don't match, the decision is No Match even if every other field is identical.

  • Multiple entities in one input: If the text appears to describe more than one entity (e.g., a partnership listing), use the most prominent entity and note the ambiguity.

Appendix A: Field weights

FieldWeightRationale
ID Numbers50Definitive identifier when present
Name25Primary identifier but subject to variation
Website15Strong signal, low ambiguity
Email Domain10Moderately strong if domains match
Phone10Useful but often changes
Address10Useful but companies relocate

Weights are relative. The agent normalizes to 100 based on which fields are actually present.

Appendix B: Normalization rules

FieldNormalization
NameStrip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace
PhoneConvert to E.164 (+1XXXXXXXXXX), strip extensions
EmailLowercase entire address, compare domains separately from local parts
AddressExpand abbreviations (St to Street, Ave to Avenue, Blvd to Boulevard, Ste to Suite), standardize directionals (N to North), remove punctuation
WebsiteRemove protocol, "www.", and trailing slashes, lowercase
ID NumbersStrip all non-alphanumeric characters, uppercase any letters
We build it

Compare Records

Compare two free-text records to determine if they represent the same real-world entity, with automatic field extraction, normalization, confidence scoring, and detailed comparison tables.

Records to Compare

Paste free-text descriptions for Record A and Record B. The agent will extract names, contact details, addresses, websites, and IDs before comparing them.

Try me

1. Overview

This agent compares two records and determines whether they represent the same real-world entity. It accepts free-text input for each record, automatically extracts and normalizes relevant fields, calculates a weighted confidence score, and returns a match decision with a field-by-field comparison.

When extracted information is ambiguous or incomplete, the agent researches publicly available sources to fill gaps before making its determination.

2. Business value

  • Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines").

  • Audit trail: the field-by-field comparison table shows exactly what matched, conflicted, or was missing.

  • Reduced manual review: only "Possible Match" results need human attention.

  • Zero formatting burden: users paste whatever they have (a CRM export, an email signature, a LinkedIn snippet, a business card photo's text) and the agent figures it out.

3. Inputs

FieldTypeDetails
Record AFree textAny unstructured text describing the entity: company name, contact info, address, ID numbers, or any combination.
Record BFree textSame as above for the second entity.

The agent parses each text block and extracts whatever fields it can identify: name, address, phone, email, website, ID numbers (tax ID, DUNS, registration number, etc.). Unrecognized text is retained as context for research and fuzzy matching.

4. Outputs

FieldContents
DecisionMatch, No Match, or Possible Match
Confidence Score0 to 100
Extracted FieldsTable showing what the agent parsed from each input, so the user can verify extraction was correct
Field ComparisonTable showing each extracted field, both values, normalized forms, and whether they matched
ConflictsList of fields where the two records directly contradict each other
Research NotesAny publicly available information the agent found to fill gaps or resolve ambiguity

5. Execution steps

  1. Extract fields from free text. Parse each input and identify: entity name, address, phone number(s), email address(es), website, and any ID numbers. If the text is ambiguous (e.g., two names appear), use surrounding context to determine which is the primary entity. Present the extracted fields table so the user can see what was parsed.

  2. Normalize extracted fields. Apply the normalization rules in Appendix B: strip whitespace, standardize phone to E.164, expand address abbreviations, lowercase email domains, remove protocol/www from URLs, strip legal suffixes from names.

  3. Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:

    • Exact match after normalization: 1.0

    • ID numbers match: 1.0 (definitive)

    • Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation"): 0.7 to 0.9 depending on edit distance

    • Partial address match (same street, different suite): 0.5 to 0.8

    • Same email domain but different local part: 0.3

    • No meaningful similarity: 0.0

  4. Research gaps. If a field is present in one record but missing in the other, and the running confidence is in the 40-75 range, attempt to fill the gap using publicly available information on the web. Note findings and sources in Research Notes.

  5. Calculate weighted confidence score.

    Confidence = Sum(field_similarity x field_weight) / Sum(active_field_weights) x 100

    Only include fields where at least one record has a value. Use weights from Appendix A.

  6. Determine decision:

    • 80 to 100: Match

    • 40 to 79: Possible Match

    • 0 to 39: No Match

    Exception: if any ID number matches exactly, the decision is Match regardless of other fields. If ID numbers are present in both records and don't match, the decision is No Match regardless of other fields.

  7. Flag conflicts. Any field where both records have values and the similarity score is below 0.3 is listed as a conflict.

6. Validation checks

  • The confidence score must be mathematically consistent with the field-by-field scores and weights. Don't round intermediate calculations.

  • Every field extracted from either record must appear in the comparison table, even if it's empty on one side.

  • Research notes must cite specific sources (e.g., "LinkedIn company page," "State business registry").

  • If extraction is uncertain (e.g., it's unclear whether a string is a company name or a person's name), note the ambiguity in the Extracted Fields table.

7. Edge cases

  • Minimal or unstructured input: If the agent can only extract one identifiable field from each input, return Possible Match with confidence of 50 and a note that there isn't enough data for a reliable determination.

  • Subsidiary or DBA: If research reveals one entity is a subsidiary or DBA of the other, return Possible Match with an explanation rather than a flat Match.

  • Conflicting ID numbers: Overrides everything else. If both records contain an ID number and they don't match, the decision is No Match even if every other field is identical.

  • Multiple entities in one input: If the text appears to describe more than one entity (e.g., a partnership listing), use the most prominent entity and note the ambiguity.

Appendix A: Field weights

FieldWeightRationale
ID Numbers50Definitive identifier when present
Name25Primary identifier but subject to variation
Website15Strong signal, low ambiguity
Email Domain10Moderately strong if domains match
Phone10Useful but often changes
Address10Useful but companies relocate

Weights are relative. The agent normalizes to 100 based on which fields are actually present.

Appendix B: Normalization rules

FieldNormalization
NameStrip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace
PhoneConvert to E.164 (+1XXXXXXXXXX), strip extensions
EmailLowercase entire address, compare domains separately from local parts
AddressExpand abbreviations (St to Street, Ave to Avenue, Blvd to Boulevard, Ste to Suite), standardize directionals (N to North), remove punctuation
WebsiteRemove protocol, "www.", and trailing slashes, lowercase
ID NumbersStrip all non-alphanumeric characters, uppercase any letters

Ready to Automate?

Get started with this workflow template in minutes. No complex setup required.

View Documentation