Entity Matcher

You describe it

Entity Matcher

1. Overview

This agent compares two records and determines whether they represent the same real-world entity. It accepts free-text input for each record, automatically extracts and normalizes relevant fields, calculates a weighted confidence score, and returns a match decision with a field-by-field comparison.

When extracted information is ambiguous or incomplete, the agent researches publicly available sources to fill gaps before making its determination.

2. Business value

Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines").
Audit trail: the field-by-field comparison table shows exactly what matched, conflicted, or was missing.
Reduced manual review: only "Possible Match" results need human attention.
Zero formatting burden: users paste whatever they have (a CRM export, an email signature, a LinkedIn snippet, a business card photo's text) and the agent figures it out.

3. Inputs

Field	Type	Details
Record A	Free text	Any unstructured text describing the entity: company name, contact info, address, ID numbers, or any combination.
Record B	Free text	Same as above for the second entity.

The agent parses each text block and extracts whatever fields it can identify: name, address, phone, email, website, ID numbers (tax ID, DUNS, registration number, etc.). Unrecognized text is retained as context for research and fuzzy matching.

4. Outputs

Field	Contents
Decision	`Match`, `No Match`, or `Possible Match`
Confidence Score	0 to 100
Extracted Fields	Table showing what the agent parsed from each input, so the user can verify extraction was correct
Field Comparison	Table showing each extracted field, both values, normalized forms, and whether they matched
Conflicts	List of fields where the two records directly contradict each other
Research Notes	Any publicly available information the agent found to fill gaps or resolve ambiguity

5. Execution steps

Extract fields from free text. Parse each input and identify: entity name, address, phone number(s), email address(es), website, and any ID numbers. If the text is ambiguous (e.g., two names appear), use surrounding context to determine which is the primary entity. Present the extracted fields table so the user can see what was parsed.
Normalize extracted fields. Apply the normalization rules in Appendix B: strip whitespace, standardize phone to E.164, expand address abbreviations, lowercase email domains, remove protocol/www from URLs, strip legal suffixes from names.
Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:
- Exact match after normalization: 1.0
- ID numbers match: 1.0 (definitive)
- Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation"): 0.7 to 0.9 depending on edit distance
- Partial address match (same street, different suite): 0.5 to 0.8
- Same email domain but different local part: 0.3
- No meaningful similarity: 0.0
Research gaps. If a field is present in one record but missing in the other, and the running confidence is in the 40-75 range, attempt to fill the gap using publicly available information on the web. Note findings and sources in Research Notes.
Calculate weighted confidence score.

Confidence = Sum(field_similarity x field_weight) / Sum(active_field_weights) x 100

Only include fields where at least one record has a value. Use weights from Appendix A.
Determine decision:
- 80 to 100: Match
- 40 to 79: Possible Match
- 0 to 39: No Match
Exception: if any ID number matches exactly, the decision is Match regardless of other fields. If ID numbers are present in both records and don't match, the decision is No Match regardless of other fields.
Flag conflicts. Any field where both records have values and the similarity score is below 0.3 is listed as a conflict.

6. Validation checks

The confidence score must be mathematically consistent with the field-by-field scores and weights. Don't round intermediate calculations.
Every field extracted from either record must appear in the comparison table, even if it's empty on one side.
Research notes must cite specific sources (e.g., "LinkedIn company page," "State business registry").
If extraction is uncertain (e.g., it's unclear whether a string is a company name or a person's name), note the ambiguity in the Extracted Fields table.

7. Edge cases

Minimal or unstructured input: If the agent can only extract one identifiable field from each input, return Possible Match with confidence of 50 and a note that there isn't enough data for a reliable determination.
Subsidiary or DBA: If research reveals one entity is a subsidiary or DBA of the other, return Possible Match with an explanation rather than a flat Match.
Conflicting ID numbers: Overrides everything else. If both records contain an ID number and they don't match, the decision is No Match even if every other field is identical.
Multiple entities in one input: If the text appears to describe more than one entity (e.g., a partnership listing), use the most prominent entity and note the ambiguity.

Appendix A: Field weights

Field	Weight	Rationale
ID Numbers	50	Definitive identifier when present
Name	25	Primary identifier but subject to variation
Website	15	Strong signal, low ambiguity
Email Domain	10	Moderately strong if domains match
Phone	10	Useful but often changes
Address	10	Useful but companies relocate

Weights are relative. The agent normalizes to 100 based on which fields are actually present.

Appendix B: Normalization rules

Field	Normalization
Name	Strip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace
Phone	Convert to E.164 (+1XXXXXXXXXX), strip extensions
Email	Lowercase entire address, compare domains separately from local parts
Address	Expand abbreviations (St to Street, Ave to Avenue, Blvd to Boulevard, Ste to Suite), standardize directionals (N to North), remove punctuation
Website	Remove protocol, "www.", and trailing slashes, lowercase
ID Numbers	Strip all non-alphanumeric characters, uppercase any letters

We build it

Try me

1. Overview

When extracted information is ambiguous or incomplete, the agent researches publicly available sources to fill gaps before making its determination.

2. Business value

Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines").
Audit trail: the field-by-field comparison table shows exactly what matched, conflicted, or was missing.
Reduced manual review: only "Possible Match" results need human attention.
Zero formatting burden: users paste whatever they have (a CRM export, an email signature, a LinkedIn snippet, a business card photo's text) and the agent figures it out.

3. Inputs

Field	Type	Details
Record A	Free text	Any unstructured text describing the entity: company name, contact info, address, ID numbers, or any combination.
Record B	Free text	Same as above for the second entity.

4. Outputs

Field	Contents
Decision	`Match`, `No Match`, or `Possible Match`
Confidence Score	0 to 100
Extracted Fields	Table showing what the agent parsed from each input, so the user can verify extraction was correct
Field Comparison	Table showing each extracted field, both values, normalized forms, and whether they matched
Conflicts	List of fields where the two records directly contradict each other
Research Notes	Any publicly available information the agent found to fill gaps or resolve ambiguity

5. Execution steps

Extract fields from free text. Parse each input and identify: entity name, address, phone number(s), email address(es), website, and any ID numbers. If the text is ambiguous (e.g., two names appear), use surrounding context to determine which is the primary entity. Present the extracted fields table so the user can see what was parsed.
Normalize extracted fields. Apply the normalization rules in Appendix B: strip whitespace, standardize phone to E.164, expand address abbreviations, lowercase email domains, remove protocol/www from URLs, strip legal suffixes from names.
Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:
- Exact match after normalization: 1.0
- ID numbers match: 1.0 (definitive)
- Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation"): 0.7 to 0.9 depending on edit distance
- Partial address match (same street, different suite): 0.5 to 0.8
- Same email domain but different local part: 0.3
- No meaningful similarity: 0.0
Research gaps. If a field is present in one record but missing in the other, and the running confidence is in the 40-75 range, attempt to fill the gap using publicly available information on the web. Note findings and sources in Research Notes.
Calculate weighted confidence score.

Confidence = Sum(field_similarity x field_weight) / Sum(active_field_weights) x 100

Only include fields where at least one record has a value. Use weights from Appendix A.
Determine decision:
- 80 to 100: Match
- 40 to 79: Possible Match
- 0 to 39: No Match
Exception: if any ID number matches exactly, the decision is Match regardless of other fields. If ID numbers are present in both records and don't match, the decision is No Match regardless of other fields.
Flag conflicts. Any field where both records have values and the similarity score is below 0.3 is listed as a conflict.

6. Validation checks

The confidence score must be mathematically consistent with the field-by-field scores and weights. Don't round intermediate calculations.
Every field extracted from either record must appear in the comparison table, even if it's empty on one side.
Research notes must cite specific sources (e.g., "LinkedIn company page," "State business registry").
If extraction is uncertain (e.g., it's unclear whether a string is a company name or a person's name), note the ambiguity in the Extracted Fields table.

7. Edge cases

Minimal or unstructured input: If the agent can only extract one identifiable field from each input, return Possible Match with confidence of 50 and a note that there isn't enough data for a reliable determination.
Subsidiary or DBA: If research reveals one entity is a subsidiary or DBA of the other, return Possible Match with an explanation rather than a flat Match.
Conflicting ID numbers: Overrides everything else. If both records contain an ID number and they don't match, the decision is No Match even if every other field is identical.
Multiple entities in one input: If the text appears to describe more than one entity (e.g., a partnership listing), use the most prominent entity and note the ambiguity.

Appendix A: Field weights

Field	Weight	Rationale
ID Numbers	50	Definitive identifier when present
Name	25	Primary identifier but subject to variation
Website	15	Strong signal, low ambiguity
Email Domain	10	Moderately strong if domains match
Phone	10	Useful but often changes
Address	10	Useful but companies relocate

Weights are relative. The agent normalizes to 100 based on which fields are actually present.

Appendix B: Normalization rules

Field	Normalization
Name	Strip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace
Phone	Convert to E.164 (+1XXXXXXXXXX), strip extensions
Email	Lowercase entire address, compare domains separately from local parts
Address	Expand abbreviations (St to Street, Ave to Avenue, Blvd to Boulevard, Ste to Suite), standardize directionals (N to North), remove punctuation
Website	Remove protocol, "www.", and trailing slashes, lowercase
ID Numbers	Strip all non-alphanumeric characters, uppercase any letters

Ready to Automate?

Get started with this workflow template in minutes. No complex setup required.

View Documentation

Entity Matcher

Entity Matcher

1. Overview

2. Business value

3. Inputs

4. Outputs

5. Execution steps

6. Validation checks

7. Edge cases

Appendix A: Field weights

Appendix B: Normalization rules

Compare Records

Records to Compare

1. Overview

2. Business value

3. Inputs

4. Outputs

5. Execution steps

6. Validation checks

7. Edge cases

Appendix A: Field weights

Appendix B: Normalization rules

Ready to Automate?

**Entity Matcher**

Entity Matcher

1. Overview

2. Business value

3. Inputs

4. Outputs

5. Execution steps

6. Validation checks

7. Edge cases

Appendix A: Field weights

Appendix B: Normalization rules

Compare Records

Records to Compare

1. Overview

2. Business value

3. Inputs

4. Outputs

5. Execution steps

6. Validation checks

7. Edge cases

Appendix A: Field weights

Appendix B: Normalization rules

Ready to Automate?

Entity Matcher