1. Overview

This agent compares two records (companies, contacts, or products) and determines whether they represent the same real-world entity. It normalizes fields, calculates a weighted confidence score, and returns a match decision with a field-by-field comparison showing exactly what matched, what conflicted, and what was missing.

When a field is ambiguous or incomplete, the agent researches publicly available information (company websites, business registries, etc.) to fill gaps before making its determination.

2. Business value

Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines," "123 Main St" vs. "123 Main Street, Suite 200").
Audit trail: the field-by-field comparison table makes it easy to verify why a match was or wasn't made, which matters for compliance and data governance.
Reduced manual review: only records in the "Possible Match" range need human attention. Clear matches and clear non-matches are resolved automatically.

3. Inputs

Two records with the following fields. Not every field needs to be filled; the agent works with whatever is available.

Field	Type	Details
Record A: Name	Text	Company name, person name, or product name
Record A: Address	Text	Full mailing address
Record A: Phone	Text	Any format
Record A: Email	Text	Email address
Record A: Website	Text	URL
Record A: ID Numbers	Text (optional)	Tax ID, DUNS number, registration number, etc.
Record B: Name	Text	Same fields as Record A
Record B: Address	Text
Record B: Phone	Text
Record B: Email	Text
Record B: Website	Text
Record B: ID Numbers	Text (optional)

4. Outputs

Field	Contents
Decision	`Match`, `No Match`, or `Possible Match`
Confidence Score	0 to 100
Field Comparison	Table showing each field, both values, the normalized forms, and whether they matched
Conflicts	List of fields where the two records directly contradict each other
Research Notes	Any publicly available information the agent found to fill gaps or resolve ambiguity

5. Execution steps

Normalize both records. Apply the normalization rules in Appendix B to each field: strip whitespace, standardize phone to E.164, expand address abbreviations (St → Street, Ave → Avenue), lowercase email domains, remove "http(s)://www." from URLs.
Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:
- Exact match after normalization → 1.0
- ID numbers match → 1.0 (these are definitive)
- Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation") → 0.7 to 0.9 depending on edit distance
- Partial address match (same street, different suite) → 0.5 to 0.8
- Same email domain but different local part → 0.3
- No meaningful similarity → 0.0
Research gaps. If a field is present in one record but missing in the other, and the confidence score is in the 40-75 range, attempt to fill the gap using publicly available information. Note what was found and the source in Research Notes.
Calculate weighted confidence score. Multiply each field's similarity score by its weight from Appendix A, then sum:

Confidence = Σ (field_similarity × field_weight) / Σ (active_field_weights) × 100

Only include fields where at least one record has a value. If neither record has a value for a field, exclude it from the calculation entirely.
Determine decision:
- 80 to 100: Match
- 40 to 79: Possible Match
- 0 to 39: No Match
Exception: if any ID number matches exactly, the decision is Match regardless of other fields. If ID numbers are present in both records and don't match, the decision is No Match regardless of other fields.
Flag conflicts. Any field where both records have values and the similarity score is below 0.3 should be listed as a conflict.

6. Validation checks

The confidence score must be mathematically consistent with the field-by-field similarity scores and weights. Don't round intermediate calculations.
Every field present in either record must appear in the comparison table, even if it's empty on one side.
Research notes must cite specific sources (e.g., "LinkedIn company page" or "State business registry") rather than vague references.

7. Edge cases

Both records nearly empty: if fewer than two comparable fields exist, return Possible Match with confidence of 50 and a note that there isn't enough data for a reliable determination.
One record is clearly a subsidiary: if research reveals that one entity is a subsidiary or DBA of the other, return Possible Match with an explanation rather than a flat Match.
Conflicting ID numbers: this overrides everything else. If both records contain an ID number and they don't match, the decision is No Match even if every other field is identical.

Appendix A: Field weights

Field	Weight	Rationale
ID Numbers	50	Definitive identifier when present
Name	25	Primary identifier but subject to variation
Website	15	Strong signal, low ambiguity
Email Domain	10	Moderately strong if domains match
Phone	10	Useful but often changes or has multiple numbers
Address	10	Useful but companies relocate and have multiple offices

Weights are relative, not absolute. The agent normalizes to 100 based on which fields are actually present.

Appendix B: Normalization rules

Field	Normalization
Name	Strip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace
Phone	Convert to E.164 format (+1XXXXXXXXXX). Strip extensions.
Email	Lowercase the entire address. Compare domains separately from local parts.
Address	Expand abbreviations (St→Street, Ave→Avenue, Blvd→Boulevard, Ste→Suite, Apt→Apartment). Standardize directionals (N→North, etc.). Remove punctuation.
Website	Remove protocol (http/https), "www.", and trailing slashes. Lowercase.
ID Numbers	Strip all non-alphanumeric characters. Uppercase any letters.

You describe it

Entity Matcher

1. Overview

This agent compares two records and determines whether they represent the same real-world entity. It accepts free-text input for each record, automatically extracts and normalizes relevant fields, calculates a weighted confidence score, and returns a match decision with a field-by-field comparison.

When extracted information is ambiguous or incomplete, the agent researches publicly available sources to fill gaps before making its determination.

2. Business value

Deduplication at scale: catches duplicates that simple string matching misses (e.g., "IBM" vs. "International Business Machines").
Audit trail: the field-by-field comparison table shows exactly what matched, conflicted, or was missing.
Reduced manual review: only "Possible Match" results need human attention.
Zero formatting burden: users paste whatever they have (a CRM export, an email signature, a LinkedIn snippet, a business card photo's text) and the agent figures it out.

3. Inputs

Field	Type	Details
Record A	Free text	Any unstructured text describing the entity: company name, contact info, address, ID numbers, or any combination.
Record B	Free text	Same as above for the second entity.

The agent parses each text block and extracts whatever fields it can identify: name, address, phone, email, website, ID numbers (tax ID, DUNS, registration number, etc.). Unrecognized text is retained as context for research and fuzzy matching.

4. Outputs

Field	Contents
Decision	`Match`, `No Match`, or `Possible Match`
Confidence Score	0 to 100
Extracted Fields	Table showing what the agent parsed from each input, so the user can verify extraction was correct
Field Comparison	Table showing each extracted field, both values, normalized forms, and whether they matched
Conflicts	List of fields where the two records directly contradict each other
Research Notes	Any publicly available information the agent found to fill gaps or resolve ambiguity

5. Execution steps

Extract fields from free text. Parse each input and identify: entity name, address, phone number(s), email address(es), website, and any ID numbers. If the text is ambiguous (e.g., two names appear), use surrounding context to determine which is the primary entity. Present the extracted fields table so the user can see what was parsed.
Normalize extracted fields. Apply the normalization rules in Appendix B: strip whitespace, standardize phone to E.164, expand address abbreviations, lowercase email domains, remove protocol/www from URLs, strip legal suffixes from names.
Compare each field pair. For each field present in both records, calculate a similarity score from 0.0 to 1.0:
- Exact match after normalization: 1.0
- ID numbers match: 1.0 (definitive)
- Fuzzy name match (e.g., "Acme Corp" vs. "Acme Corporation"): 0.7 to 0.9 depending on edit distance
- Partial address match (same street, different suite): 0.5 to 0.8
- Same email domain but different local part: 0.3
- No meaningful similarity: 0.0
Research gaps. If a field is present in one record but missing in the other, and the running confidence is in the 40-75 range, attempt to fill the gap using publicly available information on the web. Note findings and sources in Research Notes.
Calculate weighted confidence score.

Confidence = Sum(field_similarity x field_weight) / Sum(active_field_weights) x 100

Only include fields where at least one record has a value. Use weights from Appendix A.
Determine decision:
- 80 to 100: Match
- 40 to 79: Possible Match
- 0 to 39: No Match
Exception: if any ID number matches exactly, the decision is Match regardless of other fields. If ID numbers are present in both records and don't match, the decision is No Match regardless of other fields.
Flag conflicts. Any field where both records have values and the similarity score is below 0.3 is listed as a conflict.

6. Validation checks

The confidence score must be mathematically consistent with the field-by-field scores and weights. Don't round intermediate calculations.
Every field extracted from either record must appear in the comparison table, even if it's empty on one side.
Research notes must cite specific sources (e.g., "LinkedIn company page," "State business registry").
If extraction is uncertain (e.g., it's unclear whether a string is a company name or a person's name), note the ambiguity in the Extracted Fields table.

7. Edge cases

Minimal or unstructured input: If the agent can only extract one identifiable field from each input, return Possible Match with confidence of 50 and a note that there isn't enough data for a reliable determination.
Subsidiary or DBA: If research reveals one entity is a subsidiary or DBA of the other, return Possible Match with an explanation rather than a flat Match.
Conflicting ID numbers: Overrides everything else. If both records contain an ID number and they don't match, the decision is No Match even if every other field is identical.
Multiple entities in one input: If the text appears to describe more than one entity (e.g., a partnership listing), use the most prominent entity and note the ambiguity.

Appendix A: Field weights

Field	Weight	Rationale
ID Numbers	50	Definitive identifier when present
Name	25	Primary identifier but subject to variation
Website	15	Strong signal, low ambiguity
Email Domain	10	Moderately strong if domains match
Phone	10	Useful but often changes
Address	10	Useful but companies relocate

Weights are relative. The agent normalizes to 100 based on which fields are actually present.

Appendix B: Normalization rules

Field	Normalization
Name	Strip legal suffixes (Inc., LLC, Ltd., Corp.), lowercase, trim whitespace
Phone	Convert to E.164 (+1XXXXXXXXXX), strip extensions
Email	Lowercase entire address, compare domains separately from local parts
Address	Expand abbreviations (St to Street, Ave to Avenue, Blvd to Boulevard, Ste to Suite), standardize directionals (N to North), remove punctuation
Website	Remove protocol, "www.", and trailing slashes, lowercase
ID Numbers	Strip all non-alphanumeric characters, uppercase any letters

We build it

Try me

Ready to Automate?

Get started with this workflow template in minutes. No complex setup required.

View Documentation

1. Overview

1. Overview

2. Business value

3. Inputs

4. Outputs

5. Execution steps

6. Validation checks

7. Edge cases

Appendix A: Field weights

Appendix B: Normalization rules

Entity Matcher

1. Overview

2. Business value

3. Inputs

4. Outputs

5. Execution steps

6. Validation checks

7. Edge cases

Appendix A: Field weights

Appendix B: Normalization rules

Compare Records

Records to Compare

Ready to Automate?