Introduction
This SOP describes how to identify and redact personally identifiable information (PII) from a text document and how to generate a list of the redacted items. The process ensures privacy compliance and provides a clear audit trail of what information was removed.
Process
Step 1: Receive Input
- Accept the text containing potential PII
Step 2: Define PII Types
-
Use the following list of common PII categories as the basis for detection:
-
Full names
-
Email addresses
-
Phone numbers
-
Physical addresses
-
Social Security Numbers (SSN) or national ID numbers
-
Passport numbers
-
Driver's license numbers
-
Financial account numbers (credit cards, bank accounts)
-
Dates of birth
-
Any other unique identifiers specific to the context
-
Step 3: Scan for PII
-
Examine the text sequentially to locate any instance that matches the defined PII patterns.
-
For each match, record:
-
The exact original text
-
The PII category (e.g., "Email address")
-
Step 4: Redact Detected PII
-
Replace each identified PII instance with a placeholder that indicates the type of information removed. Use the format
{{PII_TYPE}}where PII_TYPE is the category in uppercase, for example:-
{{NAME}} -
{{EMAIL}} -
{{PHONE}} -
{{ADDRESS}} -
{{SSN}}
-
-
Preserve the surrounding text and formatting so the document remains readable.
Step 5: Compile Redaction Log
-
Create a list (array) of objects, each containing:
-
original: the exact text that was redacted -
type: the PII category
-
-
Order the list in the sequence the items appear in the original text.
Step 6: Return Results
-
Output the redacted text.
-
Output the redaction log (list of redacted items with their types).
Input
- text: The raw text document that may contain PII.
Output
-
redacted_text: The input text with all PII replaced by appropriate placeholders.
-
redaction_log: A list of objects detailing each redacted item, including the original value and its PII category.

