Invoice & Receipt Processing
1. Overview
This process reads a scanned invoice or receipt, extracts the important financial details—including vendor name, invoice/receipt number, date, individual line‑items, totals, and taxes—and presents them in a clear, organized format that an Accounts Payable clerk can review and use for payment or record‑keeping.
2. Business Value
- Reduces manual data entry – clerks no longer need to copy each field by hand.
- Improves accuracy – OCR‑driven extraction removes transcription errors.
- Speeds up processing – invoices can be posted within minutes of receipt.
- Supports compliance – captured data meets audit requirements for traceability and completeness.
3. Operational Context
- When it should run: Whenever a vendor submits an invoice or receipt that needs to be recorded in the accounts‑payable system.
- Who uses it: Accounts Payable clerks, finance supervisors, and auditors.
- How often: Typically daily, but can be run ad‑hoc for any individual invoice/receipt received.
4. Inputs
| Name / Label | Type | Details Provided |
|---|
| Invoice/Receipt Document | PDF (scanned image) | A single PDF file containing the full visual representation of a vendor invoice or receipt. The PDF must be legible, contain the vendor name, a unique invoice or receipt number, a date, line‑item details (description, quantity, unit price, line total), and the total amount due (including tax if shown). The PDF may be single‑page or multi‑page, but only one invoice or receipt should be present in the file. |
Note: No additional external documents are needed; all required data must be present inside the PDF supplied.
5. Outputs
| Name / Label | Contents | Formatting Rules |
|---|
| Extracted Invoice Data | A plain‑text summary containing: | |
| • Vendor name | | |
| • Invoice or receipt number | | |
| • Date (YYYY‑MM‑DD) | | |
| • Currency (e.g., USD, EUR) | | |
| • Total amount (including tax if shown) | | |
| • Tax amount (if any) | | |
| Line‑Item Table: each row includes Description, Quantity, Unit Price, Line Total | • Use a bullet‑point list for the summary fields. | |
| • Follow the summary with a table of line‑items using the headings Description, Quantity, Unit Price, Line Total. | | |
| • Numbers must be presented with two decimal places (e.g., 12.34). | | |
| • Currency symbol or code should be placed before the number (e.g., $1,250.00 or USD 1,250.00). | | |
| • No additional identifiers or system‑generated IDs are to be created. | | |
| Processing Status | One of the following plain‑text status messages: | |
| • Completed – all required data extracted successfully. | | |
| • Error – Missing field: [field name] – required information could not be found. | | |
| • Error – Unreadable PDF – the file could not be read or processed. | • Use exactly the wording above for consistency. | |
| • If an error status is returned, no further data is returned. | | |
6. Detailed Plan & Execution Steps
- Receive PDF – Accept the Invoice/Receipt PDF provided as input.
- Check PDF readability – Attempt to open the PDF and verify that each page can be rendered. If the PDF cannot be opened, set Processing Status to “Error – Unreadable PDF” and stop.
- Run OCR – Convert the visual image into searchable text.
- Locate core fields:
a. Search the OCR text for the vendor’s name (the most prominent text near the top).
b. Locate the invoice/receipt number (usually preceded by “Invoice #”, “Invoice No.”, “Receipt #”, etc.).
c. Identify the date (look for patterns such as MM/DD/YYYY, YYYY‑MM‑DD, or similar).
d. Find the total amount (look for “Total”, “Amount Due”, “Grand Total”, or a bold number at the bottom).
e. If a tax amount is shown (look for “Tax”, “VAT”, “GST”), capture it.
- Extract line‑item table –
a. Identify the block that lists each product or service.
b. For each row, extract: description, quantity, unit price, and line total.
c. If a row lacks a quantity, assume a quantity of 1.
d. If a row lacks a unit price but has a line total and quantity, compute the unit price (line total ÷ quantity).
- Validate required fields – Verify that vendor name, invoice/receipt number, date, and total amount are present and not empty. If any are missing, set Processing Status to “Error – Missing field: [missing field]” and stop.
- Validate numeric totals – Sum the line‑item totals and compare to the extracted total amount (allow a rounding difference of ±0.01). If they do not match, add a note in the summary: “Warning: summed line items differ from total amount by X.”
- Format output – Construct the Extracted Invoice Data using the formatting rules in Section 5.
- Set Processing Status to “Completed” (or the appropriate error status).
- Return the summary, line‑item table, and status as plain text.
7. Validation & Quality Checks
- PDF Readability – Ensure the PDF opens and each page can be rendered; otherwise, flag as unreadable.
- Mandatory Fields – Verify that vendor, invoice/receipt number, date, and total amount are non‑blank.
- Numeric Integrity – Check that the sum of line‑item totals matches the total amount within ±0.01.
- Currency Consistency – Ensure all monetary values use the same currency symbol or code throughout the output.
- Date Format – Verify dates follow the YYYY‑MM‑DD pattern; otherwise, flag for review.
- Error Reporting – If any validation fails, the Processing Status must contain a clear error message and no extracted data should be returned.
8. Special Rules / Edge Cases
- Multiple Pages – If the PDF contains more than one page, combine all pages before running OCR.
- Non‑standard Layout – If the vendor name or invoice number is not found in the usual locations, search the entire document for patterns (e.g., “Bill To:”, “Vendor:”).
- Missing Quantity – Assume quantity of 1 when not listed.
- Missing Unit Price – Compute unit price if missing but line total and quantity are present.
- Currency Not Stated – Use the company’s default currency (e.g., USD) if the currency symbol is absent.
- Receipt Instead of Invoice – If no invoice number is present but a receipt number is, treat the receipt number as the “invoice/receipt number”.
- Duplicate Line Items – Preserve each duplicate entry as a separate row; do not combine.
- Tax Exemption – If no tax amount appears, set Tax amount to 0 and note “No tax shown”.
- Multiple Invoices – If the PDF contains more than one invoice/receipt, abort the process, set status to “Error – Multiple invoices detected”, and request a separate file per invoice.
- Unreadable Text – If OCR confidence is low (<80 % confidence) for any required field, flag the field as missing for manual review.
- Non‑Latin Characters – If the document contains non‑Latin characters (e.g., Chinese, Arabic), the process can still extract, but any non‑numeric characters in numeric fields cause a “Error – Unrecognizable numeric value”.
9. Example
Input
- Invoice/Receipt Document: PDF named “Acme_Invoice_20240715.pdf”. The PDF shows:
- Vendor name: Acme Supplies Ltd.
- Invoice Number: INV‑2024‑00123
- Date: 07/15/2024
- Currency: $
- Line‑items:
- "Office Chair" – Qty 2 – Unit $120.00 – Line $240.00
- "Desk Lamp" – Qty 5 – Unit $45.00 – Line $225.00
- "Mouse Pad" – Qty 10 – Unit $5.00 – Line $50.00
- Tax: $52.00
- Total: $567.00
Expected Output
Extracted Invoice Data
- Vendor name: Acme Supplies Ltd.
- Invoice number: INV‑2024‑00123
- Date: 2024-07-15
- Currency: USD
- Tax amount: $52.00
- Total amount: $567.00
Line‑Item Table
| Description | Quantity | Unit Price | Line Total |
|---|
| Office Chair | 2 | $120.00 | $240.00 |
| Desk Lamp | 5 | $45.00 | $225.00 |
| Mouse Pad | 10 | $5.00 | $50.00 |
Processing Status: Completed
Note: The sum of line‑items ($515.00) plus tax ($52.00) equals the total ($567.00), so no warning is needed.
Appendix A – FAQ
1. What if the PDF is a scanned image with low quality?
If OCR confidence is low for any required field, the system flags that field as missing and returns a status Error – Missing field: [field]. The clerk should request a clearer copy.
2. The invoice uses a different date format (e.g., “15 July 2024”).
The system recognises common date patterns, but if it cannot parse the date, it will flag “Error – Missing field: Date”. The clerk must verify the date manually.
3. The document contains both an invoice and a receipt.
The process expects a single invoice/receipt per PDF. If two documents are detected, the system returns Error – Multiple invoices detected. Submit each document separately.
4. What should be done if the total amount does not match the sum of line‑items?
The system adds a warning note. If the difference is larger than the tolerance (±0.01) the clerk should verify the amounts manually.
5. The vendor name is a logo image instead of text.
The OCR cannot read images; if the vendor name is not detected as text, the system returns Error – Missing field: Vendor name. The clerk should manually add the vendor name.
6. What if a line item shows only a total price, with no separate quantity or unit price?
The system extracts the total price and assumes a quantity of 1. The unit price is set equal to the line total.
7. How are taxes shown when there are multiple tax types (e.g., VAT and Sales Tax)?
All tax amounts are summed and reported as a single Tax amount in the summary. Individual tax types are not captured unless a separate column is added in a custom version of the SOP.
8. My company uses a different date format (DD/MM/YYYY).
The SOP expects ISO format (YYYY‑MM‑DD) in the output. The system will convert a recognized DD/MM/YYYY date into the required format. If conversion fails, the date field is flagged.
9. What if the total amount includes a currency code not in the system (e.g., “JPY”)?
The system will capture the currency symbol or code. If it is not recognized, the system uses the default company currency (e.g., USD) and adds a note: “Currency not recognized – defaulted to USD”.
10. I need to extract additional fields (e.g., purchase order number).
Modify the SOP to include an extra field in the summary and add the relevant extraction rule. This version only extracts the fields listed in Section 5.
11. What if the line‑item table uses a different column order (e.g., quantity first, description second)?
The OCR extracts based on the column headings it detects (“Qty”, “Quantity”, “Amount”, “Price”, etc.). If headings are ambiguous, the system uses positional cues; however, if it cannot confidently map columns, it will flag the line‑item extraction and set status to Error – Unrecognizable line‑item format.
12. Is it possible to process a batch of invoices at once?
This SOP is designed for a single invoice/receipt per run. For batch processing, repeat the steps for each file individually or use a batch‑processing wrapper that invokes this SOP for each PDF.
13. How do I handle a receipt that has no tax line (e.g., a cash receipt)?
If no tax is found, the system sets Tax amount: $0.00 and includes a note “No tax shown”.
14. The vendor name includes special characters (e.g., “Müller & Sons”).
The system captures all characters, including diacritics, as they appear in the OCR text.
15. What if the PDF has multiple pages and the line‑items span across pages?
All pages are concatenated before extracting line‑items, preserving the order. The final line‑item table combines rows from all pages.
16. My company uses a different decimal separator (e.g., “1,250.00” vs “1.250,00”).
The system expects a period (“.”) as the decimal separator. If a comma is used as a decimal separator, the system will interpret it correctly if the pattern matches standard European format; otherwise, the numeric field is flagged as “Unrecognizable numeric value”.
17. How to handle discounts that are shown as a separate line item?
The system treats a discount line as a regular line‑item. If it contains the word “discount”, the system records it as a negative amount (e.g., –$50.00) in the line‑item table.
18. Is the output compatible with my accounting system?
The output is plain text with a simple table layout that can be copy‑pasted into most accounting software’s import fields. No special formatting is required.
19. Who should I contact if I encounter a persistent error?
Report the issue to the finance technology support team, providing the PDF that caused the error and the exact error message received.
20. Can the process be used for non‑invoice documents like contracts?
No. This SOP only extracts data from invoices and receipts. Other documents require a different SOP.
Appendix B – Glossary
| Term | Definition |
|---|
| Vendor | The supplier or service provider who issued the invoice or receipt. |
| Invoice/Receipt | A document that records a transaction, showing what was purchased, the amount owed, and payment terms. |
| Invoice Number | The unique identifier assigned by the vendor for the transaction; may be called “Invoice #”, “Invoice No.”, “Bill No.”, or “Receipt #”. |
| Receipt Number | A number that identifies a receipt; used when no invoice number is present. |
| Date | The date the transaction occurred, as shown on the document. |
| Currency | The monetary unit used (e.g., USD, EUR). |
| Total Amount | The final amount due, including all taxes and fees. |
| Tax Amount | The portion of the total that represents sales tax, VAT, GST, or any other applicable tax. |
| Line‑Item | Each individual product or service listed on the invoice/receipt. |
| Description | The text that describes a line‑item (e.g., “Office Chair”). |
| Quantity | The number of units of the item purchased. |
| Unit Price | The price per single unit of the item (excluding tax). |
| Line Total | The total cost for that line‑item (Quantity × Unit Price). |
| OCR (Optical Character Recognition) | A technology that converts images of text into machine‑readable text. |
| Processing Status | A short text message that tells the user whether the extraction succeeded or why it failed. |
| Formatting Rules | Instructions on how the output should be presented (e.g., date format, number of decimal places). |
Appendix C – Reference Materials
C1 – List of Common Expense Categories (For Classification in Accounting Systems)
| Category | Typical Items |
|---|
| Office Supplies | pens, notebooks, printer paper, staplers, printer cartridges |
| Furniture | desks, chairs, cabinets, shelving |
| Technology Equipment | computers, laptops, monitors, keyboards, mouse pads |
| Software & Licenses | software subscriptions, license keys |
| Travel & Transportation | airline tickets, train tickets, mileage, taxi rides |
| Meals & Entertainment | meals, catering, client entertainment |
| Marketing & Advertising | print ads, online ads, promotional items |
| Utilities | electricity, water, gas, internet services |
| Professional Services | consulting fees, legal services, accounting services |
| Rent & Lease | office lease, equipment lease |
| Miscellaneous | any item not fitting above categories |
Formatting Guidelines:
- Use title case for category names (e.g., Office Supplies).
- Use singular nouns for the “Typical Items” column.
- Do not include any punctuation other than commas separating items.
C2 – Currency Codes and Symbols
| Currency | Code | Symbol |
|---|
| United States Dollar | USD | $ |
| Euro | EUR | € |
| British Pound | GBP | £ |
| Japanese Yen | JPY | ¥ |
| Canadian Dollar | CAD | $ |
| Australian Dollar | AUD | $ |
| Swiss Franc | CHF | Fr |
| Chinese Yuan | CNY | ¥ |
| Indian Rupee | INR | ₹ |
| Brazilian Real | BRL | R$ |
| South African Rand | ZAR | R |
| Singapore Dollar | SGD | $ |
Formatting Rules:
- Always place the symbol before the numeric value (e.g., $1,250.00).
- When using a code, place it before the amount with a space (e.g., USD 1,250.00).
- Use a comma for thousands separator and a period for decimal separator.
C3 – Sample Invoice Formats (for reference)
1. Standard Invoice Layout
[Vendor Logo]
Vendor Name: Acme Supplies Ltd.
Address: 123 Main Street, City, Country
Phone: +1‑555‑1234
Invoice #: INV‑2024‑00123
Date: 07/15/2024
Due Date: 08/15/2024
Purchase Order #: PO‑98765
Bill To:
Company ABC
456 Business Ave.
City, Country
------------------------------------------------------------
Description Quantity Unit Price Line Total
Office Chair 2 $120.00 $240.00
Desk Lamp 5 $45.00 $225.00
Mouse Pad 10 $5.00 $50.00
------------------------------------------------------------
Tax (10%) $52.00
Total $567.00
2. Receipt Layout
Receipt
Vendor: ABC Electronics
Date: 2024‑07‑20
Receipt #: R-56789
Cashier: John Doe
Item Qty Price Total
USB Cable 3 $5.00 $15.00
Phone case 1 $12.00 $12.00
Subtotal: $27.00
Tax (8%): $2.16
Total: $29.16
Key Features to Recognize
- Vendor Name appears near top or in header.
- Invoice/Receipt Number is preceded by “Invoice”, “Invoice #”, “Invoice No.”, “Receipt #”.
- Date may appear with “Date:”, “Date of Issue:”.
- Line‑Item Table includes column headings for “Description”, “Qty”, “Quantity”, “Unit Price”, “Price”, “Line Total”, “Total”.
- Total often appears as the last bold number, preceded by “Total”, “Total Amount”, or “Grand Total”.
C4 – Sample Data Validation Checklist
| Check | Description | Pass Condition |
|---|
| PDF Readable | PDF can be opened and viewed. | Yes |
| Vendor Name | Extracted vendor name is not empty. | Not empty |
| Invoice/Receipt Number | A numeric or alphanumeric identifier present. | Present |
| Date | Recognized as a valid date. | Valid date |
| Currency | Either a symbol or 3‑letter code present. | Present or default applied |
| Total Amount | Numeric, includes decimal places. | Numeric |
| Tax Amount | Numeric (if present). | Numeric or 0 |
| Line Items | At least one line‑item present. | ≥1 |
| Line Totals | Sum of line‑totals + tax equals total amount (±0.01). | Yes |
| Date Format | YYYY‑MM‑DD. | Correct format |
| Currency Format | Symbol or code placed correctly. | Yes |
| No Duplicate Invoice | Only one invoice per PDF. | Yes |
| OCR Confidence | All required fields have confidence ≥80 %. | Yes |
If any check fails, the Processing Status will include an error description and the extracted data will be omitted.
C5 – Example Formatting Guidelines
- Dates – always use ISO format: YYYY‑MM‑DD (e.g., 2024‑07‑15).
- Numbers – use two decimal places; e.g., $1,250.00.
- Currency – show symbol (e.g., $) or three‑letter code (e.g., USD) before the amount.
- Line‑Item Table – keep columns aligned for readability; use vertical bars (|) and hyphens (‑) for table borders.
- Headers – use bold for section headers (e.g., Vendor name:).
- Error Messages – start with the word Error followed by a colon and a concise description.
C6 – Frequently Used Text Patterns for OCR
| Field | Typical Text Patterns |
|---|
| Vendor Name | Appears in the header; often followed by “Invoice”, “Bill to”, or a logo. |
| Invoice Number | “Invoice #”, “Invoice No.”, “Invoice Number”, “Inv:”, “#” followed by digits/letters. |
| Date | “Date:”, “Invoice Date:”, “Date of Issue:”, “MM/DD/YYYY”, “YYYY‑MM‑DD”. |
| Total | “Total”, “Total Amount”, “Grand Total”, “Amount Due”. |
| Tax | “Tax”, “VAT”, “GST”, “Tax Amount”. |
| Quantity | “Qty”, “Quantity”. |
| Unit Price | “Unit Price”, “Price per unit”, “$”, “USD”. |
| Line Total | “Line Total”, “Amount”, “Total”. |
C7 – Troubleshooting Checklist
- PDF cannot be opened – Verify that the file is not corrupted; re‑save the PDF.
- OCR extracts gibberish – Ensure the scan is high‑resolution (minimum 300 dpi).
- Fields missing – Check if the document uses unusual wording or a different language; ask for a clearer version or add manual notes.
- Totals do not match – Verify that all line items were captured; check for discounts or additional fees not in the line‑item table.
- Currency not recognized – Use a standard currency symbol or code; add the currency code as a note.
Tip: When scanning invoices, set the scanner to “PDF” mode, not “image” or “compressed” mode. Use black‑and‑white (monochrome) for text‑heavy documents and color for receipts that contain colored logos or stamps.
Additional Notes
- Speed: The OCR and parsing steps are typically completed within a few seconds on a standard workstation.
- Review: Even after successful extraction, the Accounts Payable clerk should review the data for any obvious anomalies before posting.
- Auditing: Keep the original PDF and the extracted text together as part of the audit trail.
- Customization: If your organization requires additional fields (e.g., purchase order number, expense code), add those fields to the summary section and adjust the extraction logic accordingly.
- Security: Store PDFs and extracted data in a secure location with access limited to finance personnel.