Category Classification
1. Overview
The Category Classification process determines the correct product category for an e‑commerce item based on its description. It either confirms that a pre‑assigned category is accurate, or selects the most appropriate category from a predefined taxonomy when no valid assignment exists.
2. Business Value
- Improves discoverability – Customers find products faster when they are in the right category.
- Boosts SEO – Search engines rank products better when taxonomy is consistent.
- Reduces manual errors – Standardized classification cuts down on re‑work and mis‑placement of items.
- Enables reporting – Accurate categories support inventory, sales, and trend analysis.
3. Operational Context
| When it runs | Who uses it | Frequency |
|---|
| When a new product is entered or an existing product’s details are edited. | Catalog Manager (and any team member responsible for product data). | As often as new or updated product data are entered (typically daily). |
4. Inputs
| Name/Label | Type | Details Provided |
|---|
| Product Description | Text | The full description of the product as it appears on the site (features, dimensions, color, usage, etc.). |
| Existing Category (optional) | Text | The category currently attached to the product in the system, if any. |
Note: All information needed for a single product is supplied in the two fields above. No external files or references are required at runtime. The full taxonomy needed for classification is supplied in Appendix C.
5. Outputs
| Name/Label | Contents | Formatting Rules |
|---|
| Assigned Category | The full category path chosen from the taxonomy (e.g., “Electronics > Computers > Laptops”). | Present as a single line with “ > ” separators. |
| Validation Status | One of the following words: Valid (existing category confirmed), Assigned (new category assigned), Needs Review (manual verification needed), Error (process could not be completed). | Use the exact capitalized word. |
| Notes (optional) | Brief explanation of why a category was assigned or why review is required. | Sentence‑case, no line breaks. |
6. Detailed Plan & Execution Steps
-
Collect Inputs – Retrieve the Product Description and, if present, the Existing Category.
-
Check Description Presence – If the Product Description field is empty, stop the process, set Validation Status to Error, and add a note: “Missing product description.”
-
Validate Existing Category (if provided):
a. Search the taxonomy (Appendix C) for an exact match to the Provided Category.
b. If a match exists, go to step 5.
c. If no match, treat the category as not provided and continue to step 4.
-
Keyword Extraction – From the Product Description, extract meaningful words (e.g., “laptop,” “cotton,” “gaming”).
-
Match Keywords to Taxonomy –
a. For each category in the taxonomy, count how many of its defined keywords appear in the extracted list.
b. Record the number of matches for each category.
-
Select Best Category –
a. Choose the category with the highest keyword match count.
b. If a tie occurs, select the category with the deepest hierarchical level (e.g., “Laptops” is deeper than “Computers”).
c. If no category has any match, assign “Miscellaneous > Other” and set Validation Status to Needs Review.
-
Determine Validation Status –
- If the Original Existing Category was found in the taxonomy and the keyword analysis confirms it, set Validation Status to Valid.
- If a new category was selected in step 6, set Validation Status to Assigned.
- If a “Miscellaneous” assignment was made, set Validation Status to Needs Review.
- If any required check fails (e.g., missing description), set Validation Status to Error.
-
Record Output –
- Populate Assigned Category, Validation Status, and Notes (if any).
-
Log Outcome – Store a brief log entry: product identifier (if available), date, assigned category, and validation status. (The log is for internal audit; it does not become an output.)
7. Validation & Quality Checks
- Description Presence: Must not be empty.
- Category Existence: The final Assigned Category must be listed in the taxonomy (Appendix C).
- Keyword Match: At least one keyword must match a category; otherwise, “Miscellaneous > Other” is used.
- Hierarchy Depth: Prefer the most specific (deepest) category when multiple have the same match count.
- Status Accuracy: Ensure the status value aligns with the outcome (Valid, Assigned, Needs Review, or Error).
- Notes Consistency: If a note is provided, it should be a concise sentence explaining the decision.
8. Special Rules / Edge Cases
| Situation | Handling |
|---|
| Missing Description | Set Validation Status to Error; note “Missing product description.” |
| Existing Category Not in Taxonomy | Treat as not provided; proceed to keyword matching (step 4). |
| Multiple Categories Tie on Keyword Count | Choose the deepest (most specific) category; if depth is also equal, assign “Miscellaneous > Other” and set Needs Review. |
| Ambiguous or Very General Description (e.g., “nice product”) | Assign “Miscellaneous > Other” and set Needs Review. |
| Product Is a Bundle of Multiple Items | Process each item separately; the current process handles a single item only (out‑of‑scope for bundles). |
| Digital Products | Must be classified under “Digital Products” hierarchy in the taxonomy. |
| Prohibited Item (e.g., weapons, illegal drugs) | Do not assign a category; set Validation Status to Error and flag for immediate manual review. |
| Failure of Keyword Extraction (e.g., all words are stop‑words) | Assign “Miscellaneous > Other” and set Needs Review. |
| System Unable to Access Taxonomy (unexpected error) | Set Validation Status to Error and note “Taxonomy unavailable.” |
9. Example
Input Example
- Product Description: “A portable 13‑inch laptop with a Retina display, 8 GB RAM, 256 GB SSD, silver finish, ideal for students.”
- Existing Category: “Computers”
Execution
- Description present → continue.
- “Computers” exists in the taxonomy → proceed to keyword match.
- Keywords extracted: laptop, 13‑inch, portable, retina, display, 8 GB RAM, 256 GB SSD, silver, students.
- Keyword match yields:
- Laptops – 3 matches (laptop, portable, 13‑inch)
- Computers – 2 matches (computer‑related, but “laptop” also maps).
- “Laptops” is deeper in the hierarchy than “Computers”.
- Assigned Category: “Electronics > Computers > Laptops”.
- Validation Status: Valid (because original category “Computers” is confirmed and the more specific “Laptops” is selected).
Output Example
- Assigned Category: “Electronics > Computers > Laptops”
- Validation Status: Valid
- Notes: “Original category confirmed; specific sub‑category ‘Laptops’ selected based on keywords.”
Appendix A – FAQ
Q1: What if the product description is very short or vague?
A: The process will assign “Miscellaneous > Other” and set the status to Needs Review so a human can decide the correct category.
Q2: How are keywords determined?
A: Keywords are predefined for each category (see Appendix C). The process looks for exact word matches, ignoring case and punctuation.
Q3: Can a product belong to more than one category?
A: Only one primary category is selected. If a product truly belongs to multiple categories, the most specific (deepest) category is chosen and the secondary category can be added later by a manual reviewer.
Q4: Where do I find the list of categories?
A: The complete taxonomy is provided in Appendix C – Category Taxonomy & Keyword Mapping.
Q5: How are synonyms handled (e.g., “smartphone” and “mobile phone”)?
A: Synonyms are included in the keyword lists for each category. The process matches any of the listed synonyms.
Q6: What should I do if I get an “Error” status?
A: Review the input. If the description is missing, supply it. If the taxonomy is unavailable (system issue), wait for the system to be restored or flag the product for manual handling.
Q7: Are there categories that are never allowed?
A: Yes. The list of prohibited categories is in Appendix C – Prohibited Items. Items that fall under those categories must be flagged for immediate review and not be assigned a regular category.
Q8: How often should the taxonomy be updated?
A: The taxonomy should be reviewed and updated at least quarterly to reflect new product lines, emerging market trends, and changes in branding. Updates are made in Appendix C.
Q9: Who is responsible for maintaining the keyword lists?
A: The Catalog Management Team maintains the keyword lists, ensuring they match the current product range and marketing language.
Q10: Can I override the system-assigned category?
A: Yes. If a human reviewer determines a different category is more appropriate, they may edit the Assigned Category and change the Validation Status to Valid after confirming the decision.
Appendix B – Glossary
| Term | Definition |
|---|
| Category | A label used to group similar products; part of a hierarchical taxonomy (e.g., “Electronics > Computers > Laptops”). |
| Taxonomy | The complete, hierarchical list of all allowable categories in the e‑commerce catalog. |
| Product Description | The text that describes a product’s features, specifications, and usage, as it would appear on the product page. |
| Keyword | A single word or short phrase that the system matches to a category (e.g., “laptop”, “cotton”). |
| Assigned Category | The final category chosen by the process, expressed as a full path through the taxonomy. |
| Validation Status | A label indicating how the assignment was determined: Valid, Assigned, Needs Review, or Error. |
| Notes | Optional short text that explains why a particular category was chosen or why manual review is needed. |
| Missing | A required input (e.g., product description) that is not present. |
| Prohibited Item | An item that must not appear in the catalog (e.g., weapons, illegal substances). |
| Deepest Level | The most specific level in the taxonomy hierarchy (e.g., “Laptops” is deeper than “Computers”). |
| Manual Review | An additional check performed by a person when the process cannot confidently assign a category. |
| Miscellaneous | The catch‑all category for items that do not fit any other defined category. |
Appendix C – Category Taxonomy, Keyword Mapping, Style Guide & Prohibited Items
1. Category Taxonomy (Hierarchical List)
Electronics
- Computers
- Desktops
- Laptops
- Gaming Laptops
- Tablets
- Mobile Devices
- Smartphones
- Tablets (Mobile)
- Wearables
- Audio & Video
- Headphones
- Speakers
- televisions
- Projectors
Apparel
- Men’s Clothing
- Shirts
- Pants
- Jackets
- Hoodies
- T‑shirts
- Women’s Clothing
- Dresses
- Blouses
- Skirts
- Leggings
- Accessories
Home & Garden
- Furniture
- Living Room
- Sofas
- Coffee Tables
- TV Stands
- Bedroom
- Beds
- Dressers
- Nightstands
- Kitchen & Dining
- Cookware
- Kitchen Appliances
- Dinnerware
- Garden & Outdoor
- Patio Furniture
- Grills
- Outdoor Lighting
Sports & Outdoors
- Exercise Equipment
- Treadmills
- Dumbbells
- Yoga Mats
- Outdoor Gear
- Camping Tents
- Hiking Backpacks
- Sleeping Bags
Books & Media
- Books
- Fiction
- Non‑Fiction
- Children’s Books
- Media
- DVDs
- Blu‑ray Discs
- Vinyl Records
Beauty & Personal Care
- Skincare
- Moisturizers
- Cleansers
- Sunscreens
- Haircare
- Shampoos
- Conditioners
- Hair Styling Tools
Health & Wellness
- Supplements
- Vitamins
- Wellness Devices
Toys & Games
- Toys
- Games
- Board Games
- Card Games
- Puzzles
Digital Products
- Software
- Operating Systems
- Office Productivity
- Graphic Design
- eBooks
- Digital Music
Miscellaneous
- Other (for items that do not fit any above category)
2. Keyword Mapping (Category → Keywords)
| Category | Keywords (comma‑separated) |
|---|
| Laptops | laptop, notebook, ultrabook, gaming laptop, portable computer, 13‑inch, 15‑inch |
| Gaming Laptops | gaming, high‑performance, GPU, SSD, high‑refresh, gamer |
| Desktops | desktop, tower, PC, workstation, tower PC |
| Tablets | tablet, iPad, Android tablet, portable tablet |
| Smartphones | smartphone, mobile phone, iPhone, Android, smartphone case |
| Wearables | smartwatch, fitness tracker, wearable |
| Headphones | headphone, earbuds, headphones, earbuds, audio |
| Speakers | speaker, Bluetooth speaker, home audio |
| Men’s Shirts | shirt, cotton, button‑down, polo, dress shirt |
| Women’s Dresses | dress, summer, evening, cocktail, midi |
| Hoodies | hoodie, fleece, hooded sweatshirt |
| T‑shirts | t‑shirt, tee, cotton |
| Bags | backpack, tote, duffel, travel bag |
| Sofas | sofa, couch, sectional, living room couch |
| Beds | bed, mattress, queen, king |
| Cookware | cookware, saucepan, pan, pot |
| Kitchen Appliances | blender, toaster, microwave, kitchen appliance |
| Camping Tents | tent, camping, outdoor shelter |
| Hiking Backpacks | backpack, hiking, trek |
| Treadmills | treadmill, cardio, fitness machine |
| Fiction Books | novel, fiction, novel, literary |
| DVDs | DVD, disc, video |
| Moisturizers | moisturizer, cream, skin care |
| Shampoos | shampoo, hair care, wash |
| Supplements | supplement, vitamin, mineral |
| Software – Operating Systems | operating system, OS, Windows, macOS |
| Digital Music | music, MP3, streaming, audio |
| Other | miscellaneous, other, unclassified, unknown |
The keyword lists are not exhaustive; they are representative examples. The Catalog Manager may add or adjust keywords in the future. The process uses any exact match (case‑insensitive) of a keyword in the product description.
3. Category Naming Style Guide
- Title case – Capitalize the first letter of each word (e.g., “Laptops”, “Mobile Devices”).
- No abbreviations – Use full words (e.g., “Smartphones” not “Phones”).
- Singular nouns – Use singular form for all categories (e.g., “Sofa”, not “Sofas”).
- No trailing spaces – Ensure no extra spaces before or after the category name.
- Separator – Use “ > ” (space‑greater‑space) between hierarchy levels (e.g., “Electronics > Computers > Laptops”).
- No special characters – Avoid symbols like “/”, “\”, “#” within category names.
4. Prohibited Items
| Category | Reason for Prohibition |
|---|
| Weapons | Illegal or restricted product; must be removed from catalog. |
| Illegal Drugs | Controlled substances; prohibited by law. |
| Adult Content | Explicit sexual material; not allowed for general audience. |
| Counterfeit Goods | Violates intellectual property laws. |
| Hazardous Materials | Requires special handling; not permitted in standard e‑commerce. |
| Fraudulent Items | Misleading or false claims. |
If a product falls into any of the above categories, set Validation Status to Error, add a note describing the prohibited nature, and flag for immediate manual review.
5. Worked Example (Full Flow)
Scenario: New product “Ultra‑light 7‑kg hiking backpack with 30‑liter capacity, water‑resistant, with multiple compartments.”
- Extracted keywords: “backpack, hiking, 30‑liter, water‑resistant, multiple compartments”.
- Keyword match: “Backpack” → Camping Gear (under “Outdoor Gear”).
- Assigned Category: “Sports & Outdoors > Outdoor Gear > Backpack”.
- Validation Status: Assigned.
- Notes: “Product fits best in ‘Backpack’ category based on keywords.”
Additional Notes
- The process assumes the Catalog Manager reviews any Needs Review or Error entries promptly.
- Updates to the taxonomy or keyword mapping should be documented in version control.
- The system does not generate any new IDs or create files; it returns only the textual outputs defined above.
- All category decisions must be logged for auditability but remain internal to the catalog management system.