Research QA & Question Answering
1. Overview
This procedure guides an analyst to answer a set of specific questions by carefully reviewing a provided collection of documents. The analyst extracts relevant information, formulates concise answers, and records where each answer originated. All work is based solely on the supplied documents – no external sources are used.
2. Business Value
- Provides timely, accurate answers to client or internal queries, supporting decision‑making and project delivery.
- Guarantees that answers are traceable to original source material, increasing credibility and auditability.
- Reduces the time analysts spend searching for information by standardising the research‑to‑answer workflow.
3. Operational Context
- When it runs: Whenever a client, stakeholder, or internal team submits a list of questions that must be answered using a specific set of documents.
- Who uses it: Analysts, consultants, or research staff who are tasked with delivering factual, document‑based answers.
- How often: On an as‑needed basis; each run handles a single set of documents and a single set of questions.
4. Inputs
| Name/Label | Type | Details Provided |
|---|
| Document Collection | PDF files (one or more) | The complete set of PDF documents that contain the information needed to answer the questions. |
| Question List | List of text questions | A clear list of the specific questions that need to be answered. Each question should be expressed as a complete sentence or phrase. |
Scope Boundaries
- The analyst will not access any websites, databases, or any files other than the PDF documents supplied.
- No subjective opinions or speculation beyond the information in the documents will be included.
- The process will not generate any new identifiers or codes for the answers.
5. Outputs
| Name/Label | Contents | Formatting Rules |
|---|
| Answer Report | For each question: (1) the original question, (2) a concise answer drawn only from the documents, (3) citation that includes the document name and page number (e.g., “(Industry Trends 2023, p. 12)”). | – Use a numbered list (1., 2., …). – Each entry starts with the question, followed by the answer on the next line, and the citation in parentheses on the same line as the answer. – Tone: Formal and professional. – No extra identifiers or system‑generated IDs. – If the documents contain no relevant information, write “No information available in the provided documents.” |
Note: The answer report is delivered as plain text or as a simple table; no files (PDF, CSV, etc.) are produced by the process.
6. Detailed Plan & Execution Steps
- Receive Inputs – Confirm receipt of all PDFs listed in the Document Collection and the complete Question List.
- Validate Inputs – Verify each file is a readable PDF; ensure the question list contains at least one question. If any document is unreadable, note the problem and pause for clarification.
- Organise Documents – Create a folder (or logical grouping) for the documents, naming each file exactly as received.
- Read the Questions – List each question in the order provided, assigning a sequential number (1., 2., …).
- Search for Relevant Information
a. Open the first document.
b. Scan for text that directly addresses the current question (use keywords, headings, or tables).
c. Record the exact sentence or paragraph that supports an answer, including the page number.
d. Repeat for all documents until the question is answered or all documents have been examined.
- Formulate the Answer
a. Write the answer using only the wording from the source, adjusting grammar for readability.
b. Do not add information not found in the documents.
c. If multiple sources support the answer, combine them into a single concise answer, noting each source separately.
d. If contradictory information is found, present both viewpoints and note the discrepancy.
- Add Citation – Immediately after the answer, add a citation in the format:
(Document Title, p. X). If multiple documents are used, list each citation separated by a semicolon.
- Handle Missing Information – If after reviewing all documents no relevant content is found, write “No information available in the provided documents.” and continue to the next question.
- Compile the Report – Assemble all numbered question‑answer‑citation entries into the Answer Report.
- Quality Review
a. Check each question has an answer or a “no information” statement.
b. Verify every citation references a document that was part of the Document Collection and that page numbers are accurate.
c. Confirm the tone is formal and professional; correct any spelling or grammar errors.
- Finalize – Save the Answer Report as plain text (or a table if preferred). No additional files are generated.
- Deliver – Provide the Answer Report to the requester (e.g., via email, shared drive, or other agreed channel).
7. Validation & Quality Checks
- Question‑Answer Presence: Ensure every numbered question appears in the final report with either an answer or a “no information” note.
- Citation Accuracy: Verify each citation includes the correct document title and page number that matches the source text.
- Source Consistency: Confirm that each answer is based solely on the text within the provided PDFs; no external knowledge is used.
- Formatting Compliance: Confirm numbering, spacing, and citation format match the specification.
- Spelling & Grammar: Run a spell‑check and read the report for professional tone.
- Completeness Check: Confirm all questions from the input list have been addressed before finalizing.
8. Special Rules / Edge Cases
- Unreadable Documents: If a PDF cannot be opened, note the filename and flag for manual review; do not proceed with that document.
- Missing Page Numbers: If a PDF lacks visible page numbers, use a logical indicator (e.g., “section 2.3” or “first page”) and note that page numbers are unavailable.
- Multiple Valid Answers: When two or more documents provide valid, but slightly different, information, include both answers and label them as “Option 1,” “Option 2,” etc., citing each source.
- Conflicting Information: If sources contradict each other, present both statements, indicate the conflict, and avoid choosing a side. Example: “Source A states X; Source B states Y. The documents disagree.”
- No Relevant Information: If after reviewing all documents a question cannot be answered, respond with “No information available in the provided documents.” and move to the next question. Do not attempt to infer or guess.
- Duplicate Questions: If identical questions appear multiple times, treat each as a separate entry but use the same answer and citation.
- Partial Information: If only part of the question can be answered with the available documents, answer the part that is supported and note the missing portion with a “Not available in the provided documents” statement for the remaining portion.
- Sensitive Content: If any document contains personal data, confidential information, or restricted content, flag the document and stop the process. Do not include such content in the Answer Report; instead, note “Document contains confidential information – requires manual handling.”
9. Example
Input
-
Document Collection
- Industry Trends 2023.pdf (contains a market growth forecast on page 10).
- Client Survey Summary.pdf (contains a list of challenges on page 5).
-
Question List
- What is the projected growth rate for the sector in 2024?
- Which three key challenges are identified by the client?
- What next steps does the client recommend for the next quarter?
Output – Answer Report
-
Question: What is the projected growth rate for the sector in 2024?
Answer: The sector is expected to grow by 6.5 % in 2024. (Industry Trends 2023, p. 10)
-
Question: Which three key challenges are identified by the client?
Answer: The client identified the following three challenges: (1) declining customer retention, (2) supply‑chain disruptions, and (3) limited digital adoption. (Client Survey Summary, p. 5)
-
Question: What next steps does the client recommend for the next quarter?
Answer: The client recommends: (a) launching a targeted retention campaign, (b) securing alternative suppliers for critical components, and (c) investing in a digital transformation pilot. (Client Survey Summary, p. 7)
Appendix A – FAQ
Q1: What should I do if a PDF file is corrupted?
A: Flag the file, note the filename, and request a replacement before proceeding.
Q2: The question is too broad. How should I answer?
A: Provide the most relevant factual information found in the documents. If the question cannot be answered precisely, note “The provided documents do not contain sufficient detail to answer this question fully.”
Q3: Can I use external sources to supplement missing data?
A: No. Use only the supplied PDF documents. If the information is not present, respond with “No information available in the provided documents.”
Q4: How do I cite a source when the PDF has no page numbers?
A: Use the section heading or a relative location (e.g., “Section 2.1”) if available; otherwise, indicate “Page N/A”.
Q5: What if two documents provide different numbers for the same metric?
A: List both numbers, each with its citation, and note that the sources differ. Do not select one without justification.
Q6: Should I include any personal opinions?
A: No. Only present facts that appear in the documents. All personal opinions or interpretations must be excluded.
Q7: How many sentences should an answer contain?
A: Keep answers concise—typically one to two sentences per answer—unless additional detail is required for clarity.
Q8: Are tables allowed in the Answer Report?
A: The answer should be plain text; tables can be used only if they are part of the source document and are quoted verbatim. Do not create new tables.
Q9: What if a question references a figure that is an image in the PDF?
A: Describe the figure in words (e.g., “Figure 2 shows a 15 % increase in sales”) and cite the document and page.
Q10: How should I handle confidential or personal information that appears in the documents?
A: Do not include any confidential or personally identifying information in the Answer Report. Flag the document for manual review and omit the sensitive content.
Appendix B – Glossary
- Document – Any PDF file provided as input that contains information to be used for answering the questions.
- Citation – A brief reference indicating the source document and page number from which an answer was derived (e.g., “(Document Title, p. 12)”).
- Answer Report – The final output that contains each question, its answer, and the citation(s) for each answer.
- Question List – A plain list of the specific queries that need to be answered.
- Formal and professional tone – Language that is courteous, objective, and free of slang or casual phrasing.
- Page Number – The printed or digital page number where the cited information appears. If no page number is visible, use the section heading or indicate “Page N/A”.
Appendix C – Reference Materials
1. Formatting Guide for Answer Report
- Numbered List: Use Arabic numerals with a period (e.g., “1.”, “2.”).
- Question Presentation: Begin with “Question:” followed by the full question text.
- Answer Presentation: Begin with “Answer:” followed by a concise response.
- Citation Format: Use parentheses with the document title exactly as provided, a comma, the letter “p.” followed by the page number. Example: (Market Analysis 2024, p. 12).
- Multiple citations: Separate each citation with a semicolon inside the same parentheses. Example: (Report A, p. 3; Report B, p. 7).
- No extra identifiers: Do not generate or include any IDs that are not already part of the document titles.
- Line Spacing: Insert a blank line between each question‑answer block for readability.
2. Tone and Style Guide
| Guideline | Example |
|---|
| Use Full Sentences | “The market is expected to grow 6 %.” |
| Avoid Jargon | Instead of “leveraging synergies,” use “working together”. |
| Passive Voice Usage | “The data was analyzed” is acceptable; “We analyzed” is also acceptable as long as the tone remains formal. |
| Neutral Language | “The report states...” rather than “I think”. |
| Avoid Speculation | Do not write “likely” or “maybe”; only state what is in the documents. |
| Cite Every Statement | Every factual statement must have a citation. |
| No Personal Opinions | Do not add personal recommendations or opinions unless they are directly quoted in a document. |
| Length | Aim for 1–2 concise sentences per answer unless the source text requires longer quotations. |
3. Prohibited Content
- Personal Identifiable Information (PII): Do not include any personal data (e.g., names, addresses, phone numbers) that appear in the documents.
- Confidential Business Information: If a document contains confidential or proprietary information, flag for manual review and exclude from the Answer Report.
- External Sources: No use of external websites, databases, or external documents.
- Speculative Content: Do not generate statements that are not explicitly supported by the documents.
4. Citation Examples
| Source Document | Example Citation |
|---|
| Industry Trends 2023.pdf page 12 | (Industry Trends 2023, p. 12) |
| Client Survey Summary.pdf page 5 | (Client Survey Summary, p. 5) |
| Market Report.pdf no page number | (Market Report, p. N/A) |
5. Conflict Resolution Procedure
- Identify Conflict: Note when two or more documents provide differing facts.
- Document Both Views: Include each viewpoint in the answer, labeling them “Source 1” and “Source 2”.
- Cite Both Sources: Provide a separate citation for each viewpoint.
- No Decision Making: The analyst does not choose which source is “correct”; instead, present both facts and note that they are contradictory.
6. Sample Work‑Flow Checklist
7. Frequently Used Phrases
| Desired Phrase | Example Use |
|---|
| “According to …” | “According to Industry Trends 2023 (p. 12), the growth rate ….” |
| “The documents indicate…” | “The documents indicate that the client’s main concerns are … (Client Survey Summary, p. 5).” |
| “No information…” | “No information is available in the provided documents.” |
| “Source 1: …” | “Source 1: The report states … (Report A, p. 4).” |
8. Handling Non‑PDF Files
If a file is not a PDF (e.g., DOCX, PPT), request a PDF version. Do not attempt to convert or read the file in its current format.
9. Version Control
When updates to the SOP are needed, use a version number (e.g., SOP v2.0) and record the date of the change in the “Additional Notes” section.
10. Glossary of Additional Terms
- Reference Material: Any static list or guideline that supports the process (e.g., style guides, citation standards).
- Audit Trail: The record of citations and source documents that provides traceability for each answer.
- Manual Review: A human‑performed check for documents that are unreadable, contain confidential data, or require clarification.
Note: The SOP may be edited to include additional reference documents or refined style guidelines as the organization’s needs evolve.
Additional Notes
- Document Naming: Keep the original file names intact; they are used in citations.
- Version Tracking: When a new version of a document is added, replace the older PDF with the newest version before starting the process.
- Feedback Loop: If the analyst discovers missing information that should have been included in the original document set, note the gap and suggest a supplemental document for future queries.
- Continuous Improvement: Periodically review the SOP for clarity, update the style guide, and incorporate any new best practices for document‑based research.