Recent advances in large language models (LLMs) like ChatGPT have shown promising results on open-domain question answering. However, multi-document question answering (MD-QA), which requires analyzing and reasoning across multiple documents, remains challenging for LLMs.
A new paper from Vanderbilt University and Adobe Research proposes a novel method called Knowledge Graph Prompting (KGP) to improve MD-QA performance. The key idea is to construct a knowledge graph over multiple documents and use it to retrieve relevant contexts for prompting the LLM.
To construct the knowledge graph, the researchers first split each document into passages. For each passage node, they extract features like keywords using TF-IDF or embeddings from sentence encoders. Edges are then added between passages based on lexical similarity (shared keywords) or semantic similarity (encoder embeddings). Structural nodes for tables, pages, etc are also extracted and connected based on relations like a page containing certain passages. Overall, the graph encodes associations between passages across documents as well as structural relationships within documents. This unified representation allows gathering relevant contexts for prompting the language model during question answering. The paper explores and compares different methods like TF-IDF, sentence transformers and pre-trained models to construct quality knowledge graphs tailored for multi-document QA.
During QA, an LM-guided graph traverser navigates the knowledge graph to find pertinent passages. The traverser is fine-tuned to generate the next “evidence” passage given current context. Retrieved neighbors that best match the generated text are added to the context1.
Results on datasets like HotpotQA, 2WikiMQA and MuSiQue show the Knowledge Graph Prompting method achieves much higher accuracy than most retrieval baselines. On 2WikiMQA, KGP-T52 reaches accuracy of 63.5% compared to 72.6% for golden33 contexts and 44.4% with no context. On the same dataset KGP acheves 39.8% of exact match answers compring to 40.2% for golden, a theoretical limit for the method. The constructed knowledge graphs also exhibit high coverage of supporting facts for QA.
The paper demonstrates how knowledge graphs can enhance prompt engineering for LLMs. By encoding inter-document associations and mimicking multi-step reasoning, KGP provides the context LLMs need for complex MD-QA. This allows fully leveraging their capabilities.
The proposed techniques could enable LLMs to answer questions that require aggregating and connecting insights across long documents, research papers, reports and more. KGP helps overcome inherent limitations of LLMs’knowledge while retaining their effectiveness at comprehension and reasoning over provided contexts.
Productivity applications like search engines, virtual assistants, and enterprise analytics tools can potentially use KGP to handle multi-document inputs. The ability to reason over connected representations of documents opens up new possibilities for LLMs. With further research, KGP could become a core technique for multi-document understanding tasks beyond QA.
- Once the knowledge graph is constructed, an LM-guided graph traverser navigates it to retrieve pertinent contexts for the given question. The traverser is fine-tuned to generate the next relevant “evidence” passage based on the current context. It prompts the language model to produce this evidence passage. The generated text is compared to candidate neighbors in the graph to find the most similar node to traverse next. This iterative process gathers passages that logically approach the question answer. For structural questions, relevant table or page contents are directly retrieved. The final retrieved contexts are provided to the language model as a prompt to generate the answer. By leveraging the knowledge graph, connections across documents are encoded to retrieve necessary contexts, enhancing the language model’s reasoning and answering capabilities. ↩︎
- KGP-T5: Knowledge Graph Prompting technique that uses the T5 language model to guide context retrieval from the constructed knowledge graph. ↩︎
- “golden” refers to prompting the LLM with the ground truth supporting facts for each question already contained in the prompt. ↩︎