LLMs and knowledge graphs

Integrating Large Language Models (LLMs) with Knowledge Graphs (KGs) in scientific domains to enhance the accuracy and depth of information retrieval and reasoning.
See Graph RAG

Resources

GitHub - zjukg/KG-LLM-Papers - Papers integrating knowledge graphs (KGs) and large language models (LLMs)
Blending Large Language Models and Knowledge Graphs - An Introduction

Code

#CODE Ontogpt - Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding

References

#PAPER Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering (2024)
- https://github.com/zjukg/knowpat
#PAPER Knowledge Graph Based Agent for Complex, Knowledge-Intensive QA in Medicine (2024)
- Harvard Presents NEW Knowledge-Graph AGENT (MedAI) - YouTube
- Large Language Models (LLMs) struggle with medical reasoning due to limitations in retrieving contextually relevant, codified, and structured knowledge.
- Challenges include difficulty combining structured knowledge (e.g., Knowledge Graphs) with non-codified data and the lack of accurate post-retrieval verification in retrieval-augmented generation (RAG) models.
- Harvard introduced a Knowledge Graph Agent (KGA) that integrates LLMs with medical Knowledge Graphs.
- This agent combines the structured data of Knowledge Graphs with LLM reasoning capabilities to improve factual accuracy, contextual relevance, and medical reasoning.
- Methodology has four Phases:
  1. Generate: LLMs generate triplets (Head-Relation-Tail) representing medical concepts and relationships.
  2. Review: Generated triplets are validated against Knowledge Graphs to ensure factual accuracy.
  3. Revise: Incorrect or incomplete triplets are adjusted or flagged for special processing.
  4. Answer: The validated and revised triplets are used for reasoning and answering medical queries.
- Integration with Knowledge Graphs:
  - LLM embeddings and Knowledge Graph embeddings are aligned in a shared geometric space using projection layers and self-attention mechanisms
  - The linear layer that maps the dimension of pre-trained embeddings of entities and relations to the same dimension as that of token embeddings in the LLM
  - Fine-tuning the LLM on Knowledge Graph completion tasks bridges gaps in the graph and enhances reasoning capabilities.
- Improved reasoning by combining structured (Knowledge Graphs) and unstructured (LLM) knowledge. Ability to propose novel relationships and fill missing links in partially complete Knowledge Graphs, essential in medicine.
- The methodology showed significant accuracy improvements (e.g., 50% → ~70%) in medical reasoning tests compared to standalone LLMs.
- The approach was implemented on a modest computing setup (e.g., NVIDIA H100 GPUs), showcasing scalability and feasibility for broader applications.
- The methodology offers a path to replace RAG systems with more powerful, integrated solutions for knowledge-intensive medical reasoning.
#PAPER #REVIEW Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey (2024)
- GitHub - zjukg/KG-MM-Survey: Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
#PAPER GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding (2024)
#PAPER LLMs for knowledge graph construction and reasoning: recent capabilities and future opportunities (2024)
#PAPER Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph (2024)
- https://github.com/IDEA-FinAI/ToG?tab=readme-ov-file
#PAPER GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation (2024)
- ADD LLM TO Knowledge-Graph: NEW GIVE Method (Berkeley) - YouTube
- Novel framework designed to enhance the reasoning capabilities of LLMs when dealing with sparse knowledge graphs
- Traditional retrieval-based methods often depend on dense, high-quality knowledge sources, which can be costly and impractical to develop for specialized domains. To address this, the authors propose the Graph Inspired Veracity Extrapolation (GIVE) framework, which integrates both parametric (internal) and non-parametric (external) knowledge to improve information retrieval and reasoning processes.
- GIVE operates by prompting LLMs to:
  1. Decompose Queries: Break down the input question into essential concepts and attributes.
  2. Construct Entity Groups: Assemble groups of related entities pertinent to the query.
  3. Build Augmented Reasoning Chains: Explore potential relationships among entities across these groups, incorporating both factual and extrapolated connections.
- This structured approach encourages LLMs to adopt a logical, step-by-step reasoning process similar to expert problem-solving, rather than merely retrieving direct answers.
- GIVE is powerful with sparse or incomplete KGs (such as in biomedical research)