Skip to main content
graphwiz.aigraphwiz.ai
← Back to graphrag

Introduction to GraphRAG: Combining Knowledge Graphs with RAG

The Problem with Vanilla RAG

Retrieval-augmented generation (RAG) solved a real problem: LLMs hallucinate when asked about facts not in their training data. By retrieving relevant documents and injecting them into the prompt, RAG grounds the model in external knowledge.

But vanilla RAG has blind spots:

  • Chunk isolation: Documents are split into chunks and embedded independently. Related facts across chunks are lost.
  • Entity ambiguity: "Apple" — the fruit or the company? Vector similarity can't distinguish.
  • Multi-hop reasoning: "Which employees of companies founded in 2015 worked on GraphRAG papers?" requires joining across documents.
  • No structure: Retrieved chunks are flat text. Relationships, hierarchies, and provenance are invisible to the LLM.

What Is GraphRAG?

GraphRAG replaces flat vector retrieval with graph-based retrieval:

  1. Build a knowledge graph from your documents (entities → nodes, relationships → edges)
  2. Index both the graph structure and the document text
  3. On query: Traverse the graph to find relevant sub-graphs AND retrieve related documents
  4. Feed both structured sub-graph and document context to the LLM
Query: "What security issues exist in GraphRAG implementations?"

Vector search: [3 chunks about security, partially relevant]
Graph search:  [sub-graph: GraphRAG → implementations → security_audit → CVE-2026-XX]
               [linked documents: "Security audit of GraphRAG v2.1", "CVE report 2026"]

LLM context: structured data + related documents
Output: grounded, multi-hop answer

The GraphRAG Pipeline

1. Entity Extraction

Process documents with an LLM to extract entities and relationships:

entities = [
    {"name": "GraphRAG", "type": "Technology"},
    {"name": "CVE-2026-XX", "type": "Vulnerability"},
    {"name": "Neo4j", "type": "Database"},
]
relationships = [
    ("GraphRAG", "uses", "Neo4j"),
    ("CVE-2026-XX", "affects", "GraphRAG"),
]

2. Graph Construction

Insert entities and relationships into Neo4j:

CREATE (g:Technology {name: "GraphRAG"})
CREATE (c:Vulnerability {id: "CVE-2026-XX"})
CREATE (n:Database {name: "Neo4j"})
CREATE (g)-[:USES]->(n)
CREATE (c)-[:AFFECTS]->(g)

3. Hybrid Retrieval

On query, perform both:

  • Vector search on document chunks (for broad context)
  • Graph traversal from matched entities (for structured relationships)

4. Context Assembly

Combine results into a structured prompt:

Relevant documents:
[chunk 1], [chunk 2], [chunk 3]

Knowledge graph sub-graph:
GraphRAG --uses--> Neo4j
CVE-2026-XX --affects--> GraphRAG
CVE-2026-XX --severity--> Critical

Implementation Options

Microsoft's GraphRAG

Microsoft Research's GraphRAG paper introduced a automated pipeline that:

  • Extracts entity communities using Leiden clustering
  • Generates community summaries
  • Answers queries at both global and local scope

Good for: Large document corpora, question answering over broad topics

Custom GraphRAG with Neo4j + LangChain

For production systems, a custom implementation gives more control:

from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain

graph = Neo4jGraph(url="bolt://localhost:7687", username="neo4j", password="...")
chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph)
result = chain.invoke("What security issues affect GraphRAG?")

Good for: Domain-specific applications, precise control over graph schema

When to Use GraphRAG

GraphRAG shines when:

  • Your data has rich entity relationships
  • Multi-hop reasoning is required
  • Entity disambiguation matters
  • You need provenance (which document supports this fact?)

It's overkill when:

  • You're answering simple factoid questions
  • Your documents have no relational structure
  • Latency is the primary concern (graph traversal adds overhead)

Next Steps

The infrastructure for GraphRAG already exists in your Neo4j knowledge graph (see AGENTS.md). The next article will walk through building a production GraphRAG pipeline using Neo4j + an open-source LLM.