Introduction to GraphRAG: Combining Knowledge Graphs with RAG
The Problem with Vanilla RAG
Retrieval-augmented generation (RAG) solved a real problem: LLMs hallucinate when asked about facts not in their training data. By retrieving relevant documents and injecting them into the prompt, RAG grounds the model in external knowledge.
But vanilla RAG has blind spots:
- Chunk isolation: Documents are split into chunks and embedded independently. Related facts across chunks are lost.
- Entity ambiguity: "Apple" — the fruit or the company? Vector similarity can't distinguish.
- Multi-hop reasoning: "Which employees of companies founded in 2015 worked on GraphRAG papers?" requires joining across documents.
- No structure: Retrieved chunks are flat text. Relationships, hierarchies, and provenance are invisible to the LLM.
What Is GraphRAG?
GraphRAG replaces flat vector retrieval with graph-based retrieval:
- Build a knowledge graph from your documents (entities → nodes, relationships → edges)
- Index both the graph structure and the document text
- On query: Traverse the graph to find relevant sub-graphs AND retrieve related documents
- Feed both structured sub-graph and document context to the LLM
Query: "What security issues exist in GraphRAG implementations?"
Vector search: [3 chunks about security, partially relevant]
Graph search: [sub-graph: GraphRAG → implementations → security_audit → CVE-2026-XX]
[linked documents: "Security audit of GraphRAG v2.1", "CVE report 2026"]
LLM context: structured data + related documents
Output: grounded, multi-hop answer
The GraphRAG Pipeline
1. Entity Extraction
Process documents with an LLM to extract entities and relationships:
entities = [
{"name": "GraphRAG", "type": "Technology"},
{"name": "CVE-2026-XX", "type": "Vulnerability"},
{"name": "Neo4j", "type": "Database"},
]
relationships = [
("GraphRAG", "uses", "Neo4j"),
("CVE-2026-XX", "affects", "GraphRAG"),
]
2. Graph Construction
Insert entities and relationships into Neo4j:
CREATE (g:Technology {name: "GraphRAG"})
CREATE (c:Vulnerability {id: "CVE-2026-XX"})
CREATE (n:Database {name: "Neo4j"})
CREATE (g)-[:USES]->(n)
CREATE (c)-[:AFFECTS]->(g)
3. Hybrid Retrieval
On query, perform both:
- Vector search on document chunks (for broad context)
- Graph traversal from matched entities (for structured relationships)
4. Context Assembly
Combine results into a structured prompt:
Relevant documents:
[chunk 1], [chunk 2], [chunk 3]
Knowledge graph sub-graph:
GraphRAG --uses--> Neo4j
CVE-2026-XX --affects--> GraphRAG
CVE-2026-XX --severity--> Critical
Implementation Options
Microsoft's GraphRAG
Microsoft Research's GraphRAG paper introduced a automated pipeline that:
- Extracts entity communities using Leiden clustering
- Generates community summaries
- Answers queries at both global and local scope
Good for: Large document corpora, question answering over broad topics
Custom GraphRAG with Neo4j + LangChain
For production systems, a custom implementation gives more control:
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
graph = Neo4jGraph(url="bolt://localhost:7687", username="neo4j", password="...")
chain = GraphCypherQAChain.from_llm(llm=llm, graph=graph)
result = chain.invoke("What security issues affect GraphRAG?")
Good for: Domain-specific applications, precise control over graph schema
When to Use GraphRAG
GraphRAG shines when:
- Your data has rich entity relationships
- Multi-hop reasoning is required
- Entity disambiguation matters
- You need provenance (which document supports this fact?)
It's overkill when:
- You're answering simple factoid questions
- Your documents have no relational structure
- Latency is the primary concern (graph traversal adds overhead)
Next Steps
The infrastructure for GraphRAG already exists in your Neo4j knowledge graph (see AGENTS.md). The next article will walk through building a production GraphRAG pipeline using Neo4j + an open-source LLM.