What Are Knowledge Graphs? A Practical Introduction
From Tables to Networks
Most software engineers are trained to think in tables. Relational databases, spreadsheets, ORMs — rows and columns are the default mental model for data. But the world doesn't organise itself into tables. Relationships between entities are often more important than the entities themselves.
A knowledge graph is a data structure that puts relationships first. Instead of a users table and a purchases table joined by a foreign key, a knowledge graph represents everything as interconnected nodes and edges.
Core Concepts
Nodes (Vertices)
Nodes represent entities — people, places, concepts, events. Each node has a unique identifier and can carry properties:
Node: Paris
type: City
population: 2.1M
country: France
Edges (Relationships)
Edges represent connections between nodes. In a property graph, edges are directional and can carry their own properties:
(Paris) -[capital_of]-> (France)
established: 508 AD
Labels and Types
Nodes and edges are typed. This is what separates a knowledge graph from a generic graph database:
- Node labels:
City,Company,Person,Technology - Edge types:
capital_of,employs,developed_by,depends_on
RDF vs Property Graphs
Two dominant models exist:
RDF (Resource Description Framework)
The semantic web standard. Everything is a triple: subject → predicate → object.
@prefix ex: <http://example.org/> .
ex:Paris ex:capitalOf ex:France .
ex:France ex:hasPopulation 67M .
- Strengths: Web-native, standardised (W3C), linked data principles
- Query language: SPARQL
- Best for: Data integration, open data, cross-domain linking
Property Graph Model
The model used by Neo4j, Amazon Neptune, and ArangoDB:
CREATE (p:City {name: "Paris", population: 2100000})
CREATE (f:Country {name: "France", population: 67000000})
CREATE (p)-[:CAPITAL_OF]->(f)
- Strengths: Intuitive, performant for graph traversals, flexible schema
- Query language: Cypher (Neo4j), Gremlin (Apache TinkerPop)
- Best for: Transactional applications, real-time recommendations, fraud detection
Why Knowledge Graphs Matter for AI
Structured Knowledge for LLMs
Large language models are stateless pattern matchers. They don't know things — they predict tokens. A knowledge graph provides:
- Factual grounding: Graph queries return verified facts, not probabilistic completions
- Relationship traversal: Answer multi-hop questions ("Which employees of Company X worked on projects using Neo4j?")
- Consistency: The same fact queried twice returns the same result
GraphRAG
Combining retrieval-augmented generation (RAG) with knowledge graphs produces GraphRAG — a pattern where:
- User query is parsed into a graph query
- The graph returns structured sub-graph results
- Results are formatted as context for the LLM prompt
- LLM generates a response grounded in the graph data
This is the subject of its own article, but the key insight is: graphs give LLMs a reliable memory.
Getting Started
The easiest way to start with knowledge graphs:
- Install Neo4j:
docker run --publish=7474:7474 --publish=7687:7687 neo4j - Learn Cypher: The
MATCHstatement is 80% of what you need - Model a small domain: Your music library, a project dependency tree, or a customer journey
- Connect it to an LLM: Use the Neo4j Python driver to query your graph and feed results to any LLM
A knowledge graph is not a silver bullet. For simple CRUD apps, PostgreSQL is the right tool. But when your data's value comes from how things connect, a knowledge graph becomes indispensable.