What Are Knowledge Graphs? A Practical Introduction

From Tables to Networks

Most software engineers are trained to think in tables. Relational databases, spreadsheets, ORMs — rows and columns are the default mental model for data. But the world doesn't organise itself into tables. Relationships between entities are often more important than the entities themselves.

A knowledge graph is a data structure that puts relationships first. Instead of a users table and a purchases table joined by a foreign key, a knowledge graph represents everything as interconnected nodes and edges.

Core Concepts

Nodes (Vertices)

Nodes represent entities — people, places, concepts, events. Each node has a unique identifier and can carry properties:

Node: Paris
  type: City
  population: 2.1M
  country: France

Edges (Relationships)

Edges represent connections between nodes. In a property graph, edges are directional and can carry their own properties:

(Paris) -[capital_of]-> (France)
  established: 508 AD

Labels and Types

Nodes and edges are typed. This is what separates a knowledge graph from a generic graph database:

Node labels: City, Company, Person, Technology
Edge types: capital_of, employs, developed_by, depends_on

RDF vs Property Graphs

Two dominant models exist:

RDF (Resource Description Framework)

The semantic web standard. Everything is a triple: subject → predicate → object.

@prefix ex: <http://example.org/> .
ex:Paris ex:capitalOf ex:France .
ex:France ex:hasPopulation 67M .

Strengths: Web-native, standardised (W3C), linked data principles
Query language: SPARQL
Best for: Data integration, open data, cross-domain linking

Property Graph Model

The model used by Neo4j, Amazon Neptune, and ArangoDB:

CREATE (p:City {name: "Paris", population: 2100000})
CREATE (f:Country {name: "France", population: 67000000})
CREATE (p)-[:CAPITAL_OF]->(f)

Strengths: Intuitive, performant for graph traversals, flexible schema
Query language: Cypher (Neo4j), Gremlin (Apache TinkerPop)
Best for: Transactional applications, real-time recommendations, fraud detection

Why Knowledge Graphs Matter for AI

Structured Knowledge for LLMs

Large language models are stateless pattern matchers. They don't know things — they predict tokens. A knowledge graph provides:

Factual grounding: Graph queries return verified facts, not probabilistic completions
Relationship traversal: Answer multi-hop questions ("Which employees of Company X worked on projects using Neo4j?")
Consistency: The same fact queried twice returns the same result

GraphRAG

Combining retrieval-augmented generation (RAG) with knowledge graphs produces GraphRAG — a pattern where:

User query is parsed into a graph query
The graph returns structured sub-graph results
Results are formatted as context for the LLM prompt
LLM generates a response grounded in the graph data

This is the subject of its own article, but the key insight is: graphs give LLMs a reliable memory.

Getting Started

The easiest way to start with knowledge graphs:

Install Neo4j: docker run --publish=7474:7474 --publish=7687:7687 neo4j
Learn Cypher: The MATCH statement is 80% of what you need
Model a small domain: Your music library, a project dependency tree, or a customer journey
Connect it to an LLM: Use the Neo4j Python driver to query your graph and feed results to any LLM

A knowledge graph is not a silver bullet. For simple CRUD apps, PostgreSQL is the right tool. But when your data's value comes from how things connect, a knowledge graph becomes indispensable.