GraphRAG vs. Vanilla RAG: Structure, Signal, and the End of Hallucinations

August 28, 2025 by

ak in Uncategorized

TL;DR: Vanilla RAG (vector search → stuff context → generate) is great for “find and summarize.” It breaks on multi-hop questions, entity-heavy domains, and anything that needs joins, constraints, or causality. GraphRAG adds an explicit knowledge graph—entities, relationships, events—so the model retrieves structured facts first, then uses text passages as evidence. The result: higher factuality, controllable reasoning, and answers you can audit.

1) Quick refresher: what Vanilla RAG solves—and where it fails

Vanilla RAG pipeline

Chunk documents → embed chunks
Query → embed → k-NN retrieve top-k chunks
LLM reads chunks and drafts answer (+ citations)

Common failure modes

Lost structure: “Is Alice Bob’s manager’s manager?” requires relational hops, not keyword similarity.
Context collision: top-k includes near-duplicates or contradictory snippets; the LLM “averages” them.
Semantic drift: embeddings pull in thematically similar but factually irrelevant text (high recall, low precision).
No constraints: “only EU orders last 30 days” is temporal + geo filtering—hard to enforce with plain vectors.

2) What is GraphRAG?

GraphRAG augments RAG with a knowledge graph (KG): nodes (entities/events) + edges (relationships). Retrieval becomes two-stage:

Structured retrieval: Use the KG to answer who/what/when/where/how-related via graph queries (e.g., Cypher/Gremlin/SQL-graph).
Evidence expansion: For the nodes/edges returned, fetch the most relevant passages (vectors/BM25) for citations and wording.
Compose: LLM assembles a grounded answer using the graph facts as the spine and passages as evidence.

Why it works

Graphs preserve relationships, constraints, and time.
You get multi-hop reasoning for free (“customer → orders → returns → root cause”).
The LLM becomes a narrator, not a database.

3) When GraphRAG decisively beats Vanilla RAG

Entity-dense domains: customers, SKUs, suppliers, contracts, policies.
Multi-hop questions: “Which campaigns drove first-order profit for customers who later churned?”
Constraint queries: time windows, geo/regulatory filters, role-based scopes.
Why/how questions requiring event chains: incident → mitigation → outcome.
De-duplication & canonical truth: unify aliases (e.g., RP, Report Pundit, ReportPundit).

4) Architecture blueprint

Ingestion

NER & linking: extract entities (people, SKUs, tickets, policies), link to canonical IDs.
Relation/event extraction: “Order123 → returned_due_to → ‘wrong size’ (2025-07-10)”.
Dedup & merge: consolidate variants, attach provenance (doc, line, timestamp).

Storage

Graph DB (Neo4j/Neptune/Arango/etc.) for facts
Vector index for passages (+ BM25 for sparse hits)

Query orchestration

Intent classifier → graph mode vs. text mode
If graph mode: run graph query → get nodes/edges
Expand neighborhood to collect supportive docs
Rank with hybrid (dense + sparse + recency)
Generate with grounding schema (facts table + citations)

Observability

Store traces: graph query, node set, evidence, final answer
Metrics: multi-hop precision, grounding rate, contradicting-evidence flags

5) Data model starter kit (ecommerce-flavored)

Entities (nodes): Customer, Order, LineItem, SKU, Ticket, Campaign, Policy, Region, Supplier
Relationships (edges):

Customer-PLACED->Order (t)
Order-HAS_ITEM->LineItem → LineItem-OF->SKU
Order-RETURNED_FOR->Reason (t)
Ticket-RELATES_TO->Order/SKU
Campaign-ATTRIBUTED->Order (model, confidence)
SKU-SUPPLIED_BY->Supplier
Order-IN_REGION->Region

Properties: timestamps, numeric values (price, margin, return_rate), booleans (is_promo), lists (tags), and provenance (doc_id, URL, paragraph_id).

6) Example: complex question decompose

Question
“Which SKUs had the highest refund rate in the EU in the last 30 days, and what root causes appear most often in support tickets? Provide evidence.”

Plan

Graph query → eligible orders in EU, 30-day window → returns by SKU
Aggregate refund rate by SKU, pick top 5
Traverse to related tickets → cluster root causes
Pull supporting passages (policy, common ticket notes)
Generate ranked list with citations

Cypher sketch

MATCH (o:Order)-[:IN_REGION]->(:Region {code:"EU"})
WHERE o.date >= date() - duration('P30D')
MATCH (o)-[:HAS_ITEM]->(li:LineItem)-[:OF]->(s:SKU)
OPTIONAL MATCH (o)-[:RETURNED_FOR]->(r:Reason)
WITH s, count(o) AS orders, count(r) AS returns
WITH s, returns * 1.0 / orders AS refund_rate
ORDER BY refund_rate DESC LIMIT 5
MATCH (t:Ticket)-[:RELATES_TO]->(s)
WITH s, refund_rate, collect(t.summary)[0..200] AS ticket_summaries
RETURN s.sku_id AS sku, refund_rate, ticket_summaries

Evidence expansion

For each sku, fetch top passages mentioning return reasons/policies via vector + BM25 constrained to sku_id and last 90 days.

Answer composition guardrails

Output table (SKU, Refund Rate, Top Causes, Citations)
Disallow claims without ≥2 supporting passages
Include confidence based on coverage of orders vs. tickets

7) Prompts that keep answers grounded

System:
“You are a factual analyst. Use ONLY the FACTS table to make claims. Use EVIDENCE passages for wording and quotes. If a claim is not in FACTS, say ‘insufficient data.’ Return a table + bullet summary. Always include citations.”

Inputs to the model

FACTS: compact JSON from graph query (aggregates, IDs)
EVIDENCE: 5–12 short passages (source, snippet, timestamp)
POLICY: redaction rules (PII), allowed scopes (region, date)

8) Evaluation: what to measure (beyond BLEU)

Grounding rate: % sentences backed by a fact or evidence passage
Contradiction rate: % outputs contradicted by retrieved evidence
Multi-hop precision@k: correctness on 2+ hop queries
Coverage: % of relevant entities touched in the graph result
Temporal accuracy: correct application of date filters
Edit distance / CSR time saved: human edits to publish
Answer reproducibility: same FACTS → same conclusion

Create a golden set of 100–300 questions with canonical answers + graph queries + acceptable evidence ranges.

9) Cost & latency playbook

Cache neighborhoods: precompute high-demand subgraphs (top SKUs, VIP customers).
Two-tier retrieval: fast graph filter → lazy evidence fetch.
Vector budget: cap to 6–10 passages; prefer shorter, denser chunks.
Pre-summaries: store nightly rollups (e.g., refund leaders) for sub-100ms answers.
SLM default: use a small model to compose grounded answers; escalate to LLM only for ambiguous or sparse evidence cases.

10) 30-60-90 migration plan

Days 0–30 — MVP GraphRAG

Pick 1–2 high-value question families (refunds, compliance, SLA breaches).
Design a minimal ontology (entities/edges/properties).
Ingest 3–5 core sources; build entity linker; attach provenance.
Wire graph → evidence → grounded generation for one report.

Days 31–60 — Hardening

Add temporal & geo constraints; implement RBAC scopes.
Build dashboards: grounding rate, contradiction rate, latency, cost.
Introduce critic model to veto ungrounded claims.

Days 61–90 — Scale & catalog

Expand ontology (campaigns, suppliers, incidents).
Add job to refresh graph nightly; backfill historical events.
Create an agent entrypoint (“Why did returns spike for SKU-X?”) that uses GraphRAG under the hood.
Templatize queries and prompts; publish an internal GraphRAG cookbook.

11) Pitfalls & anti-patterns

Ontology bloat: start lean; add edges only when a real question needs them.
No provenance: every node/edge must carry source + timestamp—non-negotiable.
Over-chunking docs: too many near-duplicates overwhelm ranking.
Letting the LLM “infer” facts: the graph is the source of truth; the LLM narrates.
Ignoring freshness: stale graphs destroy trust; schedule updates and mark recency in answers.

12) Quick decision scorecard

Situation	Vanilla RAG	GraphRAG
Single-hop Q&A / FAQ	✅	–
Policy lookup with simple filters	✅	–
Multi-hop joins (who→what→why)	–	✅
Strict time/region constraints	⚠️	✅
Entity resolution & dedupe	–	✅
Auditability & provenance	⚠️	✅

(✅ = strong fit, ⚠️ = workable with care, – = not ideal)

The bottom line

If your questions look like joins with rules, you’ve outgrown Vanilla RAG. Adopt GraphRAG: let the graph deliver structured truth, let vectors bring the right words, and let the model compose with receipts. You’ll cut hallucinations, speed up expert workflows, and—most importantly—ship answers your auditors (and customers) can trust.

Starter checklist

Define a minimal ontology tied to 2–3 real questions
Build NER + entity linking with canonical IDs & provenance
Stand up a graph store + vector index; wire a two-stage retriever
Add grounding-aware prompts + a critic to block unsupported claims
Track grounding, contradiction, and temporal accuracy on a golden set

[email protected]

+420257325117

Blog