- Published on
Transforming Data into Insights: Building a GraphRAG Pipeline with LangChain, Neo4j, and Qdrant
- Authors
- Name
- Abdelfettah Latrache
- @AbdelVA
Transforming Data into Insights: Building a GraphRAG Pipeline with LangChain, Neo4j, and Qdrant
Connecting external knowledge to large language models (LLMs) and retrieving it efficiently is a significant challenge for developers and data scientists. Integrating structured and unstructured data into AI workflows often requires navigating multiple tools, complex pipelines, and time-consuming processes.
In this guide, we demonstrate how to combine LangChain—a framework for building LLM-powered applications—with Neo4j for structured knowledge graphs and Qdrant for vector-based retrieval. By the end, you’ll see how to transform raw text into a knowledge graph and query it for actionable insights.
RAG – A Quick Recap
Retrieval-Augmented Generation (RAG) enhances LLMs by integrating external knowledge sources during inference. It converts data into vector representations and uses vector databases for retrieval. Key benefits include:
- Domain-Specific Integration: Connect specialized data to LLMs.
- Cost Reduction: Focus computation on relevant data.
- Enhanced Accuracy: Augment the LLM’s inherent knowledge.
However, classic RAG approaches can struggle when data spans multiple documents or when relationships need to be inferred. A graph-based approach (GraphRAG) overcomes these limitations by offering a global view of interconnected data.
Step-by-Step Guide to GraphRAG with LangChain, Neo4j, and Qdrant
1. Setting Up the Environment
Install the required dependencies:
pip install langchain neo4j openai qdrant-client
Set your OpenAI API key:
export OPENAI_API_KEY="your-openai-key-here"
Ensure Neo4j and Qdrant instances are running.
2. Preparing the Dataset
const documents = [
{
id: "doc1",
text: "Jessica Miller, Experienced Sales Manager with a strong track record in driving sales growth and building high-performing teams."
},
{
id: "doc2",
text: "David Thompson, Creative Graphic Designer with over 8 years of experience in visual design and branding."
}
];
3. Connecting to Neo4j
from neo4j import GraphDatabase
NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
4. Extracting Entities with LangChain
import json
from langchain import PromptTemplate, LLMChain
from langchain.llms import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)
entity_extraction_template = PromptTemplate(
input_variables=["text"],
template=(
"Extract the person name and their profession from the following text:\n\n"
"{text}\n\n"
"Return the answer as JSON with keys 'name' and 'profession'. If no person is found, return an empty JSON object."
)
)
entity_extraction_chain = LLMChain(llm=llm, prompt=entity_extraction_template)
5. Ingesting Data into Neo4j
def add_person(tx, name, profession):
query = """
MERGE (p:Person {name: $name})
SET p.profession = $profession
"""
tx.run(query, name=name, profession=profession)
6. Performing Searches on the Knowledge Graph
def get_persons(tx):
query = "MATCH (p:Person) RETURN p.name AS name"
result = tx.run(query)
return [record["name"] for record in result]
7. Comparing with a Traditional RAG Approach Using Qdrant
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Qdrant
from langchain.docstore.document import Document as LC_Document
from langchain.chains import RetrievalQA
lc_documents = [
LC_Document(page_content=doc["text"], metadata={"id": doc["id"]})
for doc in documents
]
embeddings = OpenAIEmbeddings()
vectorstore = Qdrant.from_documents(
lc_documents,
embeddings,
collection_name="document_collection",
url="http://localhost:6333"
)
retriever = vectorstore.as_retriever()
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
rag_answer = qa_chain.run("Tell me who are the people mentioned?")
print("\nAnswer based on RAG:", rag_answer)
Conclusion
By leveraging LangChain, Neo4j, and Qdrant, you can build a powerful GraphRAG pipeline that transforms raw textual data into a structured knowledge graph while supporting robust retrieval-augmented generation. This approach enhances AI pipelines by providing both global data context and precise, vector-based retrieval.
Try running this demo in your environment and see how GraphRAG can improve your AI-driven workflows!