Bordered avatar
Abdel Latrache
Software Engineer, Machine learning engineer
Published on

Transforming Data into Insights: Building a GraphRAG Pipeline with LangChain, Neo4j, and Qdrant

Authors

Transforming Data into Insights: Building a GraphRAG Pipeline with LangChain, Neo4j, and Qdrant

Connecting external knowledge to large language models (LLMs) and retrieving it efficiently is a significant challenge for developers and data scientists. Integrating structured and unstructured data into AI workflows often requires navigating multiple tools, complex pipelines, and time-consuming processes.

In this guide, we demonstrate how to combine LangChain—a framework for building LLM-powered applications—with Neo4j for structured knowledge graphs and Qdrant for vector-based retrieval. By the end, you’ll see how to transform raw text into a knowledge graph and query it for actionable insights.

RAG – A Quick Recap

Retrieval-Augmented Generation (RAG) enhances LLMs by integrating external knowledge sources during inference. It converts data into vector representations and uses vector databases for retrieval. Key benefits include:

  • Domain-Specific Integration: Connect specialized data to LLMs.
  • Cost Reduction: Focus computation on relevant data.
  • Enhanced Accuracy: Augment the LLM’s inherent knowledge.

However, classic RAG approaches can struggle when data spans multiple documents or when relationships need to be inferred. A graph-based approach (GraphRAG) overcomes these limitations by offering a global view of interconnected data.

Step-by-Step Guide to GraphRAG with LangChain, Neo4j, and Qdrant

1. Setting Up the Environment

Install the required dependencies:

pip install langchain neo4j openai qdrant-client

Set your OpenAI API key:

export OPENAI_API_KEY="your-openai-key-here"

Ensure Neo4j and Qdrant instances are running.

2. Preparing the Dataset

const documents = [
    {
        id: "doc1",
        text: "Jessica Miller, Experienced Sales Manager with a strong track record in driving sales growth and building high-performing teams."
    },
    {
        id: "doc2",
        text: "David Thompson, Creative Graphic Designer with over 8 years of experience in visual design and branding."
    }
];

3. Connecting to Neo4j

from neo4j import GraphDatabase

NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))

4. Extracting Entities with LangChain

import json
from langchain import PromptTemplate, LLMChain
from langchain.llms import OpenAI

llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)

entity_extraction_template = PromptTemplate(
    input_variables=["text"],
    template=(
        "Extract the person name and their profession from the following text:\n\n"
        "{text}\n\n"
        "Return the answer as JSON with keys 'name' and 'profession'. If no person is found, return an empty JSON object."
    )
)

entity_extraction_chain = LLMChain(llm=llm, prompt=entity_extraction_template)

5. Ingesting Data into Neo4j

def add_person(tx, name, profession):
    query = """
    MERGE (p:Person {name: $name})
    SET p.profession = $profession
    """
    tx.run(query, name=name, profession=profession)

6. Performing Searches on the Knowledge Graph

def get_persons(tx):
    query = "MATCH (p:Person) RETURN p.name AS name"
    result = tx.run(query)
    return [record["name"] for record in result]

7. Comparing with a Traditional RAG Approach Using Qdrant

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Qdrant
from langchain.docstore.document import Document as LC_Document
from langchain.chains import RetrievalQA

lc_documents = [
    LC_Document(page_content=doc["text"], metadata={"id": doc["id"]})
    for doc in documents
]

embeddings = OpenAIEmbeddings()
vectorstore = Qdrant.from_documents(
    lc_documents,
    embeddings,
    collection_name="document_collection",
    url="http://localhost:6333"
)
retriever = vectorstore.as_retriever()
qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

rag_answer = qa_chain.run("Tell me who are the people mentioned?")
print("\nAnswer based on RAG:", rag_answer)

Conclusion

By leveraging LangChain, Neo4j, and Qdrant, you can build a powerful GraphRAG pipeline that transforms raw textual data into a structured knowledge graph while supporting robust retrieval-augmented generation. This approach enhances AI pipelines by providing both global data context and precise, vector-based retrieval.

Try running this demo in your environment and see how GraphRAG can improve your AI-driven workflows!

chatbot