Build an AI Chatbot with RAG (Step-by-Step Guide)

Q: What file formats does PyPDFLoader support?

It handles standard PDFs with selectable text. For `.docx` or `.txt`, use `Docx2txtLoader` or `TextLoader` from ` LangChain _community.document_loaders`.

Q: Can I use a local model instead of OpenAI?

Yes. Use `OllamaEmbeddings` and `ChatOllama` from `langchain_ollama` with models like Llama 3 or Mistral running locally via Ollama.

Q: How do I update the index when documents change?

Re-run `ingest.py`. ChromaDB supports deleting documents by ID and re-adding. For production, implement incremental indexing with file hash tracking.

DodaTech Updated 2026-06-21 5 min read

In this tutorial, you'll learn about Build an AI Chatbot with RAG (Step. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Build an AI chatbot powered by Retrieval-Augmented Generation (RAG) using LangChain, OpenAI embeddings, and a Chroma vector database to answer questions from your own documents with source citations.

What You'll Build

You'll build a Python chatbot that reads PDF or text documents, chunks them into searchable pieces, stores them as vector embeddings in ChromaDB, and answers questions using OpenAI's GPT model. When a user asks a question, the system retrieves the most relevant document chunks and feeds them as context to the LLM. The result: answers grounded in your data, not generic training knowledge.

Why RAG Matters

A standard LLM knows only what it learned during training. If you ask about a private document, a recent product manual, or internal company policy, it guesses or says it doesn't know. RAG solves this by giving the model relevant context at query time. It's like giving a student open-book exam access instead of forcing them to memorize everything. At DodaTech, similar retrieval patterns power smart search across compressed archives in DodaZIP, letting users find files by content rather than filename alone.

Prerequisites

Python 3.9+ installed
An OpenAI API key (set as OPENAI_API_KEY environment variable)
Basic familiarity with large language models

Step 1: Project Setup

mkdir rag-chatbot
cd rag-chatbot
python -m venv venv
source venv/bin/activate
pip install langchain langchain-community langchain-openai chromadb pypdf

Project structure:

rag-chatbot/
├── ingest.py      # Load and chunk documents, store embeddings
├── query.py       # Ask questions with RAG retrieval
└── documents/     # Place your PDFs or text files here

Step 2: Ingest Documents

The ingestion pipeline loads files, splits them into overlapping chunks, generates embedding vectors, and stores them in ChromaDB.

# ingest.py
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
import os

loader = DirectoryLoader("documents", glob="**/*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", " "]
)
chunks = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db")
vectorstore.persist()

print(f"Ingested {len(chunks)} chunks from {len(documents)} documents")

Expected output:

Ingested 87 chunks from 3 documents

Why these settings? chunk_size=500 means each piece is roughly half a page, giving the LLM enough context without exceeding token limits. The chunk_overlap=50 ensures no information is lost at the boundaries between chunks.

Step 3: Query the RAG System

Now build the retrieval-and-generation pipeline that answers questions using your ingested documents.

# query.py
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

embeddings = OpenAIEmbeddings()
vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

while True:
    question = input("\nAsk a question (or 'quit'): ")
    if question.lower() == "quit":
        break
    result = qa_chain.invoke({"query": question})
    print(f"\nAnswer: {result['result']}")
    print(f"\nSources: {result['source_documents'][0].metadata['source']}")

Expected output:

Ask a question (or 'quit'): What is the refund policy?
Answer: According to section 3.1 of the policy document, refunds are processed within 5-7 business days for all products purchased directly through the company website.
Sources: documents/company-policy.pdf

Architecture

flowchart LR
    A[PDF Documents] --> B[Document Loader]
    B --> C[Text Splitter]
    C --> D[Embedding Model]
    D --> E[Chroma Vector Store]
    F[User Question] --> G[Embedding]
    G --> H[Similarity Search]
    H --> I[Relevant Chunks]
    I --> J[LLM + Context]
    J --> K[Grounded Answer]

Common Errors

1. No module named 'LangChain_community' You installed only <a href="/ai-frameworks-apis/langchain/">LangChain</a> but not the community extras. Run pip install <a href="/ai-frameworks-apis/langchain/">LangChain</a>-community to get document loaders and vector store integrations.

2. OpenAI rate limit errors The free tier has low requests-per-minute. Add a delay between queries or upgrade your API tier. For testing, use gpt-4o-mini instead of gpt-4o — it's faster and cheaper.

3. Empty results from retrievals Your documents may not contain text relevant to the question. Check that the PDFs have selectable text (not scanned images). For scanned PDFs, add OCR with pytesseract before ingestion.

4. ChromaDB persistence warning If you see "No such collection" error, the persist_directory path changed between ingest and query. Always use the same directory. Alternatively, use the same Chroma object without closing the Process.

5. Token limit exceeded with large chunks If chunks are too large, the "stuff" chain type may exceed the context window. Reduce chunk_size to 300 or switch to map_reduce chain type which processes chunks individually.

Practice Questions

1. What does RAG stand for and why is it useful? Retrieval-Augmented Generation. It combines document retrieval with text generation so the LLM answers based on your data instead of its training data alone.

2. Why do we split documents into chunks instead of sending the whole document? Documents can exceed the LLM's context window. Chunking also lets the retriever find the most relevant section instead of burying it in a massive document.

3. What is the role of embeddings in this system? Embeddings convert text into numeric vectors. Documents with similar meaning produce similar vectors, so the retriever can find the most relevant chunks by measuring vector distance.

4. Challenge: Add a relevance score Modify the query script to print the similarity score for each retrieved chunk. Chroma returns a distance attribute — lower values mean higher relevance. Display the score next to each source.

5. Challenge: Multi-document summarization Write a script that retrieves all chunks across all documents on a broad topic (like "pricing") and asks the LLM to generate a summary table comparing pricing across products.

FAQ

What file formats does PyPDFLoader support?

It handles standard PDFs with selectable text. For .docx or .txt, use Docx2txtLoader or TextLoader from <a href="/ai-frameworks-apis/langchain/">LangChain</a>_community.document_loaders.

Can I use a local model instead of OpenAI?

Yes. Use OllamaEmbeddings and ChatOllama from langchain_ollama with models like Llama 3 or Mistral running locally via Ollama.

How do I update the index when documents change?

Re-run ingest.py. ChromaDB supports deleting documents by ID and re-adding. For production, implement incremental indexing with file hash tracking.

Next Steps

Add web UI with FastAPI and a chat interface
Explore vector databases like Pinecone or Qdrant for production scale
Build the PDF to Audio Converter project for another document processing use case
Learn about evaluation metrics for RAG systems in the LLM evaluation guide

← Previous Build a Package Manager in Rust (Step-by-Step Guide) Next → Build a Real-Time Polling App with WebSockets (Step by Step)

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Build Projects More