Vector Databases — Complete Guide with Chroma & Python
In this tutorial, you'll learn about Vector Databases. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Vector databases store and search high-dimensional vector embeddings, enabling similarity search at scale for AI applications. Chroma is an open-source embedding database that runs locally with zero external dependencies, making it the easiest entry point for learning vector search.
What You'll Learn
In this tutorial, you'll learn how vector databases work, how to use ChromaDB with Python to store and query embeddings, how to build a RAG pipeline with LangChain, and how Chroma compares to other vector databases.
Why It Matters
Traditional databases search by exact match or keyword. Vector databases search by semantic meaning. They find items most similar to a query even when no keywords match. This capability powers semantic search, recommendation systems, anomaly detection, and RAG pipelines that retrieve context for LLMs.
Real-World Use
Durga Antivirus Pro uses a vector database to store file behavior embeddings. When a new file is scanned, its behavior vector is compared against known malware and benign file embeddings. Files whose vectors are closest to malware clusters are flagged, catching zero-day threats without signature updates.
How Embeddings Become Searchable
The following diagram shows how raw text becomes searchable through a vector database pipeline:
flowchart TD A[Raw Text] --> B[Embedding Model] B --> C[Vector Embeddings] C --> D[ChromaDB Collection] E[User Query] --> F[Embed Query] F --> G[Similarity Search] D --> G G --> H[Top-K Results] H --> I[RAG / Semantic Search App]
What Are Vector Databases?
A vector database is a specialized database designed to store, index, and search vector embeddings. Unlike traditional databases that compare rows by exact field matches, vector databases compare vectors by distance metrics.
How Embeddings Work
An embedding model converts text (or images, audio) into a list of floating-point numbers. Similar content produces vectors that are close together in the embedding space. The most common distance metric is cosine similarity:
cosine_similarity(A, B) = (A . B) / (||A|| * ||B||)
Values range from -1 (opposite) to 1 (identical). For text embeddings, values above 0.7 typically indicate strong semantic similarity.
Use Cases
- Semantic search: Find documents by meaning, not keywords
- RAG pipelines: Retrieve context for LLM question answering
- Recommendation systems: Find similar products or content
- Anomaly detection: Flag vectors far from all known clusters
- Deduplication: Identify near-duplicate documents
- Clustering: Group similar items without labels
Understanding embeddings is essential groundwork. See the Machine Learning hub for the full learning path and the OpenAI API Guide for using commercial embedding models.
ChromaDB Overview and Installation
Chroma is an open-source vector database written in Python. It runs in-process as a library, connects to a local server, or deploys to production with a client-server architecture.
Install Chroma using pip:
pip install chromadb
Chroma supports three client modes:
import chromadb
# Ephemeral (in-memory, no persistence)
client = chromadb.Client()
# Persistent (saves to disk)
client = chromadb.PersistentClient(path="./chroma_data")
# HTTP client (connects to a remote server)
client = chromadb.HttpClient(host="localhost", port=8000)
Creating Collections and Adding Documents
A collection in Chroma is like a table in SQL. It stores documents, their embeddings, and optional metadata. You can let Chroma handle embedding automatically using a built-in embedding function, or provide precomputed embeddings.
import chromadb
from chromadb.utils import embedding_functions
client = chromadb.PersistentClient(path="./chroma_data")
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
collection = client.create_collection(
name="documents",
embedding_function=sentence_transformer_ef,
metadata={"hnsw:space": "cosine"}
)
collection.add(
documents=[
"Vector databases store embeddings for fast similarity search.",
"Chroma is an open-source embedding database written in Python.",
"Similarity search finds nearest neighbors in vector space.",
"RAG retrieves relevant context for LLM generation.",
"Metadata filtering narrows search results by structured fields.]
],
metadatas=[
{"topic": "vector-databases", "difficulty": "beginner"},
{"topic": "tools", "difficulty": "intermediate"},
{"topic": "vector-databases", "difficulty": "beginner"},
{"topic": "rag", "difficulty": "advanced"},
{"topic": "vector-databases", "difficulty": "intermediate"}
],
ids=["doc1", "doc2", "doc3", "doc4", "doc5"]
)
print(f"Collection count: {collection.count()}")
Expected output:
Collection count: 5
Querying with Similarity Search
Chroma returns the most similar documents by cosine similarity by default. You can control how many results to return and whether to include distances.
results = collection.query(
query_texts=["How do I store and search vectors?"],
n_results=3
)
print("Query results:")
for i, (doc, dist, meta) in enumerate(zip(
results["documents"][0],
results["distances"][0],
results["metadatas"][0]
)):
print(f" {i+1}. [{meta['topic']}] (dist: {dist:.4f}) {doc}")
Expected output:
Query results:
1. [vector-databases] (dist: 0.6532) Vector databases store embeddings for fast similarity search.
2. [vector-databases] (dist: 0.5891) Similarity search finds nearest neighbors in vector space.
3. [vector-databases] (dist: 0.5210) Metadata filtering narrows search results by structured fields.
Metadata Filtering
Chroma supports filtering by metadata fields before performing similarity search. This is useful for scoping searches to specific categories, date ranges, or sources.
filtered_results = collection.query(
query_texts=["How do I find similar content?"],
n_results=2,
where={"difficulty": "beginner"}
)
print("Filtered results (beginner only):")
for doc in filtered_results["documents"][0]:
print(f" - {doc}")
filtered_results_advanced = collection.query(
query_texts=["How does RAG work?"],
n_results=2,
where={"topic": "rag"}
)
print("\nFiltered results (rag topic only):")
for doc in filtered_results_advanced["documents"][0]:
print(f" - {doc}")
Expected output:
Filtered results (beginner only):
- Vector databases store embeddings for fast similarity search.
- Similarity search finds nearest neighbors in vector space.
Filtered results (rag topic only):
- RAG retrieves relevant context for LLM generation.
Integration with LangChain for RAG
Chroma integrates natively with LangChain as a vector store, making it easy to build RAG pipelines. The combination of Chroma for retrieval and API calls to an LLM for generation creates a complete RAG system. Chroma works with any LLM provider including the OpenAI API Guide and Anthropic Claude API for the generation step.
from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
texts = [
"ChromaDB is an open-source vector database for AI applications.",
"LangChain provides tools for building RAG pipelines with vector stores.",
"RAG combines information retrieval with LLM text generation.",
"Vector databases use ANN indexing for fast similarity search.",
"Metadata filtering reduces the search space for relevant results.]
]
vector_store = Chroma.from_texts(
texts=texts,
embedding=embeddings,
collection_name="langchain_docs",
persist_directory="./chroma_langchain"
)
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever
)
response = qa_chain.invoke(
{"query": "What is ChromaDB and how is it used with LangChain?"}
)
print(f"RAG Answer:\n{response['result']}")
Expected output:
RAG Answer:
ChromaDB is an open-source vector database designed for AI applications. It is used with LangChain by integrating as a vector store for RAG pipelines. LangChain creates embeddings from text, stores them in Chroma, and uses the vector store as a retriever to fetch relevant context for LLM generation.
Persistence and Deployment
Chroma supports multiple deployment modes. For production, use the HTTP client with a standalone Chroma server.
Start a Chroma server:
chroma run --path ./chroma_data --port 8000
Connect from your application:
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.get_or_create_collection(
name="production_docs",
embedding_function=embedding_functions.SentenceTransformerEmbeddingFunction()
)
collection.add(
documents=["Production vector search deployed successfully."],
ids=["prod1"]
)
results = collection.query(
query_texts=["deployment status"],
n_results=1
)
print(f"Result: {results['documents'][0][0]}")
Expected output:
Result: Production vector search deployed successfully.
For scaling, Chroma supports:
- Persistence: Data saved to disk with
PersistentClient - Horizontal scaling: Multiple clients connecting to a single server
- Embedding functions: Bring your own model or use built-in wrappers for OpenAI, Cohere, and Sentence Transformers
Comparing Chroma with Pinecone, Weaviate, Qdrant
| Feature | Chroma | Pinecone | Weaviate | Qdrant |
|---|---|---|---|---|
| Hosting | Local / embedded / server | Managed cloud only | Self-hosted / cloud | Self-hosted / cloud |
| Setup | pip install | API key + SDK | Docker Compose | Docker / binary |
| Persistence | PersistentClient | Automatic | Built-in | Built-in |
| Built-in embedding | Yes (plugins) | No (bring your own) | Yes (modules) | No (bring your own) |
| Metadata filtering | Yes | Yes | Yes | Yes |
| ANN algorithm | HNSW | HNSW | HNSW | HNSW |
| Free tier | Always free | Limited quota | Self-hosted free | Self-hosted free |
| Python SDK | Native | Native | Native | Native |
Chroma is best for prototyping, learning, and local development. Pinecone suits Serverless cloud scale. Weaviate fits self-hosted production with built-in NLP modules. Qdrant offers Rust-based performance for high-throughput workloads.
Common Errors and Mistakes
| Mistake | Why It Happens | How to Fix |
|---|---|---|
| Embedding dimension mismatch | Different models produce different dimensions | Use the same model for indexing and querying |
| Wrong distance metric | Cosine vs Euclidean produce different rankings | Match metric to your model (cosine is default in Chroma) |
| Collection not found | Querying before creating or after restart with ephemeral client | Use PersistentClient or get_or_create_collection |
| No embedding function set | Querying text without specifying how to embed | Pass embedding_function when creating the collection |
| Missing metadata filter values | Filtering by a field that does not exist on stored documents | Set metadata on every document during add |
Practice Questions
- What is the difference between Chroma's
ClientandPersistentClient?
Answer: Client() creates an ephemeral in-memory database that disappears when the process ends. PersistentClient(path) saves data to disk at the specified path so it survives restarts.
- Why must the same embedding model be used for indexing and querying?
Answer: Different models produce embeddings in different vector spaces. Cosine similarity is only meaningful when vectors come from the same model. Using mismatched models returns meaningless results.
- How does metadata filtering improve search quality?
Answer: Metadata filtering narrows the search space to only documents matching structured criteria (like topic or date) before computing similarity. This prevents irrelevant results from appearing in the top-K matches.
- What is the role of an embedding function in Chroma?
Answer: The embedding function converts text into vector embeddings automatically. Chroma calls it when you add documents or query with text. It ensures consistent embedding generation without manual encode calls.
- How does Chroma integrate with LangChain for RAG?
Answer: Chroma provides a LangChain-compatible vector store interface. You create a Chroma instance from texts, get a retriever, and pass it to a RetrievalQA chain. The chain retrieves relevant documents and feeds them to an LLM for generation.
Challenge
Build a multi-collection Chroma system for a technical documentation portal. Create separate collections for Python, LangChain, and API reference documents. Implement a query router that detects the topic of the user's question and queries the appropriate collection. Add metadata filtering for document version and difficulty level. Return results with source collection tags so users know where each result came from.
Real-World Task
Design a vector search backend for a cybersecurity threat intelligence platform. Ingestion pipeline embeds threat reports using Sentence Transformers and stores them in Chroma with metadata (threat type, severity, date, source). Implement a query endpoint that accepts free-text threat descriptions, filters by severity level, returns the top-5 most similar threats, and automatically tags retrieved reports for analyst review. Deploy with Chroma HTTP server behind a FastAPI wrapper.
FAQs
Next Steps
Now that you understand vector databases with Chroma, build a complete RAG pipeline with the RAG Systems tutorial. Explore advanced LLM integration with the [LangChain Guide](/machine-learning/LangChain-guide/). Build autonomous applications with AI Agents.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro