Vector Databases — Complete Guide with Chroma & Python

Q: How do I update or delete documents in Chroma?

Use `collection.update(ids, documents, metadatas)` to modify existing documents and `collection.delete(ids)` to remove them. Chroma also supports `collection.upsert` which inserts documents or updates them if the IDs already exist. You cannot modify the embedding function of an existing collection.

DodaTech 11 min read

In this tutorial, you'll learn about Vector Databases. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Vector databases store and search high-dimensional vector embeddings, enabling similarity search at scale for AI applications. Chroma is an open-source embedding database that runs locally with zero external dependencies, making it the easiest entry point for learning vector search.

What You'll Learn

In this tutorial, you'll learn how vector databases work, how to use ChromaDB with Python to store and query embeddings, how to build a RAG pipeline with LangChain, and how Chroma compares to other vector databases.

Why It Matters

Traditional databases search by exact match or keyword. Vector databases search by semantic meaning. They find items most similar to a query even when no keywords match. This capability powers semantic search, recommendation systems, anomaly detection, and RAG pipelines that retrieve context for LLMs.

Real-World Use

Durga Antivirus Pro uses a vector database to store file behavior embeddings. When a new file is scanned, its behavior vector is compared against known malware and benign file embeddings. Files whose vectors are closest to malware clusters are flagged, catching zero-day threats without signature updates.

How Embeddings Become Searchable

The following diagram shows how raw text becomes searchable through a vector database pipeline:

flowchart TD
  A[Raw Text] --> B[Embedding Model]
  B --> C[Vector Embeddings]
  C --> D[ChromaDB Collection]
  E[User Query] --> F[Embed Query]
  F --> G[Similarity Search]
  D --> G
  G --> H[Top-K Results]
  H --> I[RAG / Semantic Search App]

What Are Vector Databases?

A vector database is a specialized database designed to store, index, and search vector embeddings. Unlike traditional databases that compare rows by exact field matches, vector databases compare vectors by distance metrics.

How Embeddings Work

An embedding model converts text (or images, audio) into a list of floating-point numbers. Similar content produces vectors that are close together in the embedding space. The most common distance metric is cosine similarity:

cosine_similarity(A, B) = (A . B) / (||A|| * ||B||)

Values range from -1 (opposite) to 1 (identical). For text embeddings, values above 0.7 typically indicate strong semantic similarity.

Use Cases

Semantic search: Find documents by meaning, not keywords
RAG pipelines: Retrieve context for LLM question answering
Recommendation systems: Find similar products or content
Anomaly detection: Flag vectors far from all known clusters
Deduplication: Identify near-duplicate documents
Clustering: Group similar items without labels

Understanding embeddings is essential groundwork. See the Machine Learning hub for the full learning path and the OpenAI API Guide for using commercial embedding models.

ChromaDB Overview and Installation

Chroma is an open-source vector database written in Python. It runs in-process as a library, connects to a local server, or deploys to production with a client-server architecture.

Install Chroma using pip:

pip install chromadb

Chroma supports three client modes:

import chromadb

# Ephemeral (in-memory, no persistence)
client = chromadb.Client()

# Persistent (saves to disk)
client = chromadb.PersistentClient(path="./chroma_data")

# HTTP client (connects to a remote server)
client = chromadb.HttpClient(host="localhost", port=8000)

Creating Collections and Adding Documents

A collection in Chroma is like a table in SQL. It stores documents, their embeddings, and optional metadata. You can let Chroma handle embedding automatically using a built-in embedding function, or provide precomputed embeddings.

import chromadb
from chromadb.utils import embedding_functions

client = chromadb.PersistentClient(path="./chroma_data")

sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

collection = client.create_collection(
    name="documents",
    embedding_function=sentence_transformer_ef,
    metadata={"hnsw:space": "cosine"}
)

collection.add(
    documents=[
        "Vector databases store embeddings for fast similarity search.",
        "Chroma is an open-source embedding database written in Python.",
        "Similarity search finds nearest neighbors in vector space.",
        "RAG retrieves relevant context for LLM generation.",
        "Metadata filtering narrows search results by structured fields.]
    ],
    metadatas=[
        {"topic": "vector-databases", "difficulty": "beginner"},
        {"topic": "tools", "difficulty": "intermediate"},
        {"topic": "vector-databases", "difficulty": "beginner"},
        {"topic": "rag", "difficulty": "advanced"},
        {"topic": "vector-databases", "difficulty": "intermediate"}
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"]
)

print(f"Collection count: {collection.count()}")

Expected output:

Collection count: 5

Querying with Similarity Search

Chroma returns the most similar documents by cosine similarity by default. You can control how many results to return and whether to include distances.

results = collection.query(
    query_texts=["How do I store and search vectors?"],
    n_results=3
)

print("Query results:")
for i, (doc, dist, meta) in enumerate(zip(
    results["documents"][0],
    results["distances"][0],
    results["metadatas"][0]
)):
    print(f"  {i+1}. [{meta['topic']}] (dist: {dist:.4f}) {doc}")

Expected output:

Query results:
  1. [vector-databases] (dist: 0.6532) Vector databases store embeddings for fast similarity search.
  2. [vector-databases] (dist: 0.5891) Similarity search finds nearest neighbors in vector space.
  3. [vector-databases] (dist: 0.5210) Metadata filtering narrows search results by structured fields.

Metadata Filtering

Chroma supports filtering by metadata fields before performing similarity search. This is useful for scoping searches to specific categories, date ranges, or sources.

filtered_results = collection.query(
    query_texts=["How do I find similar content?"],
    n_results=2,
    where={"difficulty": "beginner"}
)

print("Filtered results (beginner only):")
for doc in filtered_results["documents"][0]:
    print(f"  - {doc}")

filtered_results_advanced = collection.query(
    query_texts=["How does RAG work?"],
    n_results=2,
    where={"topic": "rag"}
)

print("\nFiltered results (rag topic only):")
for doc in filtered_results_advanced["documents"][0]:
    print(f"  - {doc}")

Expected output:

Filtered results (beginner only):
  - Vector databases store embeddings for fast similarity search.
  - Similarity search finds nearest neighbors in vector space.

Filtered results (rag topic only):
  - RAG retrieves relevant context for LLM generation.

Integration with LangChain for RAG

Chroma integrates natively with LangChain as a vector store, making it easy to build RAG pipelines. The combination of Chroma for retrieval and API calls to an LLM for generation creates a complete RAG system. Chroma works with any LLM provider including the OpenAI API Guide and Anthropic Claude API for the generation step.

from langchain_chroma import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

texts = [
    "ChromaDB is an open-source vector database for AI applications.",
    "LangChain provides tools for building RAG pipelines with vector stores.",
    "RAG combines information retrieval with LLM text generation.",
    "Vector databases use ANN indexing for fast similarity search.",
    "Metadata filtering reduces the search space for relevant results.]
]

vector_store = Chroma.from_texts(
    texts=texts,
    embedding=embeddings,
    collection_name="langchain_docs",
    persist_directory="./chroma_langchain"
)

retriever = vector_store.as_retriever(search_kwargs={"k": 2})

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

response = qa_chain.invoke(
    {"query": "What is ChromaDB and how is it used with LangChain?"}
)

print(f"RAG Answer:\n{response['result']}")

Expected output:

RAG Answer:
ChromaDB is an open-source vector database designed for AI applications. It is used with LangChain by integrating as a vector store for RAG pipelines. LangChain creates embeddings from text, stores them in Chroma, and uses the vector store as a retriever to fetch relevant context for LLM generation.

Persistence and Deployment

Chroma supports multiple deployment modes. For production, use the HTTP client with a standalone Chroma server.

Start a Chroma server:

chroma run --path ./chroma_data --port 8000

Connect from your application:

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)

collection = client.get_or_create_collection(
    name="production_docs",
    embedding_function=embedding_functions.SentenceTransformerEmbeddingFunction()
)

collection.add(
    documents=["Production vector search deployed successfully."],
    ids=["prod1"]
)

results = collection.query(
    query_texts=["deployment status"],
    n_results=1
)

print(f"Result: {results['documents'][0][0]}")

Expected output:

Result: Production vector search deployed successfully.

For scaling, Chroma supports:

Persistence: Data saved to disk with PersistentClient
Horizontal scaling: Multiple clients connecting to a single server
Embedding functions: Bring your own model or use built-in wrappers for OpenAI, Cohere, and Sentence Transformers

Comparing Chroma with Pinecone, Weaviate, Qdrant

Feature	Chroma	Pinecone	Weaviate	Qdrant
Hosting	Local / embedded / server	Managed cloud only	Self-hosted / cloud	Self-hosted / cloud
Setup	pip install	API key + SDK	Docker Compose	Docker / binary
Persistence	PersistentClient	Automatic	Built-in	Built-in
Built-in embedding	Yes (plugins)	No (bring your own)	Yes (modules)	No (bring your own)
Metadata filtering	Yes	Yes	Yes	Yes
ANN algorithm	HNSW	HNSW	HNSW	HNSW
Free tier	Always free	Limited quota	Self-hosted free	Self-hosted free
Python SDK	Native	Native	Native	Native

Chroma is best for prototyping, learning, and local development. Pinecone suits Serverless cloud scale. Weaviate fits self-hosted production with built-in NLP modules. Qdrant offers Rust-based performance for high-throughput workloads.

Common Errors and Mistakes

Mistake	Why It Happens	How to Fix
Embedding dimension mismatch	Different models produce different dimensions	Use the same model for indexing and querying
Wrong distance metric	Cosine vs Euclidean produce different rankings	Match metric to your model (cosine is default in Chroma)
Collection not found	Querying before creating or after restart with ephemeral client	Use `PersistentClient` or `get_or_create_collection`
No embedding function set	Querying text without specifying how to embed	Pass `embedding_function` when creating the collection
Missing metadata filter values	Filtering by a field that does not exist on stored documents	Set metadata on every document during `add`

Practice Questions

What is the difference between Chroma's Client and PersistentClient?

Answer: Client() creates an ephemeral in-memory database that disappears when the process ends. PersistentClient(path) saves data to disk at the specified path so it survives restarts.

Why must the same embedding model be used for indexing and querying?

Answer: Different models produce embeddings in different vector spaces. Cosine similarity is only meaningful when vectors come from the same model. Using mismatched models returns meaningless results.

How does metadata filtering improve search quality?

Answer: Metadata filtering narrows the search space to only documents matching structured criteria (like topic or date) before computing similarity. This prevents irrelevant results from appearing in the top-K matches.

What is the role of an embedding function in Chroma?

Answer: The embedding function converts text into vector embeddings automatically. Chroma calls it when you add documents or query with text. It ensures consistent embedding generation without manual encode calls.

How does Chroma integrate with LangChain for RAG?

Answer: Chroma provides a LangChain-compatible vector store interface. You create a Chroma instance from texts, get a retriever, and pass it to a RetrievalQA chain. The chain retrieves relevant documents and feeds them to an LLM for generation.

Challenge

Build a multi-collection Chroma system for a technical documentation portal. Create separate collections for Python, LangChain, and API reference documents. Implement a query router that detects the topic of the user's question and queries the appropriate collection. Add metadata filtering for document version and difficulty level. Return results with source collection tags so users know where each result came from.

Real-World Task

Design a vector search backend for a cybersecurity threat intelligence platform. Ingestion pipeline embeds threat reports using Sentence Transformers and stores them in Chroma with metadata (threat type, severity, date, source). Implement a query endpoint that accepts free-text threat descriptions, filters by severity level, returns the top-5 most similar threats, and automatically tags retrieved reports for analyst review. Deploy with Chroma HTTP server behind a FastAPI wrapper.

FAQs

What is the difference between a vector database and a traditional database?

A traditional database searches by exact field matches using indexes on columns. A vector database searches by semantic similarity using ANN indexes on embedding vectors. Vector databases find the most similar items even when no field matches exactly. They are designed for the high-dimensional floating-point data produced by embedding models.

Can Chroma handle millions of vectors?

Chroma uses HNSW indexing and can handle millions of vectors on a single machine. For larger scale, use Pinecone (Serverless, billions of vectors) or deploy Weaviate or Qdrant on a cluster. Chroma is best suited for prototyping, small to medium datasets, and local development.

Do I need GPU to use Chroma with embedding models?

No. Chroma runs entirely on CPU. Sentence Transformers and other embedding models work efficiently on CPU for inference. GPU acceleration is optional and only speeds up embedding generation for large batches. Chroma itself does not use GPU for indexing or search.

How do I update or delete documents in Chroma?

Use collection.update(ids, documents, metadatas) to modify existing documents and collection.delete(ids) to remove them. Chroma also supports collection.upsert which inserts documents or updates them if the IDs already exist. You cannot modify the embedding function of an existing collection.

Is Chroma free for commercial use?

Yes. Chroma is open-source under the Apache 2.0 license. You can use it freely for commercial applications, modify it, and deploy it in production. There is no paid tier or usage limit. The HTTP server and client are also open source

Next Steps

Now that you understand vector databases with Chroma, build a complete RAG pipeline with the RAG Systems tutorial. Explore advanced LLM integration with the [LangChain Guide](/machine-learning/LangChain-guide/). Build autonomous applications with AI Agents.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous Time Series Forecasting with Machine Learning Next → ML Data Pipelines with Apache Airflow and Prefect

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Machine Learning