What are Embeddings? Vector Embeddings Explained
In this tutorial, you'll learn about What are Embeddings? Vector Embeddings Explained. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Understand what embeddings are, how they represent meaning as numbers, and how to use them for search, clustering, and recommendations.
Why It Matters
Embeddings are the foundation of modern AI search, RAG, recommendation systems, and semantic understanding.
Real-World Use
Semantic search ("find similar documents"), RAG pipelines, recommendation engines, and anomaly detection.
What is an Embedding?
An embedding is a list of numbers (a vector) that represents the meaning of a piece of text, an image, or any data.
Words or sentences with similar meanings have similar vectors.
"cat" → [0.2, 0.8, -0.3, 0.1, ...] (128+ numbers)
"kitten" → [0.3, 0.7, -0.2, 0.1, ...] (nearby in vector space)
"car" → [0.9, -0.1, 0.4, -0.5, ...] (far from cat)
Creating Embeddings with OpenAI
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-small",
input="Machine learning is fascinating"
)
embedding = response.data[0].embedding
print(f"Dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
Creating Embeddings with Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
embedding = model.encode("Machine learning is fascinating")
print(f"Dimension: {len(embedding)}")
Comparing Similarity
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = [
"I love programming in Python",
"Python is a great programming language",
"I enjoy eating pizza]
]
embeddings = model.encode(sentences)
sim_1_2 = cosine_similarity(embeddings[0], embeddings[1])
sim_1_3 = cosine_similarity(embeddings[0], embeddings[2])
print(f"Python vs Python: {sim_1_2:.2f}") # High (0.85)
print(f"Python vs Pizza: {sim_1_3:.2f}") # Low (0.60)
Applications
| Use Case | How Embeddings Help |
|---|---|
| Semantic search | Find documents by meaning, not keywords |
| RAG | Retrieve relevant context for LLM |
| Clustering | Group similar documents automatically |
| Recommendations | "More like this" suggestions |
| Classification | Train classifiers on embedded text |
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro