What are Embeddings? Vector Embeddings Explained

DodaTech 1 min read

In this tutorial, you'll learn about What are Embeddings? Vector Embeddings Explained. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Understand what embeddings are, how they represent meaning as numbers, and how to use them for search, clustering, and recommendations.

Why It Matters

Embeddings are the foundation of modern AI search, RAG, recommendation systems, and semantic understanding.

Real-World Use

Semantic search ("find similar documents"), RAG pipelines, recommendation engines, and anomaly detection.

What is an Embedding?

An embedding is a list of numbers (a vector) that represents the meaning of a piece of text, an image, or any data.

Words or sentences with similar meanings have similar vectors.

"cat"     → [0.2, 0.8, -0.3, 0.1, ...]  (128+ numbers)
"kitten"  → [0.3, 0.7, -0.2, 0.1, ...]  (nearby in vector space)
"car"     → [0.9, -0.1, 0.4, -0.5, ...]  (far from cat)

Creating Embeddings with OpenAI

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Machine learning is fascinating"
)

embedding = response.data[0].embedding
print(f"Dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Creating Embeddings with Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embedding = model.encode("Machine learning is fascinating")
print(f"Dimension: {len(embedding)}")

Comparing Similarity

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "I love programming in Python",
    "Python is a great programming language",
    "I enjoy eating pizza]
]
embeddings = model.encode(sentences)

sim_1_2 = cosine_similarity(embeddings[0], embeddings[1])
sim_1_3 = cosine_similarity(embeddings[0], embeddings[2])

print(f"Python vs Python: {sim_1_2:.2f}")  # High (0.85)
print(f"Python vs Pizza: {sim_1_3:.2f}")   # Low (0.60)

Applications

Use Case	How Embeddings Help
Semantic search	Find documents by meaning, not keywords
RAG	Retrieve relevant context for LLM
Clustering	Group similar documents automatically
Recommendations	"More like this" suggestions
Classification	Train classifiers on embedded text

← Previous Building a RAG Pipeline with LangChain — Complete Guide Next → Vector Databases Explained — Pinecone, Chroma, Weaviate

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Ai Ml