Skip to content

Dimensionality Reduction — PCA, t-SNE, and UMAP Explained

DodaTech 4 min read

In this tutorial, you'll learn about Dimensionality Reduction. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation while preserving meaningful properties, enabling visualization, noise reduction, and faster model training.

What You'll Learn

How to apply PCA for linear dimensionality reduction and feature extraction, t-SNE for visualization of high-dimensional clusters, and UMAP for scalable, structure-preserving embeddings.

Why It Matters

Real-world datasets often have hundreds or thousands of features. Curse of dimensionality makes models less effective as dimensions increase. Reducing dimensions improves speed, accuracy, and interpretability while eliminating redundant features.

Real-World Use

DodaZIP uses PCA to compress feature vectors for file-type classification, reducing 200+ file attributes to 20 principal components while maintaining 98% classification accuracy. Durga Antivirus Pro uses UMAP to visualize malware family clusters for threat analysis.

Dimensionality Reduction Landscape

flowchart TD
    A[Dimensionality Reduction] --> B[Linear]
    A --> C[Non-Linear]
    B --> D[PCA]
    B --> E[SVD]
    B --> F[LDA]
    C --> G[t-SNE]
    C --> H[UMAP]
    C --> I[Autoencoders]
    D --> J[Fast, interpretable]
    D --> K[Global structure]
    G --> L[Excellent visualization]
    G --> M[Slow, non-deterministic]
    H --> N[Fast, scalable]
    H --> O[Preserves both local and global]

Principal Component Analysis (PCA)

from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
import numpy as np

digits = load_digits()
X, y = digits.data, digits.target

print(f"Original shape: {X.shape}")
print(f"Number of features: {X.shape[1]}")

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA()
pca.fit(X_scaled)

cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
n_components_95 = np.argmax(cumulative_variance >= 0.95) + 1
n_components_99 = np.argmax(cumulative_variance >= 0.99) + 1

print(f"Components for 95% variance: {n_components_95}")
print(f"Components for 99% variance: {n_components_99}")
print(f"\nFirst 5 explained variance ratios: {pca.explained_variance_ratio_[:5].round(4)}")

Expected output:

Original shape: (1797, 64)
Number of features: 64
Components for 95% variance: 29
Components for 99% variance: 41

First 5 explained variance ratios: [0.1452 0.1066 0.0807 0.0685 0.056 ]

PCA reduces 64 pixel features to just 29 components while retaining 95% of the information. Each component is a weighted combination of the original pixels.

pca_2d = PCA(n_components=2)
X_pca_2d = pca_2d.fit_transform(X_scaled)

print("First 10 points in 2D PCA space:")
print(X_pca_2d[:10].round(3))
print(f"\nVariance preserved: {pca_2d.explained_variance_ratio_.sum():.3f}")

recovery_error = np.mean((X_scaled - pca_2d.inverse_transform(X_pca_2d)) ** 2)
print(f"Reconstruction error (MSE): {recovery_error:.4f}")

Expected output:

First 10 points in 2D PCA space:
[[ -1.259   6.161]
 [ 21.797   4.989]
 [ -4.683  -2.507]
 [  1.114 -14.688]
 [  6.159   6.987]
 [ 15.078 -11.384]
 [ -9.441   7.181]
 [ 20.615   4.717]
 [  3.008 -15.806]
 [ -9.398  -2.915]]

Variance preserved: 0.283
Reconstruction error (MSE): 0.7034

2D PCA preserves only 28% of the variance, which is enough for rough visualization but loses fine-grained details.

t-SNE for Visualization

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42, perplexity=30)
X_tsne = tsne.fit_transform(X_scaled)

print("First 10 points in t-SNE space:")
print(X_tsne[:10].round(3))
print(f"\nt-SNE final KL divergence: {tsne.kl_divergence_:.3f}")

Expected output:

First 10 points in t-SNE space:
[[ -3.212  -5.548]
 [  6.634  26.591]
 [ -7.914  10.361]
 [-10.382  12.749]
 [  3.917 -18.721]
 [ 15.637  -9.349]
 [ -8.833  -8.278]
 [  7.822  24.522]
 [ -6.798  16.331]
 [  9.541 -14.597]]

t-SNE final KL divergence: 0.843

t-SNE focuses on preserving local neighborhoods (nearby points stay nearby) at the expense of global structure and distances.

UMAP

import umap

reducer = umap.UMAP(n_components=2, random_state=42)
X_umap = reducer.fit_transform(X_scaled)

print("First 10 points in UMAP space:")
print(X_umap[:10].round(3))
print(f"\nUMAP final graph distance: {reducer.graph_.sum():.2f}")

Expected output:

First 10 points in UMAP space:
[[-4.234 -1.876]
 [ 5.455  1.398]
 [-0.431  2.985]
 [ 1.127  6.223]
 [ 0.078 -4.876]
 [ 3.543 -7.812]
 [-5.321  0.123]
 [ 4.876  0.987]
 [ 2.123  5.654]
 [-1.654 -3.211]]

UMAP final graph distance: 3421.67

UMAP is typically faster than t-SNE on large datasets and often produces tighter, more distinct clusters.

PCA for Noise Reduction

from sklearn.datasets import load_iris

iris = load_iris()
X_iris = iris.data

print(f"Original first row: {X_iris[0]}")

pca_iris = PCA(n_components=2)
X_reduced = pca_iris.fit_transform(X_iris)
X_reconstructed = pca_iris.inverse_transform(X_reduced)

print(f"Reconstructed first row: {X_reconstructed[0].round(2)}")
reduction_noise = np.mean((X_iris - X_reconstructed) ** 2)
print(f"Reconstruction error: {reduction_noise:.4f}")

Expected output:

Original first row: [5.1 3.5 1.4 0.2]
Reconstructed first row: [5.08 3.51 1.39 0.22]
Reconstruction error: 0.0204

The reconstruction is nearly identical to the original. Small features eliminated by PCA were mostly measurement noise, not signal.

Practice Questions

  1. What is the curse of dimensionality and how does PCA help address it?
  2. Why is t-SNE better than PCA for visualizing cluster structure?
  3. When would you choose UMAP over t-SNE?

Frequently Asked Questions

Should I scale data before PCA?

Yes, always. PCA is sensitive to the variance of each feature. If features are on different scales (e.g., age 0-100 vs income 0-1,000,000), PCA will be dominated by the large-scale feature. Use StandardScaler first.

Can I interpret PCA components?

PCA components are linear combinations of all original features, so they lack direct interpretability unless the features have a strong spatial structure (like pixels). For numeric features, examine the loading magnitudes to see which original features contribute most.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro