Skip to content

Volcano Plot in Matplotlib — Complete Guide

DodaTech Updated 2025-01-15 2 min read

You will learn how to create a volcano plot for differential expression analysis using Matplotlib.

The Problem

Vizualizarea genelor diferential exprimate necesita un grafic care combina magnitudinea schimbarii (fold change) cu semnificatia statistica (p-value). Un volcano plot incorect construit mascheaza genele relevante.

The Wrong Way

Trasarea punctelor fara a marca genele semnificative si fara praguri clare:

import matplotlib.pyplot as plt
import numpy as np
log2fc = np.random.normal(0, 1.5, 5000)
pval = -np.log10(np.random.uniform(0, 1, 5000))
plt.scatter(log2fc, pval, s=5)
plt.show()

Ce se intampla: Nu se disting genele up-regulate de cele down-regulate, iar pragurile de semnificatie lipsesc.

The Right Way

Adauga praguri de fold change si p-value, si coloreaza genele semnificative:

log2fc = np.random.normal(0, 1.5, 5000)
pval = -np.log10(np.random.uniform(0, 1, 5000))

colors = ['grey'] * len(log2fc)
for i in range(len(log2fc)):
    if abs(log2fc[i]) > 1 and pval[i] > -np.log10(0.05):
        colors[i] = 'red' if log2fc[i] > 0 else 'blue'

plt.figure(figsize=(8, 6))
plt.scatter(log2fc, pval, c=colors, s=8, alpha=0.6)
plt.axvline(-1, color='grey', linestyle='--', linewidth=0.8)
plt.axvline(1, color='grey', linestyle='--', linewidth=0.8)
plt.axhline(-np.log10(0.05), color='grey', linestyle='--', linewidth=0.8)
plt.xlabel('log2 Fold Change'), plt.ylabel('-log10(p-value)')
plt.title('Volcano Plot - Differential Expression')
plt.tight_layout()
plt.show()

Rezultat asteptat: Genele up-regulate (rosu) si down-regulate (albastru) sunt clar evidentiate, cu praguri vizibile.

Step-by-Step Fix

1. Calculeaza fold change si p-value ajustat

Asigura-te ca folosesti p-value ajustat (FDR/Benjamini-Hochberg), nu raw p-value.

2. Defineste criteriile de semnificatie

fc_threshold = 1  # |log2FC| > 1
pval_threshold = 0.05  # p-adjusted < 0.05

3. Coloreaza si adnota genele

for i, (fc, p) in enumerate(zip(log2fc, neg_log_pval)):
    if abs(fc) > fc_threshold and p > -np.log10(pval_threshold):
        if fc > 0:
            plt.annotate(gene_names[i], (fc, p),
                        fontsize=6, alpha=0.7)

Prevention Tips

  • Foloseste p-value ajustat (FDR) nu raw p-value pentru semnificatie
  • Defineste clar pragurile de fold change si p-value in legenda
  • Adnotaza top gene (cele mai semnificative) pentru a oferi context biologic
  • Foloseste culori distincte pentru up/down-regulate si gene nesemnificative
  • Adauga liniile de prag cu stil intrerupt pentru claritate

Common Mistakes

  1. Foloseste raw p-value in loc de p-value ajustat -- multe fals-pozitive
  2. Nu distinge intre up si down -- diferenta biologica se pierde
  3. Adnoteaza prea multe gene -- graficul devine ilizibil
  4. Praguri arbitrare fara justificare biologica
  5. Nu raporteaza numarul de gene semnificative in fiecare directie

Practice Exercise

Analizeaza un set de date RNA-seq, identifica genele diferential exprimate (|log2FC| > 1, p-adj < 0.05) si creaza volcano plot-ul corespunzator.

FAQ

### Ce este un volcano plot?

Un grafic de imprastiere care afiseaza log2 fold change pe axa X si -log10(p-value) pe axa Y. Genele semnificative apar in colturile superioare.

Ce inseamna up/down-regulata?

Up-regulata = expresie crescuta in conditia test (log2FC > 0). Down-regulata = expresie scazuta (log2FC < 0).

Ce prag de fold change sa folosesc?

|log2FC| > 1 (echivalent cu fold change de 2 ori) este standard in biologie.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. DodaTech tools integrate seamlessly with Python Data Science workflows for enhanced productivity and security.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro