Volcano Plot in Matplotlib — Complete Guide
You will learn how to create a volcano plot for differential expression analysis using Matplotlib.
The Problem
Vizualizarea genelor diferential exprimate necesita un grafic care combina magnitudinea schimbarii (fold change) cu semnificatia statistica (p-value). Un volcano plot incorect construit mascheaza genele relevante.
The Wrong Way
Trasarea punctelor fara a marca genele semnificative si fara praguri clare:
import matplotlib.pyplot as plt
import numpy as np
log2fc = np.random.normal(0, 1.5, 5000)
pval = -np.log10(np.random.uniform(0, 1, 5000))
plt.scatter(log2fc, pval, s=5)
plt.show()
Ce se intampla: Nu se disting genele up-regulate de cele down-regulate, iar pragurile de semnificatie lipsesc.
The Right Way
Adauga praguri de fold change si p-value, si coloreaza genele semnificative:
log2fc = np.random.normal(0, 1.5, 5000)
pval = -np.log10(np.random.uniform(0, 1, 5000))
colors = ['grey'] * len(log2fc)
for i in range(len(log2fc)):
if abs(log2fc[i]) > 1 and pval[i] > -np.log10(0.05):
colors[i] = 'red' if log2fc[i] > 0 else 'blue'
plt.figure(figsize=(8, 6))
plt.scatter(log2fc, pval, c=colors, s=8, alpha=0.6)
plt.axvline(-1, color='grey', linestyle='--', linewidth=0.8)
plt.axvline(1, color='grey', linestyle='--', linewidth=0.8)
plt.axhline(-np.log10(0.05), color='grey', linestyle='--', linewidth=0.8)
plt.xlabel('log2 Fold Change'), plt.ylabel('-log10(p-value)')
plt.title('Volcano Plot - Differential Expression')
plt.tight_layout()
plt.show()
Rezultat asteptat: Genele up-regulate (rosu) si down-regulate (albastru) sunt clar evidentiate, cu praguri vizibile.
Step-by-Step Fix
1. Calculeaza fold change si p-value ajustat
Asigura-te ca folosesti p-value ajustat (FDR/Benjamini-Hochberg), nu raw p-value.
2. Defineste criteriile de semnificatie
fc_threshold = 1 # |log2FC| > 1
pval_threshold = 0.05 # p-adjusted < 0.05
3. Coloreaza si adnota genele
for i, (fc, p) in enumerate(zip(log2fc, neg_log_pval)):
if abs(fc) > fc_threshold and p > -np.log10(pval_threshold):
if fc > 0:
plt.annotate(gene_names[i], (fc, p),
fontsize=6, alpha=0.7)
Prevention Tips
- Foloseste p-value ajustat (FDR) nu raw p-value pentru semnificatie
- Defineste clar pragurile de fold change si p-value in legenda
- Adnotaza top gene (cele mai semnificative) pentru a oferi context biologic
- Foloseste culori distincte pentru up/down-regulate si gene nesemnificative
- Adauga liniile de prag cu stil intrerupt pentru claritate
Common Mistakes
- Foloseste raw p-value in loc de p-value ajustat -- multe fals-pozitive
- Nu distinge intre up si down -- diferenta biologica se pierde
- Adnoteaza prea multe gene -- graficul devine ilizibil
- Praguri arbitrare fara justificare biologica
- Nu raporteaza numarul de gene semnificative in fiecare directie
Practice Exercise
Analizeaza un set de date RNA-seq, identifica genele diferential exprimate (|log2FC| > 1, p-adj < 0.05) si creaza volcano plot-ul corespunzator.
FAQ
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. DodaTech tools integrate seamlessly with Python Data Science workflows for enhanced productivity and security.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro