Skip to content

Python for Data Science — Complete Setup Guide

DodaTech 2 min read

In this tutorial, you'll learn about Python for Data Science. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Set up a complete Python Data Science environment — install Anaconda, Jupyter, and the essential libraries: pandas, NumPy, Matplotlib, and Seaborn.

Why It Matters

A properly configured environment eliminates "library not found" errors and lets you focus on data analysis instead of tooling.

Real-World Use

Starting a new data analysis project, following along with tutorials, or setting up a reproducible research environment.

Installation Options

# Download from https://www.anaconda.com/download
# Or install via command line:
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
bash Anaconda3-2024.10-1-Linux-x86_64.sh

# Verify
conda --version

Option 2: Miniconda (Lightweight)

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# Install data science packages
conda install pandas numpy matplotlib seaborn scikit-learn jupyter

Option 3: pip (If you already have Python)

pip install pandas numpy matplotlib seaborn scikit-learn jupyter

Creating a Data Science Environment

# Create and activate a dedicated environment
conda create -n datascience python=3.12
conda activate datascience

# Install core libraries
conda install pandas numpy matplotlib seaborn scikit-learn jupyter

# Verify installation
python -c "import pandas; import numpy; import matplotlib; print('All good!')"

Running Jupyter

# Start Jupyter Notebook
jupyter notebook

# Or Jupyter Lab (newer interface)
jupyter lab

Your First Notebook

Create a new notebook and run:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
print(f"matplotlib: {matplotlib.__version__}")
print(f"seaborn: {sns.__version__}")

# Expected output:
# pandas: 2.2.0
# numpy: 1.26.0
# matplotlib: 3.8.0
# seaborn: 0.13.0

Project Structure

data-science-project/
├── data/
│   ├── raw/         # Original, unmodified data
│   └── processed/   # Cleaned, transformed data
├── notebooks/       # Jupyter notebooks (.ipynb)
├── src/             # Python scripts (.py)
│   ├── load.py
│   ├── clean.py
│   └── visualize.py
├── outputs/         # Charts, reports
│   └── figures/
├── README.md
└── requirements.txt

Essential Libraries

Library Purpose Used For
pandas Data structures and analysis DataFrames, CSV, Excel
NumPy Numerical computing Arrays, math operations
Matplotlib Basic plotting Charts, histograms
Seaborn Statistical visualization Beautiful default plots
scikit-learn Machine Learning Models, preprocessing
scipy Scientific computing Statistics, optimization

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro