Python for Data Science — Complete Setup Guide
DodaTech
2 min read
In this tutorial, you'll learn about Python for Data Science. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Set up a complete Python Data Science environment — install Anaconda, Jupyter, and the essential libraries: pandas, NumPy, Matplotlib, and Seaborn.
Why It Matters
A properly configured environment eliminates "library not found" errors and lets you focus on data analysis instead of tooling.
Real-World Use
Starting a new data analysis project, following along with tutorials, or setting up a reproducible research environment.
Installation Options
Option 1: Anaconda (Recommended for beginners)
# Download from https://www.anaconda.com/download
# Or install via command line:
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
bash Anaconda3-2024.10-1-Linux-x86_64.sh
# Verify
conda --version
Option 2: Miniconda (Lightweight)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Install data science packages
conda install pandas numpy matplotlib seaborn scikit-learn jupyter
Option 3: pip (If you already have Python)
pip install pandas numpy matplotlib seaborn scikit-learn jupyter
Creating a Data Science Environment
# Create and activate a dedicated environment
conda create -n datascience python=3.12
conda activate datascience
# Install core libraries
conda install pandas numpy matplotlib seaborn scikit-learn jupyter
# Verify installation
python -c "import pandas; import numpy; import matplotlib; print('All good!')"
Running Jupyter
# Start Jupyter Notebook
jupyter notebook
# Or Jupyter Lab (newer interface)
jupyter lab
Your First Notebook
Create a new notebook and run:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
print(f"pandas: {pd.__version__}")
print(f"numpy: {np.__version__}")
print(f"matplotlib: {matplotlib.__version__}")
print(f"seaborn: {sns.__version__}")
# Expected output:
# pandas: 2.2.0
# numpy: 1.26.0
# matplotlib: 3.8.0
# seaborn: 0.13.0
Project Structure
data-science-project/
├── data/
│ ├── raw/ # Original, unmodified data
│ └── processed/ # Cleaned, transformed data
├── notebooks/ # Jupyter notebooks (.ipynb)
├── src/ # Python scripts (.py)
│ ├── load.py
│ ├── clean.py
│ └── visualize.py
├── outputs/ # Charts, reports
│ └── figures/
├── README.md
└── requirements.txt
Essential Libraries
| Library | Purpose | Used For |
|---|---|---|
| pandas | Data structures and analysis | DataFrames, CSV, Excel |
| NumPy | Numerical computing | Arrays, math operations |
| Matplotlib | Basic plotting | Charts, histograms |
| Seaborn | Statistical visualization | Beautiful default plots |
| scikit-learn | Machine Learning | Models, preprocessing |
| scipy | Scientific computing | Statistics, optimization |
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro