Pandas Tutorial — DataFrames and Series Explained

DodaTech 3 min read

In this tutorial, you'll learn about Pandas Tutorial. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Use pandas to work with tabular data — create DataFrames and Series, load CSV files, select rows and columns, filter data, and perform basic operations.

Why It Matters

Pandas is the most widely used Python library for data analysis. If you work with data in Python, you'll use pandas 90% of the time.

Real-World Use

Loading a CSV of sales data, filtering for last month's transactions, calculating average order value, or grouping customers by region.

What is pandas?

pandas is a Python library for working with tabular data — data in rows and columns, like a spreadsheet or SQL table.

Series — One Column

A Series is a single column of data with an index:

import pandas as pd

# Create a Series from a list
temperatures = pd.Series([22, 25, 19, 30, 27],
    index=["Mon", "Tue", "Wed", "Thu", "Fri"])

print(temperatures)
# Mon    22
# Tue    25
# Wed    19
# Thu    30
# Fri    27
# dtype: int64

# Access by label
print(temperatures["Mon"])  # 22

# Access by position
print(temperatures.iloc[0])  # 22

# Vectorized operations
print(temperatures.mean())  # 24.6
print(temperatures.max())   # 30
print(temperatures > 25)    # Boolean Series

DataFrame — The Whole Table

A DataFrame is like a spreadsheet — multiple columns, each with a name:

# Create from dictionary
data = {
    "name": ["Alice", "Bob", "Charlie", "Diana"],
    "age": [25, 30, 35, 28],
    "salary": [50000, 60000, 70000, 55000],
    "department": ["Engineering", "Sales", "Engineering", "Marketing"],
}

df = pd.DataFrame(data)
print(df)
#       name  age  salary   department
# 0    Alice   25   50000  Engineering
# 1      Bob   30   60000        Sales
# 2  Charlie   35   70000  Engineering
# 3    Diana   28   55000    Marketing

Loading Data from CSV

# Read a CSV file
df = pd.read_csv("sales.csv")

# Read with options
df = pd.read_csv(
    "sales.csv",
    header=0,           # First row is column names
    index_col=0,        # First column is row index
    parse_dates=["date"],   # Parse date columns
    na_values=["NA", "N/A"],  # Treat as missing
)

# First 5 rows
df.head()

# Summary info
df.info()

# Basic statistics
df.describe()

Selecting Data

# Select columns
df["name"]              # Single column → Series
df[["name", "age"]]     # Multiple columns → DataFrame

# Select rows by position
df.iloc[0]              # First row
df.iloc[1:3]            # Rows 2-3
df.iloc[[0, 2, 3]]     # Rows 1, 3, 4

# Select rows by label (if index is meaningful)
df.loc[0]               # Row with index 0
df.loc[0:2]             # Rows 0-2 (inclusive)

# Select by condition
df[df["age"] > 28]
df[(df["department"] == "Engineering") & (df["salary"] > 50000)]

Adding and Removing Columns

# Add a calculated column
df["bonus"] = df["salary"] * 0.1

# Add a constant column
df["active"] = True

# Drop a column
df.drop("bonus", axis=1, inplace=True)

# Rename columns
df.rename(columns={"name": "employee_name"}, inplace=True)

Basic Operations

# Summary statistics
df["salary"].mean()      # Average salary
df["salary"].sum()       # Total salary
df["salary"].min()       # Minimum
df["salary"].max()       # Maximum
df["salary"].std()       # Standard deviation

# Value counts
df["department"].value_counts()
# Engineering    2
# Sales          1
# Marketing      1

# Unique values
df["department"].unique()
# ['Engineering', 'Sales', 'Marketing']

# Sort values
df.sort_values("salary", ascending=False)

Handling Missing Data

# Check for missing values
df.isnull().sum()

# Drop rows with any missing values
df.dropna()

# Fill missing values
df.fillna(0)                                 # Fill with 0
df["age"].fillna(df["age"].mean())           # Fill with mean
df.fillna(method="ffill")                    # Forward fill

Practice

# 1. Create a DataFrame with 5 rows
# 2. Load a CSV file
# 3. Filter rows where a column > some value
# 4. Add a calculated column
# 5. Group by a column and calculate mean

← Previous Python for Data Science — Complete Setup Guide Next → Pandas Data Cleaning — Handling Missing Data and Duplicates

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Data Science