Pandas Tutorial — DataFrames and Series Explained
In this tutorial, you'll learn about Pandas Tutorial. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Use pandas to work with tabular data — create DataFrames and Series, load CSV files, select rows and columns, filter data, and perform basic operations.
Why It Matters
Pandas is the most widely used Python library for data analysis. If you work with data in Python, you'll use pandas 90% of the time.
Real-World Use
Loading a CSV of sales data, filtering for last month's transactions, calculating average order value, or grouping customers by region.
What is pandas?
pandas is a Python library for working with tabular data — data in rows and columns, like a spreadsheet or SQL table.
Series — One Column
A Series is a single column of data with an index:
import pandas as pd
# Create a Series from a list
temperatures = pd.Series([22, 25, 19, 30, 27],
index=["Mon", "Tue", "Wed", "Thu", "Fri"])
print(temperatures)
# Mon 22
# Tue 25
# Wed 19
# Thu 30
# Fri 27
# dtype: int64
# Access by label
print(temperatures["Mon"]) # 22
# Access by position
print(temperatures.iloc[0]) # 22
# Vectorized operations
print(temperatures.mean()) # 24.6
print(temperatures.max()) # 30
print(temperatures > 25) # Boolean Series
DataFrame — The Whole Table
A DataFrame is like a spreadsheet — multiple columns, each with a name:
# Create from dictionary
data = {
"name": ["Alice", "Bob", "Charlie", "Diana"],
"age": [25, 30, 35, 28],
"salary": [50000, 60000, 70000, 55000],
"department": ["Engineering", "Sales", "Engineering", "Marketing"],
}
df = pd.DataFrame(data)
print(df)
# name age salary department
# 0 Alice 25 50000 Engineering
# 1 Bob 30 60000 Sales
# 2 Charlie 35 70000 Engineering
# 3 Diana 28 55000 Marketing
Loading Data from CSV
# Read a CSV file
df = pd.read_csv("sales.csv")
# Read with options
df = pd.read_csv(
"sales.csv",
header=0, # First row is column names
index_col=0, # First column is row index
parse_dates=["date"], # Parse date columns
na_values=["NA", "N/A"], # Treat as missing
)
# First 5 rows
df.head()
# Summary info
df.info()
# Basic statistics
df.describe()
Selecting Data
# Select columns
df["name"] # Single column → Series
df[["name", "age"]] # Multiple columns → DataFrame
# Select rows by position
df.iloc[0] # First row
df.iloc[1:3] # Rows 2-3
df.iloc[[0, 2, 3]] # Rows 1, 3, 4
# Select rows by label (if index is meaningful)
df.loc[0] # Row with index 0
df.loc[0:2] # Rows 0-2 (inclusive)
# Select by condition
df[df["age"] > 28]
df[(df["department"] == "Engineering") & (df["salary"] > 50000)]
Adding and Removing Columns
# Add a calculated column
df["bonus"] = df["salary"] * 0.1
# Add a constant column
df["active"] = True
# Drop a column
df.drop("bonus", axis=1, inplace=True)
# Rename columns
df.rename(columns={"name": "employee_name"}, inplace=True)
Basic Operations
# Summary statistics
df["salary"].mean() # Average salary
df["salary"].sum() # Total salary
df["salary"].min() # Minimum
df["salary"].max() # Maximum
df["salary"].std() # Standard deviation
# Value counts
df["department"].value_counts()
# Engineering 2
# Sales 1
# Marketing 1
# Unique values
df["department"].unique()
# ['Engineering', 'Sales', 'Marketing']
# Sort values
df.sort_values("salary", ascending=False)
Handling Missing Data
# Check for missing values
df.isnull().sum()
# Drop rows with any missing values
df.dropna()
# Fill missing values
df.fillna(0) # Fill with 0
df["age"].fillna(df["age"].mean()) # Fill with mean
df.fillna(method="ffill") # Forward fill
Practice
# 1. Create a DataFrame with 5 rows
# 2. Load a CSV file
# 3. Filter rows where a column > some value
# 4. Add a calculated column
# 5. Group by a column and calculate mean
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro