Time Series Analysis with Pandas
In this tutorial, you'll learn about Time Series Analysis with Pandas. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Analyze time series data with pandas — parse dates, set datetime index, resample to different frequencies, calculate rolling statistics, and shift data for lag analysis.
Why It Matters
Time series data is everywhere — stock prices, website traffic, sensor readings, sales data. Pandas has powerful tools for working with dates and times.
Real-World Use
Analyzing daily website traffic trends, computing 7-day rolling average of sales, comparing month-over-month revenue, or forecasting next quarter's metrics.
Parsing Dates
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Parse dates when loading
df = pd.read_csv("sales.csv", parse_dates=["date"], index_col="date")
# Or convert after loading
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date")
# Common date formats that pandas handles:
df = pd.read_csv("data.csv", parse_dates=["date"])
# "2024-01-15"
# "01/15/2024"
# "January 15, 2024"
# "2024-01-15 14:30:00"
Setting Datetime Index
# Create a sample time series
dates = pd.date_range("2024-01-01", periods=100, freq="D")
values = np.random.randn(100).cumsum() + 100
ts = pd.Series(values, index=dates, name="price")
print(ts.head())
# 2024-01-01 100.5
# 2024-01-02 101.2
# 2024-01-03 99.8
# Freq: D, dtype: float64
Creating Date Ranges
# Daily range
pd.date_range("2024-01-01", "2024-01-10", freq="D")
# DatetimeIndex(['2024-01-01', ..., '2024-01-10'], freq='D')
# Business days only
pd.date_range("2024-01-01", "2024-01-10", freq="B")
# Skips weekends
# Hourly
pd.date_range("2024-01-01", periods=5, freq="h")
# ['2024-01-01 00:00:00', '... 01:00:00', ...]
# Monthly
pd.date_range("2024-01-01", periods=12, freq="M")
# Month ends
# Custom frequencies
freqs = {
"D": "Daily",
"B": "Business day",
"W": "Weekly (Sunday)",
"W-MON": "Weekly (Monday)",
"M": "Month end",
"MS": "Month start",
"Q": "Quarter end",
"Y": "Year end",
"h": "Hourly",
"T": "Minutely",
"S": "Secondly",
}
Resampling
Change the frequency of your time series:
# Downsample (higher frequency → lower frequency)
daily_sales = df["sales"].resample("W").sum() # Weekly totals
daily_sales = df["sales"].resample("M").mean() # Monthly averages
daily_sales = df["sales"].resample("Q").sum() # Quarterly totals
# Upsample (lower frequency → higher frequency)
df["sales"].resample("D").ffill() # Forward fill missing days
df["sales"].resample("D").bfill() # Backward fill
df["sales"].resample("D").interpolate() # Linear interpolation
# Resample with multiple aggregations
df.resample("M").agg({
"sales": "sum",
"customers": "mean",
"returns": "sum",
})
Rolling Windows
# 7-day rolling average
ts.rolling(window=7).mean()
# 30-day rolling sum
ts.rolling(window=30).sum()
# Rolling standard deviation
ts.rolling(window=20).std()
# Plot original vs rolling average
plt.figure(figsize=(12, 6))
plt.plot(ts.index, ts.values, alpha=0.5, label="Daily")
plt.plot(ts.index, ts.rolling(7).mean(), label="7-day MA")
plt.plot(ts.index, ts.rolling(30).mean(), label="30-day MA")
plt.legend()
plt.title("Price with Moving Averages")
plt.show()
Shifting (Lags)
Create lagged features for forecasting:
# Shift data forward (lag)
ts.shift(1) # Previous day's value (t-1)
ts.shift(7) # Same day last week
# Shift backward (lead)
ts.shift(-1) # Next day's value (t+1)
# Create lag features
lagged = pd.DataFrame({
"today": ts,
"yesterday": ts.shift(1),
"last_week": ts.shift(7),
"next_day": ts.shift(-1),
})
# Percentage change
ts.pct_change() # Daily returns
ts.pct_change(periods=7) # Week-over-week change
# Difference
ts.diff() # Day-over-day change
ts.diff(periods=7) # Week-over-week difference
DateTime Components
# Extract date components
df["year"] = df.index.year
df["month"] = df.index.month
df["day"] = df.index.day
df["dayofweek"] = df.index.dayofweek # Monday=0, Sunday=6
df["quarter"] = df.index.quarter
df["is_weekend"] = df.index.dayofweek >= 5
# Day of week name
df["day_name"] = df.index.day_name()
# Hour/minute for sub-daily data
df["hour"] = df.index.hour
df["minute"] = df.index.minute
Real-World Example
# Load and analyze daily sales data
df = pd.read_csv("daily_sales.csv",
parse_dates=["date"],
index_col="date")
df["sales"].plot(figsize=(14, 6), title="Daily Sales")
# Calculate and plot weekly trend
weekly = df["sales"].resample("W").sum()
# 4-week rolling average
weekly_ma = weekly.rolling(4).mean()
# Compare month-over-month
monthly = df["sales"].resample("M").sum()
monthly_pct = monthly.pct_change() * 100
# Find best and worst days
print("Best sales day:", df["sales"].idxmax(), df["sales"].max())
print("Worst sales day:", df["sales"].idxmin(), df["sales"].min())
Practice
- Create a daily time series with 365 data points
- Resample to monthly and quarterly frequency
- Calculate 7-day and 30-day rolling averages
- Create lag features (t-1, t-7, t-30)
- Plot the original series with rolling averages
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro