Time Series Analysis with Pandas

DodaTech 3 min read

In this tutorial, you'll learn about Time Series Analysis with Pandas. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Analyze time series data with pandas — parse dates, set datetime index, resample to different frequencies, calculate rolling statistics, and shift data for lag analysis.

Why It Matters

Time series data is everywhere — stock prices, website traffic, sensor readings, sales data. Pandas has powerful tools for working with dates and times.

Real-World Use

Analyzing daily website traffic trends, computing 7-day rolling average of sales, comparing month-over-month revenue, or forecasting next quarter's metrics.

Parsing Dates

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Parse dates when loading
df = pd.read_csv("sales.csv", parse_dates=["date"], index_col="date")

# Or convert after loading
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date")

# Common date formats that pandas handles:
df = pd.read_csv("data.csv", parse_dates=["date"])
# "2024-01-15"
# "01/15/2024"
# "January 15, 2024"
# "2024-01-15 14:30:00"

Setting Datetime Index

# Create a sample time series
dates = pd.date_range("2024-01-01", periods=100, freq="D")
values = np.random.randn(100).cumsum() + 100

ts = pd.Series(values, index=dates, name="price")
print(ts.head())
# 2024-01-01    100.5
# 2024-01-02    101.2
# 2024-01-03     99.8
# Freq: D, dtype: float64

Creating Date Ranges

# Daily range
pd.date_range("2024-01-01", "2024-01-10", freq="D")
# DatetimeIndex(['2024-01-01', ..., '2024-01-10'], freq='D')

# Business days only
pd.date_range("2024-01-01", "2024-01-10", freq="B")
# Skips weekends

# Hourly
pd.date_range("2024-01-01", periods=5, freq="h")
# ['2024-01-01 00:00:00', '... 01:00:00', ...]

# Monthly
pd.date_range("2024-01-01", periods=12, freq="M")
# Month ends

# Custom frequencies
freqs = {
    "D": "Daily",
    "B": "Business day",
    "W": "Weekly (Sunday)",
    "W-MON": "Weekly (Monday)",
    "M": "Month end",
    "MS": "Month start",
    "Q": "Quarter end",
    "Y": "Year end",
    "h": "Hourly",
    "T": "Minutely",
    "S": "Secondly",
}

Resampling

Change the frequency of your time series:

# Downsample (higher frequency → lower frequency)
daily_sales = df["sales"].resample("W").sum()       # Weekly totals
daily_sales = df["sales"].resample("M").mean()      # Monthly averages
daily_sales = df["sales"].resample("Q").sum()       # Quarterly totals

# Upsample (lower frequency → higher frequency)
df["sales"].resample("D").ffill()    # Forward fill missing days
df["sales"].resample("D").bfill()    # Backward fill
df["sales"].resample("D").interpolate()  # Linear interpolation

# Resample with multiple aggregations
df.resample("M").agg({
    "sales": "sum",
    "customers": "mean",
    "returns": "sum",
})

Rolling Windows

# 7-day rolling average
ts.rolling(window=7).mean()

# 30-day rolling sum
ts.rolling(window=30).sum()

# Rolling standard deviation
ts.rolling(window=20).std()

# Plot original vs rolling average
plt.figure(figsize=(12, 6))
plt.plot(ts.index, ts.values, alpha=0.5, label="Daily")
plt.plot(ts.index, ts.rolling(7).mean(), label="7-day MA")
plt.plot(ts.index, ts.rolling(30).mean(), label="30-day MA")
plt.legend()
plt.title("Price with Moving Averages")
plt.show()

Shifting (Lags)

Create lagged features for forecasting:

# Shift data forward (lag)
ts.shift(1)    # Previous day's value (t-1)
ts.shift(7)    # Same day last week

# Shift backward (lead)
ts.shift(-1)   # Next day's value (t+1)

# Create lag features
lagged = pd.DataFrame({
    "today": ts,
    "yesterday": ts.shift(1),
    "last_week": ts.shift(7),
    "next_day": ts.shift(-1),
})

# Percentage change
ts.pct_change()                # Daily returns
ts.pct_change(periods=7)       # Week-over-week change

# Difference
ts.diff()                      # Day-over-day change
ts.diff(periods=7)             # Week-over-week difference

DateTime Components

# Extract date components
df["year"] = df.index.year
df["month"] = df.index.month
df["day"] = df.index.day
df["dayofweek"] = df.index.dayofweek  # Monday=0, Sunday=6
df["quarter"] = df.index.quarter
df["is_weekend"] = df.index.dayofweek >= 5

# Day of week name
df["day_name"] = df.index.day_name()

# Hour/minute for sub-daily data
df["hour"] = df.index.hour
df["minute"] = df.index.minute

Real-World Example

# Load and analyze daily sales data
df = pd.read_csv("daily_sales.csv",
                 parse_dates=["date"],
                 index_col="date")

df["sales"].plot(figsize=(14, 6), title="Daily Sales")

# Calculate and plot weekly trend
weekly = df["sales"].resample("W").sum()

# 4-week rolling average
weekly_ma = weekly.rolling(4).mean()

# Compare month-over-month
monthly = df["sales"].resample("M").sum()
monthly_pct = monthly.pct_change() * 100

# Find best and worst days
print("Best sales day:", df["sales"].idxmax(), df["sales"].max())
print("Worst sales day:", df["sales"].idxmin(), df["sales"].min())

Practice

Create a daily time series with 365 data points
Resample to monthly and quarterly frequency
Calculate 7-day and 30-day rolling averages
Create lag features (t-1, t-7, t-30)
Plot the original series with rolling averages

← Previous Data Wrangling with Pandas — Reshape, Pivot, and Stack Next → Working with CSV, Excel, and SQL in Pandas

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Data Science