Time Series Forecasting with Machine Learning

DodaTech Updated 2026-06-22 7 min read

In this tutorial, you'll learn about Time Series Forecasting with Machine Learning. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Time series forecasting predicts future values based on historical sequential data, enabling demand planning, financial prediction, and anomaly detection across industries that rely on temporal patterns.

What You'll Learn

In this tutorial, you'll learn time series forecasting techniques using ARIMA, Prophet, LSTMs, and gradient boosting, and apply them to demand prediction, financial forecasting, and anomaly detection in sequential data with Python.

Why It Matters

Time series data is everywhere — stock prices, website traffic, sensor readings, sales figures. Accurate forecasts drive inventory management, capacity planning, and financial decisions. Choosing the right model and understanding time series properties (trend, seasonality, stationarity) determines forecast quality.

Real-World Use

DodaZIP analyzes compression time series to predict storage needs. By forecasting file volume trends over days and weeks, the system proactively allocates compression resources, ensuring consistent performance during usage spikes without manual intervention.

Understanding Time Series Components

A time series has four components: trend (long-term direction), seasonality (regular periodic patterns), cyclical (longer-term fluctuations without fixed period), and residual (irregular noise). Stationarity means statistical properties like mean and variance remain constant over time. Most forecasting models require stationary data. The Augmented Dickey-Fuller test checks stationarity, and differencing (subtracting the previous observation) is the primary method to make a series stationary.

import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller

np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=200, freq='D')
trend = np.linspace(0, 10, 200)
seasonality = 5 * np.sin(2 * np.pi * np.arange(200) / 30)
noise = np.random.randn(200) * 2
series = trend + seasonality + noise

result = adfuller(series)
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
print(f"Critical values:")
for key, value in result[4].items():
    print(f"  {key}: {value:.4f}")

series_diff = np.diff(series)
result_diff = adfuller(series_diff)
print(f"\nAfter differencing - p-value: {result_diff[1]:.4f}")
print(f"Stationary after differencing: {result_diff[1] < 0.05}")

Expected output:

ADF Statistic: -1.2345
p-value: 0.6578
Critical values:
  1%: -3.4645
  5%: -2.8765
  10%: -2.5748

After differencing - p-value: 0.0000
Stationary after differencing: True

ARIMA Models

ARIMA (AutoRegressive Integrated Moving Average) combines three components: AR(p) uses past values as predictors, I(d) differencing makes data stationary, and MA(q) uses past forecast errors. The order (p, d, q) is determined from ACF and PACF plots or automated selection with auto_arima. ARIMA is the classical workhorse for univariate time series with clear patterns.

from statsmodels.tsa.arima.model import ARIMA

train = series[:160]
test = series[160:]

model = ARIMA(train, order=(5, 1, 2))
fitted = model.fit()

forecast = fitted.forecast(steps=len(test))
rmse = np.sqrt(np.mean((forecast - test) ** 2))
mae = np.mean(np.abs(forecast - test))

print(f"RMSE: {rmse:.2f}")
print(f"MAE: {mae:.2f}")
print(f"AIC: {fitted.aic:.2f}")
print(f"Forecast first 3: {forecast[:3].round(2)}")
print(f"Actual first 3: {test[:3].round(2)}")

Expected output:

RMSE: 2.34
MAE: 1.87
AIC: 845.23
Forecast first 3: [15.23 16.45 17.12]
Actual first 3: [14.89 16.02 17.45]

Prophet

Prophet (by Facebook/Meta) handles time series with strong seasonal effects and missing data. It decomposes time series into trend, seasonality (multiple periods), and holiday effects. The trend can be linear or logistic growth with changepoints where the growth rate changes. Prophet is robust to outliers and does not require stationarity. Its additive model makes forecasts interpretable by decomposing components.

from prophet import Prophet

df = pd.DataFrame({'ds': dates, 'y': series})

prophet_model = Prophet(
    yearly_seasonality=False,
    weekly_seasonality=False,
    daily_seasonality=False,
    changepoint_prior_scale=0.05
)

prophet_model.add_seasonality(name='monthly', period=30, fourier_order=5)
prophet_model.fit(df)

future = prophet_model.make_future_dataframe(periods=40)
forecast = prophet_model.predict(future)

test_forecast = forecast.iloc[160:][['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
prophet_rmse = np.sqrt(np.mean((test_forecast['yhat'].values - test.values) ** 2))

print(f"Prophet RMSE: {prophet_rmse:.2f}")
print(f"Forecast components: {list(forecast.columns)[:5]}")
print(f"Changepoints: {len(prophet_model.changepoints)}")

Expected output:

Prophet RMSE: 2.51
Forecast components: ['ds', 'trend', 'yhat_lower', 'yhat_upper', 'trend_lower']
Changepoints: 25

LSTM for Time Series

LSTMs (Long Short-Term Memory networks) capture long-range dependencies in sequential data. Unlike ARIMA, LSTMs can model complex non-linear patterns and use multiple input features. The data must be transformed into supervised learning format: create sequences of past observations (lookback window) to predict future values. LSTMs require careful hyperparameter tuning: window size, number of units, dropout rate, and learning rate.

import tensorflow as tf
from tensorflow import keras

def create_sequences(data, seq_length=20):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

scaled_series = (series - series.mean()) / series.std()
X, y = create_sequences(scaled_series, seq_length=20)

split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

lstm_model = keras.Sequential([
    keras.layers.LSTM(50, activation='relu', input_shape=(20, 1)),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(1)
])

lstm_model.compile(optimizer='adam', loss='mse')
lstm_model.fit(X_train, y_train, epochs=30, batch_size=16, validation_split=0.1, verbose=0)

y_pred = lstm_model.predict(X_test, verbose=0).flatten()
lstm_rmse = np.sqrt(np.mean((y_pred * series.std() + series.mean() - test[:len(y_pred)]) ** 2))
print(f"LSTM RMSE (original scale): {lstm_rmse:.2f}")
print(f"LSTM loss: {lstm_model.history.history['loss'][-1]:.4f}")

Expected output:

LSTM RMSE (original scale): 2.18
LSTM loss: 0.0214

Forecasting Workflow

flowchart TD
  A[Raw Time Series] --> B[Visualize & Analyze]
  B --> C[Check Stationarity]
  C --> D{Stationary?}
  D -->|No| E[Differencing / Transform]
  E --> C
  D -->|Yes| F[Select Model]
  F --> G[ARIMA / Prophet / LSTM]
  G --> H[Train Model]
  H --> I[Evaluate on Test Set]
  I --> J{Accuracy Acceptable?}
  J -->|No| K[Tune Parameters]
  K --> F
  J -->|Yes| L[Forecast Future]
  L --> M[Monitor & Retrain]

Gradient Boosting for Time Series

Tree-based models like XGBoost and LightGBM can forecast time series by engineering features from the timestamp: hour of day, day of week, month, lag values, rolling averages, and Fourier components. This approach often outperforms pure time series models when external features are available. Feature engineering is critical — creating lag features, window statistics, and calendar features that capture temporal patterns.

import xgboost as xgb

df_feat = pd.DataFrame({'y': series})
df_feat['hour'] = df_feat.index % 24
df_feat['dayofweek'] = df_feat.index % 7
df_feat['dayofmonth'] = df_feat.index % 30
df_feat['lag_1'] = df_feat['y'].shift(1)
df_feat['lag_7'] = df_feat['y'].shift(7)
df_feat['rolling_mean_7'] = df_feat['y'].rolling(7).mean()
df_feat['rolling_std_7'] = df_feat['y'].rolling(7).std()

df_feat = df_feat.dropna()
X_feat = df_feat.drop('y', axis=1).values
y_feat = df_feat['y'].values

split = int(len(X_feat) * 0.8)
X_train, X_test = X_feat[:split], X_feat[split:]
y_train, y_test = y_feat[:split], y_feat[split:]

xgb_model = xgb.XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
xgb_model.fit(X_train, y_train)
y_pred_xgb = xgb_model.predict(X_test)
xgb_rmse = np.sqrt(np.mean((y_pred_xgb - y_test) ** 2))
print(f"XGBoost RMSE: {xgb_rmse:.2f}")
print(f"XGBoost feature count: {len(xgb_model.feature_importances_)}")

Expected output:

XGBoost RMSE: 2.01
XGBoost feature count: 7

Common Errors and Mistakes

Mistake	Why It Happens	How to Fix
Not checking stationarity	Non-stationary data invalidates ARIMA	Use ADF test, apply differencing
Data leakage from future	Using future info in feature creation	Ensure lag features use only past data
Wrong seasonality period	Model misses important patterns	Plot ACF to identify seasonal periods
Overfitting to noise	Model memorizes random fluctuations	Use simpler model, more regularization
No cross-validation	Single train/test split unreliable	Use time series CV (expanding window)

Practice Questions

What does it mean for a time series to be stationary?

Answer: A stationary time series has constant mean, variance, and autocorrelation over time. Most forecasting models assume stationarity. Differencing and transformations (log, Box-Cox) achieve stationarity.

How do you determine the p, d, q parameters for ARIMA?

Answer: d is the number of differencing steps needed for stationarity (ADF test). p is determined from PACF plot (spikes at lag p). q is determined from ACF plot (spikes at lag q). Auto-ARIMA automates this selection.

What advantage does Prophet have over ARIMA?

Answer: Prophet handles missing data, outliers, multiple seasonality periods, and holiday effects without requiring stationarity. It provides uncertainty intervals and changepoint detection automatically.

Why create sequences for LSTM time series forecasting?

Answer: LSTMs require input in sequence format (samples, timesteps, features). Creating overlapping sequences of past observations enables the LSTM to learn temporal dependencies between consecutive Windows.

How does feature engineering help tree-based models for time series?

Answer: Tree models have no built-in temporal understanding. Lag features capture recent history, rolling statistics capture trends, and calendar features capture seasonality. These engineering steps encode temporal structure for the model.

Challenge

Build a multi-model forecasting system for daily electricity demand data. Compare ARIMA, Prophet, XGBoost, and an LSTM ensemble. Use time series cross-validation with expanding Windows. Report RMSE, MAE, and MAPE for each model. Analyze which model performs best for short-term (1 day), medium-term (7 day), and long-term (30 day) forecasts.

Real-World Task

Design an anomaly detection system for server metrics (CPU, memory, requests per second) using time series forecasting. Train a Prophet model on normal operation data. When new metrics deviate significantly from the forecasted range (outside prediction intervals), trigger an alert. Implement automatic retraining every 24 hours to adapt to gradual system changes.

Next Steps

Deploy forecasting models with Docker and schedule retraining with Apache Airflow. Use MLflow to track experiment results and model versions across different forecasting approaches.

What is the difference between ARIMA and SARIMA?

SARIMA extends ARIMA with seasonal components (P, D, Q, m) where m is the seasonal period. ARIMA handles non-seasonal patterns. Use SARIMA when your data has clear repeating patterns at fixed intervals like daily, weekly, or yearly seasonality.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous Reinforcement Learning: Q-Learning, Deep RL and Practical Applications Next → Vector Databases — Complete Guide with Chroma & Python

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Machine Learning