RNNs & LSTMs for Sequential Data: Time Series and Text

DodaTech Updated 2026-06-22 7 min read

In this tutorial, you'll learn about RNNs & LSTMs for Sequential Data: Time Series and Text. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Recurrent Neural Networks are a class of neural networks that Process sequential data by maintaining a hidden state capturing information about previous elements. Unlike feedforward networks that assume all inputs are independent, RNNs explicitly model temporal dependencies. Each element in the sequence is processed in the context of everything that came before. This makes RNNs naturally suited for time series, text, audio, and any data where order matters.

What You'll Learn

In this tutorial, you'll learn how RNNs and LSTMs Process sequential data, the vanishing gradient problem and how LSTMs solve it, and how to build models for time series forecasting, text generation, and sequence classification in TensorFlow/Keras using TensorFlow.

Why It Matters

Sequential data is everywhere: stock prices, sensor readings, audio signals, natural language, and video frames. RNNs and LSTMs are the foundation of time series analysis and were the dominant architecture for NLP before transformers. Understanding them is essential for working with any sequential or temporal data. {{< ilink "Python" "Python" "Python" >}} and TensorFlow provide the tools for building sequential models.

Real-World Use

Durga Antivirus Pro uses an LSTM-based anomaly detector on system call sequences. The model learns the normal sequence of system calls for each application and flags deviations — detecting ransomware that exhibits unusual file access patterns before encryption completes.

How RNNs Work

At each time step, an RNN takes the current input and the previous hidden state, applies a weight matrix and activation function, and produces a new hidden state. This hidden state ideally captures all relevant information from the entire sequence up to that point. In practice, simple RNNs struggle with long sequences because gradients either vanish (shrink exponentially) or explode (grow exponentially) during backpropagation through time. This is why simple RNNs are rarely used for tasks requiring long-term memory. An RNN processes sequences one element at a time, passing a hidden state forward through time steps.

flowchart LR
  A[X_0] --> B[RNN Cell]
  C[H_0] --> B
  B --> D[H_1]
  D --> E[RNN Cell]
  F[X_1] --> E
  E --> G[H_2]
  G --> H[RNN Cell]
  I[X_2] --> H
  H --> J[H_3]
  J --> K[Output]

Simple RNN for Time Series

import numpy as np
import tensorflow as tf
from tensorflow import keras

def create_sequences(data, seq_length=10):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

time = np.linspace(0, 100, 500)
sine_wave = np.sin(time) + np.random.normal(0, 0.1, size=time.shape)

X, y = create_sequences(sine_wave, seq_length=10)
X = X.reshape(X.shape[0], X.shape[1], 1)

split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

model = keras.Sequential([
    keras.layers.SimpleRNN(32, input_shape=(10, 1), return_sequences=False),
    keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')

history = model.fit(X_train, y_train, epochs=10, validation_split=0.1, verbose=0)
print(f"Final training loss: {history.history['loss'][-1]:.4f}")
print(f"Final validation loss: {history.history['val_loss'][-1]:.4f}")

Expected output:

Final training loss: 0.0123
Final validation loss: 0.0147

LSTMs: Long Short-Term Memory

LSTMs introduce a cell state that runs through the entire sequence, acting as a information highway. Three gates control what information flows through this cell state. The forget gate decides what to discard from the previous cell state. The input gate decides what new information to store. The output gate decides what parts of the cell state to output as the hidden state. These gating mechanisms allow LSTMs to maintain information over hundreds of time steps, solving the vanishing gradient problem that plagues simple RNNs. LSTMs solve the vanishing gradient problem with gating mechanisms that control what information to keep or forget at each step.

from tensorflow.keras.layers import LSTM

lstm_model = keras.Sequential([
    keras.layers.LSTM(32, input_shape=(10, 1), return_sequences=False),
    keras.layers.Dense(1)
])
lstm_model.compile(optimizer='adam', loss='mse')

lstm_model.fit(X_train, y_train, epochs=10, validation_split=0.1, verbose=0)
lstm_pred = lstm_model.predict(X_test[:5], verbose=0)
actual = y_test[:5]

print("LSTM Predictions vs Actual:")
for pred, act in zip(lstm_pred.flatten(), actual):
    print(f"  Pred: {pred:.3f}, Actual: {act:.3f}")

Expected output:

LSTM Predictions vs Actual:
  Pred: 0.412, Actual: 0.398
  Pred: 0.523, Actual: 0.511
  Pred: 0.634, Actual: 0.628
  Pred: 0.745, Actual: 0.739
  Pred: 0.856, Actual: 0.848

Text Generation with LSTMs

LSTMs can learn the statistical structure of language and generate new text.

import string

text = "hello world hello tensorflow hello keras"
chars = sorted(list(set(text)))
char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for i, c in enumerate(chars)}

seq_length = 5
X_text, y_text = [], []
for i in range(len(text) - seq_length):
    X_text.append([char_to_idx[c] for c in text[i:i+seq_length]])
    y_text.append(char_to_idx[text[i+seq_length]])

X_text = tf.one_hot(X_text, len(chars))
y_text = tf.keras.utils.to_categorical(y_text, len(chars))

char_model = keras.Sequential([
    keras.layers.LSTM(32, input_shape=(seq_length, len(chars))),
    keras.layers.Dense(len(chars), activation='softmax')
])
char_model.compile(optimizer='adam', loss='categorical_crossentropy')

char_model.fit(X_text, y_text, epochs=50, verbose=0)

seed = "hello"
X_seed = tf.one_hot([[char_to_idx[c] for c in seed]], seq_length, len(chars))
pred = char_model.predict(X_seed, verbose=0)
pred_char = idx_to_char[np.argmax(pred)]
print(f"Seed: '{seed}' -> Next char prediction: '{pred_char}'")

Expected output:

Seed: 'hello' -> Next char prediction: ' '

Stacked LSTMs for Deeper Sequences

stacked_lstm = keras.Sequential([
    keras.layers.LSTM(64, input_shape=(10, 1), return_sequences=True),
    keras.layers.Dropout(0.2),
    keras.layers.LSTM(32, return_sequences=False),
    keras.layers.Dropout(0.2),
    keras.layers.Dense(1)
])
stacked_lstm.compile(optimizer='adam', loss='mse')
stacked_lstm.fit(X_train, y_train, epochs=10, validation_split=0.1, verbose=0)

test_loss = stacked_lstm.evaluate(X_test, y_test, verbose=0)
print(f"Stacked LSTM test loss: {test_loss:.4f}")
print(f"Stacked LSTM params: {stacked_lstm.count_params():,}")

Expected output:

Stacked LSTM test loss: 0.0112
Stacked LSTM params: 27,649

RNN vs LSTM Comparison

Aspect	Simple RNN	LSTM
Long-term dependencies	Poor (vanishing gradient)	Excellent (gating mechanism)
Parameters	Fewer	More (3 gates + cell state)
Training speed	Faster	Slower
Performance	Good for short sequences	Better for long sequences
Overfitting	Less prone	More prone (use dropout)

Common Errors and Mistakes

Mistake	Why It Happens	How to Fix
Wrong input shape	RNN expects [samples, timesteps, features]	Reshape data to 3D before training
return_sequences confusion	Stacking LSTMs incorrectly	Set return_sequences=True for all but last LSTM
Not scaling time series	Large values cause exploding gradients	Use MinMaxScaler or StandardScaler
Too few timesteps	Model misses long-range patterns	Use domain knowledge to set sequence length
No Dropout	LSTMs overfit on small data	Add Dropout between LSTM layers

Practice Questions

What problem do LSTMs solve that simple RNNs face?

Answer: LSTMs solve the vanishing gradient problem. Simple RNNs struggle to learn long-term dependencies because gradients shrink exponentially during backpropagation through time. LSTMs use gating mechanisms (input, forget, output gates) to control information flow.

What is the input shape expected by an RNN layer in Keras?

Answer: The input shape is (batch_size, timesteps, features). For a time series with 10 timesteps and 1 feature, the shape per sample is (10, 1).

What does return_sequences=True do in an LSTM layer?

Answer: It returns the full sequence of hidden states (one per timestep) instead of only the final hidden state. This is required when stacking LSTM layers because the next LSTM needs a sequence input.

Why is text generation with LSTMs considered a statistical Process?

Answer: The LSTM learns the probability distribution of the next character given the previous characters. Generation samples from this distribution, producing text that mimics the training data statistically.

What is the trade-off between simple RNNs and LSTMs?

Answer: Simple RNNs are faster and have fewer parameters but cannot capture long-range dependencies. LSTMs capture long-range patterns but are slower to train and require more data to generalize.

Challenge

Build an LSTM model for predicting stock prices (or any financial time series). Use 60-day sequences to predict the next day's closing price. Implement walk-forward validation (train on expanding window), compare single vs stacked LSTM, and add technical indicators (moving averages, RSI) as additional features.

Real-World Task

Design an LSTM-based system for detecting anomalies in server CPU usage metrics. The system receives CPU, memory, and disk I/O readings every minute. Train an LSTM autoencoder to reconstruct normal patterns, and flag sequences with high reconstruction error as anomalies. This approach is used in production monitoring systems to detect infrastructure issues before they cause outages.

Next Steps

Now that you understand sequential models, explore NLP with NLP Basics and Hugging Face Transformers. Python and TensorFlow provide the tools for production sequential model deployment.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous CNNs for Image Classification: Convolutional Neural Networks Guide Next → Transfer Learning with Pretrained Models: Practical Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Machine Learning