What is Reinforcement Learning? Explained with Python Examples

DodaTech 2 min read

In this tutorial, you'll learn about What is Reinforcement Learning? Explained with Python Examples. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Understand reinforcement learning fundamentals — agents, environments, rewards, policies — and build a Q-learning agent that learns to navigate a grid.

Why It Matters

Reinforcement learning powers AlphaGo, self-driving cars, robotics, game AI, and autonomous trading systems.

Real-World Use

Training a robot to walk, optimizing data center cooling (Google saved 40% energy with RL), and teaching game AIs to beat human champions.

What is Reinforcement Learning?

Reinforcement learning (RL) is a type of ML where an agent learns by taking actions and receiving rewards — like training a dog with treats.

Agent → Takes action → Environment → Returns reward + new state
Agent ← Learns from reward ← Environment

The agent's goal: maximize total reward over time.

Key Concepts

Concept	Definition	Example
Agent	The learner/decision-maker	A game player
Environment	The world the agent interacts with	The game board
Action	What the agent can do	Move left, right, up, down
State	Current situation	Player position
Reward	Feedback signal	+1 for reaching goal, -1 for falling
Policy	Strategy for choosing actions	"Always go toward the goal"

Q-Learning from Scratch

Let's build an agent that learns to navigate a 5x5 grid to reach a goal.

import numpy as np

# Grid: 0=empty, 1=obstacle, 2=goal
grid = np.array([
    [0, 0, 0, 0, 0],
    [0, 1, 1, 0, 0],
    [0, 0, 0, 0, 1],
    [0, 1, 0, 1, 0],
    [0, 0, 0, 0, 2]
])

# Q-table: (row, col) -> (up, down, left, right)
q_table = np.zeros((5, 5, 4))

actions = {0: (-1, 0), 1: (1, 0), 2: (0, -1), 3: (0, 1)}
learning_rate = 0.1
discount = 0.95
episodes = 1000

for _ in range(episodes):
    state = (0, 0)
    while grid[state] != 2:
        action = np.argmax(q_table[state[0], state[1]])
        dr, dc = actions[action]
        new_state = (state[0] + dr, state[1] + dc)

        # Check bounds and obstacles
        if (0 <= new_state[0] < 5 and 0 <= new_state[1] < 5
                and grid[new_state] != 1):
            reward = 1 if grid[new_state] == 2 else -0.01
            # Q-learning update
            best_next = np.max(q_table[new_state[0], new_state[1]])
            q_table[state[0], state[1], action] += learning_rate * (
                reward + discount * best_next -
                q_table[state[0], state[1], action]
            )
            state = new_state
        else:
            # Penalize invalid moves
            q_table[state[0], state[1], action] -= 0.1

print("Training complete!")

When to Use RL

Good fit	Poor fit
Sequential decision-making	One-shot predictions
Environment is a simulator	Real-world with slow feedback
Exploration is safe	Mistakes are expensive

← Previous Image Classification with Python — Train a Model from Scratch Next → Deploying ML Models to Production — A Complete Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Ai Ml