System Design Interview Prep Roadmap — Complete 8-Week Study Plan

DodaTech Updated 2026-06-22 10 min read

In this tutorial, you'll learn about System Design Interview Prep} Interview Prep Roadmap. We cover key concepts, practical examples, and best practices.

A complete 8-week system design interview roadmap covering fundamentals, distributed systems patterns, real-world architecture studies, whiteboard practice, and mock interview preparation.

What You'll Learn

You will learn a structured approach to designing large-scale systems, master the common patterns used in system design interviews, understand trade-offs between architectures, and gain confidence through mock interview practice with real-world scenarios.

Why It Matters

System design interviews are the hardest part of the technical interview process for senior engineering roles. They test your ability to think at scale, make architectural trade-offs, communicate clearly, and handle ambiguity. Unlike coding interviews, there is no single correct answer -- interviewers evaluate your process, reasoning, and depth of knowledge.

Real-World Use

The DodaZIP and Doda Browser engineering teams use the same design patterns covered in this roadmap. When Doda Browser needed to build a real-time sync feature across 10 million devices, the engineering lead sketched a solution using the same methodology taught here: start with requirements, estimate scale, design the data model, then iterate on the architecture.

Your Learning Path

flowchart LR
  A[Week 1-2: Fundamentals] --> B[Week 3-4: Core Patterns]
  B --> C[Week 5-6: Real Systems]
  C --> D[Week 7: Mock Practice]
  D --> E[Week 8: Final Review]
  A --> F{You Are Here}
  style F fill:#f90,color:#fff

ℹ️ Info

Prerequisites: 2+ years of software engineering experience. Familiarity with databases, APIs, caching, and basic networking concepts. This roadmap assumes you can already code at a senior engineer level.

Week 1-2: Fundamentals

Scalability Concepts

Concept	Definition	Key Question
Vertical scaling	Add more power to a single machine	When is this still viable?
Horizontal scaling	Add more machines	How do you handle state?
Load balancing	Distribute traffic across servers	Which algorithm? Round-robin, least connections?
Caching	Store frequently accessed data in fast storage	What is the eviction policy?
CDN	Distribute static assets geographically	What should NOT go on a CDN?
Database sharding	Split data across multiple databases	What is the shard key?

Estimation Skills

Example: Estimate Twitter timeline storage

Daily active users: 300M
Tweets per active user per day: 2
Total tweets per day: 600M
Average tweet size: 280 bytes (text) + metadata ~100 bytes
Daily storage: 600M * 400 bytes = 240 GB/day
Yearly storage: 240 GB * 365 ≈ 87 TB/year
With 3x replication: ~262 TB/year

This tells us we need distributed storage from day 1.

Expected behavior: Running these estimations before designing gives you concrete constraints. If the numbers are too large for a single database, you know you need sharding or partitioning.

Key Trade-offs to Know

Trade-off	Description
Consistency vs Availability	CAP theorem: you can have at most two of three (partition tolerance is required)
Read vs Write optimization	Denormalize for reads, normalize for writes
Latency vs Throughput	Batch for throughput, stream for latency
Synchronous vs Asynchronous	Sync is simpler but blocks, async is complex but scales
SQL vs NoSQL	SQL for complex queries, NoSQL for horizontal scale

Week 3-4: Core Design Patterns

Load Balancer Pattern

Client -> DNS -> Load Balancer -> Web Servers
                               -> Cache (Redis)
                               -> Database (Primary/Replica)

Layer	Technology	Purpose
DNS	Route53, CloudDNS	Geographic routing
L4 LB	HAProxy, NLBs	TCP load balancing
L7 LB	ALB, Envoy, Nginx	HTTP routing, SSL termination

Database Replication

flowchart LR
  A[Application] -->|writes| B[Primary DB]
  B -->|async replication| C[Replica 1]
  B -->|async replication| D[Replica 2]
  B -->|async replication| E[Replica 3]
  A -->|reads| C
  A -->|reads| D
  A -->|reads| E

  style B fill:#4a9,color:#fff
  style C fill:#49a,color:#fff
  style D fill:#49a,color:#fff
  style E fill:#49a,color:#fff

Expected behavior: Writes go to the primary. Reads are distributed across replicas. If the primary fails, one replica is promoted. Read scalability increases linearly with replica count.

Caching Strategies

# Cache-aside pattern (lazy loading)
import redis
import json

cache = redis.Redis(host="localhost", port=6379)

def get_user_profile(user_id):
    """Fetch user profile with cache-aside pattern."""

    # Try cache first
    cached = cache.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Cache miss: query database
    profile = database.query(
        "SELECT * FROM users WHERE id = %s",
        (user_id,)
    )

    if profile:
        # Store in cache with TTL
        cache.setex(
            f"user:{user_id}",
            3600,  # 1 hour TTL
            json.dumps(profile)
        )

    return profile

Expected behavior: On first request, cache misses and the database is queried. Subsequent requests hit the cache. After 1 hour, the cache entry expires and the next request refreshes it.

Rate Limiting

import time
from collections import defaultdict

class SlidingWindowRateLimiter:
    """Rate limiter using sliding window counter."""

    def __init__(self, max_requests=100, window_seconds=60):
        self.max_requests = max_requests
        self.window = window_seconds
        self.requests = defaultdict(list)

    def allow_request(self, client_id):
        now = time.time()
        window_start = now - self.window

        # Remove expired entries
        self.requests[client_id] = [
            t for t in self.requests[client_id]
            if t > window_start
        ]

        # Check limit
        if len(self.requests[client_id]) >= self.max_requests:
            return False

        self.requests[client_id].append(now)
        return True

limiter = SlidingWindowRateLimiter(max_requests=5, window_seconds=10)

for i in range(8):
    allowed = limiter.allow_request("user-42")
    print(f"Request {i+1}: {'Allowed' if allowed else 'Rate limited'}")
    time.sleep(1)

Expected output:

Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Rate limited
Request 7: Rate limited
Request 8: Rate limited

Week 5-6: Study Real Systems

System Design Practice Problems

Problem	Key Concepts	Sample Question
URL Shortener	Hashing, Base62, key generation	"Design TinyURL"
Chat System	WebSockets, presence, persistence	"Design WhatsApp"
News Feed	Fan-out, pull vs push, ranking	"Design Facebook Feed"
Video Streaming	Chunking, CDN, adaptive bitrate	"Design YouTube"
Ride Sharing	Geospatial indexing, matching	"Design Uber"
E-commerce	Inventory, cart, order processing	"Design Amazon"

How to Approach a Design Problem

Step 1: Clarify Requirements (5 min)
  - Functional: What should the system do?
  - Non-functional: Scale, latency, availability, consistency
  - Constraints: Budget, timeline, team size

Step 2: High-Level Design (10 min)
  - Draw the main components
  - Show data flow
  - Identify the core API endpoints

Step 3: Deep Dive (15 min)
  - Data model (schema, storage choice)
  - Key component details
  - Bottlenecks and trade-offs

Step 4: Scale (10 min)
  - Where do you add more servers?
  - How do you handle 10x traffic?
  - What breaks first?

Week 7: Mock Interviews

Practice with real system design prompts under timed conditions.

Mock Prompt: Design a real-time leaderboard for a gaming platform

Requirements:
- 10M daily active users
- Scores update in real-time during matches
- Leaderboard shows top 100 players globally
- Also need friends leaderboard
- Latency: < 500ms for leaderboard view

Expected approach:
1. Use Redis sorted sets for real-time ranking
2. ZADD for score updates (O(log N) per update)
3. ZREVRANGE for top 100 (O(log N + M))
4. Shard by game region for write scalability
5. Read replicas for friends leaderboard queries

Common System Design Interview Mistakes

1. Jumping to a Solution Without Requirements

Starting with "we need a load balancer" before clarifying the problem's constraints leads to over-engineering or wrong architecture. Always spend the first 5 minutes on requirements.

2. Ignoring Non-Functional Requirements

Designing for 100 users when the question asks for 100 million means your solution misses the point. Confirm numbers: DAU, write volume, read volume, latency requirements.

3. Not Discussing Trade-offs

Every design choice has trade-offs. Not acknowledging them shows shallow understanding. When you choose a technology, explain what you are giving up: "I chose Cassandra over Postgres because write scalability matters more here than complex queries."

4. Going Too Deep Too Early

Explaining sharding internals before drawing the high-level architecture wastes time. Start top-down: first show the box diagram, then dive into details.

5. Forgetting About Failure Modes

Systems fail. Database replicas lag. Caches evict. Networks partition. An architecture that assumes everything works perfectly is not a production design. Discuss: what happens when Redis goes down? How do you handle a database failover?

6. Not Using the Whiteboard Effectively

A messy, unlabeled diagram confuses the interviewer. Use clear labels, consistent symbols, and show data flow with arrows. Practice drawing clean diagrams.

7. Ignoring the "So What" Question

After explaining your design, the interviewer will ask follow-ups to test depth. Common follow-ups: "How do you handle 10x more traffic?", "What if this component fails?", "How do you reduce latency by 50%?"

Practice Questions

1. What is the CAP theorem and how does it apply to system design? CAP theorem states a distributed system can guarantee at most two of: Consistency (every read gets the latest write), Availability (every request gets a response), and Partition Tolerance (system works despite network failures). In practice, you choose between CP (Consistency + Partition) and AP (Availability + Partition).

2. What is the difference between vertical and horizontal scaling? Vertical scaling adds resources (CPU, RAM) to a single machine. Horizontal scaling adds more machines. Vertical has an upper limit and creates a single point of failure. Horizontal is theoretically infinite but requires handling distributed state.

3. When would you choose a NoSQL database over a SQL database? Choose NoSQL for: horizontal scalability, high write throughput, flexible schema, and simple query patterns. Choose SQL for: complex joins, transactions, strict consistency, and well-defined schema.

4. What is the cache-aside pattern and when should you use it? Cache-aside loads data into cache on demand (lazy loading). On read: check cache, on miss query DB and populate cache. Simple to implement, handles cache failures gracefully (system still works without cache). Best for read-heavy workloads with tolerable staleness.

5. Challenge: Design a distributed rate limiter that can handle 1 million requests per second across multiple data centers. Consider accuracy, latency, and fault tolerance.

Mini Project: Design a URL Shortener

Design a complete URL shortening service similar to TinyURL. Your design should include: API design for creating and resolving short URLs, data model (relational or NoSQL), hash function and collision strategy (Base62 encoding with unique ID), cache layer for frequently accessed URLs, database sharding strategy for scaling to billions of URLs, analytics tracking for click counts and referrer data, and a written explanation of all trade-offs made.

# Conceptual implementation of the core logic
import string
import hashlib

BASE62 = string.digits + string.ascii_lowercase + string.ascii_uppercase

def encode_base62(num):
    """Convert a number to Base62 string."""
    if num == 0:
        return BASE62[0]

    result = []
    while num > 0:
        result.append(BASE62[num % 62])
        num //= 62
    return "".join(reversed(result))

def generate_short_key(url, user_id):
    """Generate a short key using a unique ID (from a distributed ID generator)."""
    # In production, use a distributed unique ID service (Snowflake, etc.)
    unique_id = get_next_id()  # Imagine this returns a globally unique integer
    return encode_base62(unique_id)

# Example
short_key = generate_short_key("https://example.com/long-url", "user_42")
print(f"Short URL: https://short.ly/{short_key}")

Expected behavior: Every URL gets a unique short key. The key is 7 characters or fewer for billions of URLs. The resolver looks up the key in a distributed cache (Redis), falls back to the database, increments the click counter asynchronously, and redirects with a 302.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous 100 Days of Code — Structured Challenge Next → Mobile Developer Roadmap — Android and iOS Development Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Roadmaps