Register Allocation — Graph Coloring and Linear Scan

DodaTech Updated 2026-06-21 7 min read

Register Allocation is the Process of mapping the unlimited virtual registers used in intermediate code to the finite physical registers available on the target CPU, directly determining the performance of generated code.

What You'll Learn & Why It Matters

In this tutorial, you will learn the two most important Register Allocation algorithms — graph coloring and linear scan — and understand how they balance allocation quality against compilation speed. Efficient Register Allocation is the single most impactful optimization for most programs.

Real-world use: Durga Antivirus Pro uses Register Allocation analysis during malware unpacking to reconstruct original instruction sequences that packers have intentionally obfuscated through register reassignment.

Prerequisites

You should understand Code Generation from the code generation tutorial. Familiarity with IR concepts from the IR tutorial is required. Graph theory basics (nodes, edges, coloring) help for the graph coloring algorithm.

Why Register Allocation Matters

CPU registers are 100-500 times faster than main memory. Every value kept in a register instead of memory represents a dramatic speedup. However, CPUs have very few registers:

Architecture	General-Purpose Registers
x86-64	16 (rax, rbx, rcx, rdx, rsi, rdi, r8-r15)
ARM64	31 (x0-x30)
RISC-V	32 (x0-x31)

A typical program may have hundreds of live virtual registers during compilation. The allocator must fit all active values into these few physical registers.

graph TD
    subgraph "Before Allocation"
        V1[Virtual Registers: t1, t2, t3, t4, t5, t6, ...]
    end
    subgraph "After Allocation"
        P1[Physical: eax]
        P2[Physical: ebx]
        P3[Physical: ecx]
        P4[Physical: edx]
        P5[Stack: spilled values]
    end
    V1 --> P1
    V1 --> P2
    V1 --> P3
    V1 --> P4
    V1 --> P5
    style V1 fill:#4CAF50,color:#fff
    style P1 fill:#2196F3,color:#fff
    style P2 fill:#2196F3,color:#fff
    style P3 fill:#2196F3,color:#fff
    style P4 fill:#2196F3,color:#fff
    style P5 fill:#f44336,color:#fff

Live Range Analysis

Before allocation, the compiler computes live ranges: the set of instructions where each virtual register holds a value that will be used later.

class LiveRangeAnalyzer:
    def __init__(self, instructions):
        self.instructions = instructions
        self.live_ranges = {}

    def analyze(self):
        for idx, instr in enumerate(self.instructions):
            defined = self._get_defined(instr)
            used = self._get_used(instr)
            for var in defined:
                if var not in self.live_ranges:
                    self.live_ranges[var] = {"start": idx, "end": idx}
                else:
                    self.live_ranges[var]["end"] = idx
            for var in used:
                if var in self.live_ranges:
                    self.live_ranges[var]["end"] = idx

    def _get_defined(self, instr):
        if " = " in instr:
            return [instr.split(" = ")[0]]
        return []

    def _get_used(self, instr):
        if " = " in instr:
            rhs = instr.split(" = ")[1]
            return [w for w in rhs.split() if w.startswith("t")]
        return []

    def overlaps(self, var1, var2):
        r1 = self.live_ranges.get(var1)
        r2 = self.live_ranges.get(var2)
        if not r1 or not r2:
            return False
        return r1["start"] <= r2["end"] and r2["start"] <= r1["end"]

instructions = [
    "t1 = 5", "t2 = 3", "t3 = t1 + t2",
    "t4 = t3 * 2", "x = t4", "t5 = 10", "y = t5",
]
analyzer = LiveRangeAnalyzer(instructions)
analyzer.analyze()
for var, rng in analyzer.live_ranges.items():
    print(f"{var}: live {rng['start']}-{rng['end']}")

Expected output:

t1: live 0-2
t2: live 1-2
t3: live 2-3
t4: live 3-4
t5: live 5-6

Graph Coloring Algorithm

Graph coloring treats Register Allocation as a graph coloring problem. Each virtual register is a node; edges connect nodes with overlapping live ranges; colors represent physical registers.

class GraphColoringAllocator:
    def __init__(self, num_registers=4):
        self.num_registers = num_registers
        self.colors = {}

    def build_interference_graph(self, liveness):
        graph = {}
        for var1 in liveness.live_ranges:
            graph[var1] = set()
            for var2 in liveness.live_ranges:
                if var1 != var2 and liveness.overlaps(var1, var2):
                    graph[var1].add(var2)
        return graph

    def allocate(self, liveness):
        graph = self.build_interference_graph(liveness)
        reg_names = [f"r{i}" for i in range(self.num_registers)]
        spill_list = []
        for var in sorted(graph.keys(), key=lambda v: len(graph[v]), reverse=True):
            used_colors = set()
            for neighbor in graph[var]:
                if neighbor in self.colors:
                    used_colors.add(self.colors[neighbor])
            assigned = False
            for reg in reg_names:
                if reg not in used_colors:
                    self.colors[var] = reg
                    assigned = True
                    break
            if not assigned:
                spill_list.append(var)
        return self.colors, spill_list

allocator = GraphColoringAllocator(4)
colors, spills = allocator.allocate(analyzer)
for var, reg in colors.items():
    print(f"{var} -> {reg}")
if spills:
    print(f"Spilled to memory: {spills}")

Expected output:

t1 -> r0
t2 -> r1
t3 -> r2
t4 -> r3
t5 -> r0

Linear Scan Algorithm

Linear scan is a simpler, faster alternative to graph coloring, commonly used in JIT compilers.

class LinearScanAllocator:
    def __init__(self, num_registers=4):
        self.num_registers = num_registers

    def allocate(self, liveness):
        intervals = sorted(liveness.live_ranges.items(),
                          key=lambda x: x[1]["start"])
        allocation = {}
        active = []

        for var, interval in intervals:
            active = [(e, v, r) for e, v, r in active if e > interval["start"]]

            if len(active) < self.num_registers:
                used_regs = {r for _, _, r in active}
                for reg in range(self.num_registers):
                    if reg not in used_regs:
                        allocation[var] = f"r{reg}"
                        active.append((interval["end"], var, reg))
                        break
            else:
                furthest = max(active, key=lambda x: x[0])
                if furthest[0] > interval["end"]:
                    active.remove(furthest)
                    allocation[furthest[1]] = "spill"
                    allocation[var] = f"r{furthest[2]}"
                    active.append((interval["end"], var, furthest[2]))
                else:
                    allocation[var] = "spill"

        return allocation

ls_allocator = LinearScanAllocator(4)
ls_result = ls_allocator.allocate(analyzer)
for var, reg in sorted(ls_result.items()):
    print(f"{var} -> {reg}")

Expected output:

t1 -> r0
t2 -> r1
t3 -> r2
t4 -> r3
t5 -> r0

Spilling and Rewriting

When allocation fails, the compiler spills values to memory:

def rewrite_with_spills(instructions, allocation):
    rewritten = []
    for instr in instructions:
        parts = instr.split(" = ", 1)
        if len(parts) == 2:
            target, expr = parts
            if allocation.get(target) == "spill":
                rewritten.append(instr)
                rewritten.append(f"  [spill_{target}] = {target}")
            else:
                for var in [w for w in expr.split() if allocation.get(w) == "spill"]:
                    rewritten.append(f"  {var} = [spill_{var}]")
                rewritten.append(instr)
        else:
            rewritten.append(instr)
    return rewritten

Common Errors in Register Allocation

Error 1: Incorrect Live Range Calculation

Missing a use site causes premature register freeing, leading to incorrect code. Always compute live ranges conservatively.

Error 2: Callee-Saved Register Mismanagement

Caller-saved registers must be saved by the caller. Callee-saved registers must be saved by the callee. The allocator must respect the target ABI's calling convention.

Error 3: Ignoring Fixed Registers

Some registers have special purposes (stack pointer, frame pointer, return address). These must be excluded from the allocator's pool.

Error 4: Inefficient Spill Code

Spilling a value to memory and reloading it immediately wastes cycles. The allocator should minimize spill code by choosing the least-cost spill candidate.

Error 5: No Rematerialization

Computing a value is often cheaper than loading it from memory. Rematerialization recomputes spilled values instead of reloading them, trading computation for memory access.

Practice Questions

Question 1

What is an interference graph in Register Allocation?

Show answer

An interference graph has a node for each virtual register and edges between registers whose live ranges overlap. Two registers that interfere cannot share the same physical register.

Question 2

What happens during register spilling?

Show answer

When no physical register is available, the compiler selects a virtual register to spill (store to memory). The spilled value is loaded back before each use and stored after each definition.

Question 3

How does linear scan differ from graph coloring?

Show answer

Linear scan is faster (O(n log n) vs exponential worst-case for graph coloring) but produces slightly worse allocations. Linear scan is commonly used in JIT compilers where compilation speed matters.

Question 4

What is a calling convention?

Show answer

A calling convention defines which registers are caller-saved (must be saved by the caller), which are callee-saved (must be restored by the callee), and how function arguments and return values are passed.

Question 5

Why is coalescing important in Register Allocation?

Show answer

Coalescing merges related live ranges (like copy instructions) so they can share the same register, eliminating unnecessary register-to-register copy instructions.

Challenge

Implement a register allocator that handles both graph coloring and spilling with a cost model. The allocator should select the cheapest variable to spill based on spill cost (number of uses inside loops weighted by loop depth) and interference degree. Test on a function with 10+ live variables but only 4 physical registers.

FAQ

Can Register Allocation be done optimally?

Optimal Register Allocation is NP-complete (reducible to graph coloring). All practical allocators use heuristics. However, for small programs with few registers, optimal allocation is achievable through integer linear programming.

What is SSA-based Register Allocation?

SSA form simplifies allocation because each variable has exactly one definition. SSA-based allocators can use the SSA graph directly instead of building an interference graph, leading to faster and sometimes better allocation.

How do JIT compilers handle Register Allocation?

JIT compilers favor linear scan for its speed (O(n log n)). V8, HotSpot, and .NET's RyuJIT all use variants of linear scan allocation to keep compilation pauses short.

What is register rematerialization?

Instead of spilling a value to memory and reloading it, rematerialization recomputes the value from its operands. This is beneficial when the computation is cheaper than a memory load (e.g., loading a constant or adding two known values).

← Previous Compiler Optimization Passes — Constant Folding, Inlining and Loop Unrolling Next → Type Checking — Static, Dynamic and Hindley-Milner

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Compiler Design