Register Allocation — Graph Coloring and Linear Scan
Register Allocation is the Process of mapping the unlimited virtual registers used in intermediate code to the finite physical registers available on the target CPU, directly determining the performance of generated code.
What You'll Learn & Why It Matters
In this tutorial, you will learn the two most important Register Allocation algorithms — graph coloring and linear scan — and understand how they balance allocation quality against compilation speed. Efficient Register Allocation is the single most impactful optimization for most programs.
Real-world use: Durga Antivirus Pro uses Register Allocation analysis during malware unpacking to reconstruct original instruction sequences that packers have intentionally obfuscated through register reassignment.
Prerequisites
You should understand Code Generation from the code generation tutorial. Familiarity with IR concepts from the IR tutorial is required. Graph theory basics (nodes, edges, coloring) help for the graph coloring algorithm.
Why Register Allocation Matters
CPU registers are 100-500 times faster than main memory. Every value kept in a register instead of memory represents a dramatic speedup. However, CPUs have very few registers:
| Architecture | General-Purpose Registers |
|---|---|
| x86-64 | 16 (rax, rbx, rcx, rdx, rsi, rdi, r8-r15) |
| ARM64 | 31 (x0-x30) |
| RISC-V | 32 (x0-x31) |
A typical program may have hundreds of live virtual registers during compilation. The allocator must fit all active values into these few physical registers.
graph TD
subgraph "Before Allocation"
V1[Virtual Registers: t1, t2, t3, t4, t5, t6, ...]
end
subgraph "After Allocation"
P1[Physical: eax]
P2[Physical: ebx]
P3[Physical: ecx]
P4[Physical: edx]
P5[Stack: spilled values]
end
V1 --> P1
V1 --> P2
V1 --> P3
V1 --> P4
V1 --> P5
style V1 fill:#4CAF50,color:#fff
style P1 fill:#2196F3,color:#fff
style P2 fill:#2196F3,color:#fff
style P3 fill:#2196F3,color:#fff
style P4 fill:#2196F3,color:#fff
style P5 fill:#f44336,color:#fff
Live Range Analysis
Before allocation, the compiler computes live ranges: the set of instructions where each virtual register holds a value that will be used later.
class LiveRangeAnalyzer:
def __init__(self, instructions):
self.instructions = instructions
self.live_ranges = {}
def analyze(self):
for idx, instr in enumerate(self.instructions):
defined = self._get_defined(instr)
used = self._get_used(instr)
for var in defined:
if var not in self.live_ranges:
self.live_ranges[var] = {"start": idx, "end": idx}
else:
self.live_ranges[var]["end"] = idx
for var in used:
if var in self.live_ranges:
self.live_ranges[var]["end"] = idx
def _get_defined(self, instr):
if " = " in instr:
return [instr.split(" = ")[0]]
return []
def _get_used(self, instr):
if " = " in instr:
rhs = instr.split(" = ")[1]
return [w for w in rhs.split() if w.startswith("t")]
return []
def overlaps(self, var1, var2):
r1 = self.live_ranges.get(var1)
r2 = self.live_ranges.get(var2)
if not r1 or not r2:
return False
return r1["start"] <= r2["end"] and r2["start"] <= r1["end"]
instructions = [
"t1 = 5", "t2 = 3", "t3 = t1 + t2",
"t4 = t3 * 2", "x = t4", "t5 = 10", "y = t5",
]
analyzer = LiveRangeAnalyzer(instructions)
analyzer.analyze()
for var, rng in analyzer.live_ranges.items():
print(f"{var}: live {rng['start']}-{rng['end']}")
Expected output:
t1: live 0-2
t2: live 1-2
t3: live 2-3
t4: live 3-4
t5: live 5-6
Graph Coloring Algorithm
Graph coloring treats Register Allocation as a graph coloring problem. Each virtual register is a node; edges connect nodes with overlapping live ranges; colors represent physical registers.
class GraphColoringAllocator:
def __init__(self, num_registers=4):
self.num_registers = num_registers
self.colors = {}
def build_interference_graph(self, liveness):
graph = {}
for var1 in liveness.live_ranges:
graph[var1] = set()
for var2 in liveness.live_ranges:
if var1 != var2 and liveness.overlaps(var1, var2):
graph[var1].add(var2)
return graph
def allocate(self, liveness):
graph = self.build_interference_graph(liveness)
reg_names = [f"r{i}" for i in range(self.num_registers)]
spill_list = []
for var in sorted(graph.keys(), key=lambda v: len(graph[v]), reverse=True):
used_colors = set()
for neighbor in graph[var]:
if neighbor in self.colors:
used_colors.add(self.colors[neighbor])
assigned = False
for reg in reg_names:
if reg not in used_colors:
self.colors[var] = reg
assigned = True
break
if not assigned:
spill_list.append(var)
return self.colors, spill_list
allocator = GraphColoringAllocator(4)
colors, spills = allocator.allocate(analyzer)
for var, reg in colors.items():
print(f"{var} -> {reg}")
if spills:
print(f"Spilled to memory: {spills}")
Expected output:
t1 -> r0
t2 -> r1
t3 -> r2
t4 -> r3
t5 -> r0
Linear Scan Algorithm
Linear scan is a simpler, faster alternative to graph coloring, commonly used in JIT compilers.
class LinearScanAllocator:
def __init__(self, num_registers=4):
self.num_registers = num_registers
def allocate(self, liveness):
intervals = sorted(liveness.live_ranges.items(),
key=lambda x: x[1]["start"])
allocation = {}
active = []
for var, interval in intervals:
active = [(e, v, r) for e, v, r in active if e > interval["start"]]
if len(active) < self.num_registers:
used_regs = {r for _, _, r in active}
for reg in range(self.num_registers):
if reg not in used_regs:
allocation[var] = f"r{reg}"
active.append((interval["end"], var, reg))
break
else:
furthest = max(active, key=lambda x: x[0])
if furthest[0] > interval["end"]:
active.remove(furthest)
allocation[furthest[1]] = "spill"
allocation[var] = f"r{furthest[2]}"
active.append((interval["end"], var, furthest[2]))
else:
allocation[var] = "spill"
return allocation
ls_allocator = LinearScanAllocator(4)
ls_result = ls_allocator.allocate(analyzer)
for var, reg in sorted(ls_result.items()):
print(f"{var} -> {reg}")
Expected output:
t1 -> r0
t2 -> r1
t3 -> r2
t4 -> r3
t5 -> r0
Spilling and Rewriting
When allocation fails, the compiler spills values to memory:
def rewrite_with_spills(instructions, allocation):
rewritten = []
for instr in instructions:
parts = instr.split(" = ", 1)
if len(parts) == 2:
target, expr = parts
if allocation.get(target) == "spill":
rewritten.append(instr)
rewritten.append(f" [spill_{target}] = {target}")
else:
for var in [w for w in expr.split() if allocation.get(w) == "spill"]:
rewritten.append(f" {var} = [spill_{var}]")
rewritten.append(instr)
else:
rewritten.append(instr)
return rewritten
Common Errors in Register Allocation
Error 1: Incorrect Live Range Calculation
Missing a use site causes premature register freeing, leading to incorrect code. Always compute live ranges conservatively.
Error 2: Callee-Saved Register Mismanagement
Caller-saved registers must be saved by the caller. Callee-saved registers must be saved by the callee. The allocator must respect the target ABI's calling convention.
Error 3: Ignoring Fixed Registers
Some registers have special purposes (stack pointer, frame pointer, return address). These must be excluded from the allocator's pool.
Error 4: Inefficient Spill Code
Spilling a value to memory and reloading it immediately wastes cycles. The allocator should minimize spill code by choosing the least-cost spill candidate.
Error 5: No Rematerialization
Computing a value is often cheaper than loading it from memory. Rematerialization recomputes spilled values instead of reloading them, trading computation for memory access.
Practice Questions
Question 1
What is an interference graph in Register Allocation?
Show answer
An interference graph has a node for each virtual register and edges between registers whose live ranges overlap. Two registers that interfere cannot share the same physical register.Question 2
What happens during register spilling?
Show answer
When no physical register is available, the compiler selects a virtual register to spill (store to memory). The spilled value is loaded back before each use and stored after each definition.Question 3
How does linear scan differ from graph coloring?
Show answer
Linear scan is faster (O(n log n) vs exponential worst-case for graph coloring) but produces slightly worse allocations. Linear scan is commonly used in JIT compilers where compilation speed matters.Question 4
What is a calling convention?
Show answer
A calling convention defines which registers are caller-saved (must be saved by the caller), which are callee-saved (must be restored by the callee), and how function arguments and return values are passed.Question 5
Why is coalescing important in Register Allocation?
Show answer
Coalescing merges related live ranges (like copy instructions) so they can share the same register, eliminating unnecessary register-to-register copy instructions.Challenge
Implement a register allocator that handles both graph coloring and spilling with a cost model. The allocator should select the cheapest variable to spill based on spill cost (number of uses inside loops weighted by loop depth) and interference degree. Test on a function with 10+ live variables but only 4 physical registers.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro