Compiler Optimization Passes — Constant Folding, Inlining and Loop Unrolling
Compiler Optimization passes transform programs to improve execution speed, reduce memory usage, or decrease power consumption while preserving the program's observable behavior, forming the core of modern compiler backends.
What You'll Learn & Why It Matters
In this tutorial, you will learn the most important Compiler Optimization techniques, how they work at the IR level, and how they interact with each other. Optimizations are why well-written C code often matches hand-tuned assembly in performance.
Real-world use: Firefox's JavaScript engine uses over 30 optimization passes to make web apps run at near-native speed. Durga Antivirus Pro uses optimization-aware decompilation to reverse-engineer obfuscated malware that tries to hide through dead code insertion.
Prerequisites
You should understand intermediate representations from the IR tutorial. Knowledge of the code generation tutorial helps. Familiarity with C programming or Python is assumed.
Categories of Optimizations
Compiler optimizations fall into several categories:
| Category | Examples | Benefit |
|---|---|---|
| Constant propagation | x = 5 * 3 becomes x = 15 |
Eliminates runtime computation |
| Dead code elimination | Remove unused variables and unreachable code | Smaller, faster binaries |
| Loop transformations | Unrolling, invariant code motion | Better cache and pipeline usage |
| Inlining | Replace function call with function body | Eliminates call overhead |
| Value numbering | Replace redundant computations | Reuses previous results |
graph TD
A[Raw IR] --> B[Constant Folding]
B --> C[Dead Code Elimination]
C --> D[Loop Invariant Code Motion]
D --> E[Function Inlining]
E --> F[Register Allocation]
F --> G[Optimized IR]
B --> H{Repeat until stable}
H --> B
style A fill:#4CAF50,color:#fff
style G fill:#2196F3,color:#fff
style B fill:#FF9800,color:#fff
style C fill:#FF9800,color:#fff
style D fill:#FF9800,color:#fff
style E fill:#FF9800,color:#fff
style F fill:#FF9800,color:#fff
Constant Folding and Propagation
Constant folding evaluates constant expressions at compile time. Constant propagation replaces variables known to hold constant values with those values.
class ConstantFolder:
def fold(self, ir_instructions):
constants = {}
optimized = []
for instr in ir_instructions:
parts = instr.split(" = ", 1)
if len(parts) == 2:
target, expr = parts
folded = self.fold_expr(expr, constants)
if self.is_constant(folded):
constants[target] = folded
optimized.append(f"{target} = {folded}")
else:
optimized.append(f"{target} = {folded}")
else:
optimized.append(instr)
return optimized
def fold_expr(self, expr, constants):
tokens = expr.split()
for i, token in enumerate(tokens):
if token in constants:
tokens[i] = constants[token]
if len(tokens) == 3 and tokens[0].isdigit() and tokens[2].isdigit():
left, op, right = int(tokens[0]), tokens[1], int(tokens[2])
if op == "+":
return str(left + right)
if op == "*":
return str(left * right)
if op == "-":
return str(left - right)
return " ".join(tokens)
def is_constant(self, expr):
return expr.lstrip("-").isdigit() or expr == "0"
ir = ["x = 5", "y = 3", "z = x + y", "w = z * 2"]
folder = ConstantFolder()
result = folder.fold(ir)
for line in result:
print(line)
Expected output:
x = 5
y = 3
z = 8
w = 16
Dead Code Elimination
Dead code elimination (DCE) removes instructions whose results are never used. A variable is dead if it is defined but never referenced before being redefined.
class DeadCodeEliminator:
def eliminate(self, ir_instructions):
used = set()
for instr in ir_instructions:
parts = instr.split(" = ", 1)
if len(parts) == 2:
for var in self.find_vars(parts[1]):
used.add(var)
alive = []
for instr in ir_instructions:
parts = instr.split(" = ", 1)
if len(parts) == 2:
if parts[0] in used or "call" in parts[1] or "return" in parts[1]:
alive.append(instr)
else:
alive.append(instr)
return alive
def find_vars(self, expr):
return {w for w in expr.split() if w.isalpha()}
ir = ["t1 = 5", "t2 = t1 + 3", "x = t2", "t3 = 99", "return x"]
dce = DeadCodeEliminator()
result = dce.eliminate(ir)
for line in result:
print(line)
Expected output:
t1 = 5
t2 = t1 + 3
x = t2
return x
Function Inlining
Inlining replaces a function call with the body of the function, eliminating call overhead and enabling further optimizations across the function boundary.
class Inliner:
def inline(self, ir, functions):
optimized = []
for instr in ir:
if "call" in instr:
parts = instr.split()
callee = parts[2] if len(parts) > 2 else ""
if callee in functions:
body = functions[callee]
for line in body:
optimized.append(line)
else:
optimized.append(instr)
else:
optimized.append(instr)
return optimized
functions = {"square": ["t1 = x", "t2 = t1 * t1", "return t2"]}
ir = ["a = 5", "b = call square(a)", "c = b + 1"]
inliner = Inliner()
result = inliner.inline(ir, functions)
for line in result:
print(line)
Expected output:
a = 5
t1 = a
t2 = t1 * t1
b = t2
c = b + 1
Loop Unrolling
Loop unrolling reduces loop overhead by duplicating the loop body multiple times, reducing branch mispredictions and enabling instruction-level parallelism.
class LoopUnroller:
def unroll(self, ir, loop_start, loop_end, factor=4):
body = ir[loop_start + 1:loop_end]
unrolled = ir[:loop_start + 1]
for i in range(factor):
for instr in body:
unrolled.append(instr.replace("#i", str(i)))
unrolled.extend(ir[loop_end + 1:])
return unrolled
loop_ir = [
"i = 0",
"label: L1",
"t1 = A + i",
"t2 = load t1",
"t3 = t2 * 2",
"i = i + 1",
"if i < 4 goto L1",
"result = 0",
]
unroller = LoopUnroller()
result = unroller.unroll(loop_ir, 1, 6, 4)
for line in result:
print(line)
Expected output (abbreviated):
i = 0
label: L1
t1 = A + 0
t2 = load t1
t3 = t2 * 2
t1 = A + 1
t2 = load t1
t3 = t2 * 2
t1 = A + 2
t2 = load t1
t3 = t2 * 2
t1 = A + 3
t2 = load t1
t3 = t2 * 2
result = 0
Optimization Pipeline Design
Optimizations interact in complex ways. Compilers run optimization passes in phases, often repeating passes until no further improvements are possible (fixpoint iteration).
class OptimizationPipeline:
def __init__(self):
self.passes = [ConstantFolder, DeadCodeEliminator, Inliner, LoopUnroller]
def run(self, ir, iterations=3):
for _ in range(iterations):
for opt_class in self.passes:
opt = opt_class()
if hasattr(opt, "fold"):
ir = opt.fold(ir)
elif hasattr(opt, "eliminate"):
ir = opt.eliminate(ir)
elif hasattr(opt, "inline"):
ir = opt.inline(ir, {})
elif hasattr(opt, "unroll"):
ir = opt.unroll(ir, 0, len(ir), 2)
return ir
Common Errors in Optimization
Error 1: Changing Program Semantics
Optimizations must preserve observable behavior. Removing a division by zero or eliminating a null check because it seems redundant changes the program's meaning.
Error 2: Infinite Optimization Loops
Passes that introduce new optimization opportunities must guarantee termination. The compiler should detect when no further progress is possible (fixpoint).
Error 3: Over-Optimization
Aggressive inlining can cause code bloat, destroying instruction cache locality. Always consider code size vs speed tradeoffs.
Error 4: Incorrect Alias Analysis
Assuming Two Pointers do not alias when they might leads to incorrect reordering. Conservative alias analysis is safer but leaves performance on the table.
Error 5: Ignoring Floating-Point Semantics
Floating-point arithmetic is not associative: (a + b) + c differs from a + (b + c). Reordering floating-point operations changes results.
Practice Questions
Question 1
How does constant folding differ from constant propagation?
Show answer
Constant folding evaluates constant expressions at compile time (e.g., `3 + 4` becomes `7`). Constant propagation tracks variables with known constant values and substitutes them at use sites.Question 2
Why is dead code elimination important?
Show answer
Dead code elimination reduces binary size, improves instruction cache usage, shortens compilation time, and removes unnecessary computations that waste CPU cycles.Question 3
What is loop invariant code motion?
Show answer
Loop invariant code motion moves computations that produce the same result on every iteration (like constant-index array accesses) outside the loop body, so they execute only once instead of every iteration.Question 4
What is the tradeoff of function inlining?
Show answer
Inlining eliminates call overhead (saving push/pop instructions and branch mispredictions) but increases code size, which can hurt instruction cache performance. The compiler must balance these factors.Question 5
What is a peephole optimization?
Show answer
A peephole optimization examines a small Sliding Window of instructions and replaces inefficient patterns with better ones, such as replacing `mov eax, 0; add eax, 5` with `mov eax, 5`.Challenge
Implement a peephole optimizer that detects and eliminates redundant load-store sequences in generated assembly: if a value is loaded from memory and immediately stored back unchanged, remove both instructions. Test with a sequence containing at least three such patterns intermixed with useful instructions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro