Compiler Optimization Passes — Constant Folding, Inlining and Loop Unrolling

DodaTech Updated 2026-06-21 8 min read

Compiler Optimization passes transform programs to improve execution speed, reduce memory usage, or decrease power consumption while preserving the program's observable behavior, forming the core of modern compiler backends.

What You'll Learn & Why It Matters

In this tutorial, you will learn the most important Compiler Optimization techniques, how they work at the IR level, and how they interact with each other. Optimizations are why well-written C code often matches hand-tuned assembly in performance.

Real-world use: Firefox's JavaScript engine uses over 30 optimization passes to make web apps run at near-native speed. Durga Antivirus Pro uses optimization-aware decompilation to reverse-engineer obfuscated malware that tries to hide through dead code insertion.

Prerequisites

You should understand intermediate representations from the IR tutorial. Knowledge of the code generation tutorial helps. Familiarity with C programming or Python is assumed.

Categories of Optimizations

Compiler optimizations fall into several categories:

Category	Examples	Benefit
Constant propagation	`x = 5 * 3` becomes `x = 15`	Eliminates runtime computation
Dead code elimination	Remove unused variables and unreachable code	Smaller, faster binaries
Loop transformations	Unrolling, invariant code motion	Better cache and pipeline usage
Inlining	Replace function call with function body	Eliminates call overhead
Value numbering	Replace redundant computations	Reuses previous results

graph TD
    A[Raw IR] --> B[Constant Folding]
    B --> C[Dead Code Elimination]
    C --> D[Loop Invariant Code Motion]
    D --> E[Function Inlining]
    E --> F[Register Allocation]
    F --> G[Optimized IR]

    B --> H{Repeat until stable}
    H --> B

    style A fill:#4CAF50,color:#fff
    style G fill:#2196F3,color:#fff
    style B fill:#FF9800,color:#fff
    style C fill:#FF9800,color:#fff
    style D fill:#FF9800,color:#fff
    style E fill:#FF9800,color:#fff
    style F fill:#FF9800,color:#fff

Constant Folding and Propagation

Constant folding evaluates constant expressions at compile time. Constant propagation replaces variables known to hold constant values with those values.

class ConstantFolder:
    def fold(self, ir_instructions):
        constants = {}
        optimized = []
        for instr in ir_instructions:
            parts = instr.split(" = ", 1)
            if len(parts) == 2:
                target, expr = parts
                folded = self.fold_expr(expr, constants)
                if self.is_constant(folded):
                    constants[target] = folded
                    optimized.append(f"{target} = {folded}")
                else:
                    optimized.append(f"{target} = {folded}")
            else:
                optimized.append(instr)
        return optimized

    def fold_expr(self, expr, constants):
        tokens = expr.split()
        for i, token in enumerate(tokens):
            if token in constants:
                tokens[i] = constants[token]
        if len(tokens) == 3 and tokens[0].isdigit() and tokens[2].isdigit():
            left, op, right = int(tokens[0]), tokens[1], int(tokens[2])
            if op == "+":
                return str(left + right)
            if op == "*":
                return str(left * right)
            if op == "-":
                return str(left - right)
        return " ".join(tokens)

    def is_constant(self, expr):
        return expr.lstrip("-").isdigit() or expr == "0"

ir = ["x = 5", "y = 3", "z = x + y", "w = z * 2"]
folder = ConstantFolder()
result = folder.fold(ir)
for line in result:
    print(line)

Expected output:

x = 5
y = 3
z = 8
w = 16

Dead Code Elimination

Dead code elimination (DCE) removes instructions whose results are never used. A variable is dead if it is defined but never referenced before being redefined.

class DeadCodeEliminator:
    def eliminate(self, ir_instructions):
        used = set()
        for instr in ir_instructions:
            parts = instr.split(" = ", 1)
            if len(parts) == 2:
                for var in self.find_vars(parts[1]):
                    used.add(var)
        alive = []
        for instr in ir_instructions:
            parts = instr.split(" = ", 1)
            if len(parts) == 2:
                if parts[0] in used or "call" in parts[1] or "return" in parts[1]:
                    alive.append(instr)
            else:
                alive.append(instr)
        return alive

    def find_vars(self, expr):
        return {w for w in expr.split() if w.isalpha()}

ir = ["t1 = 5", "t2 = t1 + 3", "x = t2", "t3 = 99", "return x"]
dce = DeadCodeEliminator()
result = dce.eliminate(ir)
for line in result:
    print(line)

Expected output:

t1 = 5
t2 = t1 + 3
x = t2
return x

Function Inlining

Inlining replaces a function call with the body of the function, eliminating call overhead and enabling further optimizations across the function boundary.

class Inliner:
    def inline(self, ir, functions):
        optimized = []
        for instr in ir:
            if "call" in instr:
                parts = instr.split()
                callee = parts[2] if len(parts) > 2 else ""
                if callee in functions:
                    body = functions[callee]
                    for line in body:
                        optimized.append(line)
                else:
                    optimized.append(instr)
            else:
                optimized.append(instr)
        return optimized

functions = {"square": ["t1 = x", "t2 = t1 * t1", "return t2"]}
ir = ["a = 5", "b = call square(a)", "c = b + 1"]
inliner = Inliner()
result = inliner.inline(ir, functions)
for line in result:
    print(line)

Expected output:

a = 5
t1 = a
t2 = t1 * t1
b = t2
c = b + 1

Loop Unrolling

Loop unrolling reduces loop overhead by duplicating the loop body multiple times, reducing branch mispredictions and enabling instruction-level parallelism.

class LoopUnroller:
    def unroll(self, ir, loop_start, loop_end, factor=4):
        body = ir[loop_start + 1:loop_end]
        unrolled = ir[:loop_start + 1]
        for i in range(factor):
            for instr in body:
                unrolled.append(instr.replace("#i", str(i)))
        unrolled.extend(ir[loop_end + 1:])
        return unrolled

loop_ir = [
    "i = 0",
    "label: L1",
    "t1 = A + i",
    "t2 = load t1",
    "t3 = t2 * 2",
    "i = i + 1",
    "if i < 4 goto L1",
    "result = 0",
]
unroller = LoopUnroller()
result = unroller.unroll(loop_ir, 1, 6, 4)
for line in result:
    print(line)

Expected output (abbreviated):

i = 0
label: L1
t1 = A + 0
t2 = load t1
t3 = t2 * 2
t1 = A + 1
t2 = load t1
t3 = t2 * 2
t1 = A + 2
t2 = load t1
t3 = t2 * 2
t1 = A + 3
t2 = load t1
t3 = t2 * 2
result = 0

Optimization Pipeline Design

Optimizations interact in complex ways. Compilers run optimization passes in phases, often repeating passes until no further improvements are possible (fixpoint iteration).

class OptimizationPipeline:
    def __init__(self):
        self.passes = [ConstantFolder, DeadCodeEliminator, Inliner, LoopUnroller]

    def run(self, ir, iterations=3):
        for _ in range(iterations):
            for opt_class in self.passes:
                opt = opt_class()
                if hasattr(opt, "fold"):
                    ir = opt.fold(ir)
                elif hasattr(opt, "eliminate"):
                    ir = opt.eliminate(ir)
                elif hasattr(opt, "inline"):
                    ir = opt.inline(ir, {})
                elif hasattr(opt, "unroll"):
                    ir = opt.unroll(ir, 0, len(ir), 2)
        return ir

Common Errors in Optimization

Error 1: Changing Program Semantics

Optimizations must preserve observable behavior. Removing a division by zero or eliminating a null check because it seems redundant changes the program's meaning.

Error 2: Infinite Optimization Loops

Passes that introduce new optimization opportunities must guarantee termination. The compiler should detect when no further progress is possible (fixpoint).

Error 3: Over-Optimization

Aggressive inlining can cause code bloat, destroying instruction cache locality. Always consider code size vs speed tradeoffs.

Error 4: Incorrect Alias Analysis

Assuming Two Pointers do not alias when they might leads to incorrect reordering. Conservative alias analysis is safer but leaves performance on the table.

Error 5: Ignoring Floating-Point Semantics

Floating-point arithmetic is not associative: (a + b) + c differs from a + (b + c). Reordering floating-point operations changes results.

Practice Questions

Question 1

How does constant folding differ from constant propagation?

Show answer

Constant folding evaluates constant expressions at compile time (e.g., `3 + 4` becomes `7`). Constant propagation tracks variables with known constant values and substitutes them at use sites.

Question 2

Why is dead code elimination important?

Show answer

Dead code elimination reduces binary size, improves instruction cache usage, shortens compilation time, and removes unnecessary computations that waste CPU cycles.

Question 3

What is loop invariant code motion?

Show answer

Loop invariant code motion moves computations that produce the same result on every iteration (like constant-index array accesses) outside the loop body, so they execute only once instead of every iteration.

Question 4

What is the tradeoff of function inlining?

Show answer

Inlining eliminates call overhead (saving push/pop instructions and branch mispredictions) but increases code size, which can hurt instruction cache performance. The compiler must balance these factors.

Question 5

What is a peephole optimization?

Show answer

A peephole optimization examines a small Sliding Window of instructions and replaces inefficient patterns with better ones, such as replacing `mov eax, 0; add eax, 5` with `mov eax, 5`.

Challenge

Implement a peephole optimizer that detects and eliminates redundant load-store sequences in generated assembly: if a value is loaded from memory and immediately stored back unchanged, remove both instructions. Test with a sequence containing at least three such patterns intermixed with useful instructions.

FAQ

What is the difference between -O2 and -O3 in GCC?

-O3 enables all -O2 optimizations plus more aggressive techniques like function inlining, loop unrolling, and vectorization. -O3 can produce faster code but may increase binary size and compile time.

Can optimizations make code slower?

Yes. Over-inlining causes code bloat, loop unrolling wastes instruction cache, and aggressive speculative execution can introduce pipeline stalls. Compilers use heuristics and cost models to avoid harmful transformations.

What is link-time optimization (LTO)?

LTO delays optimization until link time, when all object files are visible. This enables cross-module inlining, dead code elimination across compilation units, and whole-program optimization that is impossible with separate compilation.

How do optimizations interact with debugging?

Optimized code is harder to debug because variables may be eliminated, instructions reordered, and call sites inlined. Compilers support -Og (optimize for debugging) which applies only safe transformations that preserve debug information.

← Previous Code Generation — From IR to Machine Code Next → Register Allocation — Graph Coloring and Linear Scan

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Compiler Design