Skip to content

Intermediate Representations — Three-Address Code and SSA

DodaTech Updated 2026-06-21 7 min read

In this tutorial, you'll learn about Intermediate Representations. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Intermediate representations are language-independent and machine-independent program representations that sit between the compiler front end and back end, enabling retargetability and architecture-neutral optimization across multiple source languages and target platforms.

What You'll Learn & Why It Matters

In this tutorial, you will learn the two most important IR forms — three-address code (TAC) and static single assignment (SSA) form — and understand why every modern compiler uses an IR to separate language concerns from machine concerns. This separation is what allows GCC to support 50+ backends and LLVM to support 30+ front ends.

Real-world use: Android apps written in Kotlin, Java, and C++ all compile through LLVM IR to produce device-specific machine code. Durga Antivirus Pro uses IR-level analysis to detect malicious patterns across different binary formats without rewriting detection logic for each format.

Prerequisites

You should understand the compiler pipeline from the compiler overview tutorial. Familiarity with AST concepts from the AST tutorial is essential. Basic knowledge of assembly language concepts helps but is not required.

Three-Address Code

Three-address code represents each operation as a quadruple: result = operand1 operator operand2. Each TAC instruction has at most one operator and at most three operands (hence the name). Temporary variables hold intermediate results.

graph LR
    subgraph "AST for x = a + b * c"
        AST[ASSIGN] --> V[x]
        AST --> ADD[+]
        ADD --> A[a]
        ADD --> MUL[*]
        MUL --> B[b]
        MUL --> C[c]
    end
    subgraph "TAC Translation"
        T1[t1 = b * c]
        T2[t2 = a + t1]
        T3[x = t2]
    end
    style AST fill:#4CAF50,color:#fff
    style ADD fill:#FF9800,color:#fff
    style MUL fill:#FF9800,color:#fff

IR Instruction Types

Type Example Meaning
Binary t1 = a + b Arithmetic operation
Unary t1 = -x Negation
Copy x = t1 Value assignment
Load t1 = *p Memory read
Store *p = t1 Memory write
Branch if x > 0 goto L1 Conditional jump
Jump goto L1 Unconditional jump
Call t1 = call f(a, b) Function invocation
Return return t1 Function return

Generating TAC from AST

class TACGenerator:
    def __init__(self):
        self.temporaries = 0
        self.instructions = []

    def new_temp(self):
        self.temporaries += 1
        return f"t{self.temporaries}"

    def generate(self, node):
        method = f"gen_{type(node).__name__}"
        generator = getattr(self, method, self.default_gen)
        return generator(node)

    def gen_NumNode(self, node):
        temp = self.new_temp()
        self.instructions.append(f"{temp} = {node.value}")
        return temp

    def gen_VarNode(self, node):
        temp = self.new_temp()
        self.instructions.append(f"{temp} = {node.name}")
        return temp

    def gen_BinOpNode(self, node):
        left = self.generate(node.left)
        right = self.generate(node.right)
        temp = self.new_temp()
        self.instructions.append(f"{temp} = {left} {node.op} {right}")
        return temp

    def gen_AssignNode(self, node):
        val = self.generate(node.value)
        self.instructions.append(f"{node.name} = {val}")
        return val

    def default_gen(self, node):
        for attr in vars(node).values():
            if hasattr(attr, "__dict__"):
                self.generate(attr)

gen = TACGenerator()
gen.generate(AssignNode("x", BinOpNode(VarNode("a"), "+", BinOpNode(VarNode("b"), "*", VarNode("c")))))
for instr in gen.instructions:
    print(instr)

Expected output:

t1 = c
t2 = b
t3 = t2 * t1
t4 = a
t5 = t4 + t3
t6 = x
x = t5

Static Single Assignment Form

SSA form is a refinement of TAC where every variable is assigned exactly once, and every use of a variable refers to a single definition. This is achieved by:

  1. Renaming variables at each assignment (giving each a unique version)
  2. Inserting phi (φ) functions at join points in control flow
TAC:                SSA:
x = 1               x1 = 1
x = x + 1           x2 = x1 + 1
y = x * 2           y1 = x2 * 2

Phi Functions

At control flow join points, φ functions select the correct value based on which path was taken:

class SSAConverter:
    def __init__(self, tac_instructions, cfg):
        self.instrs = tac_instructions
        self.cfg = cfg
        self.versions = {}
        self.renamed = []

    def fresh_version(self, var):
        if var not in self.versions:
            self.versions[var] = 0
        self.versions[var] += 1
        return f"{var}_{self.versions[var]}"

    def convert(self):
        var_versions = {}
        for instr in self.instrs:
            if " = " in instr:
                target, source = instr.split(" = ", 1)
                for var in self._find_vars(source):
                    if var in var_versions:
                        source = source.replace(var, f"{var}_{var_versions[var]}")
                new_ver = self.fresh_version(target)
                var_versions[target] = self.versions[target]
                self.renamed.append(f"{new_ver} = {source}")
        return self.renamed

    def _find_vars(self, expr):
        return [w for w in expr.split() if w.isalpha() and w not in ("goto", "call", "return", "if")]

Benefits of SSA

SSA form simplifies many compiler optimizations:

Optimization How SSA Helps
Dead code elimination A variable is dead if it is never used after its definition
Constant propagation Each variable has exactly one definition point
Loop-invariant code motion Values defined outside loops are easy to identify
Register Allocation Live ranges are explicit and disjoint
Value numbering Same expression with same SSA versions must produce same result

Common Errors in IR Design

Error 1: Too High-Level or Too Low-Level

An IR that is too close to the source language inhibits optimization. An IR that is too close to machine code prevents retargetability. LLVM IR strikes the right balance with explicit control flow and infinite virtual registers.

Error 2: Overly Verbose Instructions

Each IR instruction should represent exactly one operation. Combining multiple operations in one instruction complicates analysis and optimization passes.

Error 3: Ignoring Control Flow

Optimizations require accurate control flow graphs. Missing edges (fall-through, exception paths) cause incorrect transformations.

Error 4: Phi Node Placement

Inserting φ functions at every block junction bloats the IR. Use dominance frontier algorithms to place φ functions only where needed.

Error 5: Type Information Loss

Discarding type information when lowering to IR prevents type-based alias analysis. Preserve type annotations in the IR even after Semantic Analysis.

Practice Questions

Question 1

Why is three-address code called three-address code?

Show answer Each instruction has at most three operands (two source operands and one destination), representing a single operation like `t1 = a + b`. The operands are addresses of variables or temporaries.

Question 2

What problem does SSA form solve?

Show answer SSA form ensures each variable is assigned exactly once, making data flow explicit. This simplifies optimization because each use uniquely identifies its defining instruction without iterative data-flow analysis.

Question 3

What is a phi function in SSA?

Show answer A phi function at a control flow join point selects which variable version to use based on the preceding block. Written as `x3 = (x1, x2)`, it means x3 gets x1 if coming from block A and x2 if coming from block B.

Question 4

How does IR enable retargetable compilers?

Show answer The IR serves as a common interface: all front ends produce the same IR, and all back ends consume the same IR. Adding a new language requires only a front end; adding a new target requires only a back end.

Question 5

What is a basic block in IR?

Show answer A basic block is a sequence of IR instructions with a single entry point (first instruction) and a single exit point (last instruction). No jumps into or out of the middle of a block are allowed.

Challenge

Implement an SSA construction algorithm that takes a list of TAC instructions with a control flow graph, inserts φ functions at dominance frontier join points, and renames variables to produce valid SSA form. Test with an if-else branch and a loop.

FAQ

What is the difference between high-level IR and low-level IR?

High-level IR (HIR) preserves more source language structure (loops, arrays, types). Low-level IR (LIR) exposes machine details (registers, memory layout). Many compilers use multiple IR levels, progressively lowering the representation.

Why does LLVM use an infinite set of virtual registers?

Virtual registers simplify Code Generation and optimization because there is no pressure to reuse registers. Physical Register Allocation happens later, when the virtual registers are mapped to actual machine registers.

Can IR be executed directly?

Some IRs have interpreters for debugging and testing. LLVM has an Interpreter (lli) that executes IR without compiling to machine code. This is useful for testing optimizations.

How do compilers handle IR for dynamic languages?

Dynamic language compilers (like V8 for JavaScript) use IRs that can represent dynamic types and inline caches. They often use multiple IR tiers: a fast-to-generate low-level IR for startup and an optimized IR for hot code.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro