Intermediate Representations — Three-Address Code and SSA
In this tutorial, you'll learn about Intermediate Representations. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Intermediate representations are language-independent and machine-independent program representations that sit between the compiler front end and back end, enabling retargetability and architecture-neutral optimization across multiple source languages and target platforms.
What You'll Learn & Why It Matters
In this tutorial, you will learn the two most important IR forms — three-address code (TAC) and static single assignment (SSA) form — and understand why every modern compiler uses an IR to separate language concerns from machine concerns. This separation is what allows GCC to support 50+ backends and LLVM to support 30+ front ends.
Real-world use: Android apps written in Kotlin, Java, and C++ all compile through LLVM IR to produce device-specific machine code. Durga Antivirus Pro uses IR-level analysis to detect malicious patterns across different binary formats without rewriting detection logic for each format.
Prerequisites
You should understand the compiler pipeline from the compiler overview tutorial. Familiarity with AST concepts from the AST tutorial is essential. Basic knowledge of assembly language concepts helps but is not required.
Three-Address Code
Three-address code represents each operation as a quadruple: result = operand1 operator operand2. Each TAC instruction has at most one operator and at most three operands (hence the name). Temporary variables hold intermediate results.
graph LR
subgraph "AST for x = a + b * c"
AST[ASSIGN] --> V[x]
AST --> ADD[+]
ADD --> A[a]
ADD --> MUL[*]
MUL --> B[b]
MUL --> C[c]
end
subgraph "TAC Translation"
T1[t1 = b * c]
T2[t2 = a + t1]
T3[x = t2]
end
style AST fill:#4CAF50,color:#fff
style ADD fill:#FF9800,color:#fff
style MUL fill:#FF9800,color:#fff
IR Instruction Types
| Type | Example | Meaning |
|---|---|---|
| Binary | t1 = a + b |
Arithmetic operation |
| Unary | t1 = -x |
Negation |
| Copy | x = t1 |
Value assignment |
| Load | t1 = *p |
Memory read |
| Store | *p = t1 |
Memory write |
| Branch | if x > 0 goto L1 |
Conditional jump |
| Jump | goto L1 |
Unconditional jump |
| Call | t1 = call f(a, b) |
Function invocation |
| Return | return t1 |
Function return |
Generating TAC from AST
class TACGenerator:
def __init__(self):
self.temporaries = 0
self.instructions = []
def new_temp(self):
self.temporaries += 1
return f"t{self.temporaries}"
def generate(self, node):
method = f"gen_{type(node).__name__}"
generator = getattr(self, method, self.default_gen)
return generator(node)
def gen_NumNode(self, node):
temp = self.new_temp()
self.instructions.append(f"{temp} = {node.value}")
return temp
def gen_VarNode(self, node):
temp = self.new_temp()
self.instructions.append(f"{temp} = {node.name}")
return temp
def gen_BinOpNode(self, node):
left = self.generate(node.left)
right = self.generate(node.right)
temp = self.new_temp()
self.instructions.append(f"{temp} = {left} {node.op} {right}")
return temp
def gen_AssignNode(self, node):
val = self.generate(node.value)
self.instructions.append(f"{node.name} = {val}")
return val
def default_gen(self, node):
for attr in vars(node).values():
if hasattr(attr, "__dict__"):
self.generate(attr)
gen = TACGenerator()
gen.generate(AssignNode("x", BinOpNode(VarNode("a"), "+", BinOpNode(VarNode("b"), "*", VarNode("c")))))
for instr in gen.instructions:
print(instr)
Expected output:
t1 = c
t2 = b
t3 = t2 * t1
t4 = a
t5 = t4 + t3
t6 = x
x = t5
Static Single Assignment Form
SSA form is a refinement of TAC where every variable is assigned exactly once, and every use of a variable refers to a single definition. This is achieved by:
- Renaming variables at each assignment (giving each a unique version)
- Inserting phi (φ) functions at join points in control flow
TAC: SSA:
x = 1 x1 = 1
x = x + 1 x2 = x1 + 1
y = x * 2 y1 = x2 * 2
Phi Functions
At control flow join points, φ functions select the correct value based on which path was taken:
class SSAConverter:
def __init__(self, tac_instructions, cfg):
self.instrs = tac_instructions
self.cfg = cfg
self.versions = {}
self.renamed = []
def fresh_version(self, var):
if var not in self.versions:
self.versions[var] = 0
self.versions[var] += 1
return f"{var}_{self.versions[var]}"
def convert(self):
var_versions = {}
for instr in self.instrs:
if " = " in instr:
target, source = instr.split(" = ", 1)
for var in self._find_vars(source):
if var in var_versions:
source = source.replace(var, f"{var}_{var_versions[var]}")
new_ver = self.fresh_version(target)
var_versions[target] = self.versions[target]
self.renamed.append(f"{new_ver} = {source}")
return self.renamed
def _find_vars(self, expr):
return [w for w in expr.split() if w.isalpha() and w not in ("goto", "call", "return", "if")]
Benefits of SSA
SSA form simplifies many compiler optimizations:
| Optimization | How SSA Helps |
|---|---|
| Dead code elimination | A variable is dead if it is never used after its definition |
| Constant propagation | Each variable has exactly one definition point |
| Loop-invariant code motion | Values defined outside loops are easy to identify |
| Register Allocation | Live ranges are explicit and disjoint |
| Value numbering | Same expression with same SSA versions must produce same result |
Common Errors in IR Design
Error 1: Too High-Level or Too Low-Level
An IR that is too close to the source language inhibits optimization. An IR that is too close to machine code prevents retargetability. LLVM IR strikes the right balance with explicit control flow and infinite virtual registers.
Error 2: Overly Verbose Instructions
Each IR instruction should represent exactly one operation. Combining multiple operations in one instruction complicates analysis and optimization passes.
Error 3: Ignoring Control Flow
Optimizations require accurate control flow graphs. Missing edges (fall-through, exception paths) cause incorrect transformations.
Error 4: Phi Node Placement
Inserting φ functions at every block junction bloats the IR. Use dominance frontier algorithms to place φ functions only where needed.
Error 5: Type Information Loss
Discarding type information when lowering to IR prevents type-based alias analysis. Preserve type annotations in the IR even after Semantic Analysis.
Practice Questions
Question 1
Why is three-address code called three-address code?
Show answer
Each instruction has at most three operands (two source operands and one destination), representing a single operation like `t1 = a + b`. The operands are addresses of variables or temporaries.Question 2
What problem does SSA form solve?
Show answer
SSA form ensures each variable is assigned exactly once, making data flow explicit. This simplifies optimization because each use uniquely identifies its defining instruction without iterative data-flow analysis.Question 3
What is a phi function in SSA?
Show answer
A phi function at a control flow join point selects which variable version to use based on the preceding block. Written as `x3 = (x1, x2)`, it means x3 gets x1 if coming from block A and x2 if coming from block B.Question 4
How does IR enable retargetable compilers?
Show answer
The IR serves as a common interface: all front ends produce the same IR, and all back ends consume the same IR. Adding a new language requires only a front end; adding a new target requires only a back end.Question 5
What is a basic block in IR?
Show answer
A basic block is a sequence of IR instructions with a single entry point (first instruction) and a single exit point (last instruction). No jumps into or out of the middle of a block are allowed.Challenge
Implement an SSA construction algorithm that takes a list of TAC instructions with a control flow graph, inserts φ functions at dominance frontier join points, and renames variables to produce valid SSA form. Test with an if-else branch and a loop.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro