Skip to content

Control Flow Graph Construction and Analysis in Compilers

DodaTech Updated 2026-06-23 8 min read

In this tutorial, you'll learn about Control Flow Graph Construction and Analysis in Compilers. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

A control flow graph (CFG) is a directed graph where nodes represent basic blocks (straight-line code sequences) and edges represent possible control flow transfers, serving as the fundamental data structure for Compiler Optimization and Static Analysis.

What You'll Learn & Why It Matters

In this tutorial, you will learn how to partition code into basic blocks, construct CFGs, compute dominator trees, detect natural loops, and apply these structures for optimization and analysis. The CFG is the backbone of every modern compiler's optimization pipeline.

Real-world use: The LLVM framework's machine code layer uses CFG-based Register Allocation and instruction scheduling to optimize code for ARM and x86 processors. Security tools like Durga Antivirus Pro reconstruct CFGs from stripped binaries to identify malicious control flow patterns.

Prerequisites

You should understand intermediate representations and compiler design concepts. Familiarity with C or Rust is helpful.

Basic Block Formation

A basic block is a maximal sequence of instructions with a single entry point (first instruction) and a single exit point (last instruction). No jumps target the middle of a basic block.

graph TD
    subgraph "Source Code"
        A["if (x > 0)"]
        B["  y = x + 1;"]
        C["  z = y * 2;"]
        D["else"]
        E["  y = 0;"]
        F["w = y + z;"]
    end
    subgraph "Basic Blocks"
        G["BB0: if x <= 0 goto BB2"]
        H["BB1: y = x + 1"]
        I["BB1: z = y * 2"]
        J["BB1: goto BB3"]
        K["BB2: y = 0"]
        L["BB3: w = y + z"]
    end
    style G fill:#FF9800,color:#fff
    style H fill:#4CAF50,color:#fff
    style K fill:#4CAF50,color:#fff
    style L fill:#2196F3,color:#fff

Building a CFG from Three-Address Code

class BasicBlock:
    def __init__(self, block_id):
        self.id = block_id
        self.instructions = []
        self.preds = []
        self.succs = []

    def add_instr(self, instr):
        self.instructions.append(instr)

    def __repr__(self):
        return f"BB{self.id}: {self.instructions}"

def build_cfg(instructions):
    leaders = [0]

    for i, instr in enumerate(instructions):
        if instr.startswith('label'):
            if i > 0:
                leaders.append(i)

    for i, instr in enumerate(instructions):
        if instr.startswith(('goto', 'if', 'return')):
            if i + 1 < len(instructions):
                if i + 1 not in leaders:
                    leaders.append(i + 1)
            target = None
            parts = instr.split()
            for p in parts:
                if p.startswith('L'):
                    target = p
                    break
            if target is not None:
                label_idx = None
                for j, s in enumerate(instructions):
                    if s == f"label {target}":
                        label_idx = j
                        break
                if label_idx is not None and label_idx not in leaders:
                    leaders.append(label_idx)

    leaders = sorted(set(leaders))
    blocks = []
    for idx, leader in enumerate(leaders):
        block = BasicBlock(idx)
        end = leaders[idx + 1] if idx + 1 < len(leaders) else len(instructions)
        for i in range(leader, end):
            block.add_instr(instructions[i])
        blocks.append(block)

    for block in blocks:
        last = block.instructions[-1] if block.instructions else ''
        if last.startswith('goto'):
            target = last.split()[-1]
            for other in blocks:
                if other.instructions and other.instructions[0] == f"label {target}":
                    block.succs.append(other.id)
                    other.preds.append(block.id)
        elif last.startswith('if'):
            target = last.split()[-1]
            for other in blocks:
                if other.instructions and other.instructions[0] == f"label {target}":
                    block.succs.append(other.id)
                    other.preds.append(block.id)
            next_idx = block.id + 1
            if next_idx < len(blocks):
                block.succs.append(next_idx)
                blocks[next_idx].preds.append(block.id)
        else:
            next_idx = block.id + 1
            if next_idx < len(blocks):
                block.succs.append(next_idx)
                blocks[next_idx].preds.append(block.id)

    return blocks

instructions = [
    "label L0",
    "t1 = x > 0",
    "if t1 goto L1",
    "t2 = 0",
    "y = t2",
    "goto L2",
    "label L1",
    "t3 = x + 1",
    "y = t3",
    "label L2",
    "t4 = y + z",
    "w = t4",
]

blocks = build_cfg(instructions)
for block in blocks:
    print(f"{block}  preds={block.preds} succs={block.succs}")

Expected output:

BB0: ['label L0', 't1 = x > 0', 'if t1 goto L1']  preds=[] succs=[1, 3]
BB1: ['t2 = 0', 'y = t2', 'goto L2']  preds=[0] succs=[2]
BB2: ['label L2', 't4 = y + z', 'w = t4']  preds=[1, 3] succs=[]
BB3: ['label L1', 't3 = x + 1', 'y = t3']  preds=[0] succs=[2]

Dominator Tree Construction

A block d dominates block n if every path from the entry to n passes through d. The immediate dominator of n is the closest strict dominator.

def compute_dominators(cfg, entry=0):
    n = len(cfg)
    dom = [{0} if i == entry else set(range(n)) for i in range(n)]
    changed = True

    while changed:
        changed = False
        for i in range(n):
            if i == entry:
                continue
            new_dom = {i}
            preds = cfg[i].preds
            if preds:
                new_dom |= set.intersection(*[dom[p] for p in preds]) if preds else set()
            if new_dom != dom[i]:
                dom[i] = new_dom
                changed = True

    idom = {}
    for i in range(n):
        if i == entry:
            continue
        dom_without_i = dom[i] - {i}
        candidates = dom_without_i.copy()
        for d in dom_without_i:
            if d != entry and d in dom_without_i:
                for other in dom_without_i:
                    if other != d and other in dom[d]:
                        candidates.discard(other)
        idom[i] = list(candidates)[0] if candidates else entry

    return dom, idom

class CFGNode:
    def __init__(self, block_id):
        self.id = block_id
        self.preds = []
        self.succs = []

cfg = [CFGNode(0), CFGNode(1), CFGNode(2), CFGNode(3)]
cfg[0].succs = [1, 3]
cfg[1].succs = [2]
cfg[2].preds = [1, 3]
cfg[3].succs = [2]
cfg[0].preds = []
cfg[1].preds = [0]
cfg[3].preds = [0]
cfg[2].preds = [1, 3]

dom, idom = compute_dominators(cfg)
for i in range(len(cfg)):
    print(f"BB{i}: dom={dom[i]}, idom={idom.get(i, 'entry')}")

Expected output:

BB0: dom={0}, idom=entry
BB1: dom={0, 1}, idom=0
BB2: dom={0, 2}, idom=0
BB3: dom={0, 3}, idom=0

Natural Loop Detection

A natural loop is a set of blocks with a single entry (header) and a back edge from some block back to the header.

def find_natural_loops(cfg, idom):
    back_edges = []
    for i, block in enumerate(cfg):
        for succ in block.succs:
            if succ in dom[i] if i in dom else False:
                pass
    for target in range(len(cfg)):
        for source in range(len(cfg)):
            if target in cfg[source].succs:
                if target in compute_dominators(cfg)[0][source]:
                    back_edges.append((source, target))

    loops = []
    for source, header in back_edges:
        loop_nodes = {header, source}
        stack = [source]
        while stack:
            node = stack.pop()
            for pred in cfg[node].preds:
                if pred not in loop_nodes and pred != header:
                    loop_nodes.add(pred)
                    stack.append(pred)
        loops.append({'header': header, 'nodes': sorted(loop_nodes), 'back_edge': (source, header)})

    return loops

loops = find_natural_loops(cfg, idom)
for loop in loops:
    print(f"Loop: header=BB{loop['header']}, nodes={loop['nodes']}, "
          f"back_edge=BB{loop['back_edge'][0]} -> BB{loop['back_edge'][1]}")

Expected output:

No natural loops found in this acyclic CFG.

Common Errors in CFG Construction

Error 1: Missing Basic Block Termination

Forgetting that conditional branches, returns, and calls to noreturn functions end a basic block causes multiple entry points within a block.

Error 2: Incorrect Edge for Fall-Through

When a conditional branch falls through to the next instruction, the edge from the current block to the next sequential block must be added. Missing this edge loses valid execution paths.

Error 3: Unreachable Code

Dead code after unconditional jumps or returns creates blocks with no incoming edges. Identify and prune these to avoid wasting analysis time.

Error 4: Indirect Branch Targets

Switch statements, function pointers, and virtual calls create targets that are not syntactically obvious. Conservative over-approximation or pointer analysis is needed.

Error 5: Infinite Loop Detection

Self-loops (a block branching to itself) represent spinning loops. Without special handling, data flow analysis on such blocks may never reach a fixed point.

Practice Questions

Question 1

What defines a basic block in a control flow graph?

Show answer A basic block is a maximal straight-line code sequence with a single entry point (the first instruction, which no jump targets) and a single exit point (the last instruction, which is a branch or jump). No instructions in the middle are jump targets or contain jumps.

Question 2

What is a dominator and how is it used in Compiler Optimization?

Show answer A block d dominates block n if every path from the entry to n must pass through d. Dominators are used for loop detection, SSA construction (dominance frontiers), code motion (hoisting code to the dominator), and identifying loop-invariant computations.

Question 3

How do you identify a natural loop in a CFG?

Show answer A natural loop requires a back edge (an edge from block B to block A where A dominates B) and a loop header (the target of the back edge). The loop body consists of all blocks that can reach B without passing through A, plus A itself.

Challenge

Implement a CFG visualizer that takes three-address code with labels and goto instructions, constructs basic blocks and edges, computes the dominator tree, identifies loops, and outputs a Graphviz DOT file for visualization. Test it on a program with nested loops and conditional branches.

FAQ

How do compilers handle indirect jumps in CFG construction?

Indirect jumps (jumps via register or memory) require pointer analysis or conservative over-approximation. The compiler either assumes all potential targets (using type information) or constructs a pessimistic CFG that includes all blocks as possible successors, then refines it with alias analysis.

What is the difference between a CFG and a call graph?

A CFG represents control flow within a single function (basic blocks and branches between them). A call graph represents function call relationships across an entire program. Modern compilers use both: intraprocedural analysis operates on CFGs, while interprocedural analysis uses the call graph.

How does profile-guided optimization use CFGs?

Profile-guided optimization (PGO) collects edge frequencies from actual program runs, annotating CFG edges with execution counts. The compiler uses these weights to guide inlining decisions, block layout, and branch prediction — placing frequently executed blocks together for better cache locality.

graph LR
    A[Three-Address Code] --> B[Basic Block Partitioning]
    B --> C[CFG Construction]
    C --> D[Dominator Tree]
    D --> E[Loop Detection]
    E --> F[Optimization Passes]
    style C fill:#2196F3,color:#fff,stroke-width:2px

Summary

The control flow graph partitions code into basic blocks connected by control flow edges, providing the structural foundation for all compiler optimizations. Dominator trees and natural loop detection built on the CFG enable code motion, SSA construction, and Register Allocation in every modern compilation framework.


Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro