Skip to content

Abstract Syntax Trees — Representing Program Structure

DodaTech Updated 2026-06-21 6 min read

In this tutorial, you'll learn about Abstract Syntax Trees. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Abstract syntax trees are hierarchical data structures that represent the essential structure of a program, removing syntactic sugar and punctuation while preserving the logical relationships between operations, declarations, and control flow.

What You'll Learn & Why It Matters

In this tutorial, you will learn what ASTs are, how they differ from concrete parse trees, how to build and traverse them, and why every compiler phase depends on them. ASTs are the universal currency of program analysis, used in linters, formatters, type checkers, and security analyzers.

Real-world use: Browsers build DOM trees from HTML using AST-like structures. Tools like ESLint and Prettier work directly on ASTs. Security scanners including Durga Antivirus Pro use AST analysis to detect malicious code patterns that evade regex-based detection.

Prerequisites

You should understand Parsing from the syntax analysis tutorial. Familiarity with JavaScript and Python object-oriented programming is required. Tree traversal algorithms (DFS, BFS) are helpful background knowledge.

Parse Trees vs ASTs

A parse tree (concrete syntax tree) includes every token from the source: keywords, parentheses, semicolons, and braces. An AST strips away this syntactic scaffolding, keeping only what matters for Semantic Analysis and Code Generation.

Source: x = 3 + 4;

Parse Tree:
        STMT
       /  |  \
    ASSIGN ;  EOF
   /   |   \
  x    =   EXPR
          /  |  \
         3   +   4

AST:
    ASSIGN
   /      \
  x       ADD
         /   \
        3     4

The AST is smaller, simpler, and focuses on meaning rather than syntax.

Building an AST

Step 1: Define AST Node Types

class ASTNode:
    pass

class NumNode(ASTNode):
    def __init__(self, value):
        self.value = value

class VarNode(ASTNode):
    def __init__(self, name):
        self.name = name

class BinOpNode(ASTNode):
    def __init__(self, left, op, right):
        self.left = left
        self.op = op
        self.right = right

class AssignNode(ASTNode):
    def __init__(self, name, value):
        self.name = name
        self.value = value

class PrintNode(ASTNode):
    def __init__(self, expr):
        self.expr = expr

class BlockNode(ASTNode):
    def __init__(self, statements):
        self.statements = statements

Step 2: Modify the Parser to Build ASTs

class ASTParser:
    def __init__(self, tokens):
        self.tokens = tokens
        self.pos = 0

    def parse_expression(self):
        left = self.parse_term()
        while self.pos < len(self.tokens) and self.tokens[self.pos][0] == "OPERATOR":
            op = self.tokens[self.pos][1]
            if op not in ("+", "-"):
                break
            self.pos += 1
            right = self.parse_term()
            left = BinOpNode(left, op, right)
        return left

    def parse_term(self):
        left = self.parse_factor()
        while self.pos < len(self.tokens) and self.tokens[self.pos][0] == "OPERATOR":
            op = self.tokens[self.pos][1]
            if op not in ("*", "/"):
                break
            self.pos += 1
            right = self.parse_factor()
            left = BinOpNode(left, op, right)
        return left

    def parse_factor(self):
        if self.pos >= len(self.tokens):
            raise SyntaxError("Unexpected end of input")
        token_type, token_value = self.tokens[self.pos]
        if token_type == "NUMBER":
            self.pos += 1
            return NumNode(int(token_value))
        if token_type == "IDENTIFIER":
            self.pos += 1
            return VarNode(token_value)
        if token_value == "(":
            self.pos += 1
            expr = self.parse_expression()
            if self.pos < len(self.tokens) and self.tokens[self.pos][1] == ")":
                self.pos += 1
                return expr
            raise SyntaxError("Missing closing parenthesis")
        raise SyntaxError(f"Unexpected token: {token_value}")
graph TD
    subgraph "AST for x = 3 + 4 * 5"
        A[ASSIGN] --> V[VarNode: x]
        A --> B[BinOp: +]
        B --> N1[NumNode: 3]
        B --> BM[BinOp: *]
        BM --> N2[NumNode: 4]
        BM --> N3[NumNode: 5]
    end
    style A fill:#4CAF50,color:#fff
    style V fill:#2196F3,color:#fff
    style B fill:#FF9800,color:#fff
    style BM fill:#FF9800,color:#fff

Step 3: AST Visitors

The visitor pattern separates traversal logic from node processing. This allows multiple operations on the same AST structure.

class ASTVisitor:
    def visit(self, node):
        method_name = f"visit_{type(node).__name__}"
        visitor = getattr(self, method_name, self.generic_visit)
        return visitor(node)

    def generic_visit(self, node):
        for attr in vars(node).values():
            if isinstance(attr, ASTNode):
                self.visit(attr)
            elif isinstance(attr, list):
                for item in attr:
                    if isinstance(item, ASTNode):
                        self.visit(item)

class PrintVisitor(ASTVisitor):
    def visit_NumNode(self, node):
        return str(node.value)

    def visit_VarNode(self, node):
        return node.name

    def visit_BinOpNode(self, node):
        left = self.visit(node.left)
        right = self.visit(node.right)
        return f"({left} {node.op} {right})"

    def visit_AssignNode(self, node):
        value = self.visit(node.value)
        return f"{node.name} = {value}"

    def visit_PrintNode(self, node):
        expr = self.visit(node.expr)
        return f"print({expr})"

    def visit_BlockNode(self, node):
        stmts = [self.visit(s) for s in node.statements]
        return "\n".join(stmts)

Testing the visitor:

tokens = [("IDENTIFIER", "x"), ("OPERATOR", "="), ("NUMBER", "3"),
          ("OPERATOR", "+"), ("NUMBER", "4"), ("SEPARATOR", ";")]
parser = ASTParser(tokens)
parser.parse_expression()
ast = AssignNode("x", BinOpNode(NumNode(3), "+", NumNode(4)))
visitor = PrintVisitor()
result = visitor.visit(ast)
print(result)

Expected output:

x = (3 + 4)

AST Optimizations

ASTs can be transformed directly before lowering to IR:

class FoldConstantsVisitor(ASTVisitor):
    def visit_BinOpNode(self, node):
        left = self.visit(node.left)
        right = self.visit(node.right)
        if isinstance(left, NumNode) and isinstance(right, NumNode):
            if node.op == "+":
                return NumNode(left.value + right.value)
            if node.op == "*":
                return NumNode(left.value * right.value)
        return BinOpNode(left, node.op, right)

Common Errors in AST Construction

Error 1: Including Syntactic Noise

Keeping semicolons, parentheses, and braces in the AST bloats the structure. AST nodes should represent program semantics, not concrete syntax.

Error 2: Flat Statement Lists

Treating blocks as flat lists loses scope information. Each block should be a distinct node type so analyses can track variable scopes correctly.

Error 3: No Visitor Pattern

Embedding analysis logic directly in node classes violates open-closed principles. The visitor pattern keeps node classes clean and supports adding new operations.

Some analyses need to walk upward from child to parent. Store parent references (or compute them during traversal) for these use cases.

Error 5: Mutable AST During Analysis

Modifying the AST while analyzing it causes inconsistent states. Use immutable transformations or explicit copy-on-write patterns.

Practice Questions

Question 1

What information does a parse tree contain that an AST omits?

Show answer Parse trees contain keywords, punctuation (semicolons, parentheses, braces), and other syntactic elements that are not needed for Semantic Analysis or Code Generation.

Question 2

Why is the visitor pattern commonly used with ASTs?

Show answer The visitor pattern separates the AST structure from the operations performed on it, allowing new operations (Type Checking, Code Generation, optimization) without modifying node classes.

Question 3

How does constant folding transform an AST?

Show answer Constant folding replaces subexpressions composed entirely of constants with their computed value. For example, `3 + 4` becomes `7`, reducing work during Code Generation.

Question 4

What is an AST node for a variable declaration?

Show answer A typical `VarDecl` node contains the variable name, type annotation, and optional initializer expression. It may also store scope and Symbol Table information.

Question 5

How do ASTs handle operator precedence?

Show answer Operator precedence is encoded in the tree structure. Higher-precedence operators appear lower in the tree (deeper nesting), ensuring they are evaluated first during traversal.

Challenge

Build an AST for a small JavaScript subset that includes function declarations with parameters, if-else statements, and while loops. Implement a visitor that counts the total number of nodes in the AST and another that computes the maximum nesting depth.

FAQ

What is the difference between an AST and a CST?

A concrete syntax tree (CST) or parse tree includes every token from the source, while an AST abstracts away syntactic details like parentheses, semicolons, and grouping. The AST preserves only the essential semantic information.

Can the same AST represent different source languages?

Yes. Languages with similar semantics can share AST structures. This is how language-agnostic tools like LLVM work: different front ends produce ASTs, which all lower to the same IR.

How are ASTs serialized?

ASTs are serialized as JSON or XML for tool communication. Many linters export ASTs as JSON for custom analysis. JSON Schema can validate AST structures across tools.

What is the maximum practical AST depth?

Most compilers impose a maximum nesting depth (typically 256-512 levels) to prevent stack overflows during recursive traversal. Deeply nested expressions beyond this are rejected.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro