Abstract Syntax Trees — Representing Program Structure
In this tutorial, you'll learn about Abstract Syntax Trees. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Abstract syntax trees are hierarchical data structures that represent the essential structure of a program, removing syntactic sugar and punctuation while preserving the logical relationships between operations, declarations, and control flow.
What You'll Learn & Why It Matters
In this tutorial, you will learn what ASTs are, how they differ from concrete parse trees, how to build and traverse them, and why every compiler phase depends on them. ASTs are the universal currency of program analysis, used in linters, formatters, type checkers, and security analyzers.
Real-world use: Browsers build DOM trees from HTML using AST-like structures. Tools like ESLint and Prettier work directly on ASTs. Security scanners including Durga Antivirus Pro use AST analysis to detect malicious code patterns that evade regex-based detection.
Prerequisites
You should understand Parsing from the syntax analysis tutorial. Familiarity with JavaScript and Python object-oriented programming is required. Tree traversal algorithms (DFS, BFS) are helpful background knowledge.
Parse Trees vs ASTs
A parse tree (concrete syntax tree) includes every token from the source: keywords, parentheses, semicolons, and braces. An AST strips away this syntactic scaffolding, keeping only what matters for Semantic Analysis and Code Generation.
Source: x = 3 + 4;
Parse Tree:
STMT
/ | \
ASSIGN ; EOF
/ | \
x = EXPR
/ | \
3 + 4
AST:
ASSIGN
/ \
x ADD
/ \
3 4
The AST is smaller, simpler, and focuses on meaning rather than syntax.
Building an AST
Step 1: Define AST Node Types
class ASTNode:
pass
class NumNode(ASTNode):
def __init__(self, value):
self.value = value
class VarNode(ASTNode):
def __init__(self, name):
self.name = name
class BinOpNode(ASTNode):
def __init__(self, left, op, right):
self.left = left
self.op = op
self.right = right
class AssignNode(ASTNode):
def __init__(self, name, value):
self.name = name
self.value = value
class PrintNode(ASTNode):
def __init__(self, expr):
self.expr = expr
class BlockNode(ASTNode):
def __init__(self, statements):
self.statements = statements
Step 2: Modify the Parser to Build ASTs
class ASTParser:
def __init__(self, tokens):
self.tokens = tokens
self.pos = 0
def parse_expression(self):
left = self.parse_term()
while self.pos < len(self.tokens) and self.tokens[self.pos][0] == "OPERATOR":
op = self.tokens[self.pos][1]
if op not in ("+", "-"):
break
self.pos += 1
right = self.parse_term()
left = BinOpNode(left, op, right)
return left
def parse_term(self):
left = self.parse_factor()
while self.pos < len(self.tokens) and self.tokens[self.pos][0] == "OPERATOR":
op = self.tokens[self.pos][1]
if op not in ("*", "/"):
break
self.pos += 1
right = self.parse_factor()
left = BinOpNode(left, op, right)
return left
def parse_factor(self):
if self.pos >= len(self.tokens):
raise SyntaxError("Unexpected end of input")
token_type, token_value = self.tokens[self.pos]
if token_type == "NUMBER":
self.pos += 1
return NumNode(int(token_value))
if token_type == "IDENTIFIER":
self.pos += 1
return VarNode(token_value)
if token_value == "(":
self.pos += 1
expr = self.parse_expression()
if self.pos < len(self.tokens) and self.tokens[self.pos][1] == ")":
self.pos += 1
return expr
raise SyntaxError("Missing closing parenthesis")
raise SyntaxError(f"Unexpected token: {token_value}")
graph TD
subgraph "AST for x = 3 + 4 * 5"
A[ASSIGN] --> V[VarNode: x]
A --> B[BinOp: +]
B --> N1[NumNode: 3]
B --> BM[BinOp: *]
BM --> N2[NumNode: 4]
BM --> N3[NumNode: 5]
end
style A fill:#4CAF50,color:#fff
style V fill:#2196F3,color:#fff
style B fill:#FF9800,color:#fff
style BM fill:#FF9800,color:#fff
Step 3: AST Visitors
The visitor pattern separates traversal logic from node processing. This allows multiple operations on the same AST structure.
class ASTVisitor:
def visit(self, node):
method_name = f"visit_{type(node).__name__}"
visitor = getattr(self, method_name, self.generic_visit)
return visitor(node)
def generic_visit(self, node):
for attr in vars(node).values():
if isinstance(attr, ASTNode):
self.visit(attr)
elif isinstance(attr, list):
for item in attr:
if isinstance(item, ASTNode):
self.visit(item)
class PrintVisitor(ASTVisitor):
def visit_NumNode(self, node):
return str(node.value)
def visit_VarNode(self, node):
return node.name
def visit_BinOpNode(self, node):
left = self.visit(node.left)
right = self.visit(node.right)
return f"({left} {node.op} {right})"
def visit_AssignNode(self, node):
value = self.visit(node.value)
return f"{node.name} = {value}"
def visit_PrintNode(self, node):
expr = self.visit(node.expr)
return f"print({expr})"
def visit_BlockNode(self, node):
stmts = [self.visit(s) for s in node.statements]
return "\n".join(stmts)
Testing the visitor:
tokens = [("IDENTIFIER", "x"), ("OPERATOR", "="), ("NUMBER", "3"),
("OPERATOR", "+"), ("NUMBER", "4"), ("SEPARATOR", ";")]
parser = ASTParser(tokens)
parser.parse_expression()
ast = AssignNode("x", BinOpNode(NumNode(3), "+", NumNode(4)))
visitor = PrintVisitor()
result = visitor.visit(ast)
print(result)
Expected output:
x = (3 + 4)
AST Optimizations
ASTs can be transformed directly before lowering to IR:
class FoldConstantsVisitor(ASTVisitor):
def visit_BinOpNode(self, node):
left = self.visit(node.left)
right = self.visit(node.right)
if isinstance(left, NumNode) and isinstance(right, NumNode):
if node.op == "+":
return NumNode(left.value + right.value)
if node.op == "*":
return NumNode(left.value * right.value)
return BinOpNode(left, node.op, right)
Common Errors in AST Construction
Error 1: Including Syntactic Noise
Keeping semicolons, parentheses, and braces in the AST bloats the structure. AST nodes should represent program semantics, not concrete syntax.
Error 2: Flat Statement Lists
Treating blocks as flat lists loses scope information. Each block should be a distinct node type so analyses can track variable scopes correctly.
Error 3: No Visitor Pattern
Embedding analysis logic directly in node classes violates open-closed principles. The visitor pattern keeps node classes clean and supports adding new operations.
Error 4: Missing Parent Links
Some analyses need to walk upward from child to parent. Store parent references (or compute them during traversal) for these use cases.
Error 5: Mutable AST During Analysis
Modifying the AST while analyzing it causes inconsistent states. Use immutable transformations or explicit copy-on-write patterns.
Practice Questions
Question 1
What information does a parse tree contain that an AST omits?
Show answer
Parse trees contain keywords, punctuation (semicolons, parentheses, braces), and other syntactic elements that are not needed for Semantic Analysis or Code Generation.Question 2
Why is the visitor pattern commonly used with ASTs?
Show answer
The visitor pattern separates the AST structure from the operations performed on it, allowing new operations (Type Checking, Code Generation, optimization) without modifying node classes.Question 3
How does constant folding transform an AST?
Show answer
Constant folding replaces subexpressions composed entirely of constants with their computed value. For example, `3 + 4` becomes `7`, reducing work during Code Generation.Question 4
What is an AST node for a variable declaration?
Show answer
A typical `VarDecl` node contains the variable name, type annotation, and optional initializer expression. It may also store scope and Symbol Table information.Question 5
How do ASTs handle operator precedence?
Show answer
Operator precedence is encoded in the tree structure. Higher-precedence operators appear lower in the tree (deeper nesting), ensuring they are evaluated first during traversal.Challenge
Build an AST for a small JavaScript subset that includes function declarations with parameters, if-else statements, and while loops. Implement a visitor that counts the total number of nodes in the AST and another that computes the maximum nesting depth.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro