Interpreter Design — AST Walking and Bytecode VMs
In this tutorial, you'll learn about Interpreter Design. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Interpreter design encompasses the techniques used to execute programs directly from their Abstract Syntax Tree or compiled bytecode, without translating to native machine code, offering flexibility and portability at the cost of performance.
What You'll Learn & Why It Matters
In this tutorial, you will learn the two main Interpreter architectures — AST walkers and bytecode VMs — and understand their tradeoffs. Interpreter knowledge is essential for building scripting languages, DSLs, and understanding how JavaScript, Python, and Ruby execute code.
Real-world use: Durga Antivirus Pro includes a custom bytecode Interpreter for analyzing suspicious scripts in a sandboxed environment, detecting malicious behavior without executing native code on the host system.
Prerequisites
You should understand abstract syntax trees from the AST tutorial. Familiarity with Python and stack data structures is expected. Basic knowledge of C programming helps for the bytecode VM section.
AST Walking Interpreter
An AST walker interprets the program by recursively evaluating each node in the AST. This is the simplest Interpreter design.
class ASTInterpreter:
def __init__(self):
self.environment = {}
def evaluate(self, node):
method = f"eval_{type(node).__name__}"
evaluator = getattr(self, method, self.default_eval)
return evaluator(node)
def eval_NumNode(self, node):
return node.value
def eval_StringNode(self, node):
return node.value
def eval_VarNode(self, node):
if node.name in self.environment:
return self.environment[node.name]
raise RuntimeError(f"Undefined variable: {node.name}")
def eval_BinOpNode(self, node):
left = self.evaluate(node.left)
right = self.evaluate(node.right)
if node.op == "+":
return left + right
elif node.op == "-":
return left - right
elif node.op == "*":
return left * right
elif node.op == "/":
return left / right
elif node.op == "==":
return left == right
elif node.op == "<":
return left < right
raise RuntimeError(f"Unknown operator: {node.op}")
def eval_AssignNode(self, node):
value = self.evaluate(node.value)
self.environment[node.name] = value
return value
def eval_IfNode(self, node):
condition = self.evaluate(node.condition)
if condition:
return self.evaluate(node.then_branch)
elif node.else_branch:
return self.evaluate(node.else_branch)
def eval_WhileNode(self, node):
result = None
while self.evaluate(node.condition):
result = self.evaluate(node.body)
return result
def eval_BlockNode(self, node):
result = None
for stmt in node.statements:
result = self.evaluate(stmt)
return result
def default_eval(self, node):
raise RuntimeError(f"Cannot evaluate {type(node).__name__}")
interpreter = ASTInterpreter()
interpreter.evaluate(AssignNode("x", NumNode(10)))
interpreter.evaluate(AssignNode("y", NumNode(20)))
result = interpreter.evaluate(BinOpNode(VarNode("x"), "+", VarNode("y")))
print(f"x + y = {result}")
Expected output:
x + y = 30
graph TD
A[ASTInterpreter.evaluate] --> B{Node type?}
B -->|NumNode| C[Return number value]
B -->|BinOpNode| D[Evaluate left]
B -->|BinOpNode| E[Evaluate right]
B -->|BinOpNode| F[Apply operator]
B -->|AssignNode| G[Evaluate value]
B -->|AssignNode| H[Store in environment]
B -->|VarNode| I[Lookup in environment]
B -->|IfNode| J[Evaluate condition]
B -->|IfNode| K[Branch accordingly]
style A fill:#4CAF50,color:#fff
Bytecode Virtual Machine
A bytecode VM compiles the AST to a compact bytecode representation and executes it using a stack-based or register-based virtual machine. This is much faster than AST walking.
Defining Bytecode Instructions
class Opcode:
LOAD_CONST = 0x01
LOAD_VAR = 0x02
STORE_VAR = 0x03
ADD = 0x04
SUB = 0x05
MUL = 0x06
DIV = 0x07
JUMP_IF_FALSE = 0x08
JUMP = 0x09
CALL = 0x0A
RETURN = 0x0B
PRINT = 0x0C
HALT = 0xFF
class BytecodeCompiler:
def __init__(self):
self.bytecode = []
self.constants = []
self.variables = {}
self.labels = {}
def compile(self, ast):
self._compile_node(ast)
self.bytecode.append(Opcode.HALT)
return self.bytecode, self.constants
def _compile_node(self, node):
method = f"compile_{type(node).__name__}"
compiler = getattr(self, method, None)
if compiler:
compiler(node)
def compile_NumNode(self, node):
idx = len(self.constants)
self.constants.append(node.value)
self.bytecode.extend([Opcode.LOAD_CONST, idx])
def compile_VarNode(self, node):
self.bytecode.extend([Opcode.LOAD_VAR, ord(node.name[0]) - ord('a')])
def compile_AssignNode(self, node):
self._compile_node(node.value)
self.bytecode.extend([Opcode.STORE_VAR, ord(node.name[0]) - ord('a')])
def compile_BinOpNode(self, node):
self._compile_node(node.left)
self._compile_node(node.right)
op_map = {"+": Opcode.ADD, "-": Opcode.SUB, "*": Opcode.MUL, "/": Opcode.DIV}
self.bytecode.append(op_map.get(node.op, Opcode.ADD))
Implementing the Bytecode VM
class BytecodeVM:
def __init__(self):
self.stack = []
self.variables = [0] * 26 # a-z
self.ip = 0 # instruction pointer
def run(self, bytecode, constants):
self.ip = 0
while self.ip < len(bytecode):
opcode = bytecode[self.ip]
self.ip += 1
if opcode == Opcode.LOAD_CONST:
idx = bytecode[self.ip]
self.ip += 1
self.stack.append(constants[idx])
elif opcode == Opcode.LOAD_VAR:
var_idx = bytecode[self.ip]
self.ip += 1
self.stack.append(self.variables[var_idx])
elif opcode == Opcode.STORE_VAR:
var_idx = bytecode[self.ip]
self.ip += 1
self.variables[var_idx] = self.stack.pop()
elif opcode == Opcode.ADD:
right = self.stack.pop()
left = self.stack.pop()
self.stack.append(left + right)
elif opcode == Opcode.MUL:
right = self.stack.pop()
left = self.stack.pop()
self.stack.append(left * right)
elif opcode == Opcode.HALT:
break
elif opcode == Opcode.PRINT:
value = self.stack.pop()
print(value)
return self.stack[-1] if self.stack else None
compiler = BytecodeCompiler()
ast = AssignNode("x", BinOpNode(NumNode(10), "+", NumNode(20)))
compiler.compile(ast)
compiler.bytecode.append(Opcode.PRINT)
bytecode, constants = compiler.bytecode, compiler.constants
print(f"Bytecode: {[hex(b) for b in bytecode]}")
print(f"Constants: {constants}")
vm = BytecodeVM()
result = vm.run(bytecode, constants)
Expected output:
Bytecode: ['0x1', '0x0', '0x1', '0x1', '0x4', '0x3', '0x0', '0xc', '0xff']
Constants: [10, 20]
30
Stack-Based vs Register-Based VMs
| Feature | Stack VM | Register VM |
|---|---|---|
| Example | JVM, CPython | Lua, Dalvik |
| Instruction size | Small (1-2 bytes) | Larger (3-4 bytes) |
| Code density | Higher | Lower |
| Speed | Slower (stack ops) | Faster (fewer ops) |
| Implementation | Simpler | More complex |
Inline Caching
Dynamic language interpreters optimize repeated operations using inline caching:
class InlineCache:
def __init__(self):
self.cache = {} # instruction address -> (type, method)
def lookup_method(self, ip, obj, method_name):
obj_type = type(obj).__name__
key = (ip, obj_type, method_name)
if key in self.cache:
return self.cache[key]
method = getattr(obj, method_name, None)
if method:
self.cache[key] = method
return method
Common Errors in Interpreter Design
Error 1: Tail Call Overflow
Recursive calls in AST walkers overflow the call stack for deeply nested expressions. Use explicit stack or trampolining for Recursion-heavy languages.
Error 2: Mutable Environment Sharing
Sharing the same environment object between nested blocks causes unintended mutation. Copy environments when entering new scopes.
Error 3: Missing Garbage Collection
AST interpreters generate many intermediate objects. Without GC, memory grows unboundedly. Implement reference counting or mark-sweep collection.
Error 4: Incorrect Short-Circuit Evaluation
Boolean operators && and || must short-circuit: if the left operand determines the result, the right operand must not be evaluated.
Error 5: String Concatenation Performance
Building strings with + in a loop is O(n^2). Use string builders or rope data structures for efficient string operations in interpreters.
Practice Questions
Question 1
What is the difference between an AST walker and a bytecode VM?
Show answer
An AST walker interprets the program directly from the tree structure. A bytecode VM compiles the AST to bytecode first, then interprets the bytecode. Bytecode VMs are faster because bytecode is more compact and eliminates repeated tree traversal.Question 2
What is a stack-based virtual machine?
Show answer
A stack-based VM uses a stack for all operations. Instructions pop operands from the stack and push results. The JVM and CPython use stack-based architectures.Question 3
What is inline caching?
Show answer
Inline caching optimizes dynamic method dispatch by caching the result of the first lookup at each call site. Subsequent calls to the same method on the same type skip the expensive lookup.Question 4
How does a register-based VM differ from stack-based?
Show answer
Register VMs use named virtual registers instead of a stack. Instructions specify which registers hold operands and results, reducing the number of instructions needed but making each instruction larger.Question 5
What is trampolining in interpreters?
Show answer
Trampolining converts deep Recursion into iteration by returning thunks (zero-argument functions) instead of making recursive calls. This prevents stack overflow in interpreters that would otherwise recurse deeply.Challenge
Build a bytecode compiler and VM for a small language with if-else statements, while loops, function calls, and local scopes. The bytecode should support function call frames with a call stack. Test with a recursive factorial function.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro