Semantic Analysis — Type Checking and Symbol Tables
Semantic Analysis is the third phase of compilation that verifies the meaning of a program by checking type compatibility, variable declarations, scope rules, and other context-sensitive constraints that cannot be expressed in a context-free grammar.
What You'll Learn & Why It Matters
In this tutorial, you will learn how compilers verify type correctness, manage symbol tables for scoping, and detect semantic errors like undeclared variables, type mismatches, and invalid operations. Semantic Analysis prevents countless runtime errors before the program ever runs.
Real-world use: TypeScript's type checker catches millions of bugs before deployment. Durga Antivirus Pro uses Semantic Analysis to detect obfuscated malicious code that passes lexical and syntactic checks but performs dangerous operations.
Prerequisites
You should understand AST construction from the AST tutorial. Familiarity with C programming or Java type systems helps. Understanding scoping rules from general programming experience is assumed.
What Semantic Analysis Checks
Semantic Analysis answers questions the parser cannot:
| Check | Example of Error | Why Parser Misses It |
|---|---|---|
| Variable declaration | x = 5; but x never declared |
Valid syntax regardless |
| Type compatibility | int x = "hello"; |
Both sides are valid expressions |
| Scope validity | Access x after it goes out of scope |
Block structure is syntactically valid |
| Function arity | f(1, 2) but f expects 3 args |
Call syntax is correct |
| Return type | Function returns int but body returns string |
Return statement syntax is valid |
Symbol Tables
A Symbol Table maps names to their attributes: type, scope level, memory location, and declaration point. It is the compiler's phonebook for looking up identifiers.
class Symbol:
def __init__(self, name, kind, type_name, scope_level, line):
self.name = name
self.kind = kind # "variable", "function", "parameter"
self.type_name = type_name
self.scope_level = scope_level
self.line = line
def __repr__(self):
return f"Symbol({self.name}, {self.kind}, {self.type_name}, scope={self.scope_level})"
class SymbolTable:
def __init__(self):
self.scopes = [{}] # stack of scopes
self.current_level = 0
def enter_scope(self):
self.scopes.append({})
self.current_level += 1
def exit_scope(self):
self.scopes.pop()
self.current_level -= 1
def define(self, name, kind, type_name, line):
if name in self.scopes[self.current_level]:
raise SemanticError(f"Duplicate declaration: {name} at line {line}")
symbol = Symbol(name, kind, type_name, self.current_level, line)
self.scopes[self.current_level][name] = symbol
return symbol
def lookup(self, name):
for scope in reversed(self.scopes):
if name in scope:
return scope[name]
return None
graph TD
subgraph "Scope Chain"
G[Global Scope] --> F1[Function foo scope]
G --> F2[Function bar scope]
F1 --> B1[Block inside foo]
end
subgraph "Lookup for x inside block"
L[Start in current scope] --> C{Found?}
C -->|Yes| R[Return symbol]
C -->|No| P[Check parent scope]
P --> C
P -->|No more scopes| E[Error: undeclared]
end
style G fill:#4CAF50,color:#fff
style F1 fill:#2196F3,color:#fff
style F2 fill:#2196F3,color:#fff
style B1 fill:#FF9800,color:#fff
Type Checking
Type Checking ensures operations are applied to compatible types. A type checker traverses the AST and verifies each expression and statement.
class TypeSystem:
def __init__(self):
self.types = {
"int": {"size": 4, "compatible": ["int", "float"]},
"float": {"size": 8, "compatible": ["int", "float"]},
"string": {"size": 8, "compatible": ["string"]},
"bool": {"size": 1, "compatible": ["bool"]},
"void": {"size": 0, "compatible": []},
}
def is_compatible(self, actual, expected):
if actual == expected:
return True
type_info = self.types.get(actual)
return type_info and expected in type_info["compatible"]
class TypeChecker:
def __init__(self, symbol_table, type_system):
self.symbol_table = symbol_table
self.type_system = type_system
def check(self, node):
method = f"check_{type(node).__name__}"
checker = getattr(self, method, self.default_check)
return checker(node)
def check_NumNode(self, node):
return "int"
def check_VarNode(self, node):
symbol = self.symbol_table.lookup(node.name)
if not symbol:
raise SemanticError(f"Undeclared variable: {node.name}")
return symbol.type_name
def check_BinOpNode(self, node):
left_type = self.check(node.left)
right_type = self.check(node.right)
if not self.type_system.is_compatible(left_type, right_type):
raise SemanticError(
f"Type mismatch: {left_type} {node.op} {right_type}"
)
if node.op in ("+", "-", "*", "/"):
if left_type == "float" or right_type == "float":
return "float"
return "int"
if node.op in ("<", ">", "==", "!="):
return "bool"
return left_type
def check_AssignNode(self, node):
symbol = self.symbol_table.lookup(node.name)
if not symbol:
raise SemanticError(f"Undeclared variable: {node.name}")
value_type = self.check(node.value)
if not self.type_system.is_compatible(value_type, symbol.type_name):
raise SemanticError(
f"Cannot assign {value_type} to variable of type {symbol.type_name}"
)
return symbol.type_name
def check_BlockNode(self, node):
self.symbol_table.enter_scope()
for stmt in node.statements:
self.check(stmt)
self.symbol_table.exit_scope()
def default_check(self, node):
raise SemanticError(f"Unknown node type: {type(node).__name__}")
Testing the type checker:
st = SymbolTable()
st.define("x", "variable", "int", 1)
st.define("y", "variable", "string", 2)
checker = TypeChecker(st, TypeSystem())
from ast_nodes import AssignNode, BinOpNode, NumNode, VarNode
valid = AssignNode("x", BinOpNode(VarNode("x"), "+", NumNode(5)))
try:
result = checker.check(valid)
print(f"Type check passed, result type: {result}")
except SemanticError as e:
print(f"Type error: {e}")
invalid = AssignNode("x", BinOpNode(VarNode("x"), "+", VarNode("y")))
try:
checker.check(invalid)
except SemanticError as e:
print(f"Type error: {e}")
Expected output:
Type check passed, result type: int
Type error: Type mismatch: int + string
Advanced Type Checking
Structural vs Nominal Typing
Nominal typing checks types by name: int and float are different types even if they have the same size. Structural typing checks by structure: two types are compatible if they have the same fields and methods. TypeScript uses structural typing; Java uses nominal.
Type Inference
Some languages infer types automatically, reducing annotation burden:
def infer_type(expression, symbol_table):
if isinstance(expression, NumNode):
inferred = "int"
elif isinstance(expression, VarNode):
sym = symbol_table.lookup(expression.name)
inferred = sym.type_name if sym else "unknown"
elif isinstance(expression, BinOpNode):
left = infer_type(expression.left, symbol_table)
if left in ("int", "float") and expression.op in ("<", ">", "=="):
inferred = "bool"
else:
inferred = left
else:
inferred = "unknown"
return inferred
Common Errors in Semantic Analysis
Error 1: Forward Reference Resolution
Using a variable before its declaration requires multi-pass analysis or deferred checking. Single-pass compilers must require declarations before use.
Error 2: Name Shadowing Confusion
A variable declared in an inner scope with the same name as an outer one shadows the outer. Ensure the Symbol Table correctly handles this by checking the innermost scope first.
Error 3: Type Coercion Missing
Implicit conversions (int to float) reduce errors but can hide bugs. Decide on coercion rules early and document them clearly.
Error 4: Function Overloading Handling
Multiple functions with the same name but different parameters require overload resolution based on argument types and arity.
Error 5: Generic/Polymorphic Type Support
Template functions introduce complexity: type parameters must be resolved at instantiation time, requiring deferred Type Checking.
Practice Questions
Question 1
What is the purpose of a Symbol Table?
Show answer
A Symbol Table stores information about identifiers: their names, types, scope levels, and memory locations. It enables the compiler to verify declarations and resolve references.Question 2
How does lexical scoping work in a compiler?
Show answer
Lexical scoping means a variable's scope is determined by its position in the source code. Inner blocks can access outer variables, but outer blocks cannot access inner ones. The compiler maintains a stack of scopes in the Symbol Table.Question 3
What is type coercion?
Show answer
Type coercion is the automatic conversion of a value from one type to another, such as converting int to float before addition. Implicit coercion reduces errors but can mask bugs.Question 4
Why must the semantic analyzer check all code paths?
Show answer
Unreachable code may hide type errors. The analyzer must check all paths including branches that seem unlikely, because any code that compiles must be type-safe regardless of runtime conditions.Question 5
What is duck typing in Semantic Analysis?
Show answer
Duck typing checks whether an object supports the required methods rather than checking its declared type. "If it walks like a duck and quacks like a duck, it is a duck." This is common in dynamically typed languages.Challenge
Implement a semantic analyzer for a small language with function declarations, variable declarations, nested blocks, and return type verification. Ensure every code path in a function with a non-void return type actually returns a value. Test with recursive and conditional return scenarios.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro