Semantic Analysis — Type Checking and Symbol Tables

DodaTech Updated 2026-06-21 7 min read

Semantic Analysis is the third phase of compilation that verifies the meaning of a program by checking type compatibility, variable declarations, scope rules, and other context-sensitive constraints that cannot be expressed in a context-free grammar.

What You'll Learn & Why It Matters

In this tutorial, you will learn how compilers verify type correctness, manage symbol tables for scoping, and detect semantic errors like undeclared variables, type mismatches, and invalid operations. Semantic Analysis prevents countless runtime errors before the program ever runs.

Real-world use: TypeScript's type checker catches millions of bugs before deployment. Durga Antivirus Pro uses Semantic Analysis to detect obfuscated malicious code that passes lexical and syntactic checks but performs dangerous operations.

Prerequisites

You should understand AST construction from the AST tutorial. Familiarity with C programming or Java type systems helps. Understanding scoping rules from general programming experience is assumed.

What Semantic Analysis Checks

Semantic Analysis answers questions the parser cannot:

Check	Example of Error	Why Parser Misses It
Variable declaration	`x = 5;` but `x` never declared	Valid syntax regardless
Type compatibility	`int x = "hello";`	Both sides are valid expressions
Scope validity	Access `x` after it goes out of scope	Block structure is syntactically valid
Function arity	`f(1, 2)` but `f` expects 3 args	Call syntax is correct
Return type	Function returns `int` but body returns `string`	Return statement syntax is valid

Symbol Tables

A Symbol Table maps names to their attributes: type, scope level, memory location, and declaration point. It is the compiler's phonebook for looking up identifiers.

class Symbol:
    def __init__(self, name, kind, type_name, scope_level, line):
        self.name = name
        self.kind = kind  # "variable", "function", "parameter"
        self.type_name = type_name
        self.scope_level = scope_level
        self.line = line

    def __repr__(self):
        return f"Symbol({self.name}, {self.kind}, {self.type_name}, scope={self.scope_level})"

class SymbolTable:
    def __init__(self):
        self.scopes = [{}]  # stack of scopes
        self.current_level = 0

    def enter_scope(self):
        self.scopes.append({})
        self.current_level += 1

    def exit_scope(self):
        self.scopes.pop()
        self.current_level -= 1

    def define(self, name, kind, type_name, line):
        if name in self.scopes[self.current_level]:
            raise SemanticError(f"Duplicate declaration: {name} at line {line}")
        symbol = Symbol(name, kind, type_name, self.current_level, line)
        self.scopes[self.current_level][name] = symbol
        return symbol

    def lookup(self, name):
        for scope in reversed(self.scopes):
            if name in scope:
                return scope[name]
        return None

graph TD
    subgraph "Scope Chain"
        G[Global Scope] --> F1[Function foo scope]
        G --> F2[Function bar scope]
        F1 --> B1[Block inside foo]
    end
    subgraph "Lookup for x inside block"
        L[Start in current scope] --> C{Found?}
        C -->|Yes| R[Return symbol]
        C -->|No| P[Check parent scope]
        P --> C
        P -->|No more scopes| E[Error: undeclared]
    end
    style G fill:#4CAF50,color:#fff
    style F1 fill:#2196F3,color:#fff
    style F2 fill:#2196F3,color:#fff
    style B1 fill:#FF9800,color:#fff

Type Checking

Type Checking ensures operations are applied to compatible types. A type checker traverses the AST and verifies each expression and statement.

class TypeSystem:
    def __init__(self):
        self.types = {
            "int": {"size": 4, "compatible": ["int", "float"]},
            "float": {"size": 8, "compatible": ["int", "float"]},
            "string": {"size": 8, "compatible": ["string"]},
            "bool": {"size": 1, "compatible": ["bool"]},
            "void": {"size": 0, "compatible": []},
        }

    def is_compatible(self, actual, expected):
        if actual == expected:
            return True
        type_info = self.types.get(actual)
        return type_info and expected in type_info["compatible"]

class TypeChecker:
    def __init__(self, symbol_table, type_system):
        self.symbol_table = symbol_table
        self.type_system = type_system

    def check(self, node):
        method = f"check_{type(node).__name__}"
        checker = getattr(self, method, self.default_check)
        return checker(node)

    def check_NumNode(self, node):
        return "int"

    def check_VarNode(self, node):
        symbol = self.symbol_table.lookup(node.name)
        if not symbol:
            raise SemanticError(f"Undeclared variable: {node.name}")
        return symbol.type_name

    def check_BinOpNode(self, node):
        left_type = self.check(node.left)
        right_type = self.check(node.right)
        if not self.type_system.is_compatible(left_type, right_type):
            raise SemanticError(
                f"Type mismatch: {left_type} {node.op} {right_type}"
            )
        if node.op in ("+", "-", "*", "/"):
            if left_type == "float" or right_type == "float":
                return "float"
            return "int"
        if node.op in ("<", ">", "==", "!="):
            return "bool"
        return left_type

    def check_AssignNode(self, node):
        symbol = self.symbol_table.lookup(node.name)
        if not symbol:
            raise SemanticError(f"Undeclared variable: {node.name}")
        value_type = self.check(node.value)
        if not self.type_system.is_compatible(value_type, symbol.type_name):
            raise SemanticError(
                f"Cannot assign {value_type} to variable of type {symbol.type_name}"
            )
        return symbol.type_name

    def check_BlockNode(self, node):
        self.symbol_table.enter_scope()
        for stmt in node.statements:
            self.check(stmt)
        self.symbol_table.exit_scope()

    def default_check(self, node):
        raise SemanticError(f"Unknown node type: {type(node).__name__}")

Testing the type checker:

st = SymbolTable()
st.define("x", "variable", "int", 1)
st.define("y", "variable", "string", 2)
checker = TypeChecker(st, TypeSystem())

from ast_nodes import AssignNode, BinOpNode, NumNode, VarNode
valid = AssignNode("x", BinOpNode(VarNode("x"), "+", NumNode(5)))
try:
    result = checker.check(valid)
    print(f"Type check passed, result type: {result}")
except SemanticError as e:
    print(f"Type error: {e}")

invalid = AssignNode("x", BinOpNode(VarNode("x"), "+", VarNode("y")))
try:
    checker.check(invalid)
except SemanticError as e:
    print(f"Type error: {e}")

Expected output:

Type check passed, result type: int
Type error: Type mismatch: int + string

Advanced Type Checking

Structural vs Nominal Typing

Nominal typing checks types by name: int and float are different types even if they have the same size. Structural typing checks by structure: two types are compatible if they have the same fields and methods. TypeScript uses structural typing; Java uses nominal.

Type Inference

Some languages infer types automatically, reducing annotation burden:

def infer_type(expression, symbol_table):
    if isinstance(expression, NumNode):
        inferred = "int"
    elif isinstance(expression, VarNode):
        sym = symbol_table.lookup(expression.name)
        inferred = sym.type_name if sym else "unknown"
    elif isinstance(expression, BinOpNode):
        left = infer_type(expression.left, symbol_table)
        if left in ("int", "float") and expression.op in ("<", ">", "=="):
            inferred = "bool"
        else:
            inferred = left
    else:
        inferred = "unknown"
    return inferred

Common Errors in Semantic Analysis

Error 1: Forward Reference Resolution

Using a variable before its declaration requires multi-pass analysis or deferred checking. Single-pass compilers must require declarations before use.

Error 2: Name Shadowing Confusion

A variable declared in an inner scope with the same name as an outer one shadows the outer. Ensure the Symbol Table correctly handles this by checking the innermost scope first.

Error 3: Type Coercion Missing

Implicit conversions (int to float) reduce errors but can hide bugs. Decide on coercion rules early and document them clearly.

Error 4: Function Overloading Handling

Multiple functions with the same name but different parameters require overload resolution based on argument types and arity.

Error 5: Generic/Polymorphic Type Support

Template functions introduce complexity: type parameters must be resolved at instantiation time, requiring deferred Type Checking.

Practice Questions

Question 1

What is the purpose of a Symbol Table?

Show answer

A Symbol Table stores information about identifiers: their names, types, scope levels, and memory locations. It enables the compiler to verify declarations and resolve references.

Question 2

How does lexical scoping work in a compiler?

Show answer

Lexical scoping means a variable's scope is determined by its position in the source code. Inner blocks can access outer variables, but outer blocks cannot access inner ones. The compiler maintains a stack of scopes in the Symbol Table.

Question 3

What is type coercion?

Show answer

Type coercion is the automatic conversion of a value from one type to another, such as converting int to float before addition. Implicit coercion reduces errors but can mask bugs.

Question 4

Why must the semantic analyzer check all code paths?

Show answer

Unreachable code may hide type errors. The analyzer must check all paths including branches that seem unlikely, because any code that compiles must be type-safe regardless of runtime conditions.

Question 5

What is duck typing in Semantic Analysis?

Show answer

Duck typing checks whether an object supports the required methods rather than checking its declared type. "If it walks like a duck and quacks like a duck, it is a duck." This is common in dynamically typed languages.

Challenge

Implement a semantic analyzer for a small language with function declarations, variable declarations, nested blocks, and return type verification. Ensure every code path in a function with a non-void return type actually returns a value. Test with recursive and conditional return scenarios.

FAQ

What is the difference between static and dynamic Type Checking?

Static Type Checking happens at compile time (before execution) and catches errors early. Dynamic Type Checking happens at runtime and offers more flexibility at the cost of potential runtime failures.

Can Semantic Analysis detect all runtime errors?

No. Semantic Analysis checks type safety and declaration rules, but cannot detect logical errors, division by zero (in most cases), or null pointer dereferences. These require runtime checks or advanced Static Analysis.

How do compilers handle circular dependencies?

Circular dependencies between modules require forward declarations in C/C++ or separate compilation with link-time resolution. Some languages use multi-pass compilation to handle mutual references.

What is a semantic predicate?

A semantic predicate is a condition attached to a grammar rule that must be satisfied for the rule to apply. Predicates are used in parser generators like ANTLR to resolve ambiguities based on semantic context.

← Previous Abstract Syntax Trees — Representing Program Structure Next → Intermediate Representations — Three-Address Code and SSA

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Compiler Design