Error Handling and Recovery in Compilers
In this tutorial, you'll learn about Error Handling and Recovery in Compilers. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Error handling and recovery in compilers refers to the techniques used to detect, report, and recover from errors in source code, allowing the compiler to continue processing and discover multiple errors in a single compilation session.
What You'll Learn & Why It Matters
In this tutorial, you will learn how production compilers detect errors, report them with actionable messages, and recover to find additional errors. Good error handling determines whether a compiler is usable or frustrating. Users judge compiler quality primarily by its error messages.
Real-world use: Durga Antivirus Pro uses compiler-grade error recovery to parse malformed executables that attackers intentionally corrupt to evade detection, recovering meaningful analysis data from broken structures.
Prerequisites
You should understand Parsing from the syntax analysis tutorial. Knowledge of lexer and parser implementation from the lexical analysis tutorial and AST tutorial is helpful.
Error Classification
Compilers classify errors into three categories:
| Error Type | Phase | Examples | Severity |
|---|---|---|---|
| Lexical | Tokenization | Invalid character, unterminated string | Recoverable |
| Syntax | Parsing | Missing semicolon, unmatched brace | Recoverable |
| Semantic | Type Checking | Type mismatch, undeclared variable | Recoverable |
| Logical | Runtime | Division by zero, null dereference | Not detectable |
| Fatal | Any | Out of memory, disk full | Unrecoverable |
graph TD
A[Source Code] --> B[Lexical Analysis]
B -->|Error| C[Lexical Error: bad character]
B --> D[Syntax Analysis]
D -->|Error| E[Syntax Error: missing semicolon]
D --> F[Semantic Analysis]
F -->|Error| G[Semantic Error: type mismatch]
F --> H[Code Generation]
H --> I[Executable]
style C fill:#f44336,color:#fff
style E fill:#f44336,color:#fff
style G fill:#f44336,color:#fff
Error Reporting
A good error message contains three parts:
- Location: File, line, and column
- Description: What went wrong, in human terms
- Suggestion: How to fix it (when possible)
class CompilerError(Exception):
def __init__(self, message, line, column, filename="<stdin>"):
self.message = message
self.line = line
self.column = column
self.filename = filename
def __str__(self):
return f"{self.filename}:{self.line}:{self.column}: error: {self.message}"
class ErrorReporter:
def __init__(self):
self.errors = []
self.warnings = []
def report(self, error):
self.errors.append(error)
print(error)
def warn(self, message, line, column):
warning = f"<stdin>:{line}:{column}: warning: {message}"
self.warnings.append(warning)
print(warning)
def has_errors(self):
return len(self.errors) > 0
reporter = ErrorReporter()
reporter.report(CompilerError("Expected ';' after statement", 5, 12))
reporter.warn("Variable 'x' is assigned but never used", 3, 1)
Expected output:
<stdin>:5:12: error: Expected ';' after statement
<stdin>:3:1: warning: Variable 'x' is assigned but never used
Panic Mode Recovery
Panic mode is the simplest recovery Strategy. When the parser encounters an error, it discards tokens until it finds a synchronization token (typically ;, }, end, or a keyword).
class PanicModeParser:
def __init__(self, tokens):
self.tokens = tokens
self.pos = 0
self.reporter = ErrorReporter()
self.sync_tokens = {";", "}", "end", "else", "fi"}
def parse(self):
while self.pos < len(self.tokens):
try:
self.parse_statement()
except CompilerError as e:
self.reporter.report(e)
self.synchronize()
def synchronize(self):
while self.pos < len(self.tokens):
token = self.tokens[self.pos]
if token[0] == "SEPARATOR" and token[1] in self.sync_tokens:
return
if token[0] == "KEYWORD":
return
self.pos += 1
def parse_statement(self):
token = self.peek()
if token[0] == "IDENTIFIER":
self.parse_assignment()
elif token[0] == "KEYWORD" and token[1] == "if":
self.parse_if()
else:
raise CompilerError(
f"Unexpected token: {token[1]}", 0, 0
)
def peek(self):
return self.tokens[self.pos] if self.pos < len(self.tokens) else ("EOF", "")
def parse_assignment(self):
ident = self.consume()
if self.peek()[0] != "OPERATOR" or self.peek()[1] != "=":
raise CompilerError("Expected '=' in assignment", 0, 0)
self.consume()
self.parse_expression()
if self.peek()[0] != "SEPARATOR" or self.peek()[1] != ";":
raise CompilerError("Expected ';' after assignment", 0, 0)
self.consume()
def parse_expression(self):
if self.peek()[0] == "NUMBER":
self.consume()
elif self.peek()[0] == "IDENTIFIER":
self.consume()
else:
raise CompilerError("Expected expression", 0, 0)
def consume(self):
token = self.peek()
self.pos += 1
return token
Error Productions
Error productions extend the grammar with rules that match common mistakes, allowing the parser to produce better error messages and continue Parsing.
class ErrorProductionDecorator:
def __init__(self, parser):
self.parser = parser
def parse_expression(self):
try:
return self.parser.parse_expression()
except CompilerError:
# Check for common pattern: missing operator between operands
if self.parser.peek()[0] == "NUMBER" and \
self.parser.tokens[self.parser.pos + 1][0] == "NUMBER":
raise CompilerError(
"Missing operator between numbers. Did you mean '3 + 4'?",
self.parser.line, self.parser.column
)
raise
Error Recovery Strategies Comparison
| Strategy | Complexity | Error Quality | Implementation |
|---|---|---|---|
| Panic mode | Low | Poor | Skip to sync token |
| Error productions | Medium | Good | Add error grammar rules |
| Phrase level | Medium | Good | Local patch of parse stack |
| Global correction | High | Best | Find minimal edit distance |
| Fault tolerance | Medium | Good | Continue with speculative parse |
Contextual Error Messages
Modern compilers provide rich contextual information:
def format_error_with_context(source, line, column, message, fix=None):
lines = source.split("\n")
context_line = lines[line - 1]
pointer = " " * (column - 1) + "^"
result = f"error: {message}\n"
result += f" --> line {line}, column {column}\n"
result += f" {line} | {context_line}\n"
result += f" | {pointer}\n"
if fix:
result += f"help: {fix}\n"
return result
source = "let x = 5\nreturn x + "
msg = format_error_with_context(
source, 2, 12, "Unexpected end of file",
"Add the missing expression after the '+' operator."
)
print(msg)
Expected output:
error: Unexpected end of file
--> line 2, column 12
2 | return x +
| ^
help: Add the missing expression after the '+' operator.
Common Errors in Error Handling
Error 1: Cascade Errors
A single missing semicolon can cause dozens of spurious errors. Always synchronize after the first error to avoid error cascades.
Error 2: Vague Messages
"Syntax error" without location or context is useless. Always include line number, column, and a specific description of what was expected vs found.
Error 3: Stopping at First Error
Stopping compilation at the first error wastes the developer's time. Implement recovery to report multiple errors per compilation.
Error 4: Incorrect Error Recovery
Recovering by skipping too many tokens can cause real errors later to be missed. Choose synchronization points carefully.
Error 5: Ignoring Warnings
Warnings highlight potential bugs. Modern compilers let users promote warnings to errors or suppress specific warning codes.
Practice Questions
Question 1
What is panic mode error recovery?
Show answer
Panic mode discards input tokens until a synchronization token (like `;` or `}`) is found, then resumes Parsing. It is simple to implement but may skip valid code between the error and the sync point.Question 2
What are error productions?
Show answer
Error productions are grammar rules that match common programming mistakes, allowing the parser to produce specific, helpful error messages and continue Parsing normally.Question 3
What information should every compiler error message include?
Show answer
Every error message should include the file name, line number, column number, a description of the problem, what was expected, and ideally a suggestion for fixing it.Question 4
What is error cascading?
Show answer
Error cascading occurs when a single error triggers many subsequent errors because the parser has lost synchronization with the input stream. Proper error recovery techniques prevent cascading.Question 5
How do modern compilers like Rust and Elm improve error messages?
Show answer
Modern compilers show the erroneous code with markers pointing to the problem area, suggest fixes, include examples of correct code, and provide links to documentation for the relevant error code.Challenge
Implement an error recovery system for a recursive descent parser that handles three common mistakes: missing semicolons, unmatched parentheses, and missing operator between operands. Each recovery Strategy should produce a specific error message and resume Parsing on the next statement boundary.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro