LL Parsing — Top-Down Parsing Explained with Examples
In this tutorial, you'll learn about LL Parsing. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
LL Parsing is a top-down Parsing technique that reads input left to right, produces a leftmost derivation, and uses lookahead tokens to predict which grammar production to apply without Backtracking.
What You'll Learn & Why It Matters
In this tutorial, you will learn how LL parsers work, how to compute FIRST and FOLLOW sets, build predictive Parsing tables, and implement an LL(1) parser from scratch. LL Parsing is the foundation of hand-written recursive descent parsers used in production compilers for languages like Go and Rust.
Real-world use: The Rust compiler uses a hand-written recursive descent parser (an LL-based approach) that processes Rust source code left to right, leveraging lookahead to resolve ambiguities while producing excellent error messages.
Prerequisites
You should understand lexical analysis and syntax analysis fundamentals. Familiarity with C or Python and basic grammar notation is assumed.
What Makes LL Parsing LL
The name LL(1) describes three properties:
- L: Left-to-right scan of the input
- L: Leftmost derivation (expand the leftmost non-terminal first)
- (1): One token of lookahead to decide which production to use
graph TD
subgraph "LL(1) Parsing Process"
A[Input Token Stream] --> B[LL Parser]
B --> C{Lookup Table}
C -->|Production| D[Stack]
D --> E[Parse Tree]
B --> F[Token Buffer]
F -->|Lookahead| C
end
style B fill:#2196F3,color:#fff
style C fill:#FF9800,color:#fff
style D fill:#4CAF50,color:#fff
FIRST and FOLLOW Sets
LL parsers need two computed sets:
FIRST(X) is the set of terminal symbols that can appear at the start of any string derived from X.
FOLLOW(X) is the set of terminals that can appear immediately after X in any derivation.
def compute_first(grammar):
first = {nt: set() for nt in grammar}
changed = True
while changed:
changed = False
for nt, productions in grammar.items():
for production in productions:
all_have_empty = True
for symbol in production:
if symbol in grammar:
before = len(first[nt])
first[nt].update(first[symbol] - {''})
if '' not in first[symbol]:
all_have_empty = False
if len(first[nt]) != before:
changed = True
break
if len(first[nt]) != before:
changed = True
else:
before = len(first[nt])
first[nt].add(symbol)
if len(first[nt]) != before:
changed = True
all_have_empty = False
break
if all_have_empty:
first[nt].add('')
return first
grammar = {
'E': [['T', "E'"]],
"E'": [['+', 'T', "E'"], ['']],
'T': [['F', "T'"]],
"T'": [['*', 'F', "T'"], ['']],
'F': [['id'], ['(', 'E', ')']]
}
first_sets = compute_first(grammar)
for nt, terms in first_sets.items():
print(f"FIRST({nt}) = {terms}")
Expected output:
FIRST(E) = {'(', 'id'}
FIRST(E') = {'+', ''}
FIRST(T) = {'(', 'id'}
FIRST(T') = {'*', ''}
FIRST(F) = {'(', 'id'}
Building the Predictive Parsing Table
The Parsing table maps (non-terminal, lookahead) pairs to the production to apply.
def build_parse_table(grammar, first, follow):
table = {}
for nt, productions in grammar.items():
for idx, production in enumerate(productions):
first_of_prod = set()
all_empty = True
for symbol in production:
if symbol in grammar:
first_of_prod.update(first[symbol] - {''})
if '' not in first[symbol]:
all_empty = False
break
else:
first_of_prod.add(symbol)
all_empty = False
break
for terminal in first_of_prod:
table[(nt, terminal)] = (nt, idx, production)
if '' in first_of_prod or all_empty:
for terminal in follow[nt]:
table[(nt, terminal)] = (nt, idx, production)
return table
def compute_follow(grammar, first):
follow = {nt: set() for nt in grammar}
follow['E'].add('$')
changed = True
while changed:
changed = False
for nt, productions in grammar.items():
for production in productions:
for i, symbol in enumerate(production):
if symbol in grammar:
beta = production[i+1:]
before = len(follow[symbol])
for j, b in enumerate(beta):
if b in grammar:
follow[symbol].update(first[b] - {''})
if '' not in first[b]:
break
else:
follow[symbol].add(b)
break
else:
follow[symbol].update(follow[nt])
if len(follow[symbol]) != before:
changed = True
return follow
follow_sets = compute_follow(grammar, first_sets)
parse_table = build_parse_table(grammar, first_sets, follow_sets)
for key, val in sorted(parse_table.items()):
nt, terminal = key
prod_str = ' -> '.join([val[0]] + [str(p) for p in val[2]])
print(f"M[{nt}, {terminal}] = {prod_str}")
Expected output:
M[E, (] = E -> ['T', "E'"]
M[E, id] = E -> ['T', "E'"]
M[E', +] = E' -> ['+', 'T', "E'"]
M[E', $] = E' -> ['']
M[E', )] = E' -> ['']
M[T, (] = T -> ['F', "T'"]
M[T, id] = T -> ['F', "T'"]
M[T', *] = T' -> ['*', 'F', "T'"]
M[T', $] = T' -> ['']
M[T', )] = T' -> ['']
M[T', +] = T' -> ['']
M[F, (] = F -> ['(', 'E', ')']
M[F, id] = F -> ['id']
Implementing an LL(1) Parser
With the Parsing table built, the parser uses a stack-driven algorithm.
class LL1Parser:
def __init__(self, parse_table, grammar, tokens):
self.table = parse_table
self.grammar = grammar
self.tokens = tokens
self.pos = 0
self.stack = ['$', list(grammar.keys())[0]]
def current_token(self):
if self.pos < len(self.tokens):
return self.tokens[self.pos]
return ('$', '$')
def parse(self):
while len(self.stack) > 0:
top = self.stack[-1]
token = self.current_token()
if top == token[0] or top == token[1]:
self.stack.pop()
self.pos += 1
elif top == '$' and token[0] == '$':
return True
elif top in self.grammar:
key = (top, token[0] if token[0] != '$' else '$')
if key not in self.table:
print(f"Error at token {token}: no rule for {key}")
return False
_, _, production = self.table[key]
self.stack.pop()
for symbol in reversed(production):
if symbol != '':
self.stack.append(symbol)
else:
return False
return True
tokens = [('id', 'x'), ('+', '+'), ('id', 'y'), ('*', '*'), ('id', 'z'), ('$', '$')]
parser = LL1Parser(parse_table, grammar, tokens)
result = parser.parse()
print(f"Parse result: {'Success' if result else 'Failure'}")
Expected output:
Parse result: Success
LL Parsing Limitations
| Limitation | Cause | Workaround |
|---|---|---|
| Left Recursion | Infinite loop in top-down | Rewrite as right-recursive |
| Left factoring needed | Common prefixes ambiguous | Factor out common prefix |
| No arbitrary lookahead | Fixed k tokens | Use LL(*) or Backtracking |
| Grammar restrictions | Not all CFGs are LL(1) | Use LR Parsing instead |
Common Errors in LL Parsing
Error 1: Left Recursion
Rules like A -> A a cause infinite descent. Always eliminate left Recursion before building the LL table.
Error 2: Missing Left Factoring
If A -> a b | a c, the parser cannot choose between them with one lookahead. Factor: A -> a A', A' -> b | c.
Error 3: FIRST/FOLLOW Conflicts
If FIRST and FOLLOW sets overlap for the same non-terminal, the table has conflicts. This means the grammar is not LL(1).
Error 4: Incorrect Nullable Computation
A non-terminal that derives epsilon must be correctly identified. Missing nullable entries cause incorrect FIRST sets.
Error 5: Forgetting the End Marker
The parser needs $ as an end-of-input marker in both the stack and the FOLLOW set to detect completion.
Practice Questions
Question 1
What does LL(1) stand for and what does each part mean?
Show answer
LL(1) stands for Left-to-right scan, Leftmost derivation, and 1 token of lookahead. L describes the input direction, the second L describes the derivation order, and (1) specifies how many lookahead tokens the parser uses to make decisions.Question 2
Why must left Recursion be removed from grammars used with LL parsers?
Show answer
Left-recursive rules like A -> A a cause the parser to recursively call itself without consuming any input, leading to infinite Recursion. The parser must consume at least one token per rule application to make progress.Question 3
When does an LL(1) Parsing table have a conflict?
Show answer
A conflict occurs when a single table cell M[non-terminal, terminal] contains more than one production. This happens when FIRST sets overlap for different productions of the same non-terminal, or when FIRST and FOLLOW sets overlap for epsilon-producing rules.Challenge
Implement an LL(1) parser for a simple calculator grammar that handles +, -, *, /, parentheses, and exponentiation (^ with right associativity). Compute FIRST and FOLLOW sets, build the parse table, and test with the input 3 + 4 * (5 ^ 2).
FAQ
graph LR
A[Lexical Analysis] --> B[LL Parsing]
B --> C[Semantic Analysis]
B --> D[AST Construction]
style B fill:#2196F3,color:#fff,stroke-width:2px
Summary
LL Parsing builds a parse tree top-down using lookahead tokens to predict grammar productions. Computing FIRST and FOLLOW sets enables constructing predictive Parsing tables that drive efficient stack-based parsers. While LL(1) cannot handle all grammars, LL techniques form the backbone of most hand-written production compilers.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro