LR Parsing — Bottom-Up Parsing with Shift-Reduce
In this tutorial, you'll learn about LR Parsing. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
LR Parsing is a bottom-up Parsing technique that reads input left to right, produces a rightmost derivation in reverse, and uses a deterministic finite automaton with a stack to decide when to shift tokens or reduce by grammar productions.
What You'll Learn & Why It Matters
In this tutorial, you will learn how LR parsers work, the three variants (SLR, LALR, canonical LR), how to construct LR items and parse tables, and when to choose each variant. LR Parsing handles the widest class of deterministic context-free grammars and is the standard algorithm used by parser generators like Yacc and Bison.
Real-world use: The GCC compiler uses a hand-crafted LALR parser generated by Bison to parse C and C++ source code, processing millions of lines of code across large-scale software projects every day.
Prerequisites
You should understand lexical analysis and syntax analysis. Familiarity with compiler design concepts and context-free grammars is assumed.
The Shift-Reduce Paradigm
An LR parser maintains a stack of grammar symbols and states. It uses two fundamental operations:
- Shift: Read the next input token and push it onto the stack
- Reduce: Pop symbols matching a production's right-hand side, push the left-hand side non-terminal
graph TD
subgraph "LR Parser Architecture"
A[Input Tokens] --> B[Parser Driver]
B <--> C[Parse Stack]
B <--> D[Action Table]
B <--> E[Goto Table]
D --> F{Shift or Reduce}
E --> G[Next State]
F -->|Shift| H[Push Token + State]
F -->|Reduce| I[Pop and Push Non-Terminal]
end
style B fill:#2196F3,color:#fff
style C fill:#FF9800,color:#fff
style D fill:#4CAF50,color:#fff
LR(0) Items and Closure
An LR(0) item is a grammar production with a dot indicating how much of the right-hand side has been seen. A -> B . C means we have parsed B and expect C next.
def closure(items, grammar):
closure_set = set(items)
changed = True
while changed:
changed = False
for item in list(closure_set):
dot_pos = item.index('.')
if dot_pos < len(item) - 1:
next_symbol = item[dot_pos + 1]
if next_symbol in grammar:
for production in grammar[next_symbol]:
new_item = (next_symbol,) + ('.',) + tuple(production)
if new_item not in closure_set:
closure_set.add(new_item)
changed = True
return closure_set
def goto(items, symbol, grammar):
moved = set()
for item in items:
dot_pos = list(item).index('.')
if dot_pos < len(item) - 1 and list(item)[dot_pos + 1] == symbol:
new_list = list(item)
new_list[dot_pos] = new_list[dot_pos + 1]
new_list[dot_pos + 1] = '.'
moved.add(tuple(new_list))
return closure(moved, grammar)
grammar_lr = {
"E'": [['E']],
'E': [['E', '+', 'T'], ['T']],
'T': [['T', '*', 'F'], ['F']],
'F': [['(', 'E', ')'], ['id']]
}
start = closure({("E'", '.', 'E')}, grammar_lr)
print("Initial closure set:")
for item in sorted(start):
print(' '.join(str(s) for s in item))
Expected output:
Initial closure set:
. E
. E + T
. F
. T
. T * F
( . E )
id
E' . E
Building the LR Parse Table
The LR parse table has two parts: ACTION (what to do for terminal symbols) and GOTO (next state for non-terminals).
def build_lr0_table(grammar, terminals):
items = closure({("E'", '.', next(iter(grammar.keys())))}, grammar)
states = [items]
state_map = {tuple(sorted(items)): 0}
action = {}
goto_table = {}
queue = [0]
while queue:
current = queue.pop(0)
items = states[current]
symbols_seen = set()
for item in items:
dot_pos = list(item).index('.')
if dot_pos < len(item) - 1:
symbols_seen.add(list(item)[dot_pos + 1])
for symbol in symbols_seen:
new_items = goto(items, symbol, grammar)
key = tuple(sorted(new_items))
if key not in state_map:
state_map[key] = len(states)
states.append(new_items)
queue.append(len(states) - 1)
for item in items:
dot_pos = list(item).index('.')
if dot_pos < len(item) - 1 and list(item)[dot_pos + 1] == symbol:
if symbol in terminals or symbol == '$':
action[(current, symbol)] = ('S', state_map[key])
else:
goto_table[(current, symbol)] = state_map[key]
for item in items:
dot_pos = list(item).index('.')
if dot_pos == len(item) - 1:
lhs = item[0]
if lhs == "E'":
action[(current, '$')] = ('ACC',)
else:
prod_len = len(item) - 2
for t in terminals + ('$',):
action[(current, t)] = ('R', lhs, list(item[1:-1]))
return action, goto_table, states
terminals = ['id', '+', '*', '(', ')']
action, goto_table, states = build_lr0_table(grammar_lr, terminals)
print("Number of states:", len(states))
for key, val in sorted(action.items()):
state, sym = key
print(f"ACTION[{state}, {sym}] = {val}")
Expected output:
Number of states: 12
ACTION[0, (] = ('S', ...)
ACTION[0, id] = ('S', ...)
...
Simulating LR Parsing
def lr_parse(tokens, action, goto_table, grammar):
stack = [0]
pos = 0
tokens = list(tokens) + [('$', '$')]
while True:
state = stack[-1]
token_type = tokens[pos][0]
act = action.get((state, token_type))
if act is None:
print(f"Syntax error at token {tokens[pos]}")
return False
if act[0] == 'S':
stack.append(token_type)
stack.append(act[1])
pos += 1
print(f"Shift {token_type} -> state {act[1]}")
elif act[0] == 'R':
lhs, production = act[1], act[2]
pop_count = len(production) * 2
stack = stack[:-pop_count] if pop_count > 0 else stack
state = stack[-1]
stack.append(lhs)
stack.append(goto_table.get((state, lhs), 0))
print(f"Reduce by {lhs} -> {' '.join(production)}")
elif act[0] == 'ACC':
print("Parse successful!")
return True
tokens = [('id', 'x'), ('+', '+'), ('id', 'y'), ('*', '*'), ('id', 'z')]
lr_parse(tokens, action, goto_table, grammar_lr)
LR Variants Comparison
| Variant | Table Size | Grammar Class | Complexity | Parser Generator |
|---|---|---|---|---|
| SLR(1) | Small | Simple LR | Low | Yacc (basic) |
| LALR(1) | Medium | Lookahead LR | Medium | Yacc, Bison |
| Canonical LR(1) | Large | Full LR(1) | High | Hand-crafted only |
| GLR | Large | All CFGs | High | Bison GLR mode |
Common Errors in LR Parsing
Error 1: Shift-Reduce Conflicts
The parser can both shift and reduce from the same state. This indicates the grammar is ambiguous or not LR(0).
Error 2: Reduce-Reduce Conflicts
Two different productions can reduce from the same state. This is often caused by an incorrectly factored grammar.
Error 3: Incorrect Precedence
Without proper precedence declarations, operator expressions cause shift-reduce conflicts. Bison resolves these with %left and %right directives.
Error 4: State Explosion
Canonical LR(1) creates thousands of states for real languages. LALR reduces states by merging similar LR(1) items, trading acceptance for table size.
Error 5: Missing Error Recovery
Table-driven LR parsers stop at the first error. Implement error recovery by adding error productions or panic mode synchronization.
Practice Questions
Question 1
What is the difference between SLR(1) and LALR(1) Parsing?
Show answer
SLR uses FOLLOW sets to decide when to reduce. LALR propagates lookaheads through the LR automaton states, resulting in more precise reduction decisions. LALR can parse more grammars than SLR while maintaining the same table size.Question 2
What causes a shift-reduce conflict and how is it resolved?
Show answer
A shift-reduce conflict occurs when the parser can both shift the next token or reduce by a production in the same state. It is resolved by operator precedence declarations, grammar rewriting, or switching to a more powerful LR variant.Question 3
Why is LALR(1) the most commonly used LR variant in parser generators?
Show answer
LALR(1) combines the table compactness of SLR with the grammar coverage close to canonical LR(1). Most programming language constructs can be expressed as LALR(1) grammars, making it the practical sweet spot for parser generators like Bison.Challenge
Implement an LALR(1) parse table generator that reads a grammar specification, computes LR(1) items, propagates lookaheads, merges states with identical cores, and outputs ACTION and GOTO tables. Test it on a grammar for a simple expression language with conditional statements.
FAQ
graph LR
A[Grammar Specification] --> B[LR Item Construction]
B --> C[Parse Table Generation]
C --> D[LR Parser Driver]
D --> E[Parse Tree]
style D fill:#2196F3,color:#fff,stroke-width:2px
Summary
LR Parsing constructs a deterministic bottom-up parser from a context-free grammar by building LR items, constructing states via closure and goto operations, and populating ACTION and GOTO tables. LALR(1) is the most practical variant, balancing grammar coverage with table compactness, and forms the engine behind industrial-strength parser generators.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro