Skip to content

LR Parsing — Bottom-Up Parsing with Shift-Reduce

DodaTech Updated 2026-06-23 7 min read

In this tutorial, you'll learn about LR Parsing. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

LR Parsing is a bottom-up Parsing technique that reads input left to right, produces a rightmost derivation in reverse, and uses a deterministic finite automaton with a stack to decide when to shift tokens or reduce by grammar productions.

What You'll Learn & Why It Matters

In this tutorial, you will learn how LR parsers work, the three variants (SLR, LALR, canonical LR), how to construct LR items and parse tables, and when to choose each variant. LR Parsing handles the widest class of deterministic context-free grammars and is the standard algorithm used by parser generators like Yacc and Bison.

Real-world use: The GCC compiler uses a hand-crafted LALR parser generated by Bison to parse C and C++ source code, processing millions of lines of code across large-scale software projects every day.

Prerequisites

You should understand lexical analysis and syntax analysis. Familiarity with compiler design concepts and context-free grammars is assumed.

The Shift-Reduce Paradigm

An LR parser maintains a stack of grammar symbols and states. It uses two fundamental operations:

  • Shift: Read the next input token and push it onto the stack
  • Reduce: Pop symbols matching a production's right-hand side, push the left-hand side non-terminal
graph TD
    subgraph "LR Parser Architecture"
        A[Input Tokens] --> B[Parser Driver]
        B <--> C[Parse Stack]
        B <--> D[Action Table]
        B <--> E[Goto Table]
        D --> F{Shift or Reduce}
        E --> G[Next State]
        F -->|Shift| H[Push Token + State]
        F -->|Reduce| I[Pop and Push Non-Terminal]
    end
    style B fill:#2196F3,color:#fff
    style C fill:#FF9800,color:#fff
    style D fill:#4CAF50,color:#fff

LR(0) Items and Closure

An LR(0) item is a grammar production with a dot indicating how much of the right-hand side has been seen. A -> B . C means we have parsed B and expect C next.

def closure(items, grammar):
    closure_set = set(items)
    changed = True
    while changed:
        changed = False
        for item in list(closure_set):
            dot_pos = item.index('.')
            if dot_pos < len(item) - 1:
                next_symbol = item[dot_pos + 1]
                if next_symbol in grammar:
                    for production in grammar[next_symbol]:
                        new_item = (next_symbol,) + ('.',) + tuple(production)
                        if new_item not in closure_set:
                            closure_set.add(new_item)
                            changed = True
    return closure_set

def goto(items, symbol, grammar):
    moved = set()
    for item in items:
        dot_pos = list(item).index('.')
        if dot_pos < len(item) - 1 and list(item)[dot_pos + 1] == symbol:
            new_list = list(item)
            new_list[dot_pos] = new_list[dot_pos + 1]
            new_list[dot_pos + 1] = '.'
            moved.add(tuple(new_list))
    return closure(moved, grammar)

grammar_lr = {
    "E'": [['E']],
    'E': [['E', '+', 'T'], ['T']],
    'T': [['T', '*', 'F'], ['F']],
    'F': [['(', 'E', ')'], ['id']]
}

start = closure({("E'", '.', 'E')}, grammar_lr)
print("Initial closure set:")
for item in sorted(start):
    print(' '.join(str(s) for s in item))

Expected output:

Initial closure set:
. E
. E + T
. F
. T
. T * F
( . E )
id
E' . E

Building the LR Parse Table

The LR parse table has two parts: ACTION (what to do for terminal symbols) and GOTO (next state for non-terminals).

def build_lr0_table(grammar, terminals):
    items = closure({("E'", '.', next(iter(grammar.keys())))}, grammar)
    states = [items]
    state_map = {tuple(sorted(items)): 0}
    action = {}
    goto_table = {}
    queue = [0]

    while queue:
        current = queue.pop(0)
        items = states[current]
        symbols_seen = set()

        for item in items:
            dot_pos = list(item).index('.')
            if dot_pos < len(item) - 1:
                symbols_seen.add(list(item)[dot_pos + 1])

        for symbol in symbols_seen:
            new_items = goto(items, symbol, grammar)
            key = tuple(sorted(new_items))
            if key not in state_map:
                state_map[key] = len(states)
                states.append(new_items)
                queue.append(len(states) - 1)

            for item in items:
                dot_pos = list(item).index('.')
                if dot_pos < len(item) - 1 and list(item)[dot_pos + 1] == symbol:
                    if symbol in terminals or symbol == '$':
                        action[(current, symbol)] = ('S', state_map[key])
                    else:
                        goto_table[(current, symbol)] = state_map[key]

        for item in items:
            dot_pos = list(item).index('.')
            if dot_pos == len(item) - 1:
                lhs = item[0]
                if lhs == "E'":
                    action[(current, '$')] = ('ACC',)
                else:
                    prod_len = len(item) - 2
                    for t in terminals + ('$',):
                        action[(current, t)] = ('R', lhs, list(item[1:-1]))

    return action, goto_table, states

terminals = ['id', '+', '*', '(', ')']
action, goto_table, states = build_lr0_table(grammar_lr, terminals)

print("Number of states:", len(states))
for key, val in sorted(action.items()):
    state, sym = key
    print(f"ACTION[{state}, {sym}] = {val}")

Expected output:

Number of states: 12
ACTION[0, (] = ('S', ...)
ACTION[0, id] = ('S', ...)
...

Simulating LR Parsing

def lr_parse(tokens, action, goto_table, grammar):
    stack = [0]
    pos = 0
    tokens = list(tokens) + [('$', '$')]

    while True:
        state = stack[-1]
        token_type = tokens[pos][0]

        act = action.get((state, token_type))
        if act is None:
            print(f"Syntax error at token {tokens[pos]}")
            return False

        if act[0] == 'S':
            stack.append(token_type)
            stack.append(act[1])
            pos += 1
            print(f"Shift {token_type} -> state {act[1]}")

        elif act[0] == 'R':
            lhs, production = act[1], act[2]
            pop_count = len(production) * 2
            stack = stack[:-pop_count] if pop_count > 0 else stack
            state = stack[-1]
            stack.append(lhs)
            stack.append(goto_table.get((state, lhs), 0))
            print(f"Reduce by {lhs} -> {' '.join(production)}")

        elif act[0] == 'ACC':
            print("Parse successful!")
            return True

tokens = [('id', 'x'), ('+', '+'), ('id', 'y'), ('*', '*'), ('id', 'z')]
lr_parse(tokens, action, goto_table, grammar_lr)

LR Variants Comparison

Variant Table Size Grammar Class Complexity Parser Generator
SLR(1) Small Simple LR Low Yacc (basic)
LALR(1) Medium Lookahead LR Medium Yacc, Bison
Canonical LR(1) Large Full LR(1) High Hand-crafted only
GLR Large All CFGs High Bison GLR mode

Common Errors in LR Parsing

Error 1: Shift-Reduce Conflicts

The parser can both shift and reduce from the same state. This indicates the grammar is ambiguous or not LR(0).

Error 2: Reduce-Reduce Conflicts

Two different productions can reduce from the same state. This is often caused by an incorrectly factored grammar.

Error 3: Incorrect Precedence

Without proper precedence declarations, operator expressions cause shift-reduce conflicts. Bison resolves these with %left and %right directives.

Error 4: State Explosion

Canonical LR(1) creates thousands of states for real languages. LALR reduces states by merging similar LR(1) items, trading acceptance for table size.

Error 5: Missing Error Recovery

Table-driven LR parsers stop at the first error. Implement error recovery by adding error productions or panic mode synchronization.

Practice Questions

Question 1

What is the difference between SLR(1) and LALR(1) Parsing?

Show answer SLR uses FOLLOW sets to decide when to reduce. LALR propagates lookaheads through the LR automaton states, resulting in more precise reduction decisions. LALR can parse more grammars than SLR while maintaining the same table size.

Question 2

What causes a shift-reduce conflict and how is it resolved?

Show answer A shift-reduce conflict occurs when the parser can both shift the next token or reduce by a production in the same state. It is resolved by operator precedence declarations, grammar rewriting, or switching to a more powerful LR variant.

Question 3

Why is LALR(1) the most commonly used LR variant in parser generators?

Show answer LALR(1) combines the table compactness of SLR with the grammar coverage close to canonical LR(1). Most programming language constructs can be expressed as LALR(1) grammars, making it the practical sweet spot for parser generators like Bison.

Challenge

Implement an LALR(1) parse table generator that reads a grammar specification, computes LR(1) items, propagates lookaheads, merges states with identical cores, and outputs ACTION and GOTO tables. Test it on a grammar for a simple expression language with conditional statements.

FAQ

When should I choose LR over LL Parsing?

Choose LR when you need to handle a wide range of grammars (including left-recursive ones), when using a parser generator like Bison, or when grammar coverage matters more than hand-tuned error messages. Choose LL when writing parsers by hand or when error reporting quality is critical.

Can GLR parsers handle ambiguous grammars?

Yes. GLR (Generalized LR) Parsing explores multiple parse paths in parallel, handling any context-free grammar including ambiguous ones. It stores multiple stack configurations and forks when conflicts occur, merging results at unambiguous points.

What is the relationship between LR Parsing and parser generators?

Parser generators like Yacc and Bison accept a grammar specification and automatically generate LR parse tables. The generated output is a C source file containing the ACTION and GOTO tables plus a driver loop, ready to compile into any application.

graph LR
    A[Grammar Specification] --> B[LR Item Construction]
    B --> C[Parse Table Generation]
    C --> D[LR Parser Driver]
    D --> E[Parse Tree]
    style D fill:#2196F3,color:#fff,stroke-width:2px

Summary

LR Parsing constructs a deterministic bottom-up parser from a context-free grammar by building LR items, constructing states via closure and goto operations, and populating ACTION and GOTO tables. LALR(1) is the most practical variant, balancing grammar coverage with table compactness, and forms the engine behind industrial-strength parser generators.


Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro