LL Parsing — Top-Down Parsing Explained with Examples

DodaTech Updated 2026-06-23 8 min read

In this tutorial, you'll learn about LL Parsing. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

LL Parsing is a top-down Parsing technique that reads input left to right, produces a leftmost derivation, and uses lookahead tokens to predict which grammar production to apply without Backtracking.

What You'll Learn & Why It Matters

In this tutorial, you will learn how LL parsers work, how to compute FIRST and FOLLOW sets, build predictive Parsing tables, and implement an LL(1) parser from scratch. LL Parsing is the foundation of hand-written recursive descent parsers used in production compilers for languages like Go and Rust.

Real-world use: The Rust compiler uses a hand-written recursive descent parser (an LL-based approach) that processes Rust source code left to right, leveraging lookahead to resolve ambiguities while producing excellent error messages.

Prerequisites

You should understand lexical analysis and syntax analysis fundamentals. Familiarity with C or Python and basic grammar notation is assumed.

What Makes LL Parsing LL

The name LL(1) describes three properties:

L: Left-to-right scan of the input
L: Leftmost derivation (expand the leftmost non-terminal first)
(1): One token of lookahead to decide which production to use

graph TD
    subgraph "LL(1) Parsing Process"
        A[Input Token Stream] --> B[LL Parser]
        B --> C{Lookup Table}
        C -->|Production| D[Stack]
        D --> E[Parse Tree]
        B --> F[Token Buffer]
        F -->|Lookahead| C
    end
    style B fill:#2196F3,color:#fff
    style C fill:#FF9800,color:#fff
    style D fill:#4CAF50,color:#fff

FIRST and FOLLOW Sets

LL parsers need two computed sets:

FIRST(X) is the set of terminal symbols that can appear at the start of any string derived from X.

FOLLOW(X) is the set of terminals that can appear immediately after X in any derivation.

def compute_first(grammar):
    first = {nt: set() for nt in grammar}

    changed = True
    while changed:
        changed = False
        for nt, productions in grammar.items():
            for production in productions:
                all_have_empty = True
                for symbol in production:
                    if symbol in grammar:
                        before = len(first[nt])
                        first[nt].update(first[symbol] - {''})
                        if '' not in first[symbol]:
                            all_have_empty = False
                            if len(first[nt]) != before:
                                changed = True
                            break
                        if len(first[nt]) != before:
                            changed = True
                    else:
                        before = len(first[nt])
                        first[nt].add(symbol)
                        if len(first[nt]) != before:
                            changed = True
                        all_have_empty = False
                        break
                if all_have_empty:
                    first[nt].add('')

    return first

grammar = {
    'E': [['T', "E'"]],
    "E'": [['+', 'T', "E'"], ['']],
    'T': [['F', "T'"]],
    "T'": [['*', 'F', "T'"], ['']],
    'F': [['id'], ['(', 'E', ')']]
}

first_sets = compute_first(grammar)
for nt, terms in first_sets.items():
    print(f"FIRST({nt}) = {terms}")

Expected output:

FIRST(E) = {'(', 'id'}
FIRST(E') = {'+', ''}
FIRST(T) = {'(', 'id'}
FIRST(T') = {'*', ''}
FIRST(F) = {'(', 'id'}

Building the Predictive Parsing Table

The Parsing table maps (non-terminal, lookahead) pairs to the production to apply.

def build_parse_table(grammar, first, follow):
    table = {}
    for nt, productions in grammar.items():
        for idx, production in enumerate(productions):
            first_of_prod = set()
            all_empty = True

            for symbol in production:
                if symbol in grammar:
                    first_of_prod.update(first[symbol] - {''})
                    if '' not in first[symbol]:
                        all_empty = False
                        break
                else:
                    first_of_prod.add(symbol)
                    all_empty = False
                    break

            for terminal in first_of_prod:
                table[(nt, terminal)] = (nt, idx, production)

            if '' in first_of_prod or all_empty:
                for terminal in follow[nt]:
                    table[(nt, terminal)] = (nt, idx, production)

    return table

def compute_follow(grammar, first):
    follow = {nt: set() for nt in grammar}
    follow['E'].add('$')

    changed = True
    while changed:
        changed = False
        for nt, productions in grammar.items():
            for production in productions:
                for i, symbol in enumerate(production):
                    if symbol in grammar:
                        beta = production[i+1:]
                        before = len(follow[symbol])

                        for j, b in enumerate(beta):
                            if b in grammar:
                                follow[symbol].update(first[b] - {''})
                                if '' not in first[b]:
                                    break
                            else:
                                follow[symbol].add(b)
                                break
                        else:
                            follow[symbol].update(follow[nt])

                        if len(follow[symbol]) != before:
                            changed = True
    return follow

follow_sets = compute_follow(grammar, first_sets)
parse_table = build_parse_table(grammar, first_sets, follow_sets)

for key, val in sorted(parse_table.items()):
    nt, terminal = key
    prod_str = ' -> '.join([val[0]] + [str(p) for p in val[2]])
    print(f"M[{nt}, {terminal}] = {prod_str}")

Expected output:

M[E, (] = E -> ['T', "E'"]
M[E, id] = E -> ['T', "E'"]
M[E', +] = E' -> ['+', 'T', "E'"]
M[E', $] = E' -> ['']
M[E', )] = E' -> ['']
M[T, (] = T -> ['F', "T'"]
M[T, id] = T -> ['F', "T'"]
M[T', *] = T' -> ['*', 'F', "T'"]
M[T', $] = T' -> ['']
M[T', )] = T' -> ['']
M[T', +] = T' -> ['']
M[F, (] = F -> ['(', 'E', ')']
M[F, id] = F -> ['id']

Implementing an LL(1) Parser

With the Parsing table built, the parser uses a stack-driven algorithm.

class LL1Parser:
    def __init__(self, parse_table, grammar, tokens):
        self.table = parse_table
        self.grammar = grammar
        self.tokens = tokens
        self.pos = 0
        self.stack = ['$', list(grammar.keys())[0]]

    def current_token(self):
        if self.pos < len(self.tokens):
            return self.tokens[self.pos]
        return ('$', '$')

    def parse(self):
        while len(self.stack) > 0:
            top = self.stack[-1]
            token = self.current_token()

            if top == token[0] or top == token[1]:
                self.stack.pop()
                self.pos += 1
            elif top == '$' and token[0] == '$':
                return True
            elif top in self.grammar:
                key = (top, token[0] if token[0] != '$' else '$')
                if key not in self.table:
                    print(f"Error at token {token}: no rule for {key}")
                    return False
                _, _, production = self.table[key]
                self.stack.pop()
                for symbol in reversed(production):
                    if symbol != '':
                        self.stack.append(symbol)
            else:
                return False
        return True

tokens = [('id', 'x'), ('+', '+'), ('id', 'y'), ('*', '*'), ('id', 'z'), ('$', '$')]
parser = LL1Parser(parse_table, grammar, tokens)
result = parser.parse()
print(f"Parse result: {'Success' if result else 'Failure'}")

Expected output:

Parse result: Success

LL Parsing Limitations

Limitation	Cause	Workaround
Left Recursion	Infinite loop in top-down	Rewrite as right-recursive
Left factoring needed	Common prefixes ambiguous	Factor out common prefix
No arbitrary lookahead	Fixed k tokens	Use LL(*) or Backtracking
Grammar restrictions	Not all CFGs are LL(1)	Use LR Parsing instead

Common Errors in LL Parsing

Error 1: Left Recursion

Rules like A -> A a cause infinite descent. Always eliminate left Recursion before building the LL table.

Error 2: Missing Left Factoring

If A -> a b | a c, the parser cannot choose between them with one lookahead. Factor: A -> a A', A' -> b | c.

Error 3: FIRST/FOLLOW Conflicts

If FIRST and FOLLOW sets overlap for the same non-terminal, the table has conflicts. This means the grammar is not LL(1).

Error 4: Incorrect Nullable Computation

A non-terminal that derives epsilon must be correctly identified. Missing nullable entries cause incorrect FIRST sets.

Error 5: Forgetting the End Marker

The parser needs $ as an end-of-input marker in both the stack and the FOLLOW set to detect completion.

Practice Questions

Question 1

What does LL(1) stand for and what does each part mean?

Show answer

LL(1) stands for Left-to-right scan, Leftmost derivation, and 1 token of lookahead. L describes the input direction, the second L describes the derivation order, and (1) specifies how many lookahead tokens the parser uses to make decisions.

Question 2

Why must left Recursion be removed from grammars used with LL parsers?

Show answer

Left-recursive rules like A -> A a cause the parser to recursively call itself without consuming any input, leading to infinite Recursion. The parser must consume at least one token per rule application to make progress.

Question 3

When does an LL(1) Parsing table have a conflict?

Show answer

A conflict occurs when a single table cell M[non-terminal, terminal] contains more than one production. This happens when FIRST sets overlap for different productions of the same non-terminal, or when FIRST and FOLLOW sets overlap for epsilon-producing rules.

Challenge

Implement an LL(1) parser for a simple calculator grammar that handles +, -, *, /, parentheses, and exponentiation (^ with right associativity). Compute FIRST and FOLLOW sets, build the parse table, and test with the input 3 + 4 * (5 ^ 2).

FAQ

What is the difference between LL and LR Parsing?

LL Parsing is top-down: it starts from the start symbol and expands toward the input. LR Parsing is bottom-up: it starts from the input and reduces toward the start symbol. LL parsers are easier to write by hand; LR parsers handle a larger class of grammars.

Can LL(1) parse all programming languages?

No. Many programming language constructs require more than one token of lookahead or are inherently ambiguous for LL(1). Most practical compilers use LL(k) with k > 1, LL(*), or augmented with Backtracking and semantic predicates.

How do modern compilers handle LL Parsing limitations?

Compilers like GCC and Clang use hand-written recursive descent parsers with Backtracking, multiple lookahead tokens, and semantic predicates. Rust's parser uses a Pratt Parsing approach for expressions, which is an LL-based technique with operator precedence handling.

graph LR
    A[Lexical Analysis] --> B[LL Parsing]
    B --> C[Semantic Analysis]
    B --> D[AST Construction]
    style B fill:#2196F3,color:#fff,stroke-width:2px

Summary

LL Parsing builds a parse tree top-down using lookahead tokens to predict grammar productions. Computing FIRST and FOLLOW sets enables constructing predictive Parsing tables that drive efficient stack-based parsers. While LL(1) cannot handle all grammars, LL techniques form the backbone of most hand-written production compilers.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous Lexical Analysis and Regular Expressions in Compiler Design Next → LR Parsing — Bottom-Up Parsing with Shift-Reduce

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Compiler Design