LLVM Framework — Writing a Compiler Backend
In this tutorial, you'll learn about LLVM Framework. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
LLVM is a collection of modular and reusable compiler and toolchain technologies that provide a language-independent Intermediate Representation, optimization framework, and Code Generation backends for multiple CPU architectures.
What You'll Learn & Why It Matters
In this tutorial, you will learn how to use the LLVM framework to build compiler backends, write optimization passes, and generate machine code for multiple targets. LLVM powers Clang, Rust, Swift, and many other language implementations.
Real-world use: Durga Antivirus Pro uses LLVM's Intermediate Representation analysis to scan binaries for malicious patterns across x86, ARM, and RISC-V architectures using a single detection engine, thanks to LLVM's target-independent IR.
Prerequisites
You should understand intermediate representations from the IR tutorial and Code Generation from the code generation tutorial. Familiarity with C++ is required for LLVM development.
LLVM Architecture
LLVM follows a three-phase design: front end, optimizer, and back end.
graph TD
subgraph "LLVM Architecture"
F1[C Front End] --> IR[LLVM IR]
F2[C++ Front End] --> IR
F3[Rust Front End] --> IR
F4[Swift Front End] --> IR
IR --> OPT[Optimizer Passes]
OPT --> BE1[x86 Backend]
OPT --> BE2[ARM Backend]
OPT --> BE3[RISC-V Backend]
OPT --> BE4[WebAssembly Backend]
end
style IR fill:#4CAF50,color:#fff
style OPT fill:#FF9800,color:#fff
style F1 fill:#2196F3,color:#fff
style F2 fill:#2196F3,color:#fff
style F3 fill:#2196F3,color:#fff
style F4 fill:#2196F3,color:#fff
LLVM Intermediate Representation
LLVM IR is a low-level, strongly typed, SSA-based representation with three forms: textual (.ll), bitcode (.bc), and in-memory.
; LLVM IR example
define i32 @add(i32 %a, i32 %b) {
entry:
%result = add i32 %a, %b
ret i32 %result
}
define i32 @main() {
entry:
%x = call i32 @add(i32 3, i32 4)
ret i32 %x
}
Generating LLVM IR from Python
The llvmlite library provides Python bindings for LLVM:
from llvmlite import ir
module = ir.Module(name="my_module")
# Declare the function type: i32 (i32, i32)
func_type = ir.FunctionType(ir.IntType(32), [ir.IntType(32), ir.IntType(32)])
func = ir.Function(module, func_type, name="add")
# Create the entry block
block = func.append_basic_block(name="entry")
builder = ir.IRBuilder(block)
# Get function arguments
a, b = func.args
a.name = "a"
b.name = "b"
# Generate add instruction
result = builder.add(a, b, name="result")
builder.ret(result)
# Create main function
main_type = ir.FunctionType(ir.IntType(32), [])
main_func = ir.Function(module, main_type, name="main")
main_block = main_func.append_basic_block(name="entry")
builder = ir.IRBuilder(main_block)
# Call add(3, 4)
three = ir.Constant(ir.IntType(32), 3)
four = ir.Constant(ir.IntType(32), 4)
call_result = builder.call(func, [three, four], name="call_result")
builder.ret(call_result)
print(str(module))
Expected output:
; ModuleID = "my_module"
define i32 @add(i32 %a, i32 %b) {
entry:
%result = add i32 %a, %b
ret i32 %result
}
define i32 @main() {
entry:
%call_result = call i32 @add(i32 3, i32 4)
ret i32 %call_result
}
Writing LLVM Optimization Passes
LLVM optimization passes transform IR to improve performance. Each pass is a C++ class that implements a runOnFunction or runOnModule method.
#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/Instructions.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;
namespace {
struct MyPass : public FunctionPass {
static char ID;
MyPass() : FunctionPass(ID) {}
bool runOnFunction(Function &F) override {
errs() << "Running MyPass on function: " << F.getName() << "\n";
for (BasicBlock &BB : F) {
for (Instruction &I : BB) {
if (auto *AddOp = dyn_cast<BinaryOperator>(&I)) {
if (AddOp->getOpcode() == Instruction::Add) {
errs() << " Found add instruction: " << I << "\n";
}
}
}
}
return false; // Did not modify the function
}
};
}
char MyPass::ID = 0;
static RegisterPass<MyPass> X("my-pass", "My Custom Pass");
JIT Compilation with LLVM
LLVM's JIT (Just-In-Time) engine compiles IR to machine code at runtime:
from llvmlite import binding as llvm
llvm.initialize()
llvm.initialize_native_target()
llvm.initialize_native_asmprinter()
# Create execution engine
target = llvm.Target.from_default_triple()
target_machine = target.create_target_machine()
backing_mod = llvm.parse_assembly("")
engine = llvm.create_mcjit_compiler(backing_mod, target_machine)
# Compile module
mod = llvm.parse_assembly(str(module))
engine.add_module(mod)
engine.finalize_object()
# Get function pointer and call it
func_ptr = engine.get_function_address("main")
import ctypes
result = ctypes.CFUNCTYPE(ctypes.c_int)(func_ptr)()
print(f"Result: {result}")
Expected output:
Result: 7
LLVM Pass Pipeline
Compilers using LLVM organize passes into pipelines. The standard -O2 pipeline includes dozens of passes:
# Pseudocode for optimization pipeline
pass_pipeline = [
"mem2reg", "# Promote memory to SSA registers
"instcombine"", "# Instruction combining
"reassociate"", "# Expression reassociation
"gvn"", "# Global value numbering
"simplifycfg"", "# CFG simplification
"licm"", "# Loop invariant code motion
"loop-unroll"", "# Loop unrolling
"inline"", "# Function inlining
"constprop"", "# Constant propagation
"deadargelim"", # Dead argument elimination
]
Common Errors with LLVM
Error 1: Mismatched Types
LLVM IR is strongly typed. Adding a 32-bit and a 64-bit value without explicit casts generates invalid IR. Always match operand types.
Error 2: Invalid SSA Form
Every SSA variable must be defined before use. Phi functions must list the correct predecessor blocks. LLVM's verifier catches these errors.
Error 3: Incorrect Module Structure
Functions must have at least one basic block ending with a terminator instruction. Missing terminators cause verification failures.
Error 4: Memory Management
LLVM uses intrusive reference counting. Holding pointers to deleted LLVM objects causes use-after-free bugs. Use llvm::OwningModulePtr or similar RAII wrappers.
Error 5: Target Triple Mismatch
Generating ARM code for an x86 target produces incorrect binaries. Always set the target triple to match the intended execution platform.
Practice Questions
Question 1
What is LLVM IR?
Show answer
LLVM IR is a low-level, strongly typed, SSA-based Intermediate Representation that serves as the common interface between language front ends and machine code back ends.Question 2
What is a pass in LLVM?
Show answer
A pass is a transformation or analysis that operates on LLVM IR. Analysis passes collect information; transformation passes modify the IR to improve or instrument the code.Question 3
How does LLVM support multiple target architectures?
Show answer
All front ends produce the same LLVM IR. Each target architecture has a back end that converts IR to target-specific machine code. Adding a new target requires only a back end; all existing front ends support it.Question 4
What is MCJIT in LLVM?
Show answer
MCJIT (Machine Code Just-In-Time) compiles LLVM IR to machine code at runtime and allows executing the generated code immediately. It is used in language runtimes and dynamic Code Generation systems.Question 5
What is the difference between Clang and LLVM?
Show answer
Clang is a C/C++/Objective-C front end that parses source code and generates LLVM IR. LLVM is the optimizer and Code Generation framework that transforms IR to machine code. Clang uses LLVM as its backend.Challenge
Build a small compiler in Python using llvmlite that takes a simple arithmetic expression (like 3 + 4 * 5 - 2), parses it, generates LLVM IR, runs optimization passes, and JIT-compiles it to compute the result. Output both the IR text and the execution result.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro