Source Maps and Debug Information in Compilers
In this tutorial, you'll learn about Source Maps and Debug Information in Compilers. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Source maps are data files that map compiled or transpiled code back to its original source, enabling developers to debug minified JavaScript, transpiled TypeScript, or compiled languages in their original form rather than in the generated output.
What You'll Learn & Why It Matters
In this tutorial, you will learn how source maps work, the DWARF debug information format used by native compilers, and how to generate debug information for your own compilers. Debug information is essential for productive development with compiled languages.
Real-world use: Durga Antivirus Pro uses debug information analysis to reconstruct source-level information from stripped binaries during malware analysis, identifying function boundaries and variable names that attackers tried to remove.
Prerequisites
You should understand Code Generation from the code generation tutorial. Familiarity with JavaScript and TypeScript is helpful for the source map section. Basic knowledge of how debuggers work is assumed.
Why Debug Information Matters
Without debug information, debugging would show only raw assembly:
;; Without debug info
0x401000: mov [rbp-4], 5
0x401007: mov [rbp-8], 3
0x40100e: mov eax, [rbp-4]
0x401011: add eax, [rbp-8]
0x401014: mov [rbp-12], eax
;; With debug info
int main() {
// 0x401000 line 5: int x = 5;
// 0x401007 line 6: int y = 3;
// 0x40100e line 7: int z = x + y;
}
Debug information maps machine addresses to source file, line number, column, variable names, and types.
graph TD
A[Source Code] --> B[Compiler]
B --> C[Machine Code]
B --> D[Debug Information]
D --> E[DWARF or Source Map]
C --> F[Debugger]
D --> F
F --> G[Source-Level Debugging]
style A fill:#4CAF50,color:#fff
style C fill:#f44336,color:#fff
style D fill:#2196F3,color:#fff
style G fill:#4CAF50,color:#fff
Source Maps for JavaScript/Web
Source maps allow debugging transpiled/minified code in its original form. They are JSON files with a .map extension.
Source Map Format
{
"version": 3,
"file": "output.min.js",
"sources": ["input.ts"],
"names": ["add", "x", "y", "result"],
"mappings": "AAAA,IAAIA,GAAG,CAACC,CAAD,EAAMC,CAAN;AAAA;AAAA,QAAYC,MAAM,GAAGF,CAAC,GAAGC,CAAzB;AAAA,SAAeC,MAAf;AAAA;AAAA,CAAX"
}
The mappings field uses VLQ (Variable-Length Quantity) base64 encoding to store line/column mappings compactly.
Generating Source Maps
import base64
class SourceMapGenerator:
def __init__(self, source_file, output_file):
self.source_file = source_file
self.output_file = output_file
self.mappings = []
self.names = []
self.name_index = {}
def add_mapping(self, gen_line, gen_col, source_line, source_col, name=None):
entry = {
"generated": (gen_line, gen_col),
"source": (source_line, source_col)
}
if name:
if name not in self.name_index:
self.name_index[name] = len(self.names)
self.names.append(name)
entry["name"] = self.name_index[name]
self.mappings.append(entry)
def encode_vlq(self, value):
# Simplified VLQ encoding
vlq_map = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
if value < 0:
value = ((-value) << 1) | 1
else:
value <<= 1
result = []
while value > 31:
result.append(vlq_map[32 | (value & 31)])
value >>= 5
result.append(vlq_map[value & 31])
return "".join(result)
def generate(self):
mapping_strings = []
for m in self.mappings:
parts = [
self.encode_vlq(m["generated"][1]),
self.encode_vlq(m["source"][0]),
self.encode_vlq(m["source"][1]),
]
if "name" in m:
parts.append(self.encode_vlq(m["name"]))
mapping_strings.append(",".join(parts))
return {
"version": 3,
"file": self.output_file,
"sources": [self.source_file],
"names": self.names,
"mappings": ";".join(mapping_strings),
"sourceRoot": ""
}
gen = SourceMapGenerator("input.ts", "output.js")
gen.add_mapping(0, 10, 0, 0, "add")
gen.add_mapping(0, 14, 0, 6, "x")
gen.add_mapping(0, 16, 0, 9, "y")
result = gen.generate()
print(f"Version: {result['version']}")
print(f"Sources: {result['sources']}")
print(f"Names: {result['names']}")
Expected output:
Version: 3
Sources: ['input.ts']
Names: ['add', 'x', 'y']
DWARF Debug Format
DWARF is the standard debug information format for native compilers (GCC, Clang, Rust). It stores information as Debugging Information Entries (DIEs) in a tree structure.
class DwarfDIE:
def __init__(self, tag, attributes=None):
self.tag = tag
self.attributes = attributes or {}
self.children = []
def add_child(self, child):
self.children.append(child)
def print(self, indent=0):
prefix = " " * indent
attrs = ", ".join(f"{k}: {v}" for k, v in self.attributes.items())
print(f"{prefix}DIE {self.tag} ({attrs})")
for child in self.children:
child.print(indent + 1)
# Build a DWARF tree for a simple function
compile_unit = DwarfDIE("DW_TAG_compile_unit", {
"DW_AT_name": "program.c",
"DW_AT_language": "C99"
})
function = DwarfDIE("DW_TAG_subprogram", {
"DW_AT_name": "main",
"DW_AT_low_pc": "0x401000",
"DW_AT_high_pc": "0x401050"
})
param_x = DwarfDIE("DW_TAG_formal_parameter", {
"DW_AT_name": "x",
"DW_AT_type": "int",
"DW_AT_location": "reg:rdi"
})
function.add_child(param_x)
compile_unit.add_child(function)
compile_unit.print()
Expected output:
DIE DW_TAG_compile_unit (DW_AT_name: program.c, DW_AT_language: C99)
DIE DW_TAG_subprogram (DW_AT_name: main, DW_AT_low_pc: 0x401000, DW_AT_high_pc: 0x401050)
DIE DW_TAG_formal_parameter (DW_AT_name: x, DW_AT_type: int, DW_AT_location: reg:rdi)
Emitting Debug Information from a Compiler
class DebugInfoEmitter:
def __init__(self):
self.line_number_program = []
self.variable_table = []
def emit_line(self, source_line, machine_addr):
self.line_number_program.append({
"source_line": source_line,
"address": machine_addr
})
def emit_variable(self, name, type_name, scope, location):
self.variable_table.append({
"name": name,
"type": type_name,
"scope": scope,
"location": location
})
def finalize(self):
return {
"line_table": self.line_number_program,
"variables": self.variable_table
}
debug = DebugInfoEmitter()
debug.emit_line(5, "0x401000") # int x = 5
debug.emit_line(6, "0x401007") # int y = 3
debug.emit_line(7, "0x40100e") # int z = x + y
debug.emit_variable("x", "int", "main", "rbp-4")
debug.emit_variable("y", "int", "main", "rbp-8")
info = debug.finalize()
print("Line table:")
for entry in info["line_table"]:
print(f" Line {entry['source_line']} -> {entry['address']}")
print("Variables:")
for var in info["variables"]:
print(f" {var['name']}: {var['type']} at {var['location']}")
Expected output:
Line table:
Line 5 -> 0x401000
Line 6 -> 0x401007
Line 7 -> 0x40100e
Variables:
x: int at rbp-4
y: int at rbp-8
Common Errors in Debug Information
Error 1: Wrong Line Number Mapping
Off-by-one errors in line number tables cause breakpoints to trigger at incorrect locations. Always verify mappings with a debugger.
Error 2: Missing Variable Locations
If a variable is optimized into a register, the debug info must specify the correct register. Missing location info shows variables as "optimized out."
Error 3: Not Handling Inlined Functions
Inlining creates multiple PC ranges mapping to the same source line. DWARF uses DW_TAG_inlined_subroutine with DW_AT_call_file and DW_AT_call_line.
Error 4: Source Map Size Bloat
Source maps for large minified files can exceed the original code size several times. Use source map file splitting or compression.
Error 5: Security: Hiding Source Maps
Publishing source maps in production exposes the original source code. Only serve source maps to authorized users or use them for internal debugging.
Practice Questions
Question 1
What information does a source map contain?
Show answer
A source map contains the original source file names, a mappings field that maps generated positions to source positions using VLQ encoding, and optional symbol names that appear in the original source.Question 2
What is DWARF?
Show answer
DWARF is a standardized debug data format used by native compilers (GCC, Clang) to describe program types, variables, functions, and line number information in a structured, machine-readable format.Question 3
How does VLQ encoding work in source maps?
Show answer
VLQ (Variable-Length Quantity) encoding represents integers in a compact, variable-length format using base64 characters. Smaller numbers use fewer characters. It encodes differences between consecutive mappings for further compression.Question 4
Why might a debugger show a variable as "optimized out"?
Show answer
When the compiler optimizes away a variable (no register or memory location holds it at a given point), the debugger cannot determine its value. The debug information marks these variables as unavailable.Question 5
What is the difference between -g and -g0 in GCC?
Show answer
`-g` generates default debug information (DWARF level 2). `-g0` disables all debug information. `-g3` includes extra information like macro definitions.Challenge
Build a source map generator for a small Transpiler that converts a simple arithmetic expression language to JavaScript. The source map should correctly map each generated JavaScript line back to the original source expression, and the output should work with browser DevTools for debugging.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro