Skip to content

Source Maps and Debug Information in Compilers

DodaTech Updated 2026-06-21 7 min read

In this tutorial, you'll learn about Source Maps and Debug Information in Compilers. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Source maps are data files that map compiled or transpiled code back to its original source, enabling developers to debug minified JavaScript, transpiled TypeScript, or compiled languages in their original form rather than in the generated output.

What You'll Learn & Why It Matters

In this tutorial, you will learn how source maps work, the DWARF debug information format used by native compilers, and how to generate debug information for your own compilers. Debug information is essential for productive development with compiled languages.

Real-world use: Durga Antivirus Pro uses debug information analysis to reconstruct source-level information from stripped binaries during malware analysis, identifying function boundaries and variable names that attackers tried to remove.

Prerequisites

You should understand Code Generation from the code generation tutorial. Familiarity with JavaScript and TypeScript is helpful for the source map section. Basic knowledge of how debuggers work is assumed.

Why Debug Information Matters

Without debug information, debugging would show only raw assembly:

;; Without debug info
0x401000: mov [rbp-4], 5
0x401007: mov [rbp-8], 3
0x40100e: mov eax, [rbp-4]
0x401011: add eax, [rbp-8]
0x401014: mov [rbp-12], eax

;; With debug info
int main() {
    // 0x401000 line 5: int x = 5;
    // 0x401007 line 6: int y = 3;
    // 0x40100e line 7: int z = x + y;
}

Debug information maps machine addresses to source file, line number, column, variable names, and types.

graph TD
    A[Source Code] --> B[Compiler]
    B --> C[Machine Code]
    B --> D[Debug Information]
    D --> E[DWARF or Source Map]
    C --> F[Debugger]
    D --> F
    F --> G[Source-Level Debugging]
    style A fill:#4CAF50,color:#fff
    style C fill:#f44336,color:#fff
    style D fill:#2196F3,color:#fff
    style G fill:#4CAF50,color:#fff

Source Maps for JavaScript/Web

Source maps allow debugging transpiled/minified code in its original form. They are JSON files with a .map extension.

Source Map Format

{
  "version": 3,
  "file": "output.min.js",
  "sources": ["input.ts"],
  "names": ["add", "x", "y", "result"],
  "mappings": "AAAA,IAAIA,GAAG,CAACC,CAAD,EAAMC,CAAN;AAAA;AAAA,QAAYC,MAAM,GAAGF,CAAC,GAAGC,CAAzB;AAAA,SAAeC,MAAf;AAAA;AAAA,CAAX"
}

The mappings field uses VLQ (Variable-Length Quantity) base64 encoding to store line/column mappings compactly.

Generating Source Maps

import base64

class SourceMapGenerator:
    def __init__(self, source_file, output_file):
        self.source_file = source_file
        self.output_file = output_file
        self.mappings = []
        self.names = []
        self.name_index = {}

    def add_mapping(self, gen_line, gen_col, source_line, source_col, name=None):
        entry = {
            "generated": (gen_line, gen_col),
            "source": (source_line, source_col)
        }
        if name:
            if name not in self.name_index:
                self.name_index[name] = len(self.names)
                self.names.append(name)
            entry["name"] = self.name_index[name]
        self.mappings.append(entry)

    def encode_vlq(self, value):
        # Simplified VLQ encoding
        vlq_map = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
        if value < 0:
            value = ((-value) << 1) | 1
        else:
            value <<= 1
        result = []
        while value > 31:
            result.append(vlq_map[32 | (value & 31)])
            value >>= 5
        result.append(vlq_map[value & 31])
        return "".join(result)

    def generate(self):
        mapping_strings = []
        for m in self.mappings:
            parts = [
                self.encode_vlq(m["generated"][1]),
                self.encode_vlq(m["source"][0]),
                self.encode_vlq(m["source"][1]),
            ]
            if "name" in m:
                parts.append(self.encode_vlq(m["name"]))
            mapping_strings.append(",".join(parts))

        return {
            "version": 3,
            "file": self.output_file,
            "sources": [self.source_file],
            "names": self.names,
            "mappings": ";".join(mapping_strings),
            "sourceRoot": ""
        }

gen = SourceMapGenerator("input.ts", "output.js")
gen.add_mapping(0, 10, 0, 0, "add")
gen.add_mapping(0, 14, 0, 6, "x")
gen.add_mapping(0, 16, 0, 9, "y")
result = gen.generate()
print(f"Version: {result['version']}")
print(f"Sources: {result['sources']}")
print(f"Names: {result['names']}")

Expected output:

Version: 3
Sources: ['input.ts']
Names: ['add', 'x', 'y']

DWARF Debug Format

DWARF is the standard debug information format for native compilers (GCC, Clang, Rust). It stores information as Debugging Information Entries (DIEs) in a tree structure.

class DwarfDIE:
    def __init__(self, tag, attributes=None):
        self.tag = tag
        self.attributes = attributes or {}
        self.children = []

    def add_child(self, child):
        self.children.append(child)

    def print(self, indent=0):
        prefix = "  " * indent
        attrs = ", ".join(f"{k}: {v}" for k, v in self.attributes.items())
        print(f"{prefix}DIE {self.tag} ({attrs})")
        for child in self.children:
            child.print(indent + 1)

# Build a DWARF tree for a simple function
compile_unit = DwarfDIE("DW_TAG_compile_unit", {
    "DW_AT_name": "program.c",
    "DW_AT_language": "C99"
})

function = DwarfDIE("DW_TAG_subprogram", {
    "DW_AT_name": "main",
    "DW_AT_low_pc": "0x401000",
    "DW_AT_high_pc": "0x401050"
})

param_x = DwarfDIE("DW_TAG_formal_parameter", {
    "DW_AT_name": "x",
    "DW_AT_type": "int",
    "DW_AT_location": "reg:rdi"
})

function.add_child(param_x)
compile_unit.add_child(function)
compile_unit.print()

Expected output:

DIE DW_TAG_compile_unit (DW_AT_name: program.c, DW_AT_language: C99)
  DIE DW_TAG_subprogram (DW_AT_name: main, DW_AT_low_pc: 0x401000, DW_AT_high_pc: 0x401050)
    DIE DW_TAG_formal_parameter (DW_AT_name: x, DW_AT_type: int, DW_AT_location: reg:rdi)

Emitting Debug Information from a Compiler

class DebugInfoEmitter:
    def __init__(self):
        self.line_number_program = []
        self.variable_table = []

    def emit_line(self, source_line, machine_addr):
        self.line_number_program.append({
            "source_line": source_line,
            "address": machine_addr
        })

    def emit_variable(self, name, type_name, scope, location):
        self.variable_table.append({
            "name": name,
            "type": type_name,
            "scope": scope,
            "location": location
        })

    def finalize(self):
        return {
            "line_table": self.line_number_program,
            "variables": self.variable_table
        }

debug = DebugInfoEmitter()
debug.emit_line(5, "0x401000")  # int x = 5
debug.emit_line(6, "0x401007")  # int y = 3
debug.emit_line(7, "0x40100e")  # int z = x + y
debug.emit_variable("x", "int", "main", "rbp-4")
debug.emit_variable("y", "int", "main", "rbp-8")
info = debug.finalize()
print("Line table:")
for entry in info["line_table"]:
    print(f"  Line {entry['source_line']} -> {entry['address']}")
print("Variables:")
for var in info["variables"]:
    print(f"  {var['name']}: {var['type']} at {var['location']}")

Expected output:

Line table:
  Line 5 -> 0x401000
  Line 6 -> 0x401007
  Line 7 -> 0x40100e
Variables:
  x: int at rbp-4
  y: int at rbp-8

Common Errors in Debug Information

Error 1: Wrong Line Number Mapping

Off-by-one errors in line number tables cause breakpoints to trigger at incorrect locations. Always verify mappings with a debugger.

Error 2: Missing Variable Locations

If a variable is optimized into a register, the debug info must specify the correct register. Missing location info shows variables as "optimized out."

Error 3: Not Handling Inlined Functions

Inlining creates multiple PC ranges mapping to the same source line. DWARF uses DW_TAG_inlined_subroutine with DW_AT_call_file and DW_AT_call_line.

Error 4: Source Map Size Bloat

Source maps for large minified files can exceed the original code size several times. Use source map file splitting or compression.

Error 5: Security: Hiding Source Maps

Publishing source maps in production exposes the original source code. Only serve source maps to authorized users or use them for internal debugging.

Practice Questions

Question 1

What information does a source map contain?

Show answer A source map contains the original source file names, a mappings field that maps generated positions to source positions using VLQ encoding, and optional symbol names that appear in the original source.

Question 2

What is DWARF?

Show answer DWARF is a standardized debug data format used by native compilers (GCC, Clang) to describe program types, variables, functions, and line number information in a structured, machine-readable format.

Question 3

How does VLQ encoding work in source maps?

Show answer VLQ (Variable-Length Quantity) encoding represents integers in a compact, variable-length format using base64 characters. Smaller numbers use fewer characters. It encodes differences between consecutive mappings for further compression.

Question 4

Why might a debugger show a variable as "optimized out"?

Show answer When the compiler optimizes away a variable (no register or memory location holds it at a given point), the debugger cannot determine its value. The debug information marks these variables as unavailable.

Question 5

What is the difference between -g and -g0 in GCC?

Show answer `-g` generates default debug information (DWARF level 2). `-g0` disables all debug information. `-g3` includes extra information like macro definitions.

Challenge

Build a source map generator for a small Transpiler that converts a simple arithmetic expression language to JavaScript. The source map should correctly map each generated JavaScript line back to the original source expression, and the output should work with browser DevTools for debugging.

FAQ

What is the difference between source maps and DWARF?

Source maps are JSON-based and used primarily for web languages (JavaScript, TypeScript). DWARF is a binary format used by native compilers (C, C++, Rust). Source maps focus on line/column mapping; DWARF includes full type information.

Do all compilers support debug information generation?

Most production compilers support debug information. GCC and Clang use DWARF. Rust uses DWARF with extensions. TypeScript can generate source maps. Some compilers require explicit flags (-g) to enable debug output.

How do debuggers use debug information?

Debuggers parse DWARF or source map data to map program addresses to source locations, display variable values using location expressions, and navigate call stacks using frame information. Without debug info, debugging is limited to assembly.

Can debug information be stripped from binaries?

Yes. The strip command removes debug information to reduce binary size. Debug info can account for 50-70% of a binary's size. Stripped binaries still run correctly but cannot be debugged at the source level.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro