Skip to content

Build a Static Site Generator in Python (Step by Step)

DodaTech Updated 2026-06-21 8 min read

In this tutorial, you'll learn about Build a Static Site Generator in Python (Step by Step). We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Build a static site generator in Python that turns Markdown files into a complete HTML website with templating, auto-generated navigation, and an RSS feed.

What You'll Build

You'll write a Python tool that reads Markdown files from a content/ folder, renders them through Jinja2 HTML templates, generates a navigation sidebar based on the folder structure, and produces an RSS 2.0 feed. Think of it as a miniature version of Hugo or Jekyll — but in under 300 lines of Python.

Why Build Your Own SSG?

Most developers use static site generators daily without understanding what happens under the hood. Building one yourself demystifies the pipeline: reading source files, Parsing frontmatter, rendering templates, and writing output files. This same pipeline shows up in documentation generators, email template systems, and automated report builders. At DodaTech, a similar Markdown-to-HTML pipeline powers the DodaZIP documentation output.

Prerequisites

Step 1: Project Setup

mkdir ssg && cd ssg
python -m venv venv
source venv/bin/activate
pip install pyyaml jinja2 markdown

Create this structure:

ssg/
├── ssg.py              # Main generator script
├── content/            # Markdown source files
├── templates/          # Jinja2 HTML templates
└── output/             # Generated static site

Step 2: The Markdown Parser and Frontmatter Handler

Every content file starts with YAML frontmatter between --- delimiters, followed by Markdown body text. We'll split them apart:

# ssg.py
import os
import yaml
import markdown as md_lib
from jinja2 import Environment, FileSystemLoader
from pathlib import Path
from datetime import datetime
import xml.etree.ElementTree as ET
from xml.dom import minidom

CONTENT_DIR = "content"
TEMPLATE_DIR = "templates"
OUTPUT_DIR = "output"

def parse_file(filepath):
    with open(filepath) as f:
        content = f.read()
    if not content.startswith("---"):
        return {}, content
    _, frontmatter, body = content.split("---", 2)
    meta = yaml.safe_load(frontmatter) or {}
    return meta, body.strip()

The function reads a .md file, splits off the YAML block between the first two --- markers, parses it into a dictionary, and returns the metadata alongside the remaining Markdown body.

Step 3: The Template Renderer and Content Converter

We'll load Jinja2 templates from the templates/ folder and render each page through a base layout:

env = Environment(loader=FileSystemLoader(TEMPLATE_DIR))
env.globals["now"] = datetime.utcnow

def render_page(meta, body, nav_tree):
    html_body = md_lib.markdown(
        body,
        extensions=["fenced_code", "codehilite", "toc"]
    )
    meta.setdefault("title", "Untitled")
    meta.setdefault("date", datetime.today().strftime("%Y-%m-%d"))
    return env.get_template("page.html").render(
        title=meta["title"],
        content=html_body,
        date=meta["date"],
        nav=nav_tree,
    )

Key points:

  • We extend Markdown with fenced_code (triple-backtick code blocks) and codehilite (syntax highlighting)
  • The env.globals["now"] makes the current UTC time available in templates
  • Navigation tree is built separately and injected into every page

Step 4: Navigation Tree Builder

We'll scan the content/ directory and build a nested dictionary representing the page hierarchy:

def build_nav():
    tree = []
    for root, dirs, files in os.walk(CONTENT_DIR):
        for fname in sorted(files):
            if not fname.endswith(".md"):
                continue
            full_path = os.path.join(root, fname)
            meta, _ = parse_file(full_path)
            rel_path = os.path.relpath(full_path, CONTENT_DIR)
            slug = rel_path.replace(".md", ".html")
            tree.append({
                "title": meta.get("title", fname),
                "url": slug,
                "weight": meta.get("weight", 99),
            })
    tree.sort(key=lambda p: p["weight"])
    return tree

Each page gets a weight from its frontmatter for ordering. Pages without a weight default to 99 and sort last.

Step 5: The RSS Feed Generator

def build_rss(pages, site_title="My Site", site_url="https://example.com"):
    rss = ET.Element("rss", version="2.0")
    channel = ET.SubElement(rss, "channel")
    ET.SubElement(channel, "title").text = site_title
    ET.SubElement(channel, "link").text = site_url
    ET.SubElement(channel, "description").text = "Generated by SSG"

    for page in pages[:10]:
        item = ET.SubElement(channel, "item")
        ET.SubElement(item, "title").text = page["title"]
        ET.SubElement(item, "link").text = f"{site_url}/{page['url']}"
        ET.SubElement(item, "guid").text = f"{site_url}/{page['url']}"

    rough_string = ET.tostring(rss, encoding="unicode")
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="  ")

This generates a valid RSS 2.0 feed containing the 10 most recent pages. Search engines and RSS readers can consume it to discover new content automatically.

Step 6: The Main Build Pipeline

Now we wire everything together. The build() function reads every Markdown file, renders it, writes the HTML output, and generates the RSS feed:

def build():
    pages = []
    for root, dirs, files in os.walk(CONTENT_DIR):
        for fname in files:
            if not fname.endswith(".md"):
                continue
            full_path = os.path.join(root, fname)
            meta, body = parse_file(full_path)
            rel_path = os.path.relpath(full_path, CONTENT_DIR)
            out_path = os.path.join(OUTPUT_DIR, rel_path.replace(".md", ".html"))
            os.makedirs(os.path.dirname(out_path), exist_ok=True)

            nav = build_nav()
            html = render_page(meta, body, nav)
            with open(out_path, "w") as f:
                f.write(html)
            print(f"Built: {out_path}")

            pages.append({
                "title": meta.get("title", fname),
                "url": rel_path.replace(".md", ".html"),
                "date": meta.get("date", ""),
            })

    rss_xml = build_rss(pages)
    with open(os.path.join(OUTPUT_DIR, "feed.xml"), "w") as f:
        f.write(rss_xml)
    print(f"RSS feed written to output/feed.xml")

if __name__ == "__main__":
    build()

Step 7: Templates

Create templates/base.html — the layout that wraps every page:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>{{ title }}</title>
    <link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml">
</head>
<body>
    <nav>
        <ul>
        {% for item in nav %}
            <li><a href="{{ item.url }}">{{ item.title }}</a></li>
        {% endfor %}
        </ul>
    </nav>
    <main>
        {{ content }}
    </main>
</body>
</html>

Create templates/page.html which extends the base:

{% extends "base.html" %}
{% block content %}
<article>
    <h1>{{ title }}</h1>
    <p class="date">{{ date }}</p>
    <div>{{ content }}</div>
</article>
{% endblock %}

Step 8: Write a Content File and Build

Create content/index.md:

---
title: Welcome to My Site
weight: 1
---

# Hello

This is a page generated by our static site generator.

And content/blog/first-post.md:

---
title: First Blog Post
weight: 2
date: 2026-06-21
---

This is my first blog post written in Markdown.

Run the build:

python ssg.py

Expected output:

Built: output/index.html
Built: output/blog/first-post.html
RSS feed written to output/feed.xml

Open output/index.html in a browser — you'll see a rendered page with navigation and the blog post link.

Architecture

flowchart LR
    A[Markdown Files] --> B[Frontmatter Parser]
    B --> C[YAML Metadata]
    B --> D[Markdown Body]
    D --> E[Markdown -> HTML]
    C --> F[Jinja2 Template Engine]
    E --> F
    F --> G[HTML Pages]
    G --> H[output/]
    C --> I[Nav Tree Builder]
    I --> F
    B --> J[RSS Generator]
    J --> K[feed.xml]

Common Errors

1. YAML Parsing errors from invalid frontmatter If your frontmatter has a colon in the wrong place or a missing space after a key, yaml.safe_load() raises a ScannerError. Always validate your frontmatter with a YAML linter before building.

2. Templates not found (FileSystemLoader path) Jinja2 looks for templates relative to the TEMPLATE_DIR. If your template folder path is wrong, you'll get a TemplateNotFound error. Use absolute paths with os.path.abspath() for debugging.

3. Markdown content appears as raw HTML If you see <h1> tags rendered literally in the browser, the template is not using {{ content|safe }}. Jinja2 auto-escapes HTML by default — you must use the safe filter or mark the content as Markup().

4. RSS feed is empty or malformed The RSS generator only includes pages with valid dates and titles. If a page lacks a title in frontmatter, it appears as "Untitled". Missing dates default to today's date.

5. Navigation ordering is inconsistent Pages without a weight default to 99. If multiple pages share the same weight, their order depends on filesystem traversal order. Use distinct weight values for deterministic ordering.

Practice Questions

1. What is the role of frontmatter in a static site generator? Frontmatter provides per-page metadata (title, date, weight) that the generator uses for rendering, navigation ordering, and RSS feed entries, separating content concerns from presentation.

2. How does the nav tree Builder determine page order? It reads the weight field from each page's frontmatter and sorts pages ascending. Pages without a weight get a default of 99 and sort last.

3. Why does Jinja2 require the safe filter for HTML content? Jinja2 auto-escapes all variable output to prevent XSS Attacks. Since our Markdown-to-HTML conversion produces trusted HTML, we use {{ content|safe }} to avoid double-escaping.

4. Challenge: Add a tag-based category system Extend the frontmatter parser to support a tags: [python, tutorial] list. Generate tag archive pages that list all pages sharing a given tag. Add tag links to the RSS feed.

5. Challenge: File watching and auto-rebuild Use Python's watchdog library to monitor the content/ directory for changes and automatically rebuild only the changed page. This gives you a live-reload development experience.

FAQ

How does a static site generator differ from a CMS like WordPress?

A static site generator pre-builds all HTML files at compile time. A CMS builds pages on-demand from a database. Static sites are faster, more secure, and cheaper to host because there is no server-side processing at request time.

Can I deploy the output to GitHub Pages?

Yes. The output/ directory contains a fully static website. Push it to a GitHub Repository's gh-pages branch or configure any static hosting provider (Netlify, Vercel, S3) to serve the output/ folder.

What Markdown features are supported?

The generator uses Python-Markdown with fenced_code and codehilite extensions. This supports triple-backtick code blocks, syntax highlighting, tables, and TOC generation. Add more extensions for footnotes, definition lists, or math rendering.

Next Steps

  • Extend the generator to support custom YAML frontmatter fields like draft to skip unpublished pages
  • Learn how Jinja2 template inheritance enables complex page layouts
  • Try the Mini Search Engine tutorial to add search to your generated site
  • Add syntax highlighting themes from Pygments to your code blocks

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro