Skip to content

Build a Mini Docker Registry (Step-by-Step Guide)

DodaTech Updated 2026-06-21 6 min read

In this tutorial, you'll learn about Build a Mini Docker Registry (Step. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Build a mini OCI-compatible container image registry in Python — a lightweight HTTP API that stores, retrieves, and manages Docker images using the same protocol Docker Hub uses.

What You'll Build

You will build a HTTP server that speaks the OCI Distribution Spec — the same protocol Docker, Podman, and containerd use to push and pull container images. Your registry will handle blob uploads, manifest management, tag listing, and content-addressable storage, all backed by the local filesystem.

Why Build Your Own Registry?

Every time you run docker pull or docker push, your client talks to a registry speaking the OCI Distribution Spec. Understanding this protocol demystifies how container images are stored, transferred, and verified. Security teams at DodaTech use custom registries to scan images for vulnerabilities before they reach production — a pattern used in Durga Antivirus Pro's container security module.

Prerequisites

  • Python 3.10+ installed
  • Basic Docker concepts — images, layers, tags
  • Familiarity with Flask or any Python web framework

Step 1: Project Setup

mkdir mini-registry
cd mini-registry
python -m venv venv
source venv/bin/activate
pip install flask gunicorn

Step 2: Storage Layer

Container images are made of blobs (layer tarballs) and manifests (JSON descriptors). We store them content-addressably — the filename is the SHA256 digest.

# storage.py
import hashlib
import os
import json

BLOBS_DIR = "storage/blobs"
MANIFESTS_DIR = "storage/manifests"
_REPO_DIR = "storage/repos"

def _ensure():
    for d in (BLOBS_DIR, MANIFESTS_DIR, _REPO_DIR):
        os.makedirs(d, exist_ok=True)

def put_blob(data: bytes) -> str:
    _ensure()
    digest = hashlib.sha256(data).hexdigest()
    path = os.path.join(BLOBS_DIR, digest)
    if not os.path.exists(path):
        with open(path, "wb") as f:
            f.write(data)
    return digest

def get_blob(digest: str) -> bytes:
    path = os.path.join(BLOBS_DIR, digest)
    with open(path, "rb") as f:
        return f.read()

def put_manifest(name: str, tag: str, data: dict):
    _ensure()
    repo_dir = os.path.join(MANIFESTS_DIR, name)
    os.makedirs(repo_dir, exist_ok=True)
    path = os.path.join(repo_dir, f"{tag}.json")
    with open(path, "w") as f:
        json.dump(data, f)

def get_manifest(name: str, tag: str) -> dict:
    path = os.path.join(MANIFESTS_DIR, name, f"{tag}.json")
    with open(path) as f:
        return json.load(f)

def list_tags(name: str) -> list:
    repo_dir = os.path.join(MANIFESTS_DIR, name)
    if not os.path.isdir(repo_dir):
        return []
    return sorted(f.replace(".json", "") for f in os.listdir(repo_dir))

Expected output: No errors on import. The module creates the storage directories on first blob write.

Step 3: Registry HTTP API

The OCI Distribution Spec defines these endpoints:

  • GET /v2/ — API version check
  • POST /v2/<name>/blobs/uploads/ — start blob upload
  • PUT /v2/<name>/blobs/uploads/<uuid> — complete blob upload
  • GET /v2/<name>/blobs/<digest> — download blob
  • PUT /v2/<name>/manifests/<tag> — upload manifest
  • GET /v2/<name>/manifests/<tag> — download manifest
  • GET /v2/<name>/tags/list — list tags
# registry.py
import uuid
from flask import Flask, request, jsonify, Response
from storage import put_blob, get_blob, put_manifest, get_manifest, list_tags

app = Flask(__name__)

uploads = {}

@app.route("/v2/")
def version_check():
    return jsonify({})

@app.route("/v2/<name>/blobs/uploads/", methods=["POST"])
def start_upload(name):
    upload_id = str(uuid.uuid4())
    uploads[upload_id] = bytearray()
    location = f"/v2/{name}/blobs/uploads/{upload_id}"
    return Response(status=202, headers={"Location": location, "Range": "0-0"})

@app.route("/v2/<name>/blobs/uploads/<upload_id>", methods=["PUT"])
def complete_upload(name, upload_id):
    data = uploads.pop(upload_id, bytearray())
    chunk = request.data
    if chunk:
        data.extend(chunk)
    digest = request.args.get("digest", "")
    if digest.startswith("sha256:"):
        digest = digest[7:]
    stored = put_blob(bytes(data))
    if stored != digest:
        return jsonify({"error": "digest mismatch"}), 400
    location = f"/v2/{name}/blobs/{digest}"
    return Response(status=201, headers={"Location": location})

@app.route("/v2/<name>/blobs/<digest>", methods=["GET"])
def get_blob_handler(name, digest):
    if digest.startswith("sha256:"):
        digest = digest[7:]
    try:
        data = get_blob(digest)
        return Response(data, mimetype="application/octet-stream")
    except FileNotFoundError:
        return jsonify({"error": "blob not found"}), 404

@app.route("/v2/<name>/manifests/<tag>", methods=["PUT"])
def put_manifest_handler(name, tag):
    data = request.get_json()
    put_manifest(name, tag, data)
    return Response(status=201)

@app.route("/v2/<name>/manifests/<tag>", methods=["GET"])
def get_manifest_handler(name, tag):
    try:
        return jsonify(get_manifest(name, tag))
    except FileNotFoundError:
        return jsonify({"error": "manifest not found"}), 404

@app.route("/v2/<name>/tags/list")
def tags_list(name):
    return jsonify({"name": name, "tags": list_tags(name)})

Why this matters: The upload flow uses a two-phase Process — start an upload session, then complete it with a digest. This lets clients stream large layers without keeping everything in memory.

Step 4: Run and Test

gunicorn -b 0.0.0.0:5000 registry:app

Test the registry behaves correctly:

# Version check
curl http://localhost:5000/v2/
# Expected: {}

# Start an upload
curl -X POST http://localhost:5000/v2/test/blobs/uploads/ \
  -w '%{redirect_url}'
# Expected: 202 with Location header

# Push a manifest
curl -X PUT http://localhost:5000/v2/test/manifests/latest \
  -H "Content-Type: application/json" \
  -d '{"schemaVersion":2,"config":{"mediaType":"application/vnd.oci.image.config.v1+json","digest":"sha256:abc","size":0},"layers":[]}'
# Expected: 201

# List tags
curl http://localhost:5000/v2/test/tags/list
# Expected: {"name":"test","tags":["latest"]}

Architecture

flowchart LR
    subgraph Client
        DC[docker CLI / Podman]
    end
    subgraph Registry
        API[Flask HTTP API]
        ST[Storage Layer]
        FS[(Filesystem
storage/blobs)] end DC -->|POST /v2/.../blobs/uploads| API API -->|put_blob| ST ST -->|write| FS DC -->|PUT manifest| API API -->|put_manifest| FS DC -->|GET blob| API API -->|get_blob| ST ST -->|read| FS

Common Errors

1. Digest mismatch on upload The digest query parameter must match SHA256 of the blob content. Compute it server-side and compare. Our code returns a 400 error with "digest mismatch" when they don't match.

2. Manifest not found after push Make sure the manifest points to blob digests that actually exist in storage. The spec requires all referenced layers to be present before the manifest is accepted.

3. CORS errors from Docker CLI Docker Desktop's CLI sends cross-origin requests. Add CORS headers if running behind a different hostname. For localhost testing this works without CORS.

4. Upload session timeout Our in-memory upload dict loses data on server restart. Production registries use persistent storage for upload sessions or implement resumable uploads with Range headers.

5. Conflicting tags Tag updates are idempotent — pushing the same tag twice overwrites. Make sure your application handles this gracefully rather than erroring.

Practice Questions

1. What does content-addressable storage mean? Each blob is stored at a path derived from its SHA256 hash. The same blob always has the same address, so duplicate layers are automatically deduplicated without any extra logic.

2. Why does the OCI spec use a two-phase upload? Large layers (hundreds of MB) need to be streamed. The upload session lets clients send data incrementally and finalize with the digest. This avoids buffering the entire blob in memory.

3. How does the registry know which layers belong to an image? The manifest JSON lists all layer digests in order. When pulling, the client reads the manifest, then requests each blob by digest.

4. Challenge: Authentication Add token-based authentication using HTTP Basic Auth. Use Python's flask-httpauth to protect write endpoints while allowing anonymous pulls.

5. Challenge: Garbage Collection Write a script that scans all manifests, collects referenced blob digests, and deletes unreferenced blobs from storage. This prevents disk from filling up with orphaned layers.

FAQ

What is the OCI Distribution Spec?

It is an open standard defining the HTTP API for container image registries. Docker Hub, GitHub Container Registry, and Quay.io all implement this spec, making them interchangeable from a client perspective.

Can I use this registry with docker push?

Yes. Run docker push localhost:5000/myimage:tag after pointing your Docker daemon at the insecure registry. Add "insecure-registries": ["localhost:5000"] to /etc/docker/daemon.json.

How is this different from Docker Registry v2?

Docker Registry v2 is a production-grade implementation with Garbage Collection, authentication plugins, storage backends (S3, GCS), and horizontal scaling. Our mini version implements the core protocol in under 100 lines of Python for learning purposes.

Next Steps

  • Add TLS support with self-signed certificates for secure pushes
  • Explore multi-architecture images (manifest lists)
  • Build a web UI frontend for browsing stored images
  • Learn how container runtimes use OCI Runtime Spec to spin up containers from these images

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro