Build a Mini Docker Registry (Step-by-Step Guide)
In this tutorial, you'll learn about Build a Mini Docker Registry (Step. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Build a mini OCI-compatible container image registry in Python — a lightweight HTTP API that stores, retrieves, and manages Docker images using the same protocol Docker Hub uses.
What You'll Build
You will build a HTTP server that speaks the OCI Distribution Spec — the same protocol Docker, Podman, and containerd use to push and pull container images. Your registry will handle blob uploads, manifest management, tag listing, and content-addressable storage, all backed by the local filesystem.
Why Build Your Own Registry?
Every time you run docker pull or docker push, your client talks to a registry speaking the OCI Distribution Spec. Understanding this protocol demystifies how container images are stored, transferred, and verified. Security teams at DodaTech use custom registries to scan images for vulnerabilities before they reach production — a pattern used in Durga Antivirus Pro's container security module.
Prerequisites
- Python 3.10+ installed
- Basic Docker concepts — images, layers, tags
- Familiarity with Flask or any Python web framework
Step 1: Project Setup
mkdir mini-registry
cd mini-registry
python -m venv venv
source venv/bin/activate
pip install flask gunicorn
Step 2: Storage Layer
Container images are made of blobs (layer tarballs) and manifests (JSON descriptors). We store them content-addressably — the filename is the SHA256 digest.
# storage.py
import hashlib
import os
import json
BLOBS_DIR = "storage/blobs"
MANIFESTS_DIR = "storage/manifests"
_REPO_DIR = "storage/repos"
def _ensure():
for d in (BLOBS_DIR, MANIFESTS_DIR, _REPO_DIR):
os.makedirs(d, exist_ok=True)
def put_blob(data: bytes) -> str:
_ensure()
digest = hashlib.sha256(data).hexdigest()
path = os.path.join(BLOBS_DIR, digest)
if not os.path.exists(path):
with open(path, "wb") as f:
f.write(data)
return digest
def get_blob(digest: str) -> bytes:
path = os.path.join(BLOBS_DIR, digest)
with open(path, "rb") as f:
return f.read()
def put_manifest(name: str, tag: str, data: dict):
_ensure()
repo_dir = os.path.join(MANIFESTS_DIR, name)
os.makedirs(repo_dir, exist_ok=True)
path = os.path.join(repo_dir, f"{tag}.json")
with open(path, "w") as f:
json.dump(data, f)
def get_manifest(name: str, tag: str) -> dict:
path = os.path.join(MANIFESTS_DIR, name, f"{tag}.json")
with open(path) as f:
return json.load(f)
def list_tags(name: str) -> list:
repo_dir = os.path.join(MANIFESTS_DIR, name)
if not os.path.isdir(repo_dir):
return []
return sorted(f.replace(".json", "") for f in os.listdir(repo_dir))
Expected output: No errors on import. The module creates the storage directories on first blob write.
Step 3: Registry HTTP API
The OCI Distribution Spec defines these endpoints:
GET /v2/— API version checkPOST /v2/<name>/blobs/uploads/— start blob uploadPUT /v2/<name>/blobs/uploads/<uuid>— complete blob uploadGET /v2/<name>/blobs/<digest>— download blobPUT /v2/<name>/manifests/<tag>— upload manifestGET /v2/<name>/manifests/<tag>— download manifestGET /v2/<name>/tags/list— list tags
# registry.py
import uuid
from flask import Flask, request, jsonify, Response
from storage import put_blob, get_blob, put_manifest, get_manifest, list_tags
app = Flask(__name__)
uploads = {}
@app.route("/v2/")
def version_check():
return jsonify({})
@app.route("/v2/<name>/blobs/uploads/", methods=["POST"])
def start_upload(name):
upload_id = str(uuid.uuid4())
uploads[upload_id] = bytearray()
location = f"/v2/{name}/blobs/uploads/{upload_id}"
return Response(status=202, headers={"Location": location, "Range": "0-0"})
@app.route("/v2/<name>/blobs/uploads/<upload_id>", methods=["PUT"])
def complete_upload(name, upload_id):
data = uploads.pop(upload_id, bytearray())
chunk = request.data
if chunk:
data.extend(chunk)
digest = request.args.get("digest", "")
if digest.startswith("sha256:"):
digest = digest[7:]
stored = put_blob(bytes(data))
if stored != digest:
return jsonify({"error": "digest mismatch"}), 400
location = f"/v2/{name}/blobs/{digest}"
return Response(status=201, headers={"Location": location})
@app.route("/v2/<name>/blobs/<digest>", methods=["GET"])
def get_blob_handler(name, digest):
if digest.startswith("sha256:"):
digest = digest[7:]
try:
data = get_blob(digest)
return Response(data, mimetype="application/octet-stream")
except FileNotFoundError:
return jsonify({"error": "blob not found"}), 404
@app.route("/v2/<name>/manifests/<tag>", methods=["PUT"])
def put_manifest_handler(name, tag):
data = request.get_json()
put_manifest(name, tag, data)
return Response(status=201)
@app.route("/v2/<name>/manifests/<tag>", methods=["GET"])
def get_manifest_handler(name, tag):
try:
return jsonify(get_manifest(name, tag))
except FileNotFoundError:
return jsonify({"error": "manifest not found"}), 404
@app.route("/v2/<name>/tags/list")
def tags_list(name):
return jsonify({"name": name, "tags": list_tags(name)})
Why this matters: The upload flow uses a two-phase Process — start an upload session, then complete it with a digest. This lets clients stream large layers without keeping everything in memory.
Step 4: Run and Test
gunicorn -b 0.0.0.0:5000 registry:app
Test the registry behaves correctly:
# Version check
curl http://localhost:5000/v2/
# Expected: {}
# Start an upload
curl -X POST http://localhost:5000/v2/test/blobs/uploads/ \
-w '%{redirect_url}'
# Expected: 202 with Location header
# Push a manifest
curl -X PUT http://localhost:5000/v2/test/manifests/latest \
-H "Content-Type: application/json" \
-d '{"schemaVersion":2,"config":{"mediaType":"application/vnd.oci.image.config.v1+json","digest":"sha256:abc","size":0},"layers":[]}'
# Expected: 201
# List tags
curl http://localhost:5000/v2/test/tags/list
# Expected: {"name":"test","tags":["latest"]}
Architecture
flowchart LR
subgraph Client
DC[docker CLI / Podman]
end
subgraph Registry
API[Flask HTTP API]
ST[Storage Layer]
FS[(Filesystem
storage/blobs)]
end
DC -->|POST /v2/.../blobs/uploads| API
API -->|put_blob| ST
ST -->|write| FS
DC -->|PUT manifest| API
API -->|put_manifest| FS
DC -->|GET blob| API
API -->|get_blob| ST
ST -->|read| FS
Common Errors
1. Digest mismatch on upload The digest query parameter must match SHA256 of the blob content. Compute it server-side and compare. Our code returns a 400 error with "digest mismatch" when they don't match.
2. Manifest not found after push Make sure the manifest points to blob digests that actually exist in storage. The spec requires all referenced layers to be present before the manifest is accepted.
3. CORS errors from Docker CLI Docker Desktop's CLI sends cross-origin requests. Add CORS headers if running behind a different hostname. For localhost testing this works without CORS.
4. Upload session timeout Our in-memory upload dict loses data on server restart. Production registries use persistent storage for upload sessions or implement resumable uploads with Range headers.
5. Conflicting tags Tag updates are idempotent — pushing the same tag twice overwrites. Make sure your application handles this gracefully rather than erroring.
Practice Questions
1. What does content-addressable storage mean? Each blob is stored at a path derived from its SHA256 hash. The same blob always has the same address, so duplicate layers are automatically deduplicated without any extra logic.
2. Why does the OCI spec use a two-phase upload? Large layers (hundreds of MB) need to be streamed. The upload session lets clients send data incrementally and finalize with the digest. This avoids buffering the entire blob in memory.
3. How does the registry know which layers belong to an image? The manifest JSON lists all layer digests in order. When pulling, the client reads the manifest, then requests each blob by digest.
4. Challenge: Authentication
Add token-based authentication using HTTP Basic Auth. Use Python's flask-httpauth to protect write endpoints while allowing anonymous pulls.
5. Challenge: Garbage Collection Write a script that scans all manifests, collects referenced blob digests, and deletes unreferenced blobs from storage. This prevents disk from filling up with orphaned layers.
FAQ
Next Steps
- Add TLS support with self-signed certificates for secure pushes
- Explore multi-architecture images (manifest lists)
- Build a web UI frontend for browsing stored images
- Learn how container runtimes use OCI Runtime Spec to spin up containers from these images
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro