Skip to content

Health Check Endpoint Returning 503 Fix

DodaTech Updated 2026-06-24 3 min read

In this tutorial, you'll learn about Health Check Endpoint Returning 503 Fix. We cover key concepts, practical examples, and best practices.

Your health check endpoint returns HTTP 503 Service Unavailable — the application is running but reports itself as unhealthy. Kubernetes, load balancers, or monitoring systems mark the service as down.

The Problem

# WRONG — health check that fails on any non-critical dependency failure
from flask import Flask, jsonify
import redis

app = Flask(__name__)

@app.route('/health')
def health():
    try:
        redis.ping()
        return jsonify({"status": "healthy"})
    except:
        return jsonify({"status": "unhealthy"}), 503

Redis is a caching layer. A Redis restart causes the health check to return 503, even though the application can still serve requests using database fallback. Kubernetes kills the pod, causing unnecessary disruption.

Step-by-Step Fix

1. Separate liveness and readiness probes

@app.route('/health/live')
def liveness():
    # Simple: is the app process alive?
    return jsonify({"status": "alive"}), 200

@app.route('/health/ready')
def readiness():
    # More thorough: can the app serve traffic?
    checks = {
        "database": check_database(),
        "redis": check_redis(),
        "queue": check_queue()
    }
    ready = all(checks.values())
    return jsonify({"status": "ready" if ready else "not ready", "checks": checks}), \
        200 if ready else 503

2. Configure Kubernetes probes correctly

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
    - name: app
      image: myapp:latest
      livenessProbe:
        httpGet:
          path: /health/live
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 15
      readinessProbe:
        httpGet:
          path: /health/ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 10
        failureThreshold: 3

3. Add appropriate timeouts and thresholds

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  timeoutSeconds: 3  # Don't wait too long
  failureThreshold: 3  # Allow 3 failures before marking unhealthy
  periodSeconds: 10

4. Implement graceful degradation checks

def check_database(timeout=2):
    try:
        db.execute("SELECT 1", timeout=timeout)
        return True
    except Exception:
        return False

def check_redis(timeout=1):
    try:
        redis.ping()
        return True
    except Exception:
        return True  # Degraded: Redis unavailable but app still works

5. Log health check failures

@app.route('/health')
def health():
    import logging
    logger = logging.getLogger(__name__)

    try:
        db.execute("SELECT 1")
        db_ok = True
    except Exception as e:
        db_ok = False
        logger.warning(f"Health check: database unavailable: {e}")

    status = "healthy" if db_ok else "degraded"
    return jsonify({"status": status, "database": db_ok}), 200 if db_ok else 503

Expected output:

$ curl http://localhost:8080/health/ready
{
  "status": "ready",
  "checks": {
    "database": true,
    "redis": true,
    "queue": true
  }
}

Prevention Tips

  • Separate liveness (is process alive) from readiness (can serve traffic)
  • Use initialDelaySeconds to avoid startup failures
  • Use failureThreshold: 3 to avoid flapping
  • Log health check failures for debugging
  • Degrade gracefully — don't fail on non-critical dependencies

Common Mistakes with endpoint 503

  1. Forgetting deriving (Show, Eq) on custom data types needed for debugging
  2. Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable
  3. Using head and tail instead of pattern matching, causing runtime errors on empty lists

These mistakes appear frequently in real-world HEALTHCHECK code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What's the difference between liveness and readiness probes?

Liveness probes check if the application is alive (process running, not stuck). If liveness fails, Kubernetes restarts the pod. Readiness probes check if the application can serve traffic (dependencies available, cache warm). If readiness fails, Kubernetes removes the pod from Service endpoints but doesn't restart it.

Why does my health check endpoint return 503 every 30 seconds?

The probe is probably checking a dependency that has a periodic hiccup — reconnection, cache refresh, or background job. Use failureThreshold: 3 to tolerate transient failures. Check if the dependency logs errors at the same interval. Increase the probe's periodSeconds if needed.

Should I include external API status in health checks?

No. Health checks should only verify the application's local state — database, cache, internal queues. External API availability is not your service's responsibility. Failing health checks because an external API is down causes cascading failures across your infrastructure.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro