Skip to content

OpenTelemetry Tracing: Distributed Tracing Setup Guide

DodaTech Updated 2026-06-23 5 min read

In this tutorial, you'll learn about OpenTelemetry Tracing: Distributed Tracing Setup Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You Will Learn

This tutorial teaches you how to instrument a microservice application with OpenTelemetry, export traces to a backend, and analyze distributed request flows to identify latency bottlenecks.

Why It Matters

In monolithic applications, a single stack trace tells you where a request failed. In Microservices, a single request can cross ten services. Without distributed tracing, finding the root cause of slowness is like finding a needle in a haystack.

Real-World Use

The DodaTech sync service handles file synchronization across millions of devices. When users reported slow syncs, distributed tracing revealed that an upstream authentication service was adding 800ms of latency per request -- something no individual service dashboard could have shown.

OpenTelemetry is the industry standard for Observability. It provides a single set of APIs, SDKs, and tools for generating, collecting, and exporting telemetry data (traces, metrics, logs). It is a Cloud Native Computing Foundation incubating project and is the second most active CNCF project after Kubernetes.


Prerequisites

  • A Kubernetes cluster or Docker environment
  • Basic knowledge of Microservices Architecture
  • A running backend for trace storage (Jaeger or Zipkin)
  • Python 3.10+ or Node.js 18+ for the sample application

Step-by-Step Tutorial

Step 1: Deploy the Trace Backend with Jaeger

docker run -d --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:1.57

Expected output: Jaeger starts with the UI on port 16686 and OTLP ingest on ports 4317 (gRPC) and 4318 (HTTP).

Step 2: Create a Sample Python Application

Create a file app.py:

from flask import Flask
import requests
import time
import random

app = Flask(__name__)

@app.route("/")
def home():
    return {"service": "frontend", "status": "ok"}

@app.route("/process")
def process():
    time.sleep(random.uniform(0.1, 0.5))
    resp = requests.get("http://localhost:5001/validate")
    return {"result": "processed", "validate": resp.json()}

if __name__ == "__main__":
    app.run(port=5000)

Step 3: Instrument with OpenTelemetry

Install the OpenTelemetry SDK and Flask instrumentation:

pip install opentelemetry-api opentelemetry-sdk \
  opentelemetry-instrumentation-flask \
  opentelemetry-instrumentation-requests \
  opentelemetry-exporter-otlp-proto-grpc

Update app.py with instrumentation:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()

Step 4: Create a Downstream Service

Create validator.py:

from flask import Flask
import time
import random

app = Flask(__name__)

@app.route("/validate")
def validate():
    time.sleep(random.uniform(0.05, 0.2))
    return {"valid": True}

if __name__ == "__main__":
    app.run(port=5001)

Run both services:

python app.py &
python validator.py &

Step 5: Generate Traffic and View Traces

for i in $(seq 1 20); do curl http://localhost:5000/process; done

Open http://localhost:16686 (Jaeger UI). Search for traces from the flask service. Click on a trace to see the waterfall view showing both services.

Step 6: Add Manual Spans for Business Logic

tracer = trace.get_tracer(__name__)

@app.route("/process")
def process():
    with tracer.start_as_current_span("business_logic") as span:
        span.set_attribute("user.id", "demo-user")
        time.sleep(random.uniform(0.1, 0.5))
        resp = requests.get("http://localhost:5001/validate")
    return {"result": "processed", "validate": resp.json()}

Step 7: Add Span Attributes and Events

with tracer.start_as_current_span("db_query") as span:
    span.set_attribute("db.system", "postgresql")
    span.set_attribute("db.statement", "SELECT * FROM users")
    span.add_event("query_start", {"query_id": "q001"})
    time.sleep(0.3)
    span.add_event("query_end", {"rows_returned": 42})

Step 8: Propagate Context Across HTTP Calls

OpenTelemetry automatically handles context propagation through instrumented libraries like requests. Verify by looking at the Jaeger waterfall -- you should see the frontend span connected to the validator span by a parentSpanId link.


Learning Path

flowchart LR
    A[Install Jaeger] --> B[Create Services]
    B --> C[Add Instrumentation]
    C --> D[Generate Traffic]
    D --> E[Analyze Traces]
    E --> F[Add Custom Spans]
    C -.-> G[Auto-Instrumentation]
    C -.-> H[Manual Instrumentation]
    style A fill:#4a90d9,color:#fff
    style E fill:#e67e22,color:#fff

Common Errors

  1. Spans do not appear in Jaeger -- The OTLP exporter endpoint is incorrect. Verify Jaeger is listening on port 4317 (docker logs jaeger).

  2. Trace context is not propagated between services -- The downstream HTTP library is not instrumented. Install the opentelemetry-instrumentation-requests package.

  3. All spans appear in a single trace root -- The client and server service names are not set. Configure OTEL_SERVICE_NAME environment variable for each process.

  4. Spans have no duration information -- The span was not properly closed. Use tracer.start_as_current_span context manager instead of manual start_span.

  5. Too many spans causing performance drop -- The sampling rate is too high. Set OTEL_TRACES_SAMPLER=parentbased_traceidratio and OTEL_TRACES_SAMPLER_ARG=0.1 for 10% sampling.

  6. gRPC OTLP exporter fails to connect -- The backend does not support gRPC or the endpoint is wrong. Use the HTTP/protobuf exporter (opentelemetry-exporter-otlp-proto-http) instead.

  7. Custom attributes are not indexed -- Jaeger does not index all attributes by default. Configure Jaeger to index the specific attribute keys you need.


Practice Questions

  1. What is a trace in OpenTelemetry? Answer: A trace represents the complete path of a single request as it travels through multiple services, composed of spans.

  2. What is the difference between a span and an event? Answer: A span represents a unit of work with start and end time; an event is a timestamped annotation within a span.

  3. How does context propagation work across HTTP calls? Answer: OpenTelemetry injects trace context into HTTP headers (traceparent, tracestate) that downstream services extract to continue the trace.

  4. What are the three pillars of Observability in OpenTelemetry? Answer: Traces, metrics, and logs -- OpenTelemetry supports all three.

  5. Why should you use sampling in production tracing? Answer: To reduce storage and performance overhead while still capturing representative request patterns.


Challenge

Build a three-microservice application (frontend, payment, inventory) where each service is instrumented with OpenTelemetry. Add manual spans for business logic (e.g., "validate_payment", "check_inventory", "update_stock"). Include span attributes that capture order IDs and amounts. Inject a simulated failure in the inventory service and verify that the trace clearly shows where the error occurred. Export all traces to Jaeger and create a screenshot of the waterfall view.


FAQ

What is the difference between OpenTelemetry and OpenTracing?

OpenTelemetry is the merger of OpenTracing and OpenCensus. It provides a single, unified standard. OpenTracing is deprecated in favor of OpenTelemetry.

Does OpenTelemetry support logs?

Yes, OpenTelemetry has a logs signal API and SDK. However, log collection is still evolving compared to traces and metrics.

Can I use OpenTelemetry with non-cloud applications?

Absolutely. OpenTelemetry works on bare metal, VMs, containers, and serverless environments. It is platform-agnostic.

What languages does OpenTelemetry support?

Officially supported languages include Python, Java, Go, JavaScript/TypeScript, .NET, Ruby, PHP, Rust, C++, and Erlang.

How does OpenTelemetry handle sensitive data in traces?

You can configure span processors with custom filters to redact or drop attributes that contain sensitive information.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro