OpenTelemetry Tracing: Distributed Tracing Setup Guide
In this tutorial, you'll learn about OpenTelemetry Tracing: Distributed Tracing Setup Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You Will Learn
This tutorial teaches you how to instrument a microservice application with OpenTelemetry, export traces to a backend, and analyze distributed request flows to identify latency bottlenecks.
Why It Matters
In monolithic applications, a single stack trace tells you where a request failed. In Microservices, a single request can cross ten services. Without distributed tracing, finding the root cause of slowness is like finding a needle in a haystack.
Real-World Use
The DodaTech sync service handles file synchronization across millions of devices. When users reported slow syncs, distributed tracing revealed that an upstream authentication service was adding 800ms of latency per request -- something no individual service dashboard could have shown.
OpenTelemetry is the industry standard for Observability. It provides a single set of APIs, SDKs, and tools for generating, collecting, and exporting telemetry data (traces, metrics, logs). It is a Cloud Native Computing Foundation incubating project and is the second most active CNCF project after Kubernetes.
Prerequisites
- A Kubernetes cluster or Docker environment
- Basic knowledge of Microservices Architecture
- A running backend for trace storage (Jaeger or Zipkin)
- Python 3.10+ or Node.js 18+ for the sample application
Step-by-Step Tutorial
Step 1: Deploy the Trace Backend with Jaeger
docker run -d --name jaeger \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:1.57
Expected output: Jaeger starts with the UI on port 16686 and OTLP ingest on ports 4317 (gRPC) and 4318 (HTTP).
Step 2: Create a Sample Python Application
Create a file app.py:
from flask import Flask
import requests
import time
import random
app = Flask(__name__)
@app.route("/")
def home():
return {"service": "frontend", "status": "ok"}
@app.route("/process")
def process():
time.sleep(random.uniform(0.1, 0.5))
resp = requests.get("http://localhost:5001/validate")
return {"result": "processed", "validate": resp.json()}
if __name__ == "__main__":
app.run(port=5000)
Step 3: Instrument with OpenTelemetry
Install the OpenTelemetry SDK and Flask instrumentation:
pip install opentelemetry-api opentelemetry-sdk \
opentelemetry-instrumentation-flask \
opentelemetry-instrumentation-requests \
opentelemetry-exporter-otlp-proto-grpc
Update app.py with instrumentation:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()
Step 4: Create a Downstream Service
Create validator.py:
from flask import Flask
import time
import random
app = Flask(__name__)
@app.route("/validate")
def validate():
time.sleep(random.uniform(0.05, 0.2))
return {"valid": True}
if __name__ == "__main__":
app.run(port=5001)
Run both services:
python app.py &
python validator.py &
Step 5: Generate Traffic and View Traces
for i in $(seq 1 20); do curl http://localhost:5000/process; done
Open http://localhost:16686 (Jaeger UI). Search for traces from the flask service. Click on a trace to see the waterfall view showing both services.
Step 6: Add Manual Spans for Business Logic
tracer = trace.get_tracer(__name__)
@app.route("/process")
def process():
with tracer.start_as_current_span("business_logic") as span:
span.set_attribute("user.id", "demo-user")
time.sleep(random.uniform(0.1, 0.5))
resp = requests.get("http://localhost:5001/validate")
return {"result": "processed", "validate": resp.json()}
Step 7: Add Span Attributes and Events
with tracer.start_as_current_span("db_query") as span:
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.statement", "SELECT * FROM users")
span.add_event("query_start", {"query_id": "q001"})
time.sleep(0.3)
span.add_event("query_end", {"rows_returned": 42})
Step 8: Propagate Context Across HTTP Calls
OpenTelemetry automatically handles context propagation through instrumented libraries like requests. Verify by looking at the Jaeger waterfall -- you should see the frontend span connected to the validator span by a parentSpanId link.
Learning Path
flowchart LR
A[Install Jaeger] --> B[Create Services]
B --> C[Add Instrumentation]
C --> D[Generate Traffic]
D --> E[Analyze Traces]
E --> F[Add Custom Spans]
C -.-> G[Auto-Instrumentation]
C -.-> H[Manual Instrumentation]
style A fill:#4a90d9,color:#fff
style E fill:#e67e22,color:#fff
Common Errors
Spans do not appear in Jaeger -- The OTLP exporter endpoint is incorrect. Verify Jaeger is listening on port 4317 (
docker logs jaeger).Trace context is not propagated between services -- The downstream HTTP library is not instrumented. Install the
opentelemetry-instrumentation-requestspackage.All spans appear in a single trace root -- The client and server service names are not set. Configure
OTEL_SERVICE_NAMEenvironment variable for each process.Spans have no duration information -- The span was not properly closed. Use
tracer.start_as_current_spancontext manager instead of manualstart_span.Too many spans causing performance drop -- The sampling rate is too high. Set
OTEL_TRACES_SAMPLER=parentbased_traceidratioandOTEL_TRACES_SAMPLER_ARG=0.1for 10% sampling.gRPC OTLP exporter fails to connect -- The backend does not support gRPC or the endpoint is wrong. Use the HTTP/protobuf exporter (
opentelemetry-exporter-otlp-proto-http) instead.Custom attributes are not indexed -- Jaeger does not index all attributes by default. Configure Jaeger to index the specific attribute keys you need.
Practice Questions
What is a trace in OpenTelemetry? Answer: A trace represents the complete path of a single request as it travels through multiple services, composed of spans.
What is the difference between a span and an event? Answer: A span represents a unit of work with start and end time; an event is a timestamped annotation within a span.
How does context propagation work across HTTP calls? Answer: OpenTelemetry injects trace context into HTTP headers (traceparent, tracestate) that downstream services extract to continue the trace.
What are the three pillars of Observability in OpenTelemetry? Answer: Traces, metrics, and logs -- OpenTelemetry supports all three.
Why should you use sampling in production tracing? Answer: To reduce storage and performance overhead while still capturing representative request patterns.
Challenge
Build a three-microservice application (frontend, payment, inventory) where each service is instrumented with OpenTelemetry. Add manual spans for business logic (e.g., "validate_payment", "check_inventory", "update_stock"). Include span attributes that capture order IDs and amounts. Inject a simulated failure in the inventory service and verify that the trace clearly shows where the error occurred. Export all traces to Jaeger and create a screenshot of the waterfall view.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro