Datadog Introduction: APM and Infrastructure Monitoring
In this tutorial, you'll learn about Datadog Introduction: APM and Infrastructure Monitoring. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You Will Learn
This tutorial teaches you how to set up Datadog for infrastructure monitoring, application performance monitoring (APM), log management, and dashboard creation -- all through a single agent.
Why It Matters
Most Observability tools require separate agents, backends, and dashboards for metrics, traces, and logs. Datadog unifies all three signals into one platform, reducing operational overhead and enabling faster correlation during incident investigations.
Real-World Use
The Doda Browser team uses Datadog APM to trace every API request from the browser client through the backend services. When a user reports slowness, the team finds the trace, identifies the slowest span, and sees the host-level CPU and memory metrics alongside the trace -- all in one view.
Datadog is a SaaS-based monitoring and analytics platform. It provides infrastructure monitoring, APM, log management, synthetic monitoring, and security monitoring. The Datadog Agent is installed on hosts and collects metrics, traces, and logs, forwarding them to the Datadog backend.
Prerequisites
- A Datadog account (14-day free trial available)
- A Linux server or local VM
- Python 3.8+ for the sample application
- Basic understanding of Prometheus Introduction or other monitoring tools
Step-by-Step Tutorial
Step 1: Install the Datadog Agent
DD_API_KEY=your_api_key_here DD_SITE="datadoghq.com" bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"
Expected output: The Agent installs and starts. Verify with sudo <a href="/devops/monitoring-tools/">datadog</a>-agent status.
Step 2: Verify Agent Installation
sudo datadog-agent status | head -20
Look for "Running" in the Agent status. Open the Datadog web dashboard and navigate to Infrastructure > Host Map. Your host should appear.
Step 3: Enable Integrations
Datadog provides 700+ integrations. Enable common ones:
# Enable the Redis integration
sudo cp /etc/datadog-agent/conf.d/redisdb.d/conf.yaml.example \
/etc/datadog-agent/conf.d/redisdb.d/conf.yaml
sudo vi /etc/datadog-agent/conf.d/redisdb.d/conf.yaml
init_config:
instances:
- host: localhost
port: 6379
# Enable the Nginx integration
sudo cp /etc/datadog-agent/conf.d/nginx.d/conf.yaml.example \
/etc/datadog-agent/conf.d/nginx.d/conf.yaml
Step 4: Restart the Agent
sudo systemctl restart datadog-agent
Step 5: Instrument a Python Application with APM
pip install ddtrace
Create app.py:
from flask import Flask
import time
import random
app = Flask(__name__)
@app.route("/")
def home():
return {"message": "Hello from Datadog"}
@app.route("/process")
def process():
time.sleep(random.uniform(0.1, 0.3))
return {"status": "processed"}
if __name__ == "__main__":
app.run(port=5000)
Run with ddtrace:
DD_SERVICE="my-app" DD_ENV="production" DD_VERSION="1.0" \
ddtrace-run python app.py
Generate traffic:
for i in $(seq 1 50); do curl http://localhost:5000/process; done
Step 6: View Traces in Datadog
In the Datadog web dashboard:
- Navigate to APM > Traces
- Select the
my-appservice - Click on a trace to see the Waterfall view
Step 7: Create Custom Metrics
from ddtrace import tracer
@app.route("/custom")
def custom():
with tracer.trace("custom.operation", service="my-app") as span:
span.set_tag("user.id", "demo-user")
time.sleep(0.2)
span.set_metric("custom.processing_time", 200)
return {"custom": "done"}
Step 8: Create a Dashboard
- In Datadog, go to Dashboards > New Dashboard
- Add a Timeseries widget with the metric
system.cpu.user - Add an APM trace search widget showing traces from
my-app - Add a Log Stream widget showing recent error logs
- Set template variables for
envandservice
Step 9: Set Up Monitors and Alerts
- Go to Monitors > New Monitor
- Choose Metric Monitor
- Define:
avg:system.cpu.user{*} by {host} > 80 - Set alert condition: above 80 for 5 minutes
- Configure notification message with @slack or @pagerduty
Learning Path
flowchart LR
A[Install Datadog Agent] --> B[Enable Integrations]
B --> C[Infrastructure Metrics]
A --> D[APM Instrumentation]
D --> E[Distributed Traces]
A --> F[Log Collection]
C --> G[Dashboards]
E --> G
F --> G
G --> H[Monitors & Alerts]
style A fill:#4a90d9,color:#fff
style H fill:#e67e22,color:#fff
Common Errors
Agent status shows "not running" -- The Agent service failed to start. Check
journalctl -u <a href="/devops/monitoring-tools/">datadog</a>-agentfor error logs and verify the API key is correct.Host does not appear in the infrastructure list -- The Agent cannot connect to the Datadog backend. Verify network access to
trace.agent.datadoghq.comandapi.datadoghq.com.APM traces do not appear -- The
ddtracelibrary is not instrumenting the application correctly. Ensure you run the app withddtrace-runor callpatch_all().Integration metrics are missing -- The integration configuration file has syntax errors or the target service is not running. Validate the YAML with
yamllint.Custom metrics are not queryable -- The metric name has a typo or namespace mismatch. Wait up to 10 minutes for custom metrics to appear.
High Agent CPU usage -- Too many integrations are enabled or the dogstatsd metrics rate is too high. Disable unused integrations.
Logs not appearing in Datadog -- Log collection is not enabled in the
<a href="/devops/monitoring-tools/">datadog</a>.yamlconfiguration. Setlogs_enabled: trueand restart the Agent.
Practice Questions
How does the Datadog Agent collect metrics? Answer: The Agent collects system metrics directly, pulls metrics from integration endpoints, and accepts custom metrics through dogstatsd.
What is ddtrace and how is it used? Answer: ddtrace is Datadog tracing library that auto-instruments Python applications for APM. It is invoked with
ddtrace-run python app.py.How do you enable log collection in Datadog? Answer: Set
logs_enabled: truein<a href="/devops/monitoring-tools/">datadog</a>.yaml, configure log integration files, and restart the Agent.What is the purpose of Datadog monitors? Answer: Monitors evaluate metric thresholds, anomaly conditions, or log patterns and trigger notifications when conditions are met.
How does Datadog APM correlate traces with infrastructure metrics? Answer: Every trace includes host and container metadata, allowing you to see CPU, memory, and network metrics alongside the trace Waterfall.
Challenge
Set up Datadog monitoring for a two-tier application (Flask API + PostgreSQL). Install the Datadog Agent, enable the PostgreSQL integration, instrument the Flask app with ddtrace, and configure log collection. Create a dashboard with: a timeseries of request latency p99, a table of slowest database queries, a heatmap of error rates by endpoint, and a log stream filtered to ERROR level. Set up monitors for: CPU > 80% (warning), Error rate > 5% (critical), and Database connection count > 100 (info). Verify everything works by simulating load and checking the dashboard.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro