Prometheus Introduction: Metrics and Monitoring Explained
In this tutorial, you'll learn about Prometheus Introduction: Metrics and Monitoring Explained. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You Will Learn
This tutorial teaches you how Prometheus collects time-series metrics, scrapes targets, and provides a powerful query language for real-time monitoring of systems and applications.
Why It Matters
Without reliable metrics, teams operate blind. Prometheus gives you the data to detect slowdowns, predict capacity issues, and correlate outages with infrastructure changes.
Real-World Use
DodaTech uses Prometheus to monitor every production service -- from the Doda Browser sync servers to the Durga Antivirus Pro scan engines. Alerts from Prometheus trigger runbooks that resolve incidents before users notice.
Prometheus is a graduated Cloud Native Computing Foundation project purpose-built for time-series monitoring. It pulls metrics from instrumented targets at configurable intervals, stores them in a custom TSDB, and exposes a rich query language called PromQL. Unlike legacy polling systems, Prometheus uses a pull model that simplifies discovery and reduces coupling between components.
Prerequisites
- Basic familiarity with Linux command line
- A server or local machine running Linux (Ubuntu 22.04+ recommended)
- Port 9090 accessible in your firewall
- Docker installed if you prefer the containerized setup
Step-by-Step Tutorial
Step 1: Install Prometheus
Download the latest release from Prometheus.io or use Docker.
# Using direct download
wget https://github.com/prometheus/prometheus/releases/download/v2.53.0/prometheus-2.53.0.linux-amd64.tar.gz
tar xvf prometheus-2.53.0.linux-amd64.tar.gz
cd prometheus-2.53.0.linux-amd64
# Using Docker
docker run -d --name prometheus -p 9090:9090 prom/prometheus:v2.53.0
Expected output: The container starts and exposes the web UI on port 9090.
Step 2: Understand the Configuration File
Prometheus reads a YAML configuration file. The minimal <a href="/devops/prometheus-grafana/">Prometheus</a>.yml looks like this:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
This tells Prometheus to scrape itself every 15 seconds.
Step 3: Verify the Web Interface
Open http://localhost:9090 in your browser. Navigate to Status > Targets. You should see one target with state UP.
Step 4: Run Your First PromQL Query
In the web UI, go to the Graph tab and enter:
prometheus_http_requests_total
Click Execute. You will see a table of counter values for each HTTP endpoint.
rate(prometheus_http_requests_total[5m])
This calculates the per-second request rate over a 5-minute window.
Step 5: Add a Node Exporter Target
The Node Exporter exposes hardware and OS metrics. Add it to <a href="/devops/prometheus-grafana/">Prometheus</a>.yml:
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
Restart Prometheus after updating the file.
# If running as a binary
kill -HUP $(pidof prometheus)
# If running Docker
docker restart prometheus
Check the Targets page to confirm the new target is UP.
Step 6: Query Node Metrics
node_cpu_seconds_total
node_memory_MemAvailable_bytes
node_filesystem_size_bytes
These queries return raw counter values. Rate them over time:
avg by (mode) (rate(node_cpu_seconds_total[5m]))
Expected output: A table showing CPU time distribution across user, system, idle, and iowait modes.
Step 7: Configure Recording Rules
Recording rules precompute expensive queries so dashboards load faster. Create a file rules.yml:
groups:
- name: node_rules
rules:
- record: node:memory:percent_used
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
Reference it in <a href="/devops/prometheus-grafana/">Prometheus</a>.yml:
rule_files:
- "rules.yml"
Now query node:memory:percent_used directly instead of writing the full expression each time.
Step 8: Set Up a Simple Alerting Rule
groups:
- name: node_alerts
rules:
- alert: HighMemoryUsage
expr: node:memory:percent_used > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage above 90%"
When memory exceeds 90% for five consecutive minutes, this alert fires.
Learning Path
flowchart LR
A[Install Prometheus] --> B[Configure Targets]
B --> C[Write PromQL Queries]
C --> D[Add Recording Rules]
D --> E[Create Alerts]
E --> F[Visualize in Grafana]
B -.-> G[Node Exporter]
G --> B
style A fill:#4a90d9,color:#fff
style E fill:#e67e22,color:#fff
Common Errors
Target shows DOWN in Status page -- The target port is not reachable. Check firewall rules and that the exporter is running.
PromQL returns no data -- The metric name or label filter is incorrect. Use the metric explorer dropdown to verify names.
TSDB corruption after unclean shutdown -- Prometheus is resilient but always use
systemctl stop <a href="/devops/prometheus-grafana/">Prometheus</a>before shutting down the host.High cardinality explosion -- A label with too many unique values (like user ID or email) blows up the TSDB. Never use unbounded labels.
Recording rule not showing up -- The
rule_filespath in Prometheus.yml is relative to the config file directory. Use absolute paths or verify the location.Rate() produces no results for the first 5 minutes --
rate()needs at least two samples in the time window. Wait for enough data points.Docker container exits immediately -- Check the logs with
docker logs <a href="/devops/prometheus-grafana/">Prometheus</a>. The config file path inside the container must match the mounted volume.
Practice Questions
What is the default scrape interval in Prometheus? Answer: 15 seconds (configured in the global section of Prometheus.yml).
How does Prometheus differ from a push-based monitoring system? Answer: Prometheus uses a pull model -- it scrapes targets at intervals rather than waiting for targets to push metrics.
What PromQL function calculates the per-second average rate of increase? Answer:
rate()-- it computes the per-second average rate of increase for counter metrics.Why should you avoid using user IDs or email addresses as label values? Answer: They cause high cardinality, which dramatically increases TSDB storage and query time.
What is the difference between a recording rule and an alerting rule? Answer: Recording rules precompute frequently needed expressions; alerting rules fire notifications when conditions are met.
Challenge
Set up Prometheus to monitor a three-tier application (web server, API, database). Use three separate exporters -- Node Exporter for OS metrics, a custom exporter for your application, and the Postgres Exporter for database metrics. Write at least two recording rules that combine metrics from different exporters and one alerting rule that pages when API latency exceeds 500ms for 5 minutes. Verify everything is scraped correctly in the Targets page.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro