Monitoring Pipelines: Telegraf, Vector, and Fluentd
In this tutorial, you'll learn about Monitoring Pipelines: Telegraf, Vector, and Fluentd. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You Will Learn
This tutorial teaches you how to build monitoring pipelines using Telegraf (metrics collection), Vector (unified Observability pipeline), and Fluentd (log collection), comparing their strengths and showing when to use each.
Why It Matters
Every Observability stack needs a pipeline to collect data from sources, transform it, and route it to storage backends. Choosing the right pipeline tool determines your data quality, operational cost, and ability to adapt to new data sources.
Real-World Use
The DodaTech Observability team runs a hybrid pipeline: Telegraf collects infrastructure metrics from 500 servers, Vector processes application logs and enriches them with metadata, and Fluentd handles container logs from Kubernetes. All three send to a centralized Kafka cluster for buffering before storage.
A monitoring pipeline consists of three stages: collection (gathering data from sources), processing (transforming, filtering, enriching), and routing (sending to one or more destinations). Telegraf, Vector, and Fluentd are the three most popular open-source pipeline tools.
Prerequisites
- Docker and Docker Compose installed
- A Linux server (Ubuntu 22.04+) for installation
- Basic understanding of Prometheus Introduction and Grafana Loki
- Familiarity with YAML configuration
Step-by-Step Tutorial
Step 1: Deploy Telegraf for Metrics Collection
Create telegraf.conf:
[agent]
interval = "15s"
flush_interval = "15s"
[[inputs.cpu]]
percpu = true
totalcpu = true
[[inputs.mem]]
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "overlay"]
[[inputs.net]]
[[outputs.influxdb]]
urls = ["http://influxdb:8086"]
database = "telegraf"
[[outputs.prometheus_client]]
listen = ":9273"
docker run -d --name telegraf \
-v $(pwd)/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
-p 9273:9273 \
telegraf:1.31
Expected output: Telegraf collects CPU, memory, disk, and network metrics every 15 seconds and exposes them at http://localhost:9273/metrics.
Step 2: Use Telegraf Input Plugins
# Docker container stats
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
# Kafka consumer lag
[[inputs.kafka_consumer]]
brokers = ["kafka:9092"]
topics = ["metrics-topic"]
group_id = "telegraf-consumer"
# HTTP response time
[[inputs.http_response]]
urls = ["https://api.dodatech.com/health"]
response_timeout = "5s"
Step 3: Deploy Vector for Unified Pipelines
Create vector.toml:
[sources.app_logs]
type = "file"
include = ["/var/log/app/*.log"]
[transforms.parse_json]
type = "remap"
inputs = ["app_logs"]
source = '''
. = parse_json!(.message)
.host = get_hostname!()
'''
[transforms.filter_errors]
type = "filter"
inputs = ["parse_json"]
condition = '.level == "ERROR"'
[sinks.loki]
type = "loki"
inputs = ["filter_errors"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels = {service = "{{ .service }}", host = "{{ .host }}"}
docker run -d --name vector \
-v $(pwd)/vector.toml:/etc/vector/vector.toml:ro \
-v /var/log:/var/log:ro \
timberio/vector:0.38
Step 4: Deploy Fluentd for Log Collection
Create fluentd.conf:
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<source>
@type tail
path /var/log/containers/*.log
tag kubernetes.*
format json
read_from_head true
</source>
<filter kubernetes.**>
@type record_transformer
<record>
hostname ${hostname}
</record>
</filter>
<match **>
@type elasticsearch
host elasticsearch
port 9200
logstash_format true
flush_interval 10s
</match>
docker run -d --name fluentd \
-p 24224:24224 \
-v $(pwd)/fluentd.conf:/fluentd/etc/fluentd.conf:ro \
-v /var/log:/var/log:ro \
fluent/fluentd:v1.17
Step 5: Use Vector for Metrics Transformation
[sources.prometheus_metrics]
type = "prometheus_scrape"
endpoints = ["http://telegraf:9273/metrics"]
[transforms.aggregate_metrics]
type = "aggregate"
inputs = ["prometheus_metrics"]
interval = "60s"
[sinks.prometheus_remote_write]
type = "prometheus_remote_write"
inputs = ["aggregate_metrics"]
endpoint = "http://prometheus:9090/api/v1/write"
Step 6: Compare Pipeline Performance
# Test Telegraf throughput
docker exec telegraf telegraf --test --config /etc/telegraf/telegraf.conf
# Test Vector throughput
docker exec vector vector tap --component-id app_logs
Expected output: Each tool shows the number of events processed per second and the latency distribution.
Step 7: Set Up Kafka as a Buffer Layer
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.6
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.6
environment:
KAFKA_BOOTSTRAP_SERVERS: kafka:9092
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
ports:
- "9092:9092"
Configure Vector to use Kafka as a sink:
[sinks.kafka]
type = "kafka"
inputs = ["parse_json"]
bootstrap_servers = "kafka:9092"
topic = "observability-logs"
encoding.codec = "json"
Step 8: Monitor the Pipeline Itself
[[inputs.internal_stats]]
# Telegraf self-monitoring
[[inputs.prometheus]]
urls = ["http://vector:8686/metrics"]
Learning Path
flowchart LR
A[Data Sources] --> B{Collection Layer}
B --> C[Telegraf]
B --> D[Vector]
B --> E[Fluentd]
C --> F[Processing]
D --> F
E --> F
F --> G[Buffer / Kafka]
G --> H[Storage Backends]
H --> I[Prometheus]
H --> J[Loki]
H --> K[Elasticsearch]
style B fill:#4a90d9,color:#fff
style G fill:#e67e22,color:#fff
Common Errors
Telegraf cannot connect to InfluxDB -- The
urlsin the output config is incorrect or InfluxDB is not running. Verify the endpoint withcurl.Vector VRL script fails to compile -- The syntax of the
remaptransformation is wrong. Usevector validate /etc/vector/vector.tomlto check.Fluentd buffer fills up and blocks input -- The flush interval is too long or the output is slow. Increase
flush_intervalor add more output workers.Telegraf plugin returns permission denied -- The input plugin needs access to a system file or socket. Run Telegraf as root or add the user to the required group.
Vector drops events under load -- The
buffer.typeis set tomemorywith a small limit. Switch todiskbuffer for production deployments.Fluentd tag routing does not match -- The
<match>pattern is too strict. Use**for broad matching or specify exact tags.Kafka consumer lag grows unbounded -- The pipeline sink is slower than the source. Add more consumer partitions or scale the processing layer.
Practice Questions
What are the three stages of a monitoring pipeline? Answer: Collection (gathering data), processing (transforming, filtering, enriching), and routing (sending to destinations).
When should you use Telegraf over Fluentd? Answer: Telegraf is better for metrics collection (CPU, memory, disk) with 300+ input plugins. Fluentd is better for log collection with advanced filtering.
What makes Vector different from Telegraf and Fluentd? Answer: Vector is a unified pipeline that handles both metrics and logs with a single binary, supporting VRL (Vector Remap Language) for transformations.
Why use Kafka as a buffer in the monitoring pipeline? Answer: Kafka decouples the collection from storage, providing durability during back-end outages and allowing multiple consumers to read the same data.
How do you self-monitor the pipeline tool itself? Answer: All three tools expose internal metrics (event rate, error rate, buffer size) that can be scraped by Prometheus.
Challenge
Build a complete monitoring pipeline that collects metrics and logs from three sources: a Docker container (CPU/memory via Telegraf Docker input), an application log file (JSON structured logs via Vector), and a Kubernetes pod log (via Fluentd). Configure Vector to parse and enrich the application logs with hostname and timestamp. Route Telegraf metrics to Prometheus via remote write, Vector logs to Loki, and Fluentd logs to Elasticsearch. Add a Kafka buffer between the collection layer and the storage backends. Self-monitor all three pipeline tools with Prometheus and create a Grafana dashboard showing events per second, error rate, and buffer size for each tool.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro