Skip to content

Monitoring Pipelines: Telegraf, Vector, and Fluentd

DodaTech Updated 2026-06-23 6 min read

In this tutorial, you'll learn about Monitoring Pipelines: Telegraf, Vector, and Fluentd. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You Will Learn

This tutorial teaches you how to build monitoring pipelines using Telegraf (metrics collection), Vector (unified Observability pipeline), and Fluentd (log collection), comparing their strengths and showing when to use each.

Why It Matters

Every Observability stack needs a pipeline to collect data from sources, transform it, and route it to storage backends. Choosing the right pipeline tool determines your data quality, operational cost, and ability to adapt to new data sources.

Real-World Use

The DodaTech Observability team runs a hybrid pipeline: Telegraf collects infrastructure metrics from 500 servers, Vector processes application logs and enriches them with metadata, and Fluentd handles container logs from Kubernetes. All three send to a centralized Kafka cluster for buffering before storage.

A monitoring pipeline consists of three stages: collection (gathering data from sources), processing (transforming, filtering, enriching), and routing (sending to one or more destinations). Telegraf, Vector, and Fluentd are the three most popular open-source pipeline tools.


Prerequisites

  • Docker and Docker Compose installed
  • A Linux server (Ubuntu 22.04+) for installation
  • Basic understanding of Prometheus Introduction and Grafana Loki
  • Familiarity with YAML configuration

Step-by-Step Tutorial

Step 1: Deploy Telegraf for Metrics Collection

Create telegraf.conf:

[agent]
  interval = "15s"
  flush_interval = "15s"

[[inputs.cpu]]
  percpu = true
  totalcpu = true

[[inputs.mem]]

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "overlay"]

[[inputs.net]]

[[outputs.influxdb]]
  urls = ["http://influxdb:8086"]
  database = "telegraf"

[[outputs.prometheus_client]]
  listen = ":9273"
docker run -d --name telegraf \
  -v $(pwd)/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
  -p 9273:9273 \
  telegraf:1.31

Expected output: Telegraf collects CPU, memory, disk, and network metrics every 15 seconds and exposes them at http://localhost:9273/metrics.

Step 2: Use Telegraf Input Plugins

# Docker container stats
[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"

# Kafka consumer lag
[[inputs.kafka_consumer]]
  brokers = ["kafka:9092"]
  topics = ["metrics-topic"]
  group_id = "telegraf-consumer"

# HTTP response time
[[inputs.http_response]]
  urls = ["https://api.dodatech.com/health"]
  response_timeout = "5s"

Step 3: Deploy Vector for Unified Pipelines

Create vector.toml:

[sources.app_logs]
  type = "file"
  include = ["/var/log/app/*.log"]

[transforms.parse_json]
  type = "remap"
  inputs = ["app_logs"]
  source = '''
    . = parse_json!(.message)
    .host = get_hostname!()
  '''

[transforms.filter_errors]
  type = "filter"
  inputs = ["parse_json"]
  condition = '.level == "ERROR"'

[sinks.loki]
  type = "loki"
  inputs = ["filter_errors"]
  endpoint = "http://loki:3100"
  encoding.codec = "json"
  labels = {service = "{{ .service }}", host = "{{ .host }}"}
docker run -d --name vector \
  -v $(pwd)/vector.toml:/etc/vector/vector.toml:ro \
  -v /var/log:/var/log:ro \
  timberio/vector:0.38

Step 4: Deploy Fluentd for Log Collection

Create fluentd.conf:

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<source>
  @type tail
  path /var/log/containers/*.log
  tag kubernetes.*
  format json
  read_from_head true
</source>

<filter kubernetes.**>
  @type record_transformer
  <record>
    hostname ${hostname}
  </record>
</filter>

<match **>
  @type elasticsearch
  host elasticsearch
  port 9200
  logstash_format true
  flush_interval 10s
</match>
docker run -d --name fluentd \
  -p 24224:24224 \
  -v $(pwd)/fluentd.conf:/fluentd/etc/fluentd.conf:ro \
  -v /var/log:/var/log:ro \
  fluent/fluentd:v1.17

Step 5: Use Vector for Metrics Transformation

[sources.prometheus_metrics]
  type = "prometheus_scrape"
  endpoints = ["http://telegraf:9273/metrics"]

[transforms.aggregate_metrics]
  type = "aggregate"
  inputs = ["prometheus_metrics"]
  interval = "60s"

[sinks.prometheus_remote_write]
  type = "prometheus_remote_write"
  inputs = ["aggregate_metrics"]
  endpoint = "http://prometheus:9090/api/v1/write"

Step 6: Compare Pipeline Performance

# Test Telegraf throughput
docker exec telegraf telegraf --test --config /etc/telegraf/telegraf.conf

# Test Vector throughput
docker exec vector vector tap --component-id app_logs

Expected output: Each tool shows the number of events processed per second and the latency distribution.

Step 7: Set Up Kafka as a Buffer Layer

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.6
    environment:
      KAFKA_BOOTSTRAP_SERVERS: kafka:9092
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
    ports:
      - "9092:9092"

Configure Vector to use Kafka as a sink:

[sinks.kafka]
  type = "kafka"
  inputs = ["parse_json"]
  bootstrap_servers = "kafka:9092"
  topic = "observability-logs"
  encoding.codec = "json"

Step 8: Monitor the Pipeline Itself

[[inputs.internal_stats]]
  # Telegraf self-monitoring

[[inputs.prometheus]]
  urls = ["http://vector:8686/metrics"]

Learning Path

flowchart LR
    A[Data Sources] --> B{Collection Layer}
    B --> C[Telegraf]
    B --> D[Vector]
    B --> E[Fluentd]
    C --> F[Processing]
    D --> F
    E --> F
    F --> G[Buffer / Kafka]
    G --> H[Storage Backends]
    H --> I[Prometheus]
    H --> J[Loki]
    H --> K[Elasticsearch]
    style B fill:#4a90d9,color:#fff
    style G fill:#e67e22,color:#fff

Common Errors

  1. Telegraf cannot connect to InfluxDB -- The urls in the output config is incorrect or InfluxDB is not running. Verify the endpoint with curl.

  2. Vector VRL script fails to compile -- The syntax of the remap transformation is wrong. Use vector validate /etc/vector/vector.toml to check.

  3. Fluentd buffer fills up and blocks input -- The flush interval is too long or the output is slow. Increase flush_interval or add more output workers.

  4. Telegraf plugin returns permission denied -- The input plugin needs access to a system file or socket. Run Telegraf as root or add the user to the required group.

  5. Vector drops events under load -- The buffer.type is set to memory with a small limit. Switch to disk buffer for production deployments.

  6. Fluentd tag routing does not match -- The <match> pattern is too strict. Use ** for broad matching or specify exact tags.

  7. Kafka consumer lag grows unbounded -- The pipeline sink is slower than the source. Add more consumer partitions or scale the processing layer.


Practice Questions

  1. What are the three stages of a monitoring pipeline? Answer: Collection (gathering data), processing (transforming, filtering, enriching), and routing (sending to destinations).

  2. When should you use Telegraf over Fluentd? Answer: Telegraf is better for metrics collection (CPU, memory, disk) with 300+ input plugins. Fluentd is better for log collection with advanced filtering.

  3. What makes Vector different from Telegraf and Fluentd? Answer: Vector is a unified pipeline that handles both metrics and logs with a single binary, supporting VRL (Vector Remap Language) for transformations.

  4. Why use Kafka as a buffer in the monitoring pipeline? Answer: Kafka decouples the collection from storage, providing durability during back-end outages and allowing multiple consumers to read the same data.

  5. How do you self-monitor the pipeline tool itself? Answer: All three tools expose internal metrics (event rate, error rate, buffer size) that can be scraped by Prometheus.


Challenge

Build a complete monitoring pipeline that collects metrics and logs from three sources: a Docker container (CPU/memory via Telegraf Docker input), an application log file (JSON structured logs via Vector), and a Kubernetes pod log (via Fluentd). Configure Vector to parse and enrich the application logs with hostname and timestamp. Route Telegraf metrics to Prometheus via remote write, Vector logs to Loki, and Fluentd logs to Elasticsearch. Add a Kafka buffer between the collection layer and the storage backends. Self-monitor all three pipeline tools with Prometheus and create a Grafana dashboard showing events per second, error rate, and buffer size for each tool.


FAQ

Can I use all three tools together?

Yes, many production deployments use all three. Telegraf for infrastructure metrics, Vector for application Observability, and Fluentd for Kubernetes log forwarding.

Which tool has the best performance?

Vector is generally the fastest because it is written in Rust. Telegraf (Go) is also fast. Fluentd (Ruby) has higher resource usage but the richest plugin ecosystem.

Do any of these tools support OpenTelemetry?

Yes. Vector has native OTLP support as both source and sink. Telegraf has an OpenTelemetry input plugin. Fluentd has community plugins for OpenTelemetry.

Which is easiest to configure?

Telegraf uses simple TOML with flat key-value pairs. Vector uses TOML with VRL for transformations. Fluentd uses Ruby-based configuration which is more powerful but complex.

Can I replace Logstash with these tools?

Yes, Vector and Fluentd are common Logstash alternatives. They provide similar filtering and routing capabilities with lower resource usage.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro