Skip to content

Metrics Overview

Klag exports its metrics through Micrometer, so the exact name format depends on the reporter (Prometheus, Datadog, or OTLP). The logical metrics are the same everywhere.

All metrics are tagged with consumer_group, topic, and partition where applicable.

MetricDescription
klag.consumer.lagCurrent lag per partition (also .sum, .max, .min).
klag.consumer.lag.velocityRate of change; positive means falling behind. See Lag Velocity.
klag.consumer.committed_offsetLast committed offset per consumer.
klag.partition.log_end_offsetLatest offset per partition.
klag.partition.log_start_offsetEarliest available offset per partition.
klag.topic.partitionsPartition count per topic.
klag.consumer.group.stateGroup health: Stable, Rebalancing, Dead, Empty.

Reported only when statistical outliers exist (see Hot Partitions):

MetricDescription
klag.hot_partitionPartition throughput × 100 when statistically high. Tags: topic, partition only.
klag.hot_partition.lagPartition lag when statistically high.

See Time-Based Lag. Tags: consumer_group, topic.

MetricDescription
klag.consumer.lag.msLag in milliseconds, from Kafka log timestamps (poll-history fallback).
klag.consumer.lag.time_to_close_secondsEstimated seconds until lag reaches zero (only when catching up).

See Data Loss Prevention. Tags: consumer_group, topic.

MetricDescription
klag.consumer.lag.retention_percentLag as a percentage of the retention window (value × 100). 100% means data loss.

When METRICS_JVM_ENABLED=true, standard Micrometer JVM metrics (memory, GC, threads, classes, CPU) are exported too, and visualized in the Grafana dashboard.