Skip to content

MCP Endpoint

Klag exposes an optional MCP (Model Context Protocol) endpoint so AI agents (SRE copilots, dev assistants) can query consumer-lag state in natural workflows.

It is opt-in, read-only, and zero-impact when off. The endpoint serves an in-memory snapshot the metrics collector publishes after each cycle; it never queries Kafka or touches the collection flow.

VariableDefaultDescription
MCP_ENABLEDfalseExpose the /mcp endpoint.
MCP_AUTH_TOKEN(empty)When set, requires Authorization: Bearer <token>. Empty = open (logged warning).
MCP_PATH/mcpHTTP path of the endpoint.

MCP requires METRICS_REPORTER to be set. The snapshot is only populated when metrics collection runs.

Streamable HTTP, JSON-RPC 2.0 over POST. A GET returns 405.

ToolPurpose
list_consumer_groupsList groups, each with its overallTrend.
get_consumer_group_lagLag detail for a group, plus trends, overallTrend, and recentTransitions.
find_lagging_groupsGroups currently lagging, with overallTrend.
diagnoseComposite severity assessment; flags frequent state changes (rebalance storm / flapping).

Each group snapshot carries a basic lag trend (growing / shrinking / stable, per-topic plus an overallTrend rollup) derived from lag velocity via LAG_TREND_DEADBAND_MSG_PER_SEC, and a rolling state-change history (last 10 from→to transitions). diagnose uses the transition history to flag rebalance storms and flapping groups.

The MCP layer reads from a SnapshotStore populated by the metrics collector, never from direct Kafka calls. See the design doc: docs/superpowers/specs/2026-06-01-mcp-support-design.md.