This page describes the metrics exposed in Prometheus format by the inference router and related services. These are numeric time series intended for dashboards, SLOs, and alerts.Documentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Inference router metrics
Inference router metrics describe queues, scheduling, and request lifecycle in the core inference layer.Inference router metrics table
| Metric | Category | Prometheus Name | Description | Granularity |
|---|---|---|---|---|
| Queue length | Queue | queue_length | Number of requests currently queued in the router. | Per model, QoS, and/or user |
| Max queue wait time | Queue | queue_max_wait_seconds | Maximum age (seconds) of any request currently in the queue. | Per model, QoS |
| Customer queue length | Queue | customer_queue_length | Queue length per customer per model. | Per user, model |
| Submitted requests | Traffic | submitted_total | Total number of requests submitted to the router. | Per model, QoS, user, status |
| Completed requests | Traffic | completed_total | Total number of completed requests, labeled with completion status (success, error, etc.). | Per model, QoS, user, status |
| Response codes | Traffic | response_code_total | Count of HTTP responses by status code. | Per HTTP code, route, user |
| Response latency | Latency | response_duration_ms | End-to-end response latency in milliseconds (often as a histogram or summary). | Per model, QoS, customer |
| Connection state | Workers | connection_state_ratio | Fraction of workers in each state (idle, busy, draining, unhealthy, etc.). | Per worker state, model, pool |
| Active users | Adoption | active_users | Number of active users observed by the router. | Global and/or per user |
Metric names and label sets may evolve over time. Refer to the release notes for changes in metric schema.
Related topics
- Monitoring and Observability – Conceptual overview and hierarchy.
- Logs – Log/manifest event schema and usage.

