This page describes the logging-based telemetry emitted by the platform. These are log events, not Prometheus metrics, and are intended for detailed debugging, performance analysis, and forensics. Log events are typically ingested into a log backend such as OpenSearch or Loki and queried via Grafana or a similar tool.Documentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
RDU manifest events
RDU manifest events are structured logs emitted per request by the model runtime. They contain token counts, high-level latencies, and a set of detailed timing fields. These events are typically indexed into a log index (for example, an OpenSearch index) and can be filtered by fields such as model, tenant, pod, and time range.RDU manifest fields
| Field | Category | Key | Description |
|---|---|---|---|
| Prompt tokens | Tokens | prompt_tokens_count | Number of input tokens in the prompt. |
| Completion tokens | Tokens | completion_tokens_count | Number of output tokens generated. |
| Total latency | Latency | total_latency | End-to-end time from request start to last token (includes queue + compute). |
| Time to first token (TTFT) | Latency | time_to_first_token | Time from request submission to first token. |
| Completion tokens per second | Throughput | completion_tokens_per_sec | Effective throughput over the entire completion. |
| Tokens/sec after first token | Throughput | completion_tokens_after_first_per_sec | Decode throughput after first token (steady-state). |
| Acceptance rate | Spec decoding | acceptance_rate | Acceptance rate for speculative decoding. |
| Decode queue time | Internal timing | decode_queue_time | Time spent in decode-related queues (e.g. continuous batching queues). |
| Tensor transfer time | Internal timing | tensor_transfer_time | Time spent transferring tensors between components. |
| Cache transfer time | Internal timing | cache_transfer_time | Time spent transferring cache (e.g. KV cache). |
These fields are logging events and may be subject to schema evolution.
Example queries
Examples assume a log backend that supports a query language (e.g., OpenSearch or Loki) and timestamps on each event. p95 total latency per model (last 15 minutes)- Filter:
model:"<model_name>" AND @timestamp:[now-15m TO now] - Aggregate: percentile 95 on
total_latencygrouped bymodel.
- Group by
model, compute p50/p90 for bothtime_to_first_tokenandtotal_latency.
- Filter:
decode_queue_time > <threshold> - Group by
modelortenantto identify where queueing is highest.
- Filter:
model:"<model_name>" - Aggregate: average
acceptance_rateover time.
Related topics
- Monitoring and Observability – High-level telemetry breakdown and hierarchy.
- Metrics – Router-level metrics reference.

