Skip to main content
This page describes the logging-based telemetry emitted by the platform. These are log events, not Prometheus metrics, and are intended for detailed debugging, performance analysis, and forensics. Log events are typically ingested into a log backend such as OpenSearch or Loki and queried via Grafana or a similar tool.

RDU manifest events

RDU manifest events are structured logs emitted per request by the model runtime. They contain token counts, high-level latencies, and a set of detailed timing fields. These events are typically indexed into a log index (for example, an OpenSearch index) and can be filtered by fields such as model, tenant, pod, and time range.

RDU manifest fields

FieldCategoryKeyDescription
Prompt tokensTokensprompt_tokens_countNumber of input tokens in the prompt.
Completion tokensTokenscompletion_tokens_countNumber of output tokens generated.
Total latencyLatencytotal_latencyEnd-to-end time from request start to last token (includes queue + compute).
Time to first token (TTFT)Latencytime_to_first_tokenTime from request submission to first token.
Completion tokens per secondThroughputcompletion_tokens_per_secEffective throughput over the entire completion.
Tokens/sec after first tokenThroughputcompletion_tokens_after_first_per_secDecode throughput after first token (steady-state).
Acceptance rateSpec decodingacceptance_rateAcceptance rate for speculative decoding.
Decode queue timeInternal timingdecode_queue_timeTime spent in decode-related queues (e.g. continuous batching queues).
Tensor transfer timeInternal timingtensor_transfer_timeTime spent transferring tensors between components.
Cache transfer timeInternal timingcache_transfer_timeTime spent transferring cache (e.g. KV cache).
Additional internal timing fields (~15) are available for fine-grained analysis of execution stages. Contact your SambaNova representative for details.
These fields are logging events and may be subject to schema evolution.

Example queries

Examples assume a log backend that supports a query language (e.g., OpenSearch or Loki) and timestamps on each event. p95 total latency per model (last 15 minutes)
  • Filter: model:"<model_name>" AND @timestamp:[now-15m TO now]
  • Aggregate: percentile 95 on total_latency grouped by model.
TTFT vs total latency comparison
  • Group by model, compute p50/p90 for both time_to_first_token and total_latency.
Decode queue time hotspots
  • Filter: decode_queue_time > <threshold>
  • Group by model or tenant to identify where queueing is highest.
Speculative decoding acceptance rate
  • Filter: model:"<model_name>"
  • Aggregate: average acceptance_rate over time.