Audio-Input Implementation Guide - SambaNova Documentation

For developers requiring audio support, SambaNova provides OpenAI’s Whisper large-v3 model, which enables real-time transcriptions and translations.

Whisper-Large-v3

Model: Whisper-Large-v3
Description: State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
Model ID: Whisper-Large-v3
Supported languages: Multilingual

Core capabilities

Transcribes and translates extended audio inputs (up to 25 MB).
Demonstrates high accuracy in speech recognition and translation tasks.
Provides OpenAI-compatible endpoints for transcriptions and translations.

Request parameters

Parameter	Type	Description	Default	Endpoints
`model`	String	The ID of the model to use.	Required	`transcriptions`, `translations`
`file`	File	Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB.	Required	`transcriptions`, `translations`
`prompt`	String	Prompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.”	Optional	`transcriptions`, `translations`
`response_format`	String	Output format: either `json` or `text`.	`json`	`transcriptions`, `translations`
`language`	String	The language of the input audio. Using ISO-639-1 format (e.g., `en`) improves accuracy and latency.	Optional	`transcriptions`, `translations`
`stream`	Boolean	Enables streaming responses.	`false`	`transcriptions`, `translations`
`stream_options`	Object	Additional streaming configuration (e.g., `{"include_usage": true}`).	Optional	`transcriptions`, `translations`

Example usage

from sambanova import SambaNova
import base64

client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key",
)

audio_path="audio_path"
with open(audio_path, "rb") as audio_file:
   bin_audio = audio_file.read()

response = client.audio.transcriptions.create(
    model="Whisper-Large-v3",
    file=(audio_path,bin_audio),
)
print(str(response))

Translations

The translations endpoint transcribes audio in any supported language and returns the output in English. Use the language parameter to specify the language of the input audio in ISO 639-1 format (for example, "es" for Spanish) to improve accuracy and reduce latency.

Example usage

from sambanova import SambaNova

client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key",
)

audio_path = "audio_path"
with open(audio_path, "rb") as audio_file:
    bin_audio = audio_file.read()

response = client.audio.translations.create(
    model="Whisper-Large-v3",
    file=(audio_path, bin_audio),
    language="es",
)
print(str(response))

Example response

{
  "text": "It is the sound effect of a bell ringing, specifically a church bell."
}

Get started

Models

Features

Build

Resources

Implement audio‑input features

Whisper-Large-v3

Core capabilities

Request parameters

Example usage

Translations

Example usage

Example response

Get started

Models

Features

Build

Resources

​Whisper-Large-v3

​Core capabilities

​Request parameters

​Example usage

​Translations

​Example usage

​Example response

Whisper-Large-v3

Core capabilities

Request parameters

Example usage

Translations

Example usage

Example response