Whisper-Large-v3
- Model: Whisper-Large-v3
- Description: State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
- Model ID:
Whisper-Large-v3 - Supported languages: Multilingual
Core capabilities
- Transcribes and translates extended audio inputs (up to 25 MB).
- Demonstrates high accuracy in speech recognition and translation tasks.
- Provides OpenAI-compatible endpoints for transcriptions and translations.
Request parameters
| Parameter | Type | Description | Default | Endpoints |
|---|---|---|---|---|
model | String | The ID of the model to use. | Required | transcriptions, translations |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB. | Required | transcriptions, translations |
prompt | String | Prompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.” | Optional | transcriptions, translations |
response_format | String | Output format: either json or text. | json | transcriptions, translations |
language | String | The language of the input audio. Using ISO-639-1 format (e.g., en) improves accuracy and latency. | Optional | transcriptions, translations |
stream | Boolean | Enables streaming responses. | false | transcriptions, translations |
stream_options | Object | Additional streaming configuration (e.g., {"include_usage": true}). | Optional | transcriptions, translations |
Example usage
Translations
The translations endpoint transcribes audio in any supported language and returns the output in English. Use thelanguage parameter to specify the language of the input audio in ISO 639-1 format (for example, "es" for Spanish) to improve accuracy and reduce latency.
