October 2025
Release Date: October 7, 2025New features and enhancements
API Reference Documentation Upgrade Upgraded the API Reference documentation with a fully automated, Swagger-style interface that dynamically syncs with the OpenAPI specification.- Interactive, auto-generated API documentation with full model definitions.
- Real-time updates to stay aligned with the latest OpenAPI spec.
- Up-to-date usage examples in Python and TypeScript, powered by the official SambaNova SDKs.
June 2025
Release Date: June 17, 2025New features and enhancements
Japanese Language Support Added Japanese language support for SambaNova documentation.- Go to the SambaNova documentation homepage.
- Open the language dropdown menu and select Japanese.
May 2025
Release Date: May 28, 2025New features and enhancements
AWS Marketplace Integration SambaCloud is now available as a SaaS offering on AWS Marketplace.- Subscribe using your AWS account and streamline billing through AWS.
- PrivateLink support for secure, low-latency connections between your AWS VPC and SambaCloud—no public internet exposure.
- Fast onboarding with models like Llama 4, DeepSeek-R1 671B, and Whisper.
- Up to 10x faster inference vs. GPUs with SambaNova’s RDU architecture.
- Privacy-first architecture: SambaNova never stores your data.
- Support for fine-tuned model deployment without code changes.
April 2025
New features and enhancements
New Models-
Qwen3-32B (April 29, 2025)
- Added as a Preview model. High-capacity, multilingual LLM with strong performance across question answering, summarization, reasoning, and coding.
- Available via Playground and API.
-
Whisper-Large-V3 (April 18, 2025)
- OpenAI’s latest large-scale automatic speech recognition (ASR) model with enhanced transcription accuracy, improved multilingual support, and better robustness against noisy audio.
- Available via API.
-
Llama-4-Maverick-17B-128E-Instruct (April 9, 2025)
- 400B parameter mixture-of-experts model with 17B active parameters and 128 experts.
- Added as a Preview model. Competitive with Gemma 3, Gemini 2.0 Flash, and Mistral 3.1.
- Available via Playground and API.
-
Llama-4-Scout-17B-16E-Instruct (April 7, 2025)
- 109B parameter mixture-of-experts model with 17B active parameters and 16 experts.
- Added as a Preview model. Competitive with Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1.
- Available via Playground and API.
Deprecations
Release Date: April 14, 2025 The following models have been deprecated and are scheduled for removal from active endpoints:- Llama-3.1-Swallow-70B-Instruct-v0.3
- Llama-3.1-Tulu-3-405B
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Meta-Llama-3.1-70B-Instruct
- Qwen2.5-72B-Instruct
- Qwen2.5-Coder-32B-Instruct
March 2025
New features and enhancements
New Models-
DeepSeek-V3-0324 (March 27, 2025)
- First open-source non-reasoning model that outperforms proprietary non-reasoning models.
- Major boost in reasoning performance, stronger front-end development skills, and smarter tool-use capabilities.
- Added as a Preview model. Available via Playground and API.
-
E5-Mistral-7B-Instruct (March 20, 2025)
- Embedding model with Mistral architecture backbone.
- Recommended for English use. Available via API.
- See Embeddings capabilities and endpoint documentation.
-
QwQ-32B (March 6, 2025)
- State-of-the-art reasoning model from Alibaba Qwen team.
- Delivers comparable performance to DeepSeek-R1’s 671B with far fewer parameters.
- Integrates advanced agent-related features for critical thinking and external tool usage.
-
Llama-3.1-Swallow-8B-Instruct-v0.3 and Llama-3.1-Swallow-70B-Instruct-v0.3 (March 5, 2025)
- Japanese-optimized language models developed through continual pre-training of Meta’s Llama 3.1.
- Trained on 200B tokens from diverse sources including web corpora, technical content, and multilingual Wikipedia.
API improvements and fixes
Model List Endpoint Release Date: March 18, 2025 Added the model list endpoint that provides information about currently available models in SambaCloud. Context Length Increases- DeepSeek-R1-Distill-Llama-70B (March 13, 2025): Increased to 128k tokens.
February 2025
New features and enhancements
New Models-
DeepSeek-R1 (February 13, 2025)
- 671B parameter MoE model with performance comparable to OpenAI’s o1 across mathematics, coding, and reasoning.
- Added as a Preview model. SambaCloud delivers the fastest DeepSeek-R1 deployment in the world.
-
Llama-3.1-Tulu-3-405B (February 4, 2025)
- Open-source model from Allen Institute for AI (AI2) that performs better than DeepSeek-V3.
- Trained using Reinforcement Learning with Verifiable Rewards (RLVR).
- Competitive with or superior to GPT-4o and DeepSeek-V3, with notable advantage in safety benchmarks.
-
DeepSeek-R1-Distill-Llama-70B (January 30, 2025)
- Fine-tuned on Llama 3.3 70B using samples generated by DeepSeek-R1.
- Outperforms GPT-4o, o1-mini, and Claude-3.5-Sonnet across AIME, MATH-500, GPQA, and LiveCodeBench.
- Llama-3.3-70B (February 25, 2025): Increased to 128k tokens. Available as production model.
- DeepSeek-R1 (February 21, 2025): Increased to 8k tokens. Available as preview model.
For API access and higher rate limits for DeepSeek-R1, please complete this form to join the waitlist.
December 2024
New features and enhancements
New Models-
Llama 3.3 70B (December 11, 2024)
- Delivers comparable performance to Llama 3.1 405B.
- Competes closely with OpenAI’s GPT-4o and Google’s Gemini Pro 1.5.
-
QwQ 32B Preview (December 11, 2024)
- Experimental reasoning model from Alibaba’s Qwen team with 32.5B parameters.
- Excels in mathematics and programming: 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, 50.0% on LiveCodeBench.
-
Qwen2.5 72B (December 5, 2024)
- 72B-parameter model trained on 18T tokens. Supports context lengths up to 128k tokens.
- Supports 29+ languages including English, Chinese, French, and Spanish.
-
Qwen2.5 Coder 32B (December 5, 2024)
- 32B-parameter model tailored for code-related tasks across 92 programming languages.
- HumanEval score of 92.7%, matching GPT-4o coding capability.
-
Llama Guard 3 8B (December 5, 2024)
- Fine-tuned for content safety classification, aligning with 14 MLCommons standardized hazards taxonomy.
- Llama 3.2 1B: Increased from 4k to 16k.
- Llama 3.1 70B: Increased from 64k to 128k.
- Llama 3.1 405B: Increased from 8k to 16k.
October 2024
New features and enhancements
New Models-
Llama 3.2 11B and 90B (October 29, 2024)
- Multimodality support for text and image inputs.
-
Llama 3.2 1B and 3B (October 1, 2024)
- Available to all tiers at the fastest inference speed.
API improvements and fixes
Automatic Sequence Length Routing Release Date: October 10, 2024 Automatic routing based on sequence length—no need to change model names to specify different sequence lengths. Context Length Increases- Llama 3.1 8B (October 10, 2024): Increased from 8k to 16k.
- Llama 3.1 70B (October 10, 2024): Increased from 8k to 64k.
User experience improvements
- How to Use API guide with example curl code for text and image inputs.
- Streamlined access to updated code snippets.
- New Clear Chat option in Playground.
- New UI components with tool tips.
Updated AI starter kits
- Multimodal Retriever: Chart, image, and figure understanding with advanced retrieval combining visual and textual data.
- Llama-3.1-Instruct-o1: Enhanced reasoning with Llama-3.1-405B, hosted on Hugging Face Spaces.
September 2024
Release Date: September 10, 2024New features and enhancements
SambaCloud Public Launch- Public launch of the SambaCloud portal, API and the community.
- Access to Llama 3.1 8B, 70B, and 405B at full precision and 10x faster inference compared to GPUs.
- Launched with two tiers: free and enterprise (paid).
