SambaStack モデル

SambaStack は、オンプレミス環境およびホステッド環境の両方でデプロイ可能な、さまざまなモデルをサポートしています。お使いの環境で有効化されているモデルについては、貴組織のシステム管理者にお問い合わせください。

デプロイのオプション

SambaStack でモデルをデプロイする際、管理者はコンテキスト長とバッチサイズのさまざまな組み合わせから選択できます。

バッチサイズを小さくすると、トークンスループット (token/second) が向上します。
バッチサイズを大きくすると、複数ユーザーによる同時実行性が向上します。

対応モデル

以下の表は、対応しているモデル、コンテキスト長、バッチサイズ、特徴を示しています。

開発元/モデルID	モデル種別	コンテキスト長 (バッチサイズ)	特徴・最適化手法	Hugging Faceへのリンク
Meta
`Meta-Llama-3.3-70B-Instruct`	Text	View 4K (1,2,4,8,16,32) 8K (1,2,4,8) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: Speculative decoding	Model card
`Meta-Llama-3.1-8B-Instruct`	Text	View 4K (1,2,4,8) 8K (1,2,4,8) 16K (1,2,4)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: Yes Optimizations: None	Model card
`Meta-Llama-3.1-405B-Instruct`	Text	View 4K (1,2,4) 8K (1) 16K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: Speculative decoding	Model card
`Llama-4-Maverick-17B-128E-Instruct`	Image, Text	View 4K (1,4) 8K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
DeepSeek
`DeepSeek-R1-0528`	Reasoning, Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`DeepSeek-R1-Distill-Llama-70B`	Reasoning, Text	View 4K (1,2,4,8,16,32) 8K (1,2,4,8) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: Yes Optimizations: Speculative decoding	Model card
`DeepSeek-V3-0324`	Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`DeepSeek-V3.1`	Reasoning, Text	View 4K (4) 8K (1) 16K (1) 32K (1)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
OpenAI
`gpt-oss-120b`	Text	View 8K (2) 32K (2) 64K (2) 128K (2)	View Endpoint: Chat completions Capabilities: Function calling, JSON mode Import checkpoint: No Optimizations: None	Model card
`Whisper-Large-v3`	Audio	View 4K (1,16,32)	View Endpoint: Translation, Transcription Capabilities: None Import checkpoint: No Optimizations: None	Model card
Qwen
`Qwen3-32B`	Reasoning, Text	View 8K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: None	Model card
Tokyotech-llm
`Llama-3.3-Swallow-70B-Instruct-v0.4`	Text	View 4K (1,2,4,8,16) 8K (1,2,4,8,16) 16K (1,2,4) 32K (1,2,4) 64K (1) 128K (1)	View Endpoint: Chat completions Capabilities: None Import checkpoint: No Optimizations: Speculative decoding	Model card
Other
`E5-Mistral-7B-Instruct`	Embedding	View 4K (1,2,4,8,16,32)	View Endpoint: Embeddings Capabilities: None Import checkpoint: No Optimizations: None	Model card

バンドルのサンプル

SambaStack では、個々のモデルを単体でデプロイするのではなく、「バンドル」 (bundle) としてデプロイします。バンドルは、1つ以上のモデルとその構成 (バッチサイズ、シーケンス長、演算精度など) をまとめてパッケージ化したデプロイメント単位です。たとえば、Meta-Llama-3.3-70B モデルをバッチサイズ4・シーケンス長16kでデプロイする場合、それはある1つの構成となります。一方、バンドルは同一または異なるモデル間で複数の構成を含むことができます。 SambaNova の RDU 技術は、1つのデプロイメント内で複数のモデルや構成を同時にロードすることを可能にします。これにより、必要に応じてモデルやバッチ/シーケンスプロファイルを即時に切り替えることができます。従来の GPU システムが単一モデルかつ静的なデプロイが主流であるのに対し、SambaStack はマルチモデル・マルチ構成バンドルをサポートします。このアプローチにより、高効率・高柔軟性・高スループットを実現しつつ、低レイテンシを維持します。

バンドルテンプレート	バンドルの説明	バンドルの構成
70b-3dot3-ss-16k-32k-64k-128k	Speculative decoding of: `Meta-Llama-3.3-70B` (Target) `Meta-Llama-3.2-1B` (Draft) Medium to large context length with low batch size	View Target Models: `Meta-Llama-3.3-70B-Instruct` Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2 Seq Length: 64K, BS: 1 Seq Length: 128K, BS: 1 Draft Models: `Meta-Llama-3.2-1B-Instruct` Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2 Seq Length: 64K, BS: 1 Seq Length: 128K, BS: 1
70b-3dot3-ss-8-16-32k-batching	Speculative decoding of: `Meta-Llama-3.3-70B` (Target) `Meta-Llama-3.2-1B` (Draft) Small to medium context length with low-medium batch sizes	View Target Models: `Meta-Llama-3.3-70B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2 Draft Models: `Meta-Llama-3.2-1B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 2
70b-ss-8-16-32k	Speculative decoding of: `Meta-Llama-3.3-70B` (Target) `DeepSeek-R1-Distill-Llama-70B` (Target) `Meta-Llama-3.2-1B` (Draft) `Meta-Llama-3.2-1B-Distill-Instruct` (Draft) Small to medium context length with low-medium batch sizes	View Target Models: `DeepSeek-R1-Distill-Llama-70B` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4 `Meta-Llama-3.3-70B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4 Draft Models: `Meta-Llama-3.2-1B-Distill-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4 `Meta-Llama-3.2-1B-Instruct` Seq Length: 8K, BS: 1, 2, 4, 8 Seq Length: 16K, BS: 1, 2, 4 Seq Length: 32K, BS: 1, 4
llama-405b-s-m	Speculative decoding of: `Meta-Llama-3.1-405B` (Target) `Meta-Llama-3.1-1B` (Draft) `Meta-Llama-3.2-3B` (Draft) Small context length with low batch sizes	View Target Models: `Meta-Llama-3.1-405B-Instruct` (Target) Seq Length: 16K, BS: 1 Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 1, 2, 4 Draft Models: `Meta-Llama-3.1-8B-Instruct-16k` (Draft) Seq Length: 16K, BS: 1 `Meta-Llama-3.2-3B-Instruct` (Draft) Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 1, 2, 4
deepseek-r1-v3-fp8-32k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3-0324` High context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 32K, BS: 1 `DeepSeek-V3-0324` Seq Length: 32K, BS: 1
deepseek-r1-v3-fp8-16k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3-0324` Medium context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 16K, BS: 1 `DeepSeek-V3-0324` Seq Length: 16K, BS: 1
deepseek-r1-v3-fp8-4-8k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3-0324` Low context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 4 `DeepSeek-V3-0324` Seq Length: 8K, BS: 1 Seq Length: 4K, BS: 4

deepseek-r1-v31-fp8-16k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Medium context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 16K, BS: 1 `DeepSeek-V3.1` Seq Length: 16K, BS: 1
deepseek-r1-v31-fp8-32k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Large context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 32K, BS: 1 `DeepSeek-V3.1` Seq Length: 32K, BS: 1

deepseek-r1-v31-fp8-4k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Small context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 4K, BS: 1, 4 `DeepSeek-V3.1` Seq Length: 4K, BS: 1, 4

deepseek-r1-v31-fp8-8k	Combination of: `DeepSeek-R1-0528` `DeepSeek-V3.1` Small context length with low batch size	View Models: `DeepSeek-R1-0528` Seq Length: 8K, BS: 1 `DeepSeek-V3.1` Seq Length: 8K, BS: 1
llama-4-medium-8-16-32-64-128k	`Llama-4-Maverick-17B-128E-Instruct` Small to large context length with low batch	View `Llama-4-Maverick-17B-128E-Instruct` Seq Length: 8K, BS: 1 Seq Length: 16K, BS: 1 Seq Length: 32K, BS: 1 Seq Length: 64K, BS: 1 Seq Length: 128K, BS: 1
qwen3-32b-whisper-e5-mistral	Combination of: `Qwen-3-32B` `Whisper-Large-v3` `E5-Mistral-7B-Instruct` Small to medium context length with varied batch size	View `E5-Mistral-7B-Instruct` Seq Length: 4K, BS: 1, 4, 8, 16, 32 `Qwen-3-32B` Seq Length: 8K, BS: 1, 4 Seq Length: 16K, BS: 1 Seq Length: 32K, BS: 1, 2 `Whisper-Large-v3` BS: 1, 16, 32
gpt-oss-120b-8k	`gpt-oss-120b` Small context length with low batch size	View `gpt-oss-120b` Seq Length: 8K, BS: 2
gpt-oss-120b-32k	`gpt-oss-120b` Medium context length with low batch size	View `gpt-oss-120b` Seq Length: 32K, BS: 2
gpt-oss-120b-64-128k	`gpt-oss-120b` Large context length with low batch size	View `gpt-oss-120b` Seq Length: 64K, BS: 2 Seq Length: 128K, BS: 2

推奨バンドル

以下の表は、SambaStack で利用可能なモデルごとの推奨バンドルテンプレートを示しています。各エントリは、特定のモデルと最適なデプロイ構成をペアリングしており、効率的な環境構築と運用を可能します。

モデル名	バンドルテンプレート
`Meta-Llama-3.3-70B-Instruct`	70b-3dot3-ss-8-16-32k-batching
`Llama-4-Maverick-17B-128E-Instruct`	llama-4-medium-8-16-32-64-128k
`DeepSeek-R1-0528`	deepseek-r1-v31-fp8-16k
`DeepSeek-R1-Distill-Llama-70B`	70b-ss-8-16-32k
`DeepSeek-V3-0324`	deepseek-r1-v3-fp8-16k
`DeepSeek-V3.1`	deepseek-r1-v31-fp8-16k
`Whisper-Large-v3`	qwen3-32b-whisper-e5-mistral
`Qwen3-32B`	qwen3-32b-whisper-e5-mistral
`E5-Mistral-7B-Instruct`	qwen3-32b-whisper-e5-mistral

​デプロイのオプション

​対応モデル

​バンドルのサンプル

​推奨バンドル

デプロイのオプション

対応モデル

バンドルのサンプル

推奨バンドル