This guide describes how to create custom BundleTemplates and Bundles to deploy models with specific configurations on SambaStack. In SambaStack, bundles are the fundamental deployment unit. Rather than deploying individual models, you deploy bundles that group one or more models together with their deployment configurations, including batch sizes and sequence lengths. This approach uses the SambaNova Reconfigurable Dataflow Unit (RDU) to support multiple models and configurations in a single deployment, enabling instant switching between configurations for improved efficiency and flexibility.Documentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
This guide covers creating custom bundles. For deploying pre-configured bundles provided by SambaNova, see Deploying Bundles.
Prerequisites
Before creating custom bundles, complete the following that applies to you:Quickstart - Hosted
System set up for hosted SambaStack
Quickstart - On-prem
System set up for On-prem Sambastack
- Model Bundle Deployment - Bundle deployment concepts and workflows
- Supported Models and Bundles - Available model checkpoints
- Speculative Decoding Deployment Guidelines - Required if configuring speculative decoding pairs
Terminology
| Term | Definition |
|---|---|
| RDU | Reconfigurable Dataflow Unit - SambaNova’s proprietary processor architecture |
| PEF | Processor Executable Format - Compiled model binaries that run on RDUs |
| Expert | A sequence length profile configuration (for example, 8k, 16k, 32k) within a model |
| Speculative Decoding | An optimization technique using a smaller draft model to accelerate inference from a larger target model |
| Legalizer | A validation process that verifies a bundle fits within RDU memory constraints |
| CR (Custom Resource) | A Kubernetes extension that defines custom resource types such as Pef, Bundle, and BundleTemplate |
Concepts
Bundle architecture
For an overview of core bundle concepts including what bundles and bundle templates are see Deploying Model Bundles. This section covers the specific three-tier structure used when creating custom bundles:- BundleTemplate - Defines how models can be run: available sequence length profiles, batch sizes, and PEF mappings. Also defines target and draft model relationships for speculative decoding pairs.
- Bundle - Binds a template to specific checkpoints in storage, making it ready for deployment
- BundleDeployment - Instantiates one or more replicas of a bundle on the cluster
- Reuse templates across multiple checkpoints, including custom checkpoints for fine-tuned models
- Bundle different checkpoints of the same underlying model while sharing the same template
- Deploy the same bundle configuration with different replica counts
- Update checkpoints without modifying deployment configurations
BundleTemplate structure
A BundleTemplate defines the deployment capabilities for one or more models. The following example shows a multi-model template with speculative decoding configuration:- gpt-oss-120b: 2 configs
- Meta-Llama-3.3-70B-Instruct: 5 configs, 3 of which use speculative decoding with Meta-Llama-3.1-8B-Instruct as the draft model
- Meta-Llama-3.1-8B-Instruct: 3 configs
| Field | Required | Description |
|---|---|---|
spec.models | Yes | Defines the models and their expert configurations. See Models and Experts. |
spec.owner | Yes | Email address of the bundle template owner for tracking and notifications. |
spec.secretNames | Yes | List of Kubernetes secrets used to access artifacts. Must match secrets configured in your environment. |
spec.usePefCRs | Yes | Set to true to use PEF custom resources for deployment. |
8k,16k,32k,64k,128k- Fixed sequence length configurationsdefault- Standard configuration when no specific length is required
| Parameter | Required | Description |
|---|---|---|
pef | Yes | Reference to a PEF custom resource in format <pef-name>:<version>. Use version 1 unless a higher version is confirmed via kubectl describe pef. The <pef-name> includes the batch size after the bs characters. |
spec_decoding | No | Speculative decoding configuration. Only specify for target models, not draft models. |
Speculative decoding parameters
Speculative decoding parameters
Parameters within
For detailed guidance, see Speculative Decoding Deployment Guidelines.
spec_decoding (target models only):| Parameter | Description |
|---|---|
draft_model | Name of the draft model in the same BundleTemplate |
draft_expert | Expert profile of the draft model to use. Should match the sequence length of the target model expert (for example, use a 16k draft expert with a 16k target expert). |
Bundle structure
A Bundle binds a BundleTemplate to specific checkpoints. The following sections explain each part of the Bundle manifest. Resource Identity Define the Bundle name and resource type:metadata.name- The Bundle name used to reference this bundle in deploymentsapiVersionandkind- Keep these values the same for all bundles
| Field | Description |
|---|---|
source | GCS path pointing to the model checkpoint. Find available checkpoints in Model and Bundle Directory. |
toolSupport | Boolean flag indicating whether this checkpoint is compatible with tools and function-calling (if supported by the product). |
| Field | Description |
|---|---|
<model-key> (for example, Meta-Llama-3.3-70B-Instruct) | The API model name that users will send inference requests to. Must match a name in the BundleTemplate’s spec.models section. |
checkpoint | The checkpoint alias (from spec.checkpoints) this model should use. |
template | The model template in the BundleTemplate’s spec.models to use. This value must exactly match a model name defined under spec.models in the BundleTemplate. |
| Field | Description |
|---|---|
template | References the BundleTemplate by its metadata.name. This connects the Bundle to the deployment configurations defined in that template. |
secretNames | Credentials used to read checkpoints from GCS. Must match the secrets configured in your environment. |
source fields above.
BundleDeployment structure
A BundleDeployment instantiates a bundle on the cluster. For detailed deployment information, see Quickstart - Hosted or Quickstart - On-prem.| Field | Description |
|---|---|
spec.bundle | Name of the Bundle to deploy |
spec.groups[].name | Name identifier for the deployment group |
spec.groups[].minReplicas | Minimum number of bundle replicas to maintain |
spec.groups[].qosList | Quality of service classes for request prioritization |
spec.owner | Email address of the deployment owner for tracking and notifications |
spec.secretNames | Credentials used to access artifacts. Must match secrets configured in your environment. |
PEF and checkpoint lifecycle status
SambaStack assigns apef_status field to PEF CR versions and a checkpoint_status field to model CR checkpoint versions to indicate their support lifecycle. Understanding these statuses helps you make informed decisions when selecting PEF or checkpoint versions for custom bundles.
PEF and checkpoint version status values
Each version entry in a PEF CR includes a pef_status field. Checkpoint CR versions use checkpoint_status. Both share the same set of values:
| Status | Description |
|---|---|
preview | Not fully tested or supported. May have unknown reliability or performance issues, or limited functionality (for example, partial function calling support). Not recommended for production workloads. |
stable | Fully supported and tested. |
deprecated | Has known reliability or performance issues. Still available for a limited transition period (up to 3 months from the deprecation announcement) to allow migration to a stable version. |
removed | No longer usable. The version entry is retained in the PEF CR or model CR for traceability and auditability, but the path may no longer exist, causing deployment to fail if referenced. |
kubectl describe pef <pef-name> or kubectl describe model <model-name> and review the pef_status or checkpoint_status field in the Versions section.
Procedures
Identify available PEFs
Before creating a BundleTemplate, identify the PEF resources available for your model.List available PEFs
List PEFs matching your model and sequence length requirements:Example:Output:
- Hosted
- On Premise
View PEF details
View PEF details to understand supported configurations and check for higher versions:Example output:Review the
- Hosted
- On Premise
Spec.Metadata section for:batch_size- Supported batch sizemax_seq_length- Maximum sequence lengthnum_rdus- Required RDU countrdu_arch- Required RDU architectureseq_lengths- Supported sequence lengths
Versions section to determine if a higher PEF version is available and to review each version’s pef_status before referencing it in a BundleTemplate.Create a BundleTemplate
Create the YAML file
Create a YAML file for your BundleTemplate. For a single-model template:For multi-model templates with speculative decoding, see the BundleTemplate Structure example.
Create a bundle
Create the YAML file
Deploy the bundle
Update or remove a bundle/BundleTemplate
- Update a bundle
- Remove a bundle
Troubleshooting
Legalizer validation failures
| Error Pattern | Cause | Resolution |
|---|---|---|
PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0) | PEFs with the same ckpt_sharing_uuid cannot share checkpoint memory | Assign different ckpt_sharing_uuid values to the incompatible PEFs |
Bundle exceeds memory constraints | Combined PEF and checkpoint size exceeds RDU memory | Reduce the number of experts or batch sizes in the template |
PEF not found: <pef-name> | Referenced PEF does not exist | Verify PEF name with kubectl get pefs |
Deployment failures
| Symptom | Possible Cause | Resolution |
|---|---|---|
| Deployment stuck in pending | Insufficient RDU resources | Check cluster capacity; reduce minReplicas |
| Checkpoint download fails | Invalid GCS path or missing credentials | Verify source path; confirm sambanova-artifact-reader secret exists |
| Model not accessible via API | Model name mismatch | Verify spec.models.<name> matches expected API endpoint |
Related documentation
Model Deployment
Bundle deployment concepts and workflows
Supported Models and Bundles
Catalogue of models and bundles available for deployment
Custom checkpoint deployment
Deploy your own custom or fine-tuned checkpoints
Checkpoint Conversion Tool
Convert Checkpoints to Compatible formats

