This guide covers creating custom bundles. For deploying pre-configured bundles provided by SambaNova, see Model Deployment.
Prerequisites
Before creating custom bundles, complete the following:Installation Prerequisites
Required system and environment setup
SambaStack Setup
Core cluster installation and configuration
Optional Configuration
Additional configuration as needed
- Model Deployment - Bundle deployment concepts and workflows
- SambaStack Models - Available model checkpoints
- Speculative Decoding Deployment Guidelines - Required if configuring speculative decoding pairs
Terminology
| Term | Definition |
|---|---|
| RDU | Reconfigurable Dataflow Unit - SambaNova’s proprietary processor architecture |
| PEF | Processor Executable Format - Compiled model binaries that run on RDUs |
| Expert | A sequence length profile configuration (for example, 8k, 16k, 32k) within a model |
| Speculative Decoding | An optimization technique using a smaller draft model to accelerate inference from a larger target model |
| Legalizer | A validation process that verifies a bundle fits within RDU memory constraints |
| BYOC | Bring Your Own Checkpoint - Support for custom or fine-tuned model checkpoints |
Concepts
Bundle Architecture
SambaStack uses a three-tier architecture for model deployment:- BundleTemplate - Defines how models can be run: available sequence length profiles, batch sizes, and PEF mappings. Also defines target and draft model relationships for speculative decoding pairs.
- Bundle - Binds a template to specific checkpoints in storage, making it ready for deployment
- BundleDeployment - Instantiates one or more replicas of a bundle on the cluster
- Reuse templates across multiple checkpoints, including Bring Your Own Checkpoint (BYOC) for custom fine-tuned models
- Bundle different checkpoints of the same underlying model while sharing the same template
- Deploy the same bundle configuration with different replica counts
- Update checkpoints without modifying deployment configurations
BundleTemplate Structure
A BundleTemplate defines the deployment capabilities for one or more models. The following example shows a multi-model template with speculative decoding configuration:Meta-Llama-3.2-1B-Instructserves as the draft model for speculative decodingMeta-Llama-3.3-70B-Instructis the target model withspec_decodingconfiguration pointing to the draft model
BundleTemplate Top-Level Fields
| Field | Required | Description |
|---|---|---|
spec.models | Yes | Defines the models and their expert configurations. See Models and Experts. |
spec.owner | Yes | Email address of the bundle template owner for tracking and notifications. |
spec.secretNames | Yes | List of Kubernetes secrets used to access artifacts. Must match secrets configured in your environment. |
spec.usePefCRs | Yes | Set to true to use PEF custom resources for deployment. |
Models and Experts
Each model in a BundleTemplate contains one or more experts, which represent sequence length profiles. Common profiles include:8k,16k,32k,64k,128k- Fixed sequence length configurationsdefault- Standard configuration when no specific length is required
Expert Configuration Parameters
Each expert contains one or more configurations with the following parameters:| Parameter | Required | Description |
|---|---|---|
batch_size | Yes | Number of concurrent requests the expert can handle. Higher values increase throughput but may increase individual request latency. Include multiple batch sizes to allow the inference engine to select the optimal configuration based on workload. |
pef | Yes | Reference to a PEF custom resource in format <pef-name>:<version>. Use version 1 unless a higher version is confirmed via kubectl describe pef. |
ckpt_sharing_uuid | Yes | Identifier for checkpoint memory sharing. PEFs with the same UUID share checkpoint memory. See Determining Checkpoint Compatibility for the recommended workflow. |
num_tokens_at_a_time | Yes | Number of tokens processed per decoding step. Values are not hard-wired to a specific PEF but tuned per model size and use case. Use 20 for standard models unless you have a clear reason to believe a different value will deliver better performance. Must be 1 for target models in speculative decoding pairs. |
resubmit_to | No | Fallback expert for requests exceeding the current expert’s sequence length (for example, resubmit to a 32k expert in case of a failure with a 16k expert). Specify the next higher sequence length profile. Omit if no higher option exists. Do not include for draft models in speculative decoding pairs. |
spec_decoding | No | Speculative decoding configuration. Only specify for target models, not draft models. |
Speculative decoding parameters
Speculative decoding parameters
Parameters within
For detailed guidance, see Speculative Decoding Deployment Guidelines.
spec_decoding (target models only):| Parameter | Description |
|---|---|
draft_model | Name of the draft model in the same BundleTemplate |
draft_expert | Expert profile of the draft model to use. Should match the sequence length of the target model expert (for example, use a 16k draft expert with a 16k target expert). |
Coming soon: Automatic
ckpt_sharing_uuid management is planned for a future release, reducing or eliminating the need for manual assignment.Bundle Structure
A Bundle binds a BundleTemplate to specific checkpoints. The following sections explain each part of the Bundle manifest.Resource Identity
Define the Bundle name and resource type:metadata.name- The Bundle name used to reference this bundle in deploymentsapiVersionandkind- Keep these values the same for all bundles
Checkpoints
Define the model checkpoints to use:| Field | Description |
|---|---|
source | GCS path pointing to the model checkpoint. Find available checkpoints in SambaStack Models. |
toolSupport | Boolean flag indicating whether this checkpoint is compatible with tools and function-calling (if supported by the product). |
Models
Map model names to checkpoints and templates:| Field | Description |
|---|---|
<model-key> (e.g., Meta-Llama-3.3-70B-Instruct) | The API model name that users will send inference requests to. Must match a name in the BundleTemplate’s spec.models section. |
checkpoint | The checkpoint alias (from spec.checkpoints) this model should use. |
template | The model template in the BundleTemplate’s spec.models to use. This value must exactly match a model name defined under spec.models in the BundleTemplate. |
Template and Secrets
Connect the Bundle to its BundleTemplate and credentials:| Field | Description |
|---|---|
template | References the BundleTemplate by its metadata.name. This connects the Bundle to the deployment configurations defined in that template. |
secretNames | Credentials used to read checkpoints from GCS. Must match the secrets configured in your environment. |
Complete Bundle Example
The following example shows a complete multi-model Bundle for speculative decoding:BundleDeployment Structure
A BundleDeployment instantiates a bundle on the cluster. For detailed deployment information, see SambaStack Setup.| Field | Description |
|---|---|
spec.bundle | Name of the Bundle to deploy |
spec.groups[].name | Name identifier for the deployment group |
spec.groups[].minReplicas | Minimum number of bundle replicas to maintain |
spec.groups[].qosList | Quality of service classes for request prioritization |
spec.owner | Email address of the deployment owner for tracking and notifications |
spec.secretNames | Credentials used to access artifacts. Must match secrets configured in your environment. |
Procedures
Identify Available PEFs
Before creating a BundleTemplate, identify the PEF resources available for your model.1
List available PEFs
List PEFs matching your model and sequence length requirements:Example:Output:
2
View PEF details
View PEF details to understand supported configurations and check for higher versions:Example output:Review the
Spec.Metadata section for:batch_size- Supported batch sizemax_seq_length- Maximum sequence lengthnum_rdus- Required RDU countrdu_arch- Required RDU architectureseq_lengths- Supported sequence lengths
Versions section to determine if a higher PEF version is available.Create a BundleTemplate
1
Create the YAML file
Create a YAML file for your BundleTemplate. For a single-model template:For multi-model templates with speculative decoding, see the BundleTemplate Structure example.
2
Apply the BundleTemplate
3
Verify creation
Determining Checkpoint Compatibility
Use the following workflow to determine validckpt_sharing_uuid groupings:
1
Assign initial UUIDs
Start by assigning a common
ckpt_sharing_uuid (for example, group1) to all PEFs in your BundleTemplate.2
Apply and inspect
Apply the bundle and check the status:
3
Check for compatibility errors
If checkpoint sharing is not valid between two PEFs, you will see a legalizer error:
4
Adjust and retry
Assign different
ckpt_sharing_uuid values to the incompatible PEFs and reapply. Repeat until the bundle validates successfully.Create a Bundle
1
Create the YAML file
2
Apply the Bundle
3
Verify legalizer validation
The legalizer automatically runs when you apply the bundle and validates whether the bundle fits in RDU memory.
- Successful validation
- Failed validation
Deploy the Bundle
1
Create a BundleDeployment
2
Apply the BundleDeployment
3
Monitor deployment status
Update or Remove a Bundle/BundleTemplate
- Update a bundle
- Remove a bundle
1
Modify the YAML file
Edit the Bundle or BundleTemplate YAML file with your changes.
2
Reapply the configuration
Troubleshooting
Legalizer Validation Failures
| Error Pattern | Cause | Resolution |
|---|---|---|
PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0) | PEFs with the same ckpt_sharing_uuid cannot share checkpoint memory | Assign different ckpt_sharing_uuid values to the incompatible PEFs |
Bundle exceeds memory constraints | Combined PEF and checkpoint size exceeds RDU memory | Reduce the number of experts or batch sizes in the template |
PEF not found: <pef-name> | Referenced PEF does not exist | Verify PEF name with kubectl get pefs |
Checkpoint Compatibility Issues
When checkpoint sharing validation fails, follow the Determining Checkpoint Compatibility procedure to identify and resolve incompatible PEF groupings.Deployment Failures
| Symptom | Possible Cause | Resolution |
|---|---|---|
| Deployment stuck in pending | Insufficient RDU resources | Check cluster capacity; reduce minReplicas |
| Checkpoint download fails | Invalid GCS path or missing credentials | Verify source path; confirm sambanova-artifact-reader secret exists |
| Model not accessible via API | Model name mismatch | Verify spec.models.<name> matches expected API endpoint |
