Custom Bundle Deployment

This guide describes how to create custom BundleTemplates and Bundles to deploy models with specific configurations on SambaStack. In SambaStack, bundles are the fundamental deployment unit. Rather than deploying individual models, you deploy bundles that group one or more models together with their deployment configurations, including batch sizes and sequence lengths. This approach uses the SambaNova Reconfigurable Dataflow Unit (RDU) to support multiple models and configurations in a single deployment, enabling instant switching between configurations for improved efficiency and flexibility.

This guide covers creating custom bundles. For deploying pre-configured bundles provided by SambaNova, see Deploying Bundles.

Prerequisites

Before creating custom bundles, complete the following that applies to you:

Quickstart - Hosted

System set up for hosted SambaStack

Quickstart - On-prem

System set up for On-prem Sambastack

Additionally, review the following documentation:

Model Bundle Deployment - Bundle deployment concepts and workflows
Supported Models and Bundles - Available model checkpoints
Speculative Decoding Deployment Guidelines - Required if configuring speculative decoding pairs

Terminology

Term	Definition
RDU	Reconfigurable Dataflow Unit - SambaNova’s proprietary processor architecture
PEF	Processor Executable Format - Compiled model binaries that run on RDUs
Expert	A sequence length profile configuration (for example, 8k, 16k, 32k) within a model
Speculative Decoding	An optimization technique using a smaller draft model to accelerate inference from a larger target model
Legalizer	A validation process that verifies a bundle fits within RDU memory constraints
CR (Custom Resource)	A Kubernetes extension that defines custom resource types such as `Pef`, `Bundle`, and `BundleTemplate`

Concepts

Bundle architecture

For an overview of core bundle concepts including what bundles and bundle templates are see Deploying Model Bundles. This section covers the specific three-tier structure used when creating custom bundles:

BundleTemplate - Defines how models can be run: available sequence length profiles, batch sizes, and PEF mappings. Also defines target and draft model relationships for speculative decoding pairs.
Bundle - Binds a template to specific checkpoints in storage, making it ready for deployment
BundleDeployment - Instantiates one or more replicas of a bundle on the cluster

This separation allows you to:

Reuse templates across multiple checkpoints, including custom checkpoints for fine-tuned models
Bundle different checkpoints of the same underlying model while sharing the same template
Deploy the same bundle configuration with different replica counts
Update checkpoints without modifying deployment configurations

BundleTemplate structure

A BundleTemplate defines the deployment capabilities for one or more models. The following example shows a multi-model template with speculative decoding configuration:

apiVersion: sambanova.ai/v1alpha1
kind: BundleTemplate
metadata:
  name: bt-gpt120-llama70sd8-llama8
spec:
  models:
    gpt-oss-120b:
      experts:
        8k:
          configs:
          - pef: gpt-oss-fp8-ss8192-bs2:1
        32k:
          configs:
          - pef: gpt-oss-fp8-ss32768-bs2:1
    Meta-Llama-3.3-70B-Instruct:
      experts:
        4k:
          configs:
          - pef: llama-3p1-70b-ss4096-bs4-sd5:3
            spec_decoding:
              draft_model: Meta-Llama-3.1-8B-Instruct
          - pef: llama-3p1-70b-ss4096-bs32-sd5:2
        8k:
          configs:
          - pef: llama-3p1-70b-ss8192-bs1-sd5:1
          - pef: llama-3p1-70b-ss8192-bs8-sd5:2
          default_config_values:
            spec_decoding:
              draft_model: Meta-Llama-3.1-8B-Instruct
        128k:
          configs:
          - pef: llama-3p1-70b-ss131072-bs1-sd5:2
    Meta-Llama-3.1-8B-Instruct:
      experts:
        4k:
          configs:
          - pef: llama-3p1-8b-ss4096-bs4:1
        8k:
          configs:
          - pef: llama-3p1-8b-ss8192-bs1:1
          - pef: llama-3p1-8b-ss8192-bs8:1
  owner: no-reply@sambanova.ai
  secretNames:
  - sambanova-artifact-reader
  usePefCRs: true

The example above includes configurations for three models:

gpt-oss-120b: 2 configs
Meta-Llama-3.3-70B-Instruct: 5 configs, 3 of which use speculative decoding with Meta-Llama-3.1-8B-Instruct as the draft model
Meta-Llama-3.1-8B-Instruct: 3 configs

For more details on the speculative decoding fields, see the Speculative Decoding Deployment Guidelines. BundleTemplate Top-Level Fields

Field	Required	Description
`spec.models`	Yes	Defines the models and their expert configurations. See Models and Experts.
`spec.owner`	Yes	Email address of the bundle template owner for tracking and notifications.
`spec.secretNames`	Yes	List of Kubernetes secrets used to access artifacts. Must match secrets configured in your environment.
`spec.usePefCRs`	Yes	Set to `true` to use PEF custom resources for deployment.

Models and Experts Each model in a BundleTemplate contains one or more experts, which represent sequence length profiles. Common profiles include:

8k, 16k, 32k, 64k, 128k - Fixed sequence length configurations
default - Standard configuration when no specific length is required

spec:
  models:
    Meta-Llama-3.1-8B-Instruct:
      experts:
        128k:
          configs:
            - <config>
        64k:
          configs:
            - <config>
        32k:
          configs:
            - <config>
        16k:
          configs:
            - <config>
        8k:
          configs:
            - <config>
        default:
          configs:
            - <config>

Expert Configuration Parameters Each expert contains one or more configurations with the following parameters:

Parameter	Required	Description
`pef`	Yes	Reference to a PEF custom resource in format `<pef-name>:<version>`. Use version `1` unless a higher version is confirmed via `kubectl describe pef`. The `<pef-name>` includes the batch size after the `bs` characters.
`spec_decoding`	No	Speculative decoding configuration. Only specify for target models, not draft models.

Speculative decoding parameters

Parameters within spec_decoding (target models only):

Parameter	Description
`draft_model`	Name of the draft model in the same BundleTemplate
`draft_expert`	Expert profile of the draft model to use. Should match the sequence length of the target model expert (for example, use a 16k draft expert with a 16k target expert).

For detailed guidance, see Speculative Decoding Deployment Guidelines.

Bundle structure

A Bundle binds a BundleTemplate to specific checkpoints. The following sections explain each part of the Bundle manifest. Resource Identity Define the Bundle name and resource type:

apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: b-70b-3dot3-ss-16-32k-bs-4

metadata.name - The Bundle name used to reference this bundle in deployments
apiVersion and kind - Keep these values the same for all bundles

Checkpoints Define the model checkpoints to use:

spec:
  checkpoints:
    GPT_OSS_120B_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true
    META_LLAMA_3_3_70B_INSTRUCT_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true
    META_LLAMA_3_1_8B_INSTRUCT_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true

Field	Description
`source`	GCS path pointing to the model checkpoint. Find available checkpoints in Model and Bundle Directory.
`toolSupport`	Boolean flag indicating whether this checkpoint is compatible with tools and function-calling (if supported by the product).

Models Map model names to checkpoints and templates:

spec:
  models:
    gpt-oss-120b:
      checkpoint: GPT_OSS_120B_CKPT
      template: gpt-oss-120b
    Meta-Llama-3.3-70B-Instruct:
      checkpoint: META_LLAMA_3_3_70B_INSTRUCT_CKPT
      template: Meta-Llama-3.3-70B-Instruct
    Meta-Llama-3.1-8B-Instruct:
      checkpoint: META_LLAMA_3_1_8B_INSTRUCT_CKPT
      template: Meta-Llama-3.1-8B-Instruct

Field	Description
`<model-key>` (for example, `Meta-Llama-3.3-70B-Instruct`)	The API model name that users will send inference requests to. Must match a name in the BundleTemplate’s `spec.models` section.
`checkpoint`	The checkpoint alias (from `spec.checkpoints`) this model should use.
`template`	The model template in the BundleTemplate’s `spec.models` to use. This value must exactly match a model name defined under `spec.models` in the BundleTemplate.

Template and Secrets Connect the Bundle to its BundleTemplate and credentials:

spec:
  template: bt-gpt120-llama70sd8-llama8
  secretNames:
  - sambanova-artifact-reader

Field	Description
`template`	References the BundleTemplate by its `metadata.name`. This connects the Bundle to the deployment configurations defined in that template.
`secretNames`	Credentials used to read checkpoints from GCS. Must match the secrets configured in your environment.

Complete Bundle Example The following example shows a complete multi-model Bundle:

apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: b-gpt120-llama70sd8-llama8
spec:
  checkpoints:
    GPT_OSS_120B_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true
    META_LLAMA_3_3_70B_INSTRUCT_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true
    META_LLAMA_3_1_8B_INSTRUCT_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true
  models:
    gpt-oss-120b:
      checkpoint: GPT_OSS_120B_CKPT
      template: gpt-oss-120b
    Meta-Llama-3.3-70B-Instruct:
      checkpoint: META_LLAMA_3_3_70B_INSTRUCT_CKPT
      template: Meta-Llama-3.3-70B-Instruct
    Meta-Llama-3.1-8B-Instruct:
      checkpoint: META_LLAMA_3_1_8B_INSTRUCT_CKPT
      template: Meta-Llama-3.1-8B-Instruct
  secretNames:
  - sambanova-artifact-reader
  template: bt-gpt120-llama70sd8-llama8

The paths to checkpoints hosted by SambaNova will be provided to you by your SambaNova contact. If you have hosted your own checkpoints, you can include those paths in the source fields above.

BundleDeployment structure

A BundleDeployment instantiates a bundle on the cluster. For detailed deployment information, see Quickstart - Hosted or Quickstart - On-prem.

apiVersion: sambanova.ai/v1alpha1
kind: BundleDeployment
metadata:
  name: bd-gpt120-llama70sd8-llama8
spec:
  bundle: b-gpt120-llama70sd8-llama8
  groups:
  - minReplicas: 1
    name: default
    qosList:
    - free
  owner: no-reply@sambanova.ai
  secretNames:
  - sambanova-artifact-reader

Field	Description
`spec.bundle`	Name of the Bundle to deploy
`spec.groups[].name`	Name identifier for the deployment group
`spec.groups[].minReplicas`	Minimum number of bundle replicas to maintain
`spec.groups[].qosList`	Quality of service classes for request prioritization
`spec.owner`	Email address of the deployment owner for tracking and notifications
`spec.secretNames`	Credentials used to access artifacts. Must match secrets configured in your environment.

PEF and checkpoint lifecycle status

SambaStack assigns a pef_status field to PEF CR versions and a checkpoint_status field to model CR checkpoint versions to indicate their support lifecycle. Understanding these statuses helps you make informed decisions when selecting PEF or checkpoint versions for custom bundles. PEF and checkpoint version status values Each version entry in a PEF CR includes a pef_status field. Checkpoint CR versions use checkpoint_status. Both share the same set of values:

Status	Description
`preview`	Not fully tested or supported. May have unknown reliability or performance issues, or limited functionality (for example, partial function calling support). Not recommended for production workloads.
`stable`	Fully supported and tested.
`deprecated`	Has known reliability or performance issues. Still available for a limited transition period (up to 3 months from the deprecation announcement) to allow migration to a stable version.
`removed`	No longer usable. The version entry is retained in the PEF CR or model CR for traceability and auditability, but the path may no longer exist, causing deployment to fail if referenced.

Example PEF CR versions with status

versions:
  '1':
    source: gs://ext-sambastack-artifacts-prod-0/path/to/pef_v1.pef
    pef_status: deprecated
  '2':
    source: gs://ext-sambastack-artifacts-prod-0/path/to/pef_v2.pef
    pef_status: stable

To check version statuses, run kubectl describe pef <pef-name> or kubectl describe model <model-name> and review the pef_status or checkpoint_status field in the Versions section.

The following procedures describe the step-by-step workflow for creating and deploying custom bundles using the concepts and structures described above.

Procedures

Identify available PEFs

Before creating a BundleTemplate, identify the PEF resources available for your model.

List available PEFs

List PEFs matching your model and sequence length requirements:

Hosted
On Premise

kubectl get pefs | grep <model-pattern>

Example:

kubectl get pefs | grep llama-3p1-70b-ss4096

kubectl -n <namespace> get pefs.sambanova.ai | grep <model-pattern>

Example:

kubectl -n <namespace> get pefs.sambanova.ai | grep llama-3p1-70b-ss4096

Output:

llama-3p1-70b-ss4096-bs1-sd9     17h
llama-3p1-70b-ss4096-bs16-sd5    17h
llama-3p1-70b-ss4096-bs2-sd5     17h
llama-3p1-70b-ss4096-bs32-sd5    17h
llama-3p1-70b-ss4096-bs4-sd5     17h
llama-3p1-70b-ss4096-bs8-sd5     17h

View PEF details

View PEF details to understand supported configurations and check for higher versions:

Hosted
On Premise

kubectl describe pef <pef-name>

Example output:

$ kubectl describe pef deepseek-ss131072-bs1
Name:         deepseek-ss131072-bs1
...
Spec:
  copy_pef:                gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/copy_pef
  copy_pef_name_override:  COPY_PEF_DEEPSEEK_R1_128K_PEF_BS1
  Metadata:
    batch_size:             1
    is_prompt_caching:      false
    job_type:               infer
    max_completion_tokens:  131072
    max_seq_length:         131072
    num_rdus:               16
    rdu_arch:               SN40L-16
    seq_lengths:
      65536
      131072
  model_arch:         deepseek
  pef_name_override:  DEEPSEEK_R1_128K_PEF_BS1
  Versions:
    1:
      cached_path:  gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v1.pef
      pef_status:   deprecated
      Source:       gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v1.pef
    2:
      cached_path:  gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v2.pef
      pef_status:   stable
      Source:       gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v2.pef

kubectl -n <namespace> describe pef.sambanova.ai <pef-name>

Example output:

$ kubectl -n <namespace> describe pef.sambanova.ai deepseek-ss131072-bs1
Name:         deepseek-ss131072-bs1
...
Spec:
  copy_pef:                gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/copy_pef
  copy_pef_name_override:  COPY_PEF_DEEPSEEK_R1_128K_PEF_BS1
  Metadata:
    batch_size:             1
    is_prompt_caching:      false
    job_type:               infer
    max_completion_tokens:  131072
    max_seq_length:         131072
    num_rdus:               16
    rdu_arch:               SN40L-16
    seq_lengths:
      65536
      131072
  model_arch:         deepseek
  pef_name_override:  DEEPSEEK_R1_128K_PEF_BS1
  Versions:
    1:
      cached_path:  gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v1.pef
      pef_status:   deprecated
      Source:       gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v1.pef
    2:
      cached_path:  gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v2.pef
      pef_status:   stable
      Source:       gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/pef_v2.pef

Review the Spec.Metadata section for:

batch_size - Supported batch size
max_seq_length - Maximum sequence length
num_rdus - Required RDU count
rdu_arch - Required RDU architecture
seq_lengths - Supported sequence lengths

Check the Versions section to determine if a higher PEF version is available and to review each version’s pef_status before referencing it in a BundleTemplate.

Use version 1 in your PEF references (for example, llama-3p1-70b-ss16384-bs4-sd5:1) unless kubectl describe pef confirms a higher version is available.

Create a BundleTemplate

Create the YAML file

Create a YAML file for your BundleTemplate. For a single-model template:

apiVersion: sambanova.ai/v1alpha1
kind: BundleTemplate
metadata:
  name: bt-gpt120
spec:
  models:
    gpt-oss-120b:
      experts:
        8k:
          configs:
          - pef: gpt-oss-fp8-ss8192-bs2:1
        32k:
          configs:
          - pef: gpt-oss-fp8-ss32768-bs2:1
        64k:
          configs:
          - pef: gpt-oss-fp8-ss65536-bs2:1
        128k:
          configs:
          - pef: gpt-oss-fp8-ss131072-bs2:1
  owner: no-reply@sambanova.ai
  secretNames:
  - sambanova-artifact-reader
  usePefCRs: true

For multi-model templates with speculative decoding, see the BundleTemplate Structure example.

Apply the BundleTemplate

Hosted
On Premise

kubectl apply -f <bundletemplate-file>.yaml

kubectl -n <namespace> apply -f <bundletemplate-file>.yaml

Verify creation

Hosted
On Premise

kubectl get bundletemplates

kubectl -n <namespace> get bundletemplates.sambanova.ai <bundle-name>

Including multiple batch sizes for each expert allows the inference engine to select the smallest and fastest configuration based on current workload.

Create a bundle

Create the YAML file

Create a YAML file for your Bundle:

apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: b-gpt120
spec:
  checkpoints:
    GPT_OSS_120B_CKPT:
      source: gs://<SAMBASTACK_ARTIFACTS_BUCKET>/path/to/checkpoint
      toolSupport: true
  models:
    gpt-oss-120b:
      checkpoint: GPT_OSS_120B_CKPT
      template: gpt-oss-120b
  secretNames:
  - sambanova-artifact-reader
  template: bt-gpt120

For multi-model bundles, see the Complete Bundle Example.

Apply the Bundle

Hosted
On Premise

kubectl apply -f <bundle-file>.yaml

kubectl -n <namespace> apply -f <bundle-file>.yaml

Verify legalizer validation

The legalizer automatically runs when you apply the bundle and validates whether the bundle fits in RDU memory.

Hosted
On Premise

kubectl describe bundle <bundle-name>

kubectl -n <namespace> describe bundle.sambanova.ai <bundle-name>

Successful validation
Failed validation

Status:
  Conditions:
    Last Transition Time:  2025-12-22T21:11:05.689262+00:00
    Message:               Bundle is Valid
    Observed Generation:   1
    Reason:                ValidationSucceeded
    Status:                True
    Type:                  Valid

Status:
  Conditions:
    Last Transition Time:  2025-12-22T21:11:54.975311+00:00
    Message:               <error-details>
    Reason:                ValidationFailed
    Status:                False
    Type:                  Valid

The Message field contains error details, including legalizer errors if any.

Do not proceed to deployment until the bundle shows ValidationSucceeded.

Deploy the bundle

Create a BundleDeployment

apiVersion: sambanova.ai/v1alpha1
kind: BundleDeployment
metadata:
  name: bd-gpt120
spec:
  bundle: b-gpt120
  groups:
  - minReplicas: 1
    name: default
    qosList:
    - free
  owner: no-reply@sambanova.ai
  secretNames:
  - sambanova-artifact-reader

Apply the BundleDeployment

Hosted
On Premise

kubectl apply -f <bundledeployment-file>.yaml

kubectl -n <namespace> apply -f <bundledeployment-file>.yaml

Monitor deployment status

Hosted
On Premise

kubectl get bundledeployments
kubectl describe bundledeployment <deployment-name>

kubectl -n <namespace> get bundledeployments.sambanova.ai
kubectl -n <namespace> describe bundledeployment.sambanova.ai <deployment-name>

Update or remove a bundle/BundleTemplate

Update a bundle
Remove a bundle

Modify the YAML file

Edit the Bundle or BundleTemplate YAML file with your changes.

Reapply the configuration

Hosted
On Premise

kubectl apply -f <modified-file>.yaml

kubectl -n <namespace> apply -f <modified-file>.yaml

The legalizer automatically revalidates the changes.

Delete the BundleDeployment

Hosted
On Premise

kubectl delete bundledeployment <deployment-name>

kubectl -n <namespace> delete bundledeployment <deployment-name>

Delete the Bundle

Hosted
On Premise

kubectl delete bundle <bundle-name>

kubectl -n <namespace> delete bundle <bundle-name>

Delete the BundleTemplate (optional)

Hosted
On Premise

kubectl delete bundletemplate <template-name>

kubectl -n <namespace> delete bundletemplate <template-name>

Troubleshooting

Legalizer validation failures

Error Pattern	Cause	Resolution
`PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0)`	PEFs with the same `ckpt_sharing_uuid` cannot share checkpoint memory	Assign different `ckpt_sharing_uuid` values to the incompatible PEFs
`Bundle exceeds memory constraints`	Combined PEF and checkpoint size exceeds RDU memory	Reduce the number of experts or batch sizes in the template
`PEF not found: <pef-name>`	Referenced PEF does not exist	Verify PEF name with `kubectl get pefs`

Deployment failures

Symptom	Possible Cause	Resolution
Deployment stuck in pending	Insufficient RDU resources	Check cluster capacity; reduce `minReplicas`
Checkpoint download fails	Invalid GCS path or missing credentials	Verify `source` path; confirm `sambanova-artifact-reader` secret exists
Model not accessible via API	Model name mismatch	Verify `spec.models.<name>` matches expected API endpoint

Model Deployment

Bundle deployment concepts and workflows

Supported Models and Bundles

Catalogue of models and bundles available for deployment

Custom checkpoint deployment

Deploy your own custom or fine-tuned checkpoints

Checkpoint Conversion Tool

Convert Checkpoints to Compatible formats

Getting Started

Service Administration

Hardware Administration

Reference Architecture

Resources

Prerequisites

Quickstart - Hosted

Quickstart - On-prem

Terminology

Concepts

Bundle architecture

BundleTemplate structure

Bundle structure

BundleDeployment structure

PEF and checkpoint lifecycle status

Procedures

Identify available PEFs

Create a BundleTemplate

Create a bundle

Deploy the bundle

Update or remove a bundle/BundleTemplate

Troubleshooting

Legalizer validation failures

Deployment failures

Model Deployment

Supported Models and Bundles

Custom checkpoint deployment

Checkpoint Conversion Tool

Getting Started

Service Administration

Hardware Administration

Reference Architecture

Resources

Documentation Index

​Prerequisites

Quickstart - Hosted

Quickstart - On-prem

​Terminology

​Concepts

​Bundle architecture

​BundleTemplate structure

​Bundle structure

​BundleDeployment structure

​PEF and checkpoint lifecycle status

​Procedures

​Identify available PEFs

​Create a BundleTemplate

​Create a bundle

​Deploy the bundle

​Update or remove a bundle/BundleTemplate

​Troubleshooting

​Legalizer validation failures

​Deployment failures

​Related documentation

Model Deployment

Supported Models and Bundles

Custom checkpoint deployment

Checkpoint Conversion Tool

Prerequisites

Terminology

Concepts

Bundle architecture

BundleTemplate structure

Bundle structure

BundleDeployment structure

PEF and checkpoint lifecycle status

Procedures

Identify available PEFs

Create a BundleTemplate

Create a bundle

Deploy the bundle

Update or remove a bundle/BundleTemplate

Troubleshooting

Legalizer validation failures

Deployment failures

Related documentation