Skip to main content
This guide describes how to create custom BundleTemplates and Bundles to deploy models with specific configurations on SambaStack. In SambaStack, bundles are the fundamental deployment unit. Rather than deploying individual models, you deploy bundles that group one or more models together with their deployment configurations, including batch sizes and sequence lengths. This approach uses the SambaNova Reconfigurable Dataflow Unit (RDU) to support multiple models and configurations in a single deployment, enabling instant switching between configurations for improved efficiency and flexibility.
This guide covers creating custom bundles. For deploying pre-configured bundles provided by SambaNova, see Model Deployment.

Prerequisites

Before creating custom bundles, complete the following: Additionally, review the following documentation:

Terminology

TermDefinition
RDUReconfigurable Dataflow Unit - SambaNova’s proprietary processor architecture
PEFProcessor Executable Format - Compiled model binaries that run on RDUs
ExpertA sequence length profile configuration (for example, 8k, 16k, 32k) within a model
Speculative DecodingAn optimization technique using a smaller draft model to accelerate inference from a larger target model
LegalizerA validation process that verifies a bundle fits within RDU memory constraints
BYOCBring Your Own Checkpoint - Support for custom or fine-tuned model checkpoints

Concepts

Bundle Architecture

SambaStack uses a three-tier architecture for model deployment:
  1. BundleTemplate - Defines how models can be run: available sequence length profiles, batch sizes, and PEF mappings. Also defines target and draft model relationships for speculative decoding pairs.
  2. Bundle - Binds a template to specific checkpoints in storage, making it ready for deployment
  3. BundleDeployment - Instantiates one or more replicas of a bundle on the cluster
This separation allows you to:
  • Reuse templates across multiple checkpoints, including Bring Your Own Checkpoint (BYOC) for custom fine-tuned models
  • Bundle different checkpoints of the same underlying model while sharing the same template
  • Deploy the same bundle configuration with different replica counts
  • Update checkpoints without modifying deployment configurations

BundleTemplate Structure

A BundleTemplate defines the deployment capabilities for one or more models. The following example shows a multi-model template with speculative decoding configuration:
apiVersion: sambanova.ai/v1alpha1
kind: BundleTemplate
metadata:
  name: bt-70b-3dot3-ss-16-32k-bs-4
spec:
  models:
    Meta-Llama-3.2-1B-Instruct:
      experts:
        16k:
          configs:
          - batch_size: 4
            ckpt_sharing_uuid: id1
            num_tokens_at_a_time: 20
            pef: llama-3p1-1b-ss16384-bs4:1
        32k:
          configs:
          - batch_size: 4
            ckpt_sharing_uuid: id2
            num_tokens_at_a_time: 20
            pef: llama-3p1-1b-ss32768-bs4:1          
    Meta-Llama-3.3-70B-Instruct:
      experts:
        16k:
          configs:
          - batch_size: 4
            ckpt_sharing_uuid: id3
            num_tokens_at_a_time: 1
            pef: llama-3p1-70b-ss16384-bs4-sd5:1
            resubmit_to: Meta-Llama-3.3-70B-Instruct-32k
            spec_decoding:
              draft_expert: 16k
              draft_model: Meta-Llama-3.2-1B-Instruct
        32k:
          configs:
          - batch_size: 4
            ckpt_sharing_uuid: id4
            num_tokens_at_a_time: 1
            pef: llama-3p1-70b-ss32768-bs4-sd5:1
            resubmit_to: Meta-Llama-3.3-70B-Instruct-64k
            spec_decoding:
              draft_expert: 32k
              draft_model: Meta-Llama-3.2-1B-Instruct
  owner: no-reply@sambanova.ai
  secretNames:
  - sambanova-artifact-reader
  usePefCRs: true
In this example:
  • Meta-Llama-3.2-1B-Instruct serves as the draft model for speculative decoding
  • Meta-Llama-3.3-70B-Instruct is the target model with spec_decoding configuration pointing to the draft model

BundleTemplate Top-Level Fields

FieldRequiredDescription
spec.modelsYesDefines the models and their expert configurations. See Models and Experts.
spec.ownerYesEmail address of the bundle template owner for tracking and notifications.
spec.secretNamesYesList of Kubernetes secrets used to access artifacts. Must match secrets configured in your environment.
spec.usePefCRsYesSet to true to use PEF custom resources for deployment.

Models and Experts

Each model in a BundleTemplate contains one or more experts, which represent sequence length profiles. Common profiles include:
  • 8k, 16k, 32k, 64k, 128k - Fixed sequence length configurations
  • default - Standard configuration when no specific length is required
spec:
  models:
    Meta-Llama-3.1-8B-Instruct:
      experts:
        128k:
          configs:
            - <config>
        64k:
          configs:
            - <config>
        32k:
          configs:
            - <config>
        16k:
          configs:
            - <config>
        8k:
          configs:
            - <config>
        default:
          configs:
            - <config>

Expert Configuration Parameters

Each expert contains one or more configurations with the following parameters:
ParameterRequiredDescription
batch_sizeYesNumber of concurrent requests the expert can handle. Higher values increase throughput but may increase individual request latency. Include multiple batch sizes to allow the inference engine to select the optimal configuration based on workload.
pefYesReference to a PEF custom resource in format <pef-name>:<version>. Use version 1 unless a higher version is confirmed via kubectl describe pef.
ckpt_sharing_uuidYesIdentifier for checkpoint memory sharing. PEFs with the same UUID share checkpoint memory. See Determining Checkpoint Compatibility for the recommended workflow.
num_tokens_at_a_timeYesNumber of tokens processed per decoding step. Values are not hard-wired to a specific PEF but tuned per model size and use case. Use 20 for standard models unless you have a clear reason to believe a different value will deliver better performance. Must be 1 for target models in speculative decoding pairs.
resubmit_toNoFallback expert for requests exceeding the current expert’s sequence length (for example, resubmit to a 32k expert in case of a failure with a 16k expert). Specify the next higher sequence length profile. Omit if no higher option exists. Do not include for draft models in speculative decoding pairs.
spec_decodingNoSpeculative decoding configuration. Only specify for target models, not draft models.
Parameters within spec_decoding (target models only):
ParameterDescription
draft_modelName of the draft model in the same BundleTemplate
draft_expertExpert profile of the draft model to use. Should match the sequence length of the target model expert (for example, use a 16k draft expert with a 16k target expert).
For detailed guidance, see Speculative Decoding Deployment Guidelines.
Coming soon: Automatic ckpt_sharing_uuid management is planned for a future release, reducing or eliminating the need for manual assignment.

Bundle Structure

A Bundle binds a BundleTemplate to specific checkpoints. The following sections explain each part of the Bundle manifest.

Resource Identity

Define the Bundle name and resource type:
apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: b-70b-3dot3-ss-16-32k-bs-4
  • metadata.name - The Bundle name used to reference this bundle in deployments
  • apiVersion and kind - Keep these values the same for all bundles

Checkpoints

Define the model checkpoints to use:
spec:
  checkpoints:
    LLAMA3D2_1B_CKPT:
      source: gs://ext-sambastack-artifacts-prod-0/version/0.1.0/pefs-checkpoints/ckpts/meta-llama-Llama-3.2-1B-Instruct_untie
    LLAMA3_70B_3_3_CKPT:
      source: gs://ext-sambastack-artifacts-prod-0/version/0.1.0/pefs-checkpoints/ckpts/Llama-3.3-70B-Instruct
      toolSupport: true
FieldDescription
sourceGCS path pointing to the model checkpoint. Find available checkpoints in SambaStack Models.
toolSupportBoolean flag indicating whether this checkpoint is compatible with tools and function-calling (if supported by the product).

Models

Map model names to checkpoints and templates:
spec:
  models:
    Meta-Llama-3.2-1B-Instruct:
      checkpoint: LLAMA3D2_1B_CKPT
      template: Meta-Llama-3.2-1B-Instruct
    Meta-Llama-3.3-70B-Instruct:
      checkpoint: LLAMA3_70B_3_3_CKPT
      template: Meta-Llama-3.3-70B-Instruct
FieldDescription
<model-key> (e.g., Meta-Llama-3.3-70B-Instruct)The API model name that users will send inference requests to. Must match a name in the BundleTemplate’s spec.models section.
checkpointThe checkpoint alias (from spec.checkpoints) this model should use.
templateThe model template in the BundleTemplate’s spec.models to use. This value must exactly match a model name defined under spec.models in the BundleTemplate.

Template and Secrets

Connect the Bundle to its BundleTemplate and credentials:
spec:
  template: bt-70b-3dot3-ss-16-32k-bs-4
  secretNames:
  - sambanova-artifact-reader
FieldDescription
templateReferences the BundleTemplate by its metadata.name. This connects the Bundle to the deployment configurations defined in that template.
secretNamesCredentials used to read checkpoints from GCS. Must match the secrets configured in your environment.

Complete Bundle Example

The following example shows a complete multi-model Bundle for speculative decoding:
apiVersion: sambanova.ai/v1alpha1
kind: Bundle
metadata:
  name: b-70b-3dot3-ss-16-32k-bs-4
spec:
  checkpoints:
    LLAMA3D2_1B_CKPT:
      source: gs://ext-sambastack-artifacts-prod-0/version/0.1.0/pefs-checkpoints/ckpts/meta-llama-Llama-3.2-1B-Instruct_untie
    LLAMA3_70B_3_3_CKPT:
      source: gs://ext-sambastack-artifacts-prod-0/version/0.1.0/pefs-checkpoints/ckpts/Llama-3.3-70B-Instruct
      toolSupport: true
  models:
    Meta-Llama-3.2-1B-Instruct:
      checkpoint: LLAMA3D2_1B_CKPT
      template: Meta-Llama-3.2-1B-Instruct
    Meta-Llama-3.3-70B-Instruct:
      checkpoint: LLAMA3_70B_3_3_CKPT
      template: Meta-Llama-3.3-70B-Instruct
  secretNames:
  - sambanova-artifact-reader
  template: bt-70b-3dot3-ss-16-32k-bs-4

BundleDeployment Structure

A BundleDeployment instantiates a bundle on the cluster. For detailed deployment information, see SambaStack Setup.
apiVersion: sambanova.ai/v1alpha1
kind: BundleDeployment
metadata:
  name: bd-70b-3dot3-ss-16-32k-bs-4
spec:
  bundle: b-70b-3dot3-ss-16-32k-bs-4
  groups:
  - minReplicas: 1
    name: default
    qosList:
    - free 
  owner: no-reply@sambanova.ai
  secretNames:
  - sambanova-artifact-reader
FieldDescription
spec.bundleName of the Bundle to deploy
spec.groups[].nameName identifier for the deployment group
spec.groups[].minReplicasMinimum number of bundle replicas to maintain
spec.groups[].qosListQuality of service classes for request prioritization
spec.ownerEmail address of the deployment owner for tracking and notifications
spec.secretNamesCredentials used to access artifacts. Must match secrets configured in your environment.

Procedures

Identify Available PEFs

Before creating a BundleTemplate, identify the PEF resources available for your model.
1

List available PEFs

List PEFs matching your model and sequence length requirements:
    kubectl get pefs | grep <model-pattern>
Example:
    kubectl get pefs | grep llama-3p1-70b-ss4096
Output:
    llama-3p1-70b-ss4096-bs1-sd9     17h
    llama-3p1-70b-ss4096-bs16-sd5    17h
    llama-3p1-70b-ss4096-bs2-sd5     17h
    llama-3p1-70b-ss4096-bs32-sd5    17h
    llama-3p1-70b-ss4096-bs4-sd5     17h
    llama-3p1-70b-ss4096-bs8-sd5     17h
2

View PEF details

View PEF details to understand supported configurations and check for higher versions:
    kubectl describe pef <pef-name>
Example output:
    $ kubectl describe pef deepseek-ss131072-bs1
    Name:         deepseek-ss131072-bs1
    ...
    Spec:
      copy_pef:                gs://ext-sambastack-artifacts-prod-0/version/deepseek-0.0.4/pefs-checkpoints/pefs/PEF_1782/coe_pef_bs1_ss1024_4096_8192_16384_65536_131072/copy_out/copy_out.pef
      copy_pef_name_override:  COPY_PEF_DEEPSEEK_R1_128K_PEF_BS1
      Metadata:
        batch_size:             1
        is_prompt_caching:      false
        job_type:               infer
        max_completion_tokens:  131072
        max_seq_length:         131072
        num_rdus:               16
        rdu_arch:               SN40L-16
        seq_lengths:
          65536
          131072
      model_arch:         deepseek
      pef_name_override:  DEEPSEEK_R1_128K_PEF_BS1
      Versions:
    ...
Review the Spec.Metadata section for:
  • batch_size - Supported batch size
  • max_seq_length - Maximum sequence length
  • num_rdus - Required RDU count
  • rdu_arch - Required RDU architecture
  • seq_lengths - Supported sequence lengths
Check the Versions section to determine if a higher PEF version is available.
Use version 1 in your PEF references (for example, llama-3p1-70b-ss16384-bs4-sd5:1) unless kubectl describe pef confirms a higher version is available.

Create a BundleTemplate

1

Create the YAML file

Create a YAML file for your BundleTemplate. For a single-model template:
    apiVersion: sambanova.ai/v1alpha1
    kind: BundleTemplate
    metadata:
      name: bt-llama3-8b-custom
    spec:
      models:
        Meta-Llama-3.1-8B-Instruct:
          experts:
            16k:
              configs:
                - batch_size: 4
                  ckpt_sharing_uuid: group1
                  num_tokens_at_a_time: 20
                  pef: llama-3p1-8b-ss16384-bs4:1
                  resubmit_to: Meta-Llama-3.1-8B-Instruct-32k
            32k:
              configs:
                - batch_size: 4
                  ckpt_sharing_uuid: group1
                  num_tokens_at_a_time: 20
                  pef: llama-3p1-8b-ss32768-bs4:1
      owner: admin@example.com
      secretNames:
        - sambanova-artifact-reader
      usePefCRs: true
For multi-model templates with speculative decoding, see the BundleTemplate Structure example.
2

Apply the BundleTemplate

    kubectl apply -f <bundletemplate-file>.yaml
3

Verify creation

    kubectl get bundletemplates
Including multiple batch sizes for each expert allows the inference engine to select the smallest and fastest configuration based on current workload.

Determining Checkpoint Compatibility

Use the following workflow to determine valid ckpt_sharing_uuid groupings:
1

Assign initial UUIDs

Start by assigning a common ckpt_sharing_uuid (for example, group1) to all PEFs in your BundleTemplate.
2

Apply and inspect

Apply the bundle and check the status:
    kubectl apply -f <bundle-file>.yaml
    kubectl describe bundle <bundle-name>
3

Check for compatibility errors

If checkpoint sharing is not valid between two PEFs, you will see a legalizer error:
    PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0)
4

Adjust and retry

Assign different ckpt_sharing_uuid values to the incompatible PEFs and reapply. Repeat until the bundle validates successfully.

Create a Bundle

1

Create the YAML file

Create a YAML file for your Bundle:
    apiVersion: sambanova.ai/v1alpha1
    kind: Bundle
    metadata:
      name: b-llama3-8b-custom
    spec:
      checkpoints:
        LLAMA3_8B_CKPT:
          source: gs://ext-sambastack-artifacts-prod-0/version/0.1.0/pefs-checkpoints/ckpts/Meta-Llama-3.1-8B-Instruct
          toolSupport: true
      models:
        Meta-Llama-3.1-8B-Instruct:
          checkpoint: LLAMA3_8B_CKPT
          template: Meta-Llama-3.1-8B-Instruct
      template: bt-llama3-8b-custom
      secretNames:
        - sambanova-artifact-reader
For multi-model bundles, see the Complete Bundle Example.
2

Apply the Bundle

    kubectl apply -f <bundle-file>.yaml
3

Verify legalizer validation

The legalizer automatically runs when you apply the bundle and validates whether the bundle fits in RDU memory.
    kubectl describe bundle <bundle-name>
        Status:
          Conditions:
            Last Transition Time:  2025-12-22T21:11:05.689262+00:00
            Message:               Bundle is Valid
            Observed Generation:   1
            Reason:                ValidationSucceeded
            Status:                True
            Type:                  Valid
Do not proceed to deployment until the bundle shows ValidationSucceeded.

Deploy the Bundle

1

Create a BundleDeployment

    apiVersion: sambanova.ai/v1alpha1
    kind: BundleDeployment
    metadata:
      name: bd-llama3-8b-custom
    spec:
      bundle: b-llama3-8b-custom
      groups:
        - name: default
          minReplicas: 1
          qosList:
            - free
      owner: admin@example.com
      secretNames:
        - sambanova-artifact-reader
2

Apply the BundleDeployment

    kubectl apply -f <bundledeployment-file>.yaml
3

Monitor deployment status

    kubectl get bundledeployments
    kubectl describe bundledeployment <deployment-name>

Update or Remove a Bundle/BundleTemplate

1

Modify the YAML file

Edit the Bundle or BundleTemplate YAML file with your changes.
2

Reapply the configuration

        kubectl apply -f <modified-file>.yaml
The legalizer automatically revalidates the changes.

Troubleshooting

Legalizer Validation Failures

Error PatternCauseResolution
PEF pef1 and pef2 are not checkpoint compatible (checkpoint #0)PEFs with the same ckpt_sharing_uuid cannot share checkpoint memoryAssign different ckpt_sharing_uuid values to the incompatible PEFs
Bundle exceeds memory constraintsCombined PEF and checkpoint size exceeds RDU memoryReduce the number of experts or batch sizes in the template
PEF not found: <pef-name>Referenced PEF does not existVerify PEF name with kubectl get pefs

Checkpoint Compatibility Issues

When checkpoint sharing validation fails, follow the Determining Checkpoint Compatibility procedure to identify and resolve incompatible PEF groupings.

Deployment Failures

SymptomPossible CauseResolution
Deployment stuck in pendingInsufficient RDU resourcesCheck cluster capacity; reduce minReplicas
Checkpoint download failsInvalid GCS path or missing credentialsVerify source path; confirm sambanova-artifact-reader secret exists
Model not accessible via APIModel name mismatchVerify spec.models.<name> matches expected API endpoint