SambaStack Checkpoint conversion tool - SambaNova Documentation

The Checkpoint Conversion Tool is a utility that converts external model artifacts, such as checkpoints, into the SambaNova-compatible format required for deployment on SambaRack and SambaCloud systems. The tool retrieves platform-specific metadata so it can stay aligned with the model architectures and releases supported by your SambaStack instance.

Checkpoint conversion is a substep of deploying custom checkpoints on SambaStack or SambaCloud. See the Deploying custom checkpoints page for the high-level workflow.

Overview

The Checkpoint Conversion Tool provides two primary capabilities:

Checkpoint conversion: Transforms HuggingFace-format checkpoints into SambaNova-compatible format for deployment on SN40L hardware
Speculative decoding validation: Verifies whether a converted draft checkpoint is compatible with a target checkpoint for speculative decoding deployments

Prerequisites

System requirements

Before using the Checkpoint Conversion Tool, ensure your system meets the following requirements:

Requirement	Specification
Memory	At least 2.5x the size of your checkpoint’s SafeTensor or .bin files. For example, a 10GB checkpoint requires a minimum of 25GB available memory.
Storage	Equal to the input checkpoint size for the output. For example, converting a 14GB checkpoint requires 14GB of free storage for the converted output (plus the original).
Operating System	macOS or Linux-based operating systems. Windows OS is not supported.

Estimated conversion times

Checkpoint Size	Example Model	Estimated Time
~8GB	Meta-Llama-3.3-8B-Instruct	~5 minutes
~140GB	Meta-Llama-3.3-70B-Instruct	~1 hour

We recommend running the conversion locally on your machine or workspace that has access to your checkpoint’s storage mount. A cloud compute instance can also be used, but note that data transfer of checkpoints can take a long time.

Required software

Docker Desktop or Docker Engine - Installation guide
Google Cloud CLI - Installation guide

Required access

Network access to your SambaStack instance endpoint
Authentication credentials for Google Cloud

Supported models and checkpoint formats

Supported model architectures

Custom checkpoint deployment is supported for a growing set of base models. To check whether custom checkpoints are supported for a model family, use the Supported models list and check the Import checkpoint field in the Features and optimizations column.

Checkpoint format requirements

Checkpoints are accepted in the HuggingFace format. The tensors should be in the safetensors format and the checkpoint directory should contain the same relevant config files as the base model for the custom checkpoint. For example, if the custom checkpoint is a finetuned variant of meta-llama/Llama-3.3-70B-Instruct, then the custom checkpoint should contain files such as:

config.json
generation_config.json
model-00001-of-00030.safetensors, …, model-00030-of-00030.safetensors
model.safetensors.index.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json

Checkpoint compatibility

Given that a checkpoint is fine-tuned or derived from one of the supported models for your platform, checkpoints are compatible when their computational graph has not been modified from the original checkpoint (i.e., tensor weights and shapes). Aspects that must remain unchanged:

Number of attention heads
Rope type (rope theta)
Model vocabulary size
Optimizer type
Static architectural attributes in config.json such as: head_dim, hidden_act, intermediate_size, attention_bias, attention_dropout, vocab_size

Aspects that can be modified:

Model weights or model weight tensor values
Tokenizer and vocabulary (as long as it retains the exact vocabulary size of the original model checkpoint—useful for multilingual use-cases)

It can be helpful to think about this in terms of a static graph. Aspects of a model that are typically static in engines such as TensorRT-LLM are also static for custom checkpoints.

Practical compatibility examples

Take the base model meta-llama/Llama-3.3-70B-Instruct (a base model supported by SambaNova). The following checkpoints use the same computational graph as the original 70B model and can be converted and deployed on SambaNova platforms:

These checkpoints have undergone updates to their model weights, which have been adjusted and refined to improve performance or adapt to specific tasks or datasets.

Download and set up

The Checkpoint Conversion Tool is distributed as a Docker container that encapsulates all conversion and validation utilities. Container Path:

us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12

Install Docker Desktop or Docker Engine

Install the Docker engine in your conversion environment. You can follow the official Docker Engine Installation guide.After installation, start Docker (this can be done via the desktop application on macOS).

Install Google Cloud CLI

Install Google Cloud CLI in your conversion environment. You can follow the official Google Cloud CLI Installation guide.

Authenticate and pull the container image

First, configure the Docker client to authenticate with us-docker.pkg.dev (one-time setup):

gcloud auth configure-docker us-docker.pkg.dev

Authenticate with Google Cloud:

gcloud auth login

Pull the Docker image:

docker pull us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12

Sync model metadata

This step downloads platform-specific model metadata from your SambaStack instance and stores it locally. The tool uses this metadata to perform checkpoint conversions. You can additionally inspect the model metadata to understand how artifacts are converted to run on SambaNova’s SN40L hardware.The metadata is fetched and cached by running the Checkpoint Conversion Tool container with the download-serving-cache command.

Command template

docker run -v $HOST_WORKING_DIR:$DOCKER_WORKING_DIR --rm -it \
    --platform linux/amd64 \
    $IMAGE_NAME \
    download-serving-cache \
    --server $SERVER \
    --cache_location $DOCKER_WORKING_DIR/$CACHE_LOCATION

On native Linux/amd64 environments, the --platform linux/amd64 flag is optional and can be omitted. It is primarily helpful on macOS (especially Apple Silicon) to ensure the correct architecture is used.

Parameters

Input	Type	Description
`HOST_WORKING_DIR`		Directory on the host machine that will be mounted into the container. This directory will receive the downloaded metadata and may also contain checkpoints you plan to convert.
`DOCKER_WORKING_DIR`		Path inside the container where `HOST_WORKING_DIR` is mounted. This must match the right-hand side of the `-v` flag.
`IMAGE_NAME`	`str`	Full name of the Checkpoint Conversion Tool Docker image. Example: `us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12`
`SERVER`	`str`	Base endpoint URL for your SambaStack platform and product version. Refer to the Manage API Keys and Endpoints section of the documentation to obtain this value.
`CACHE_LOCATION`	`str`	Path (relative to `DOCKER_WORKING_DIR`) where metadata will be stored inside the container. The metadata will be visible on the host under the corresponding path in `HOST_WORKING_DIR`.

Your setup should now be complete. The full setup only needs to be done once. You may need to repeat Step 4 under the following conditions:

SambaNova or your organization make more model architectures available for custom checkpoint inference
Updates to any model metadata are released for your SambaStack instance (scoped by the base URL)
You are switching to another platform (e.g., SambaCloud) or your organization is using multiple SambaStack instances (scoped by base URLs)

Convert and validate checkpoint

The Checkpoint Conversion Tool converts custom checkpoints (from HuggingFace or otherwise) into a format that can run on SambaNova’s SN40L hardware. This step occurs prior to uploading or deploying custom checkpoints on any SambaStack or SambaCloud.

Command template

docker run -v $HOST_WORKING_DIR:$DOCKER_WORKING_DIR --rm -it \
    --platform linux/amd64 \
    $IMAGE_NAME \
    prepare-ckpt \
    --model $MODEL_NAME \
    --original_checkpoint_path "$DOCKER_WORKING_DIR/$CHECKPOINT_DIR" \
    --output_checkpoint_path "$DOCKER_WORKING_DIR/$OUTPUT_DIR" \
    --transformers_version $TRANSFORMERS_VERSION \
    --server $SERVER \
    --cache_location $CACHE_LOCATION \
    --ignore_transformers_version

Parameters

These are the primary input flags to the prepare-ckpt command:

Flag / Variable	Type	Description
`--model` / `MODEL_NAME`	`str`	The base model to convert (e.g., `llama3-70b`). To check whether custom checkpoints are supported for a model family, use the Supported models list and the Import checkpoint field in the Features and optimizations column.
`--original_checkpoint_path` / `CHECKPOINT_DIR`	`str`	Path inside the container to the directory containing the input checkpoint (e.g., config, tokenizer, safetensors). This is typically `$DOCKER_WORKING_DIR/<subdir>`, where `<subdir>` is mounted from `HOST_WORKING_DIR`.
`--output_checkpoint_path` / `OUTPUT_DIR`	`str`	Path inside the container where the converted checkpoint will be written. The converted artifacts will appear on the host under `$HOST_WORKING_DIR/$OUTPUT_DIR`.
`--transformers_version` / `TRANSFORMERS_VERSION`	`str`	(Optional but recommended.) Specifies the Hugging Face `transformers` version required for deployment (e.g., `4.45.1`). The tool checks that the version used to save the model is ≤ the deployment version. If this validation is not needed, omit the flag. If you encounter version errors, re-save the checkpoint with a newer `transformers` version or update this value.
`--ignore_transformers_version`	`bool`	When enabled, skips all `transformers` version checking. Default is `False`. Useful if you encounter a version validation error and explicitly want to bypass it. Ignoring the Transformers version of your checkpoint uses the default backend version for converting your checkpoints (i.e., `4.45.1`).
`--server` / `SERVER`	`str`	Source of serving metadata. Can be `embedded`, a local path, or a remote URL such as `https://api.sambanova.ai/`. Typically this is the base endpoint URL for your SambaStack instance.
`--cache_location` / `CACHE_LOCATION`	`str`	Location (inside the container) where serving metadata/configs are stored. Usually a subdirectory of `$DOCKER_WORKING_DIR` and visible on the host under `$HOST_WORKING_DIR/$CACHE_LOCATION`.

Host-level variables

In addition to the flags above, you will typically set the following environment variables for the Docker command:

Variable	Description
`HOST_WORKING_DIR`	Directory on the host that contains your input checkpoint and where output will be written. Must be writable.
`DOCKER_WORKING_DIR`	Directory inside the container where `HOST_WORKING_DIR` is mounted (matches the right-hand side of `-v`).
`IMAGE_NAME`	Full image name of the Checkpoint Conversion Tool container.
`CHECKPOINT_DIR`	Subdirectory under `HOST_WORKING_DIR` (mirrored under `DOCKER_WORKING_DIR`) containing the original checkpoint.
`OUTPUT_DIR`	Subdirectory under `HOST_WORKING_DIR` (mirrored under `DOCKER_WORKING_DIR`) where converted checkpoints will be written.

Validating the conversion output

After a successful conversion, verify that the output directory is complete and consistent:

Checklist

The output directory contains the expected set of safetensors files, typically named model-00001-of-000NN.safetensors, model-00002-of-000NN.safetensors, …, up to NN
The output directory contains a DONE file indicating successful completion
The tool logs show no errors

Successful conversion log

A successful run ends with a log similar to:

-----------------------------------------------
[1] STEP: parse_args - Succeeded
-----------------------------------------------


> Errors (None)

-----------------------------------------------
[2] STEP: legalizer - Succeeded
-----------------------------------------------


> Errors (None)

***************************************************
YYYY-MM-DDTHH:MM:SSUTC - byoc_lib - INFO - The process is completed without any errors.

Failed conversion log

If conversion fails, you may see error entries like:

-----------------------------------------------
[2] STEP: validate_transformer - Failed
-----------------------------------------------


> Errors
  Id: ByocErrorId.TRANSFORMERS_VERSION_ERROR
  Responsible: ByocErrorOwner.USAGE
  Reason: The transformers version of your checkpoint (4.52.4) is larger than the maximum supported transformers version (4.45.1).
  Suggestion: To fix this, please save your checkpoint with a transformers environment <= 4.45.1. Alternatively, you can include the --ignore_transformers_version flag, but this may lead to errors due to transformers version compatibility issues.

  Stack Trace:
    ...

If you do not see either "One or more failures have occurred. Do not deploy" or "The process is completed without any errors" and the process exits unexpectedly, you may have hit an out-of-memory (OOM) condition. In that case, increase available memory and rerun the conversion.

Once you’ve confirmed that the conversion completed successfully and the output directory contains all expected files, you can proceed to the upload and deployment steps in Deploying custom checkpoints.

Interpreting output logs and troubleshooting

The Checkpoint Conversion Tool prints a structured log for each run to help diagnose issues, identify which part of the process failed, and decide what to do next.

Log structure overview

At a high level, each run includes:

A final report header showing the command that was executed
A sequence of STEP entries indicating which module or test is running
An Errors section (if any failures occur) with details and suggestions
A final status line indicating whether the process completed successfully or failed

Example output log

FINAL REPORT FOR: `prepare-ckpt --model llama3-1b --original_checkpoint_path /data/Llama-3.2-1B --output_checkpoint_path /data/outputs/llama3-1b --transformers_version 4.45.1`
***************************************************

-----------------------------------------------
[1] STEP: parse_args - Succeeded
-----------------------------------------------


> Errors (None)

-----------------------------------------------
[2] STEP: validate_transformer - Failed
-----------------------------------------------


> Errors
  Id: ByocErrorId.TRANSFORMERS_VERSION_ERROR
  Responsible: ByocErrorOwner.USAGE
  Reason: The transformers version of your checkpoint (4.52.4) is larger than the maximum supported transformers version (4.45.1).
  Suggestion: To fix this, please save your checkpoint with a transformers environment <= 4.45.1. Alternatively, you can include the --ignore_transformers_version flag, but this may lead to errors due to transformers version compatibility issues.

  Stack Trace:
    Traceback (most recent call last):
    File "/byoc_core/main.runfiles/_main/byoc_core/byoc.py", line 46, in _byoc
    warning = validate_transformer(convert_checkpoint_path,
    File "/byoc_core/main.runfiles/_main/byoc_core/transformers_validator/check_transformers.py", line 115, in validate_transformer
    raise TransformersVersionError(model_config_transformers_version,
    byoc_core.transformers_validator.check_transformers.TransformersVersionError: (<Version('4.52.4')>, '4.45.1')
    (<Version('4.52.4')>, '4.45.1')


***************************************************
2025-08-25T17:49:10UTC - byoc_lib - ERROR - One or more failures have occurred. Do not deploy

Log components explained

Final report header Shows the exact command and arguments used:

FINAL REPORT FOR: `prepare-ckpt --model llama3-1b --original_checkpoint_path /data/Llama-3.2-1B --output_checkpoint_path /data/outputs/llama3-1b --transformers_version 4.45.1`
***************************************************

STEP blocks Each step represents a modular phase of the workflow (argument parsing, validation, conversion, etc.):

-----------------------------------------------
[1] STEP: parse_args - Succeeded
-----------------------------------------------
...
-----------------------------------------------
[2] STEP: validate_transformer - Failed
-----------------------------------------------

Error blocks For each step, an Errors block follows with structured details:

If there are no errors:
```
> Errors (None)
```

If a step results in error:

> Errors
  Id: ByocErrorId.TRANSFORMERS_VERSION_ERROR
  Responsible: ByocErrorOwner.USAGE
  Reason: The transformers version of your checkpoint (4.52.4) is larger than the maximum supported transformers version (4.45.1).
  Suggestion: To fix this, please save your checkpoint with a transformers environment <= 4.45.1. Alternatively, you can include the --ignore_transformers_version flag, but this may lead to errors due to transformers version compatibility issues.

  Stack Trace:
    ...

Final status line At the end of the log, you’ll see either a success or failure message:

2025-08-25T17:49:10UTC - byoc_lib - ERROR - One or more failures have occurred. Do not deploy

Error block components

If a step failed, each error block will contain the following fields:

Field	Description
`Id`	A marker that can be used to trace the specific check or process that failed when inspecting the container.
`Responsible`	Indicates whether the error can be remedied by you (`ByocErrorOwner.USAGE`) or whether there is an issue with the conversion tool itself (`ByocErrorOwner.LIBRARY`).
`Reason`	A human-readable explanation of what may have gone wrong.
`Suggestion`	A recommendation that can be used to fix or triage the error.
`Stack Trace`	Helpful to identify where the error took place.

Common errors and solutions

Error ID	Cause	Solution
`TRANSFORMERS_VERSION_ERROR`	Checkpoint was saved with a Transformers version newer than supported	Re-save the checkpoint with Transformers ≤ 4.45.1, or use `--ignore_transformers_version` flag
Out-of-memory (OOM)	Insufficient system memory for checkpoint size	Increase available memory to at least 2.5x the checkpoint size
Missing files	Incomplete checkpoint directory	Ensure all required files (config.json, safetensors, tokenizer files) are present

Next steps

After successfully converting your checkpoint:

Upload the converted checkpoint to your GCS bucket
Register the checkpoint with a Model Manifest
Deploy the checkpoint using a Bundle configuration

See Deploying custom checkpoints for the complete workflow. For speculative decoding deployments, see Deploying with speculative decoding for draft-target validation and deployment instructions.

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

Checkpoint conversion tool

Overview

Prerequisites

System requirements

Estimated conversion times

Required software

Required access

Supported models and checkpoint formats

Supported model architectures

Checkpoint format requirements

Checkpoint compatibility

Practical compatibility examples

Download and set up

Install Docker Desktop or Docker Engine

Install Google Cloud CLI

Authenticate and pull the container image

Sync model metadata

Command template

Parameters

Convert and validate checkpoint

Command template

Parameters

Host-level variables

Validating the conversion output

Checklist

Successful conversion log

Failed conversion log

Interpreting output logs and troubleshooting

Log structure overview

Example output log

Log components explained

Error block components

Common errors and solutions

Next steps

Overview

Installation

Service Administration

Hardware Administration

Reference Architectures

Resources

​Overview

​Prerequisites

​System requirements

​Estimated conversion times

​Required software

​Required access

​Supported models and checkpoint formats

​Supported model architectures

​Checkpoint format requirements

​Checkpoint compatibility

​Practical compatibility examples

​Download and set up

Install Docker Desktop or Docker Engine

Install Google Cloud CLI

Authenticate and pull the container image

Sync model metadata

​Command template

​Parameters

​Convert and validate checkpoint

​Command template

​Parameters

​Host-level variables

​Validating the conversion output

​Checklist

​Successful conversion log

​Failed conversion log

​Interpreting output logs and troubleshooting

​Log structure overview

​Example output log

​Log components explained

​Error block components

​Common errors and solutions

​Next steps

Overview

Prerequisites

System requirements

Estimated conversion times

Required software

Required access

Supported models and checkpoint formats

Supported model architectures

Checkpoint format requirements

Checkpoint compatibility

Practical compatibility examples

Download and set up

Command template

Parameters

Convert and validate checkpoint

Command template

Parameters

Host-level variables

Validating the conversion output

Checklist

Successful conversion log

Failed conversion log

Interpreting output logs and troubleshooting

Log structure overview

Example output log

Log components explained

Error block components

Common errors and solutions

Next steps