Skip to main content
The Checkpoint Conversion Tool is a utility that converts external model artifacts—such as checkpoints—into the SambaNova-compatible format required for deployment on SambaRack and SambaCloud systems. The tool retrieves platform-specific metadata so it can stay aligned with the model architectures and releases supported by your SambaStack instance.
Checkpoint conversion is a substep of deploying custom checkpoints on SambaStack or SambaCloud. See the Deploying custom checkpoints page for the high-level workflow.

Overview

The Checkpoint Conversion Tool provides two primary capabilities:
  1. Checkpoint conversion: Transforms HuggingFace-format checkpoints into SambaNova-compatible format for deployment on SN40L hardware
  2. Speculative decoding validation: Verifies whether a converted draft checkpoint is compatible with a target checkpoint for speculative decoding deployments

Prerequisites

System requirements

Before using the Checkpoint Conversion Tool, ensure your system meets the following requirements:
RequirementSpecification
MemoryAt least 2.5x the size of your checkpoint’s SafeTensor or .bin files. For example, a 10GB checkpoint requires a minimum of 25GB available memory.
StorageEqual to the input checkpoint size for the output. For example, converting a 14GB checkpoint requires 14GB of free storage for the converted output (plus the original).
Operating SystemmacOS or Linux-based operating systems. Windows OS is not supported.

Estimated conversion times

Checkpoint SizeExample ModelEstimated Time
~8GBMeta-Llama-3.3-8B-Instruct~5 minutes
~140GBMeta-Llama-3.3-70B-Instruct~1 hour
We recommend running the conversion locally on your machine or workspace that has access to your checkpoint’s storage mount. A cloud compute instance can also be used, but note that data transfer of checkpoints can take a long time.

Required software

Required access

  • Network access to your SambaStack instance endpoint
  • Authentication credentials for Google Cloud

Supported models and checkpoint formats

Supported model architectures

Custom checkpoint deployment is supported for a growing set of base models. To check whether custom checkpoints are supported for a model family, use the Supported models list and check the Import checkpoint field in the Features and optimizations column.

Checkpoint format requirements

Checkpoints are accepted in the HuggingFace format. The tensors should be in the safetensors format and the checkpoint directory should contain the same relevant config files as the base model for the custom checkpoint. For example, if the custom checkpoint is a finetuned variant of meta-llama/Llama-3.3-70B-Instruct, then the custom checkpoint should contain files such as:
  • config.json
  • generation_config.json
  • model-00001-of-00030.safetensors, …, model-00030-of-00030.safetensors
  • model.safetensors.index.json
  • special_tokens_map.json
  • tokenizer.json
  • tokenizer_config.json

Checkpoint compatibility

Given that a checkpoint is fine-tuned or derived from one of the supported models for your platform, checkpoints are compatible when their computational graph has not been modified from the original checkpoint (i.e., tensor weights and shapes). Aspects that must remain unchanged:
  • Number of attention heads
  • Rope type (rope theta)
  • Model vocabulary size
  • Optimizer type
  • Static architectural attributes in config.json such as: head_dim, hidden_act, intermediate_size, attention_bias, attention_dropout, vocab_size
Aspects that can be modified:
  • Model weights or model weight tensor values
  • Tokenizer and vocabulary (as long as it retains the exact vocabulary size of the original model checkpoint—useful for multilingual use-cases)
It can be helpful to think about this in terms of a static graph. Aspects of a model that are typically static in engines such as TensorRT-LLM are also static for custom checkpoints.

Practical compatibility examples

Take the base model meta-llama/Llama-3.3-70B-Instruct (a base model supported by SambaNova). The following checkpoints use the same computational graph as the original 70B model and can be converted and deployed on SambaNova platforms: These checkpoints have undergone updates to their model weights, which have been adjusted and refined to improve performance or adapt to specific tasks or datasets.

Download and set up

The Checkpoint Conversion Tool is distributed as a Docker container that encapsulates all conversion and validation utilities. Container Path:
us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12
1

Install Docker Desktop or Docker Engine

Install the Docker engine in your conversion environment. You can follow the official Docker Engine Installation guide.After installation, start Docker (this can be done via the desktop application on macOS).
2

Install Google Cloud CLI

Install Google Cloud CLI in your conversion environment. You can follow the official Google Cloud CLI Installation guide.
3

Authenticate and pull the container image

First, configure the Docker client to authenticate with us-docker.pkg.dev (one-time setup):
gcloud auth configure-docker us-docker.pkg.dev
Authenticate with Google Cloud:
gcloud auth login
Pull the Docker image:
docker pull us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12
4

Sync model metadata

This step downloads platform-specific model metadata from your SambaStack instance and stores it locally. The tool uses this metadata to perform checkpoint conversions. You can additionally inspect the model metadata to understand how artifacts are converted to run on SambaNova’s SN40L hardware.The metadata is fetched and cached by running the Checkpoint Conversion Tool container with the download-serving-cache command.

Command template

docker run -v $HOST_WORKING_DIR:$DOCKER_WORKING_DIR --rm -it \
    --platform linux/amd64 \
    $IMAGE_NAME \
    download-serving-cache \
    --server $SERVER \
    --cache_location $DOCKER_WORKING_DIR/$CACHE_LOCATION
On native Linux/amd64 environments, the --platform linux/amd64 flag is optional and can be omitted. It is primarily helpful on macOS (especially Apple Silicon) to ensure the correct architecture is used.

Parameters

InputTypeDescription
HOST_WORKING_DIRDirectory on the host machine that will be mounted into the container. This directory will receive the downloaded metadata and may also contain checkpoints you plan to convert.
DOCKER_WORKING_DIRPath inside the container where HOST_WORKING_DIR is mounted. This must match the right-hand side of the -v flag.
IMAGE_NAMEstrFull name of the Checkpoint Conversion Tool Docker image. Example: us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12
SERVERstrBase endpoint URL for your SambaStack platform and product version. Refer to the Manage API Keys and Endpoints section of the documentation to obtain this value.
CACHE_LOCATIONstrPath (relative to DOCKER_WORKING_DIR) where metadata will be stored inside the container. The metadata will be visible on the host under the corresponding path in HOST_WORKING_DIR.
Your setup should now be complete. The full setup only needs to be done once. You may need to repeat Step 4 under the following conditions:
  • SambaNova or your organization make more model architectures available for custom checkpoint inference
  • Updates to any model metadata are released for your SambaStack instance (scoped by the base URL)
  • You are switching to another platform (e.g., SambaCloud) or your organization is using multiple SambaStack instances (scoped by base URLs)

Convert and validate checkpoint

The Checkpoint Conversion Tool converts custom checkpoints (from HuggingFace or otherwise) into a format that can run on SambaNova’s SN40L hardware. This step occurs prior to uploading or deploying custom checkpoints on any SambaStack or SambaCloud.

Command template

docker run -v $HOST_WORKING_DIR:$DOCKER_WORKING_DIR --rm -it \
    --platform linux/amd64 \
    $IMAGE_NAME \
    prepare-ckpt \
    --model $MODEL_NAME \
    --original_checkpoint_path "$DOCKER_WORKING_DIR/$CHECKPOINT_DIR" \
    --output_checkpoint_path "$DOCKER_WORKING_DIR/$OUTPUT_DIR" \
    --transformers_version $TRANSFORMERS_VERSION \
    --server $SERVER \
    --cache_location $CACHE_LOCATION \
    --ignore_transformers_version

Parameters

These are the primary input flags to the prepare-ckpt command:
Flag / VariableTypeDescription
--model / MODEL_NAMEstrThe base model to convert (e.g., llama3-70b). To check whether custom checkpoints are supported for a model family, use the Supported models list and the Import checkpoint field in the Features and optimizations column.
--original_checkpoint_path / CHECKPOINT_DIRstrPath inside the container to the directory containing the input checkpoint (e.g., config, tokenizer, safetensors). This is typically $DOCKER_WORKING_DIR/<subdir>, where <subdir> is mounted from HOST_WORKING_DIR.
--output_checkpoint_path / OUTPUT_DIRstrPath inside the container where the converted checkpoint will be written. The converted artifacts will appear on the host under $HOST_WORKING_DIR/$OUTPUT_DIR.
--transformers_version / TRANSFORMERS_VERSIONstr(Optional but recommended.) Specifies the Hugging Face transformers version required for deployment (e.g., 4.45.1). The tool checks that the version used to save the model is ≤ the deployment version. If this validation is not needed, omit the flag. If you encounter version errors, re-save the checkpoint with a newer transformers version or update this value.
--ignore_transformers_versionboolWhen enabled, skips all transformers version checking. Default is False. Useful if you encounter a version validation error and explicitly want to bypass it. Ignoring the Transformers version of your checkpoint uses the default backend version for converting your checkpoints (i.e., 4.45.1).
--server / SERVERstrSource of serving metadata. Can be embedded, a local path, or a remote URL such as https://api.sambanova.ai/. Typically this is the base endpoint URL for your SambaStack instance.
--cache_location / CACHE_LOCATIONstrLocation (inside the container) where serving metadata/configs are stored. Usually a subdirectory of $DOCKER_WORKING_DIR and visible on the host under $HOST_WORKING_DIR/$CACHE_LOCATION.

Host-level variables

In addition to the flags above, you will typically set the following environment variables for the Docker command:
VariableDescription
HOST_WORKING_DIRDirectory on the host that contains your input checkpoint and where output will be written. Must be writable.
DOCKER_WORKING_DIRDirectory inside the container where HOST_WORKING_DIR is mounted (matches the right-hand side of -v).
IMAGE_NAMEFull image name of the Checkpoint Conversion Tool container.
CHECKPOINT_DIRSubdirectory under HOST_WORKING_DIR (mirrored under DOCKER_WORKING_DIR) containing the original checkpoint.
OUTPUT_DIRSubdirectory under HOST_WORKING_DIR (mirrored under DOCKER_WORKING_DIR) where converted checkpoints will be written.

Validating the conversion output

After a successful conversion, verify that the output directory is complete and consistent:

Checklist

  • The output directory contains the expected set of safetensors files, typically named model-00001-of-000NN.safetensors, model-00002-of-000NN.safetensors, …, up to NN
  • The output directory contains a DONE file indicating successful completion
  • The tool logs show no errors

Successful conversion log

A successful run ends with a log similar to:
-----------------------------------------------
[1] STEP: parse_args - Succeeded
-----------------------------------------------


> Errors (None)

-----------------------------------------------
[2] STEP: legalizer - Succeeded
-----------------------------------------------


> Errors (None)

***************************************************
YYYY-MM-DDTHH:MM:SSUTC - byoc_lib - INFO - The process is completed without any errors.

Failed conversion log

If conversion fails, you may see error entries like:
-----------------------------------------------
[2] STEP: validate_transformer - Failed
-----------------------------------------------


> Errors
  Id: ByocErrorId.TRANSFORMERS_VERSION_ERROR
  Responsible: ByocErrorOwner.USAGE
  Reason: The transformers version of your checkpoint (4.52.4) is larger than the maximum supported transformers version (4.45.1).
  Suggestion: To fix this, please save your checkpoint with a transformers environment <= 4.45.1. Alternatively, you can include the --ignore_transformers_version flag, but this may lead to errors due to transformers version compatibility issues.

  Stack Trace:
    ...
If you do not see either "One or more failures have occurred. Do not deploy" or "The process is completed without any errors" and the process exits unexpectedly, you may have hit an out-of-memory (OOM) condition. In that case, increase available memory and rerun the conversion.
Once you’ve confirmed that the conversion completed successfully and the output directory contains all expected files, you can proceed to the upload and deployment steps in Deploying custom checkpoints.

Interpreting output logs and troubleshooting

The Checkpoint Conversion Tool prints a structured log for each run to help diagnose issues, identify which part of the process failed, and decide what to do next.

Log structure overview

At a high level, each run includes:
  • A final report header showing the command that was executed
  • A sequence of STEP entries indicating which module or test is running
  • An Errors section (if any failures occur) with details and suggestions
  • A final status line indicating whether the process completed successfully or failed

Example output log

FINAL REPORT FOR: `prepare-ckpt --model llama3-1b --original_checkpoint_path /data/Llama-3.2-1B --output_checkpoint_path /data/outputs/llama3-1b --transformers_version 4.45.1`
***************************************************

-----------------------------------------------
[1] STEP: parse_args - Succeeded
-----------------------------------------------


> Errors (None)

-----------------------------------------------
[2] STEP: validate_transformer - Failed
-----------------------------------------------


> Errors
  Id: ByocErrorId.TRANSFORMERS_VERSION_ERROR
  Responsible: ByocErrorOwner.USAGE
  Reason: The transformers version of your checkpoint (4.52.4) is larger than the maximum supported transformers version (4.45.1).
  Suggestion: To fix this, please save your checkpoint with a transformers environment <= 4.45.1. Alternatively, you can include the --ignore_transformers_version flag, but this may lead to errors due to transformers version compatibility issues.

  Stack Trace:
    Traceback (most recent call last):
    File "/byoc_core/main.runfiles/_main/byoc_core/byoc.py", line 46, in _byoc
    warning = validate_transformer(convert_checkpoint_path,
    File "/byoc_core/main.runfiles/_main/byoc_core/transformers_validator/check_transformers.py", line 115, in validate_transformer
    raise TransformersVersionError(model_config_transformers_version,
    byoc_core.transformers_validator.check_transformers.TransformersVersionError: (<Version('4.52.4')>, '4.45.1')
    (<Version('4.52.4')>, '4.45.1')


***************************************************
2025-08-25T17:49:10UTC - byoc_lib - ERROR - One or more failures have occurred. Do not deploy

Log components explained

Final report header Shows the exact command and arguments used:
FINAL REPORT FOR: `prepare-ckpt --model llama3-1b --original_checkpoint_path /data/Llama-3.2-1B --output_checkpoint_path /data/outputs/llama3-1b --transformers_version 4.45.1`
***************************************************
STEP blocks Each step represents a modular phase of the workflow (argument parsing, validation, conversion, etc.):
-----------------------------------------------
[1] STEP: parse_args - Succeeded
-----------------------------------------------
...
-----------------------------------------------
[2] STEP: validate_transformer - Failed
-----------------------------------------------
Error blocks For each step, an Errors block follows with structured details:
  • If there are no errors:
    > Errors (None)
    
  • If a step results in error:
    > Errors
      Id: ByocErrorId.TRANSFORMERS_VERSION_ERROR
      Responsible: ByocErrorOwner.USAGE
      Reason: The transformers version of your checkpoint (4.52.4) is larger than the maximum supported transformers version (4.45.1).
      Suggestion: To fix this, please save your checkpoint with a transformers environment <= 4.45.1. Alternatively, you can include the --ignore_transformers_version flag, but this may lead to errors due to transformers version compatibility issues.
    
      Stack Trace:
        ...
    
Final status line At the end of the log, you’ll see either a success or failure message:
2025-08-25T17:49:10UTC - byoc_lib - ERROR - One or more failures have occurred. Do not deploy

Error block components

If a step failed, each error block will contain the following fields:
FieldDescription
IdA marker that can be used to trace the specific check or process that failed when inspecting the container.
ResponsibleIndicates whether the error can be remedied by you (ByocErrorOwner.USAGE) or whether there is an issue with the conversion tool itself (ByocErrorOwner.LIBRARY).
ReasonA human-readable explanation of what may have gone wrong.
SuggestionA recommendation that can be used to fix or triage the error.
Stack TraceHelpful to identify where the error took place.

Common errors and solutions

Error IDCauseSolution
TRANSFORMERS_VERSION_ERRORCheckpoint was saved with a Transformers version newer than supportedRe-save the checkpoint with Transformers ≤ 4.45.1, or use --ignore_transformers_version flag
Out-of-memory (OOM)Insufficient system memory for checkpoint sizeIncrease available memory to at least 2.5x the checkpoint size
Missing filesIncomplete checkpoint directoryEnsure all required files (config.json, safetensors, tokenizer files) are present

Next steps

After successfully converting your checkpoint:
  1. Upload the converted checkpoint to your GCS bucket
  2. Register the checkpoint with a Model Manifest
  3. Deploy the checkpoint using a Bundle configuration
See Deploying custom checkpoints for the complete workflow. For speculative decoding deployments, see Deploying with speculative decoding for draft-target validation and deployment instructions.