Checkpoint conversion is a substep of deploying custom checkpoints on SambaStack or SambaCloud. See the Deploying custom checkpoints page for the high-level workflow.
Overview
The Checkpoint Conversion Tool provides two primary capabilities:- Checkpoint conversion: Transforms HuggingFace-format checkpoints into SambaNova-compatible format for deployment on SN40L hardware
- Speculative decoding validation: Verifies whether a converted draft checkpoint is compatible with a target checkpoint for speculative decoding deployments
Prerequisites
System requirements
Before using the Checkpoint Conversion Tool, ensure your system meets the following requirements:| Requirement | Specification |
|---|---|
| Memory | At least 2.5x the size of your checkpoint’s SafeTensor or .bin files. For example, a 10GB checkpoint requires a minimum of 25GB available memory. |
| Storage | Equal to the input checkpoint size for the output. For example, converting a 14GB checkpoint requires 14GB of free storage for the converted output (plus the original). |
| Operating System | macOS or Linux-based operating systems. Windows OS is not supported. |
Estimated conversion times
| Checkpoint Size | Example Model | Estimated Time |
|---|---|---|
| ~8GB | Meta-Llama-3.3-8B-Instruct | ~5 minutes |
| ~140GB | Meta-Llama-3.3-70B-Instruct | ~1 hour |
Required software
- Docker Desktop or Docker Engine - Installation guide
- Google Cloud CLI - Installation guide
Required access
- Network access to your SambaStack instance endpoint
- Authentication credentials for Google Cloud
Supported models and checkpoint formats
Supported model architectures
Custom checkpoint deployment is supported for a growing set of base models. To check whether custom checkpoints are supported for a model family, use the Supported models list and check the Import checkpoint field in the Features and optimizations column.Checkpoint format requirements
Checkpoints are accepted in the HuggingFace format. The tensors should be in the safetensors format and the checkpoint directory should contain the same relevant config files as the base model for the custom checkpoint. For example, if the custom checkpoint is a finetuned variant of meta-llama/Llama-3.3-70B-Instruct, then the custom checkpoint should contain files such as:config.jsongeneration_config.jsonmodel-00001-of-00030.safetensors, …,model-00030-of-00030.safetensorsmodel.safetensors.index.jsonspecial_tokens_map.jsontokenizer.jsontokenizer_config.json
Checkpoint compatibility
Given that a checkpoint is fine-tuned or derived from one of the supported models for your platform, checkpoints are compatible when their computational graph has not been modified from the original checkpoint (i.e., tensor weights and shapes). Aspects that must remain unchanged:- Number of attention heads
- Rope type (rope theta)
- Model vocabulary size
- Optimizer type
- Static architectural attributes in
config.jsonsuch as:head_dim,hidden_act,intermediate_size,attention_bias,attention_dropout,vocab_size
- Model weights or model weight tensor values
- Tokenizer and vocabulary (as long as it retains the exact vocabulary size of the original model checkpoint—useful for multilingual use-cases)
Practical compatibility examples
Take the base model meta-llama/Llama-3.3-70B-Instruct (a base model supported by SambaNova). The following checkpoints use the same computational graph as the original 70B model and can be converted and deployed on SambaNova platforms: These checkpoints have undergone updates to their model weights, which have been adjusted and refined to improve performance or adapt to specific tasks or datasets.Download and set up
The Checkpoint Conversion Tool is distributed as a Docker container that encapsulates all conversion and validation utilities. Container Path:1
Install Docker Desktop or Docker Engine
Install the Docker engine in your conversion environment. You can follow the official Docker Engine Installation guide.After installation, start Docker (this can be done via the desktop application on macOS).
2
Install Google Cloud CLI
Install Google Cloud CLI in your conversion environment. You can follow the official Google Cloud CLI Installation guide.
3
Authenticate and pull the container image
First, configure the Docker client to authenticate with Authenticate with Google Cloud:Pull the Docker image:
us-docker.pkg.dev (one-time setup):4
Sync model metadata
This step downloads platform-specific model metadata from your SambaStack instance and stores it locally. The tool uses this metadata to perform checkpoint conversions. You can additionally inspect the model metadata to understand how artifacts are converted to run on SambaNova’s SN40L hardware.The metadata is fetched and cached by running the Checkpoint Conversion Tool container with the
download-serving-cache command.Command template
On native Linux/amd64 environments, the
--platform linux/amd64 flag is optional and can be omitted. It is primarily helpful on macOS (especially Apple Silicon) to ensure the correct architecture is used.Parameters
| Input | Type | Description |
|---|---|---|
HOST_WORKING_DIR | Directory on the host machine that will be mounted into the container. This directory will receive the downloaded metadata and may also contain checkpoints you plan to convert. | |
DOCKER_WORKING_DIR | Path inside the container where HOST_WORKING_DIR is mounted. This must match the right-hand side of the -v flag. | |
IMAGE_NAME | str | Full name of the Checkpoint Conversion Tool Docker image. Example: us-docker.pkg.dev/acp-artifacts-development-38/oci-us-public-development/sn-byoc:0.0.12 |
SERVER | str | Base endpoint URL for your SambaStack platform and product version. Refer to the Manage API Keys and Endpoints section of the documentation to obtain this value. |
CACHE_LOCATION | str | Path (relative to DOCKER_WORKING_DIR) where metadata will be stored inside the container. The metadata will be visible on the host under the corresponding path in HOST_WORKING_DIR. |
Your setup should now be complete. The full setup only needs to be done once. You may need to repeat Step 4 under the following conditions:
- SambaNova or your organization make more model architectures available for custom checkpoint inference
- Updates to any model metadata are released for your SambaStack instance (scoped by the base URL)
- You are switching to another platform (e.g., SambaCloud) or your organization is using multiple SambaStack instances (scoped by base URLs)
Convert and validate checkpoint
The Checkpoint Conversion Tool converts custom checkpoints (from HuggingFace or otherwise) into a format that can run on SambaNova’s SN40L hardware. This step occurs prior to uploading or deploying custom checkpoints on any SambaStack or SambaCloud.Command template
Parameters
These are the primary input flags to theprepare-ckpt command:
| Flag / Variable | Type | Description |
|---|---|---|
--model / MODEL_NAME | str | The base model to convert (e.g., llama3-70b). To check whether custom checkpoints are supported for a model family, use the Supported models list and the Import checkpoint field in the Features and optimizations column. |
--original_checkpoint_path / CHECKPOINT_DIR | str | Path inside the container to the directory containing the input checkpoint (e.g., config, tokenizer, safetensors). This is typically $DOCKER_WORKING_DIR/<subdir>, where <subdir> is mounted from HOST_WORKING_DIR. |
--output_checkpoint_path / OUTPUT_DIR | str | Path inside the container where the converted checkpoint will be written. The converted artifacts will appear on the host under $HOST_WORKING_DIR/$OUTPUT_DIR. |
--transformers_version / TRANSFORMERS_VERSION | str | (Optional but recommended.) Specifies the Hugging Face transformers version required for deployment (e.g., 4.45.1). The tool checks that the version used to save the model is ≤ the deployment version. If this validation is not needed, omit the flag. If you encounter version errors, re-save the checkpoint with a newer transformers version or update this value. |
--ignore_transformers_version | bool | When enabled, skips all transformers version checking. Default is False. Useful if you encounter a version validation error and explicitly want to bypass it. Ignoring the Transformers version of your checkpoint uses the default backend version for converting your checkpoints (i.e., 4.45.1). |
--server / SERVER | str | Source of serving metadata. Can be embedded, a local path, or a remote URL such as https://api.sambanova.ai/. Typically this is the base endpoint URL for your SambaStack instance. |
--cache_location / CACHE_LOCATION | str | Location (inside the container) where serving metadata/configs are stored. Usually a subdirectory of $DOCKER_WORKING_DIR and visible on the host under $HOST_WORKING_DIR/$CACHE_LOCATION. |
Host-level variables
In addition to the flags above, you will typically set the following environment variables for the Docker command:| Variable | Description |
|---|---|
HOST_WORKING_DIR | Directory on the host that contains your input checkpoint and where output will be written. Must be writable. |
DOCKER_WORKING_DIR | Directory inside the container where HOST_WORKING_DIR is mounted (matches the right-hand side of -v). |
IMAGE_NAME | Full image name of the Checkpoint Conversion Tool container. |
CHECKPOINT_DIR | Subdirectory under HOST_WORKING_DIR (mirrored under DOCKER_WORKING_DIR) containing the original checkpoint. |
OUTPUT_DIR | Subdirectory under HOST_WORKING_DIR (mirrored under DOCKER_WORKING_DIR) where converted checkpoints will be written. |
Validating the conversion output
After a successful conversion, verify that the output directory is complete and consistent:Checklist
- The output directory contains the expected set of
safetensorsfiles, typically namedmodel-00001-of-000NN.safetensors,model-00002-of-000NN.safetensors, …, up toNN - The output directory contains a
DONEfile indicating successful completion - The tool logs show no errors
Successful conversion log
A successful run ends with a log similar to:Failed conversion log
If conversion fails, you may see error entries like:If you do not see either
"One or more failures have occurred. Do not deploy" or "The process is completed without any errors" and the process exits unexpectedly, you may have hit an out-of-memory (OOM) condition. In that case, increase available memory and rerun the conversion.Once you’ve confirmed that the conversion completed successfully and the output directory contains all expected files, you can proceed to the upload and deployment steps in Deploying custom checkpoints.
Interpreting output logs and troubleshooting
The Checkpoint Conversion Tool prints a structured log for each run to help diagnose issues, identify which part of the process failed, and decide what to do next.Log structure overview
At a high level, each run includes:- A final report header showing the command that was executed
- A sequence of STEP entries indicating which module or test is running
- An Errors section (if any failures occur) with details and suggestions
- A final status line indicating whether the process completed successfully or failed
Example output log
Log components explained
Final report header Shows the exact command and arguments used:-
If there are no errors:
-
If a step results in error:
Error block components
If a step failed, each error block will contain the following fields:| Field | Description |
|---|---|
Id | A marker that can be used to trace the specific check or process that failed when inspecting the container. |
Responsible | Indicates whether the error can be remedied by you (ByocErrorOwner.USAGE) or whether there is an issue with the conversion tool itself (ByocErrorOwner.LIBRARY). |
Reason | A human-readable explanation of what may have gone wrong. |
Suggestion | A recommendation that can be used to fix or triage the error. |
Stack Trace | Helpful to identify where the error took place. |
Common errors and solutions
| Error ID | Cause | Solution |
|---|---|---|
TRANSFORMERS_VERSION_ERROR | Checkpoint was saved with a Transformers version newer than supported | Re-save the checkpoint with Transformers ≤ 4.45.1, or use --ignore_transformers_version flag |
| Out-of-memory (OOM) | Insufficient system memory for checkpoint size | Increase available memory to at least 2.5x the checkpoint size |
| Missing files | Incomplete checkpoint directory | Ensure all required files (config.json, safetensors, tokenizer files) are present |
Next steps
After successfully converting your checkpoint:- Upload the converted checkpoint to your GCS bucket
- Register the checkpoint with a Model Manifest
- Deploy the checkpoint using a Bundle configuration
