When you call the SambaNova Chat Completions API, the platform applies the model’s default Jinja-based chat template server-side, formatting your messages into the raw prompt the model receives. For most use cases this is the right behavior. However, some scenarios require you to take control of prompt formatting and output parsing on the client side. This page explains when and why to use the Completions API with custom templates, and how to implement custom output parsers. For a complete interactive walkthrough, see the Custom Chat Templates AI Starter Kit.Documentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
When to use custom chat templates
Use the Completions API with a custom chat template instead of the Chat Completions API when:- You need full control over prompt structure. Some workflows require injecting custom variables, special tokens, or instructions that are not exposed through the Chat Completions API parameters. For standard models available on SambaCloud with no customization, the Chat Completions API with built-in function calling is the recommended approach. See Function calling and JSON mode.
- You are using a BYOC (Bring Your Own Checkpoint) model. Fine-tuned checkpoints deployed on SambaStack may use a different chat template than the base model. Letting the server apply the base model’s default template produces incorrect prompts for these checkpoints.
- Your model uses a non-standard tool-call output format. Fine-tuned models may emit tool calls in a format the default parsers do not handle, for example XML markers instead of JSON.
How it works
Instead of calling/v1/chat/completions, you render the prompt string yourself and send it directly to /v1/completions. The server receives a raw string and continues generation from it, applying no template of its own.
The workflow has four steps:
- Load a chat template. Either pull the Jinja template from a Hugging Face tokenizer or write a custom one.
- Render the prompt. Apply the template to your messages and tool definitions to produce a raw prompt string.
- Call the Completions API. Send the rendered string to
/v1/completions. - Parse the output. Convert the raw text response into a structured assistant message with tool calls.
Load a chat template
From a Hugging Face model
Use thetransformers library to load the tokenizer for your base model and extract its built-in chat template.
Define a custom Jinja template
If your checkpoint uses a different template than the base model, write a Jinja2 template directly. Your template must handle themessages, tools, and add_generation_prompt variables at minimum.
Render the prompt
Apply the template to your messages and tool definitions using Jinja2. Pass tokenizer attributes such asbos_token and eos_token as context variables when using a template loaded from a tokenizer. For custom templates, supply these values explicitly.
Call the Completions API
Send the rendered prompt string to the/v1/completions endpoint. This endpoint accepts a raw string and returns a raw string — no template is applied server-side.
Parse model output
The raw text response must be parsed into a structured assistant message. The correct parser depends on the tool-call format your model emits.JSON format (Llama-style)
Llama instruction-tuned models emit tool calls as JSON objects in the response text.XML format (DeepSeek-style)
DeepSeek models use XML markers to delimit tool calls.Build the assistant message
Once tool calls are extracted, assemble the final assistant message in OpenAI-compatible format.Custom parsers
If your model emits tool calls in a format other than the default one, implement a custom parser. Your parser must accept the raw response string and return a list of tool-call dicts in OpenAI-compatible format.Note: Custom parsers execute user-supplied code. Only run code you trust. This is not a sandboxed execution environment.
Next steps
- Explore the full end-to-end workflow, interactive Streamlit app, and Jupyter notebook in the Custom Chat Templates AI Starter Kit.
- For standard function calling without custom templates, see Function calling and JSON mode.

