Skip to main content
The SambaNova Messages API (POST /v1/messages) is compatible with the Anthropic Messages API standard. Existing Anthropic SDK clients can target SambaNova by changing only the base URL, API key, and model identifier. This endpoint is designed for conversational, tool-capable, and reasoning-oriented integrations. The Messages API complements the existing Chat Completions and Responses API endpoints; it does not replace them.

Supported models

In the initial release, the Messages API is available for gpt-oss-120b. Additional models may be added in later releases — see the supported models page for the current list.

How it works

The Messages API structures model output as typed content blocks — text, tool_use, and thinking — rather than a single assistant text field. Each request returns a message object containing one or more of these blocks, depending on the model’s behavior. Key characteristics:
  • Client-executed tools only. When a tool is needed, the model returns a tool_use content block. Your application executes the function and returns the result in a follow-up request via a tool_result content block. Server-side tools are not supported.
  • Thinking passthrough. Reasoning-capable models expose thinking content via a thinking content block alongside the text block, with no extra request parameters.
  • System prompt as a top-level field. Unlike Chat Completions, Anthropic-style requests pass the system prompt via the top-level system field rather than as a message with role: "system".
  • Structured streaming. Streaming responses use typed Server-Sent Events (SSE) following the event sequence: message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop.

Limitations

Read these before migrating an existing Anthropic-based application:
  • Server-side tools (web_search, code_execution, bash, text_editor) are not supported and return a 400 error. Only client-executed function tools are available.
  • document content blocks (PDF input) are not supported and return a 400 error.
  • URL image sources are not supported. Use base64-encoded images instead.

Usage

All examples below use the Anthropic Python SDK pointed at SambaNova. Install with pip install anthropic and configure as shown in the Anthropic compatibility page.

Simple generation

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.sambanova.ai/v1",
    api_key="your-sambanova-api-key"
)

message = client.messages.create(
    model="gpt-oss-120b",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain the difference between supervised and unsupervised learning in two sentences."}
    ]
)

print(message.content[0].text)

Streaming response

Use client.messages.stream(...) to receive typed SSE events as the response is generated.
with client.messages.stream(
    model="gpt-oss-120b",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short poem about speed."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Thinking

Reasoning-capable models expose thinking content via a thinking content block alongside the text block. No additional parameters are required; thinking content is surfaced automatically when the model produces it.
response = client.messages.create(
    model="gpt-oss-120b",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is 27 * 43?"}
    ]
)

for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

Tool calling

When tools are provided, the model may return a tool_use content block. Your application is responsible for executing the function and returning the result. Step 1: Send a request with tools defined.
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a given location.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and country, e.g. Paris, FR"
                }
            },
            "required": ["location"]
        }
    }
]

response = client.messages.create(
    model="gpt-oss-120b",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What is the weather in Paris?"}
    ]
)

for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")
Step 2: Execute the tool and return the result. This example continues from Step 1 — client, response, and tools are reused.
import json

def get_weather(location: str) -> dict:
    # Replace with a real weather API call
    return {"location": location, "temperature_celsius": 22, "condition": "Sunny"}

tool_use_block = next(block for block in response.content if block.type == "tool_use")
result = get_weather(tool_use_block.input["location"])

follow_up = client.messages.create(
    model="gpt-oss-120b",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What is the weather in Paris?"},
        {"role": "assistant", "content": response.content},
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": json.dumps(result)
                }
            ]
        }
    ]
)

print(follow_up.content[0].text)
For the full parameter list, see the API reference.