Vision and Multimodal Guide - SambaNova Documentation

Make a query with an image

On SambaNova, the vision model request follows OpenAI’s multimodal input format which accepts both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via the image_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.

Step 1

Make a new Python file and copy the code below.;

This example uses the Llama-4-Maverick-17B-128E-Instruct model.

from sambanova import SambaNova
import base64

client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key",
)

# Helper function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# The path to your image
image_path = "sample.JPEG"

# The base64 string of the image
image_base64 = encode_image(image_path)

print(image_base64)

response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is happening in this image?"},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}}
            ]
        }
    ]
)

print(response.choices[0].message.content)

Step 2

Use your SambaNova API key and base URL from the API keys and URLs page to replace the string fields "your-sambanova-api-key" and "your-sambanova-base-url"in the construction of the client.

Step 3

Select an image and move it to a suitable path that you can specify in the lines.

# The path to your image
image_path = "sample.JPEG"

Step 4

Verify the prompt to pair with the image in the content portion of the user prompt.

Step 5

Run the Python file to receive the text output.

Get started

Models

Features

Build

Resources

Implement vision and multimodal features

Make a query with an image

Get started

Models

Features

Build

Resources

Documentation Index

​Make a query with an image

Make a query with an image