Use this file to discover all available pages before exploring further.
This guide covers different aspects of text generation, including types of generation, model selection, creating prompts, and managing multi-turn conversations.
Use the following code to perform text generation with the SambaNova or OpenAI Python client in a non-streaming manner.
from sambanova import SambaNovaclient = SambaNova( base_url="your-sambanova-base-url", api_key="your-sambanova-api-key")completion = client.chat.completions.create( model="Meta-Llama-3.3-70B-Instruct", messages = [ {"role": "system", "content": "Answer the question in a couple sentences."}, {"role": "user", "content": "Share a happy story with me"} ])print(completion.choices[0].message.content)
Use the n parameter to generate multiple independent completions for a single prompt. The response includes each completion in choices[0] through choices[n-1].
Parameter
Type
Default
Valid range
n
integer
1
1–8
Set temperature greater than 0 to get varied outputs across completions. With temperature: 0, all completions will be identical.
n greater than 1 is not supported when using function calling or tools. Combining them returns a 400 error.
from sambanova import SambaNovaclient = SambaNova( base_url="your-sambanova-base-url", api_key="your-sambanova-api-key")completion = client.chat.completions.create( model="Meta-Llama-3.1-8B-Instruct", messages=[ {"role": "user", "content": "Write a one-sentence tagline for a coffee shop."} ], n=3, temperature=0.7)for i, choice in enumerate(completion.choices): print(f"Completion {i + 1}: {choice.message.content}")
Prompt engineering is the practice of designing and refining prompts to optimize responses from large language models (LLMs). This process is iterative and requires experimentation to achieve the best possible outcomes.
A basic prompt can be as simple as a few words to elicit a response from the LLM. However, for more complex use cases, you may need additional elements:
Element
Description
Defining a persona
Assigning a specific role to the model (e.g., “You are a financial advisor”).
Providing context
Supplying background information to guide the model’s response.
Specifying output format
Instructing the model to respond in a particular style (e.g., JSON, bullet points, structured text).
To maintain context across multiple exchanges, messages in a conversational AI system are typically stored as a list of dictionaries. Each dictionary contains keys that specify the sender’s role and the message content. This structure helps the system track context across multiple turns in a conversation.Below is an example of how a multi-turn conversation is structured using the Meta-Llama-3.3-70B-Instruct model:
Structuring multi-turn conversations using Meta-Llama-3.3-70B-Instruct
completion = client.chat.completions.create( model="Meta-Llama-3.3-70B-Instruct", messages = [ {"role": "user", "content": "Hi! My name is Peter and I am 31 years old. What is 1+1?"}, {"role": "assistant", "content": "Nice to meet you, Peter. 1 + 1 is equal to 2"}, {"role": "user", "content": "What is my age?"} ], stream = True)for chunk in completion: print(chunk.choices[0].delta.content, end="")
After running the program, you should see an output similar to the following.
Example output
You told me earlier, Peter. You're 31 years old.
By structuring conversations this way, the model can maintain context, recall prior user inputs, and provide more coherent responses.
When engaging in long conversations with LLMs, certain factors such as token limits and memory constraints must be considered to ensure accuracy and coherence.
Token limits - LLMs have a fixed context window, limiting the number of tokens they can process in a single request. If the input exceeds this limit, the system might truncate it, leading to incomplete or incoherent responses.
Memory constraints - The model does not retain context beyond its input window. To preserve context, past messages should be re-included in prompts.
By structuring prompts effectively and managing conversation history, you can optimize interactions with LLMs for better accuracy and coherence.