Basic Usage¶
Let’s take a look back at the quickstart program:
from kani import Kani, chat_in_terminal
from kani.engines.openai import OpenAIEngine
api_key = "sk-..."
engine = OpenAIEngine(api_key, model="gpt-4o-mini")
ai = Kani(engine)
chat_in_terminal(ai)
kani is comprised of two main parts: the engine, which is the interface between kani and the language model, and the kani, which is responsible for tracking chat history, prompting the engine, and handling function calls.
In this section, we’ll look at how to initialize a Kani class and core concepts in the library.
Kani¶
See also
The Kani API documentation.
To initialize a kani, only the engine is required, though you can configure much more:
- Kani.__init__(
- engine: BaseEngine,
- system_prompt: str = None,
- always_included_messages: list[ChatMessage] = None,
- desired_response_tokens: int = None,
- chat_history: list[ChatMessage] = None,
- functions: list[AIFunction] = None,
- retry_attempts: int = 1,
- Parameters:
engine – The LM engine implementation to use.
system_prompt – The system prompt to provide to the LM. The prompt will not be included in
chat_history.always_included_messages – A list of messages to always include as a prefix in all chat rounds (i.e., evict newer messages rather than these to manage context length). These will not be included in
chat_history.desired_response_tokens – The minimum amount of space to leave in
max context size - tokens in prompt. To control the maximum number of tokens generated more precisely, you may be able to configure the engine (e.g.OpenAIEngine(..., max_tokens=250)). Defaults to 10% of the engine’s context length or 8192 tokens, whichever is smaller.chat_history –
The chat history to start with (not including system prompt or always included messages), for advanced use cases. By default, each kani starts with a new conversation session.
Caution
If you pass another kani’s chat history here without copying it, the same list will be mutated! Use
chat_history=mykani.chat_history.copy()to pass a copy.functions – A list of
AIFunctionto expose to the model (for dynamic function calling). Useai_function()to define static functions (see Function Calling).retry_attempts – How many attempts the LM may take per full round if any tool call raises an exception.
>>> from kani import Kani, chat_in_terminal
>>> from kani.engines.openai import OpenAIEngine
>>> api_key = "sk-..."
>>> engine = OpenAIEngine(api_key, model="gpt-4o-mini")
>>> ai = Kani(engine, system_prompt="You are a sarcastic assistant.")
>>> chat_in_terminal(ai, rounds=1)
USER: Hello kani!
AI: Is there something I can assist you with today, or are you just here for more of my delightful company?
Entrypoints¶
While chat_in_terminal() is helpful in development, let’s look at how to use a Kani in a larger
application.
The two standard entrypoints are Kani.chat_round() and Kani.full_round(), and their _str counterparts:
- async Kani.chat_round(
- query: str | Sequence[MessagePart | str] | None,
- **kwargs,
Perform a single chat round (user -> model -> user, no functions allowed).
- Parameters:
query – The contents of the user’s chat message. Can be None to generate a completion without a user prompt.
kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).
- Returns:
The model’s reply.
- async Kani.full_round( ) AsyncIterable[ChatMessage][source]
Perform a full chat round (user -> model [-> function -> model -> …] -> user).
Yields each non-user ChatMessage created during the round. A ChatMessage will have at least one of
(content, function_call).Use this in an async for loop, like so:
async for msg in kani.full_round("How's the weather?"): print(msg.text)
- Parameters:
query – The content of the user’s chat message. Can be None to generate a completion without a user prompt.
max_function_rounds – The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. Default unlimited (continues until model’s response does not contain a function call).
kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).
Important
These are asynchronous methods, which means you’ll need to be in an async context.
Web frameworks like FastAPI and Flask 2 allow your route methods to be async, meaning you can await a kani method from within your route method without having to get too in the weeds with asyncio.
Otherwise, you can create an async context by defining an async function and using asyncio.run(). For example,
here’s how you might implement a simple chat:
import asyncio
from kani import Kani
from kani.engines.openai import OpenAIEngine
api_key = "sk-..."
engine = OpenAIEngine(api_key, model="gpt-4o-mini")
ai = Kani(engine, system_prompt="You are a helpful assistant.")
# define your function normally, using `async def` instead of `def`
async def chat_with_kani():
while True:
user_message = input("USER: ")
# now, you can use `await` to call kani's async methods
message = await ai.chat_round_str(user_message)
print("AI:", message)
# use `asyncio.run` to call your async function to start the program
asyncio.run(chat_with_kani())
Engines¶
Engines are responsible for interfacing with a language model.
This table lists the engines built in to kani:
Model Name |
Extra |
Capabilities |
Engine |
|---|---|---|---|
All OpenAI Models |
|
🛠️ 🖼 |
|
All Anthropic Models |
|
🛠️ 🖼 |
|
All Google AI Models |
|
🛠️ 🖼 |
|
🤗 transformers[3] |
|
(model-specific) |
|
llama.cpp[2] |
|
(model-specific) |
|
vLLM[2] |
|
(model-specific) |
|
Additional models using the classes above are also supported - see the model zoo for a more comprehensive list of models!
Legend
🛠️: Supports function calling.
🖼: Supports multimodal inputs.
See also
We won’t go too far into implementation details here - if you are interested in implementing your own engine, check
out Engines or the BaseEngine API documentation.
When you are finished with an engine, release its resources with BaseEngine.close().
Concept: Chat Messages¶
At a high level, a Kani is responsible for managing a list of ChatMessage: the chat session
associated with it. You can access the chat messages through the Kani.chat_history attribute.
Each message contains the role (a ChatRole: system, assistant, user, or function) that sent the message
and the content of the message. Optionally, a user message can also contain a name (for multi-user
conversations), and an assistant message can contain a function_call (discussed in Function Calling).
- class kani.ChatMessage(
- *,
- role: ChatRole,
- content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None,
- name: str | None = None,
- tool_call_id: str | None = None,
- tool_calls: list[ToolCall] | None = None,
- is_tool_call_error: bool | None = None,
- extra: dict = {},
Represents a message in the chat context.
- role: ChatRole
Who said the message?
- content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None
The data used to create this message. Generally, you should use
textorpartsinstead.
- property text: str | None
The content of the message, as a string. Can be None only if the message is a requested function call from the assistant. If the message is comprised of multiple parts, concatenates the parts.
- property parts: list[MessagePart | str]
The parts of the message that make up its content. Can be an empty tuple only if the message is a requested function call from the assistant.
This is a read-only list; changes here will not affect the message’s content. To mutate the message content, use
copy_with()and settext,parts, orcontent.
- tool_call_id: str | None
The ID for a requested
ToolCallwhich this message is a response to (function messages only).
- is_tool_call_error: bool | None
If this is a FUNCTION message containing the results of a function call, whether the function call raised an exception.
- property function_call: FunctionCall | None
If there is exactly one tool call to a function, return that tool call’s requested function.
This is mostly provided for backwards-compatibility purposes; iterating over
tool_callsshould be preferred.
- extra: dict
Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.
This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).
- classmethod system(content: str | Sequence[MessagePart | str], **kwargs)[source]
Create a new system message.
- classmethod user(content: str | Sequence[MessagePart | str], **kwargs)[source]
Create a new user message.
- classmethod assistant(content: str | Sequence[MessagePart | str] | None, **kwargs)[source]
Create a new assistant message.
- classmethod function( )[source]
Create a new function message.
- copy_with(**new_values)[source]
Make a shallow copy of this object, updating the passed attributes (if any) to new values.
This does not validate the updated attributes! This is mostly just a convenience wrapper around
.model_copy.Only one of (content, text, parts) may be passed and will update the other two attributes accordingly.
Only one of (tool_calls, function_call) may be passed and will update the other accordingly.
You may even modify the chat history (e.g. append or delete ChatMessages or edit a message’s content) to change the prompt at any time.
Warning
In some advanced use cases, ChatMessage.content may be a list of MessagePart or str rather
than a string. ChatMessage exposes ChatMessage.text (always a string or None) and
ChatMessage.parts (always a list of message parts), which we recommend using instead of
ChatMessage.content. See Message Parts for more information.
These properties are dynamically generated based on the underlying content, and it is safe to mix messages with different content types in a single Kani.
>>> from kani import Kani, chat_in_terminal
>>> from kani.engines.openai import OpenAIEngine
>>> api_key = "sk-..."
>>> engine = OpenAIEngine(api_key, model="gpt-4o-mini")
>>> ai = Kani(engine, system_prompt="You are a helpful assistant.")
>>> chat_in_terminal(ai, rounds=1)
USER: Hello kani!
AI: Hello! How can I assist you today?
>>> ai.chat_history
[
ChatMessage(role=ChatRole.USER, content="Hello kani!"),
ChatMessage(role=ChatRole.ASSISTANT, content="Hello! How can I assist you today?"),
]
>>> await ai.get_prompt()
# The system prompt is passed to the engine, but isn't part of chat_history
# - this will be useful later in advanced use cases.
[
ChatMessage(role=ChatRole.SYSTEM, content="You are a helpful assistant."),
ChatMessage(role=ChatRole.USER, content="Hello kani!"),
ChatMessage(role=ChatRole.ASSISTANT, content="Hello! How can I assist you today?"),
]
Function Calling¶
Function calling gives language models the ability to choose when to call a function you provide based off its documentation.
With kani, you can write functions in Python and expose them to the model with just one line of code: the
@ai_function decorator.
# import the library
import asyncio
from typing import Annotated
from kani import AIParam, Kani, ai_function, chat_in_terminal, ChatRole
from kani.engines.openai import OpenAIEngine
# set up the engine as above
api_key = "sk-..."
engine = OpenAIEngine(api_key, model="gpt-4o-mini")
# subclass Kani to add AI functions
class MyKani(Kani):
# Adding the annotation to a method exposes it to the AI
@ai_function()
def get_weather(
self,
# and you can provide extra documentation about specific parameters
location: Annotated[str, AIParam(desc="The city and state, e.g. San Francisco, CA")],
):
"""Get the current weather in a given location."""
# In this example, we mock the return, but you could call a real weather API
return f"Weather in {location}: Sunny, 72 degrees fahrenheit."
ai = MyKani(engine)
# the terminal utility allows you to test function calls...
chat_in_terminal(ai)
# and you can track multiple rounds programmatically.
async def main():
async for msg in ai.full_round("What's the weather in Tokyo?"):
print(msg.role, msg.text)
if __name__ == "__main__":
asyncio.run(main())
kani guarantees that function calls are valid by the time they reach your methods while allowing you to focus on writing code. For more information, check out the function calling docs.
Streaming¶
kani supports streaming to print tokens from the engine as they are received. Streaming is designed to be a drop-in
superset of the chat_round and full_round methods, allowing you to gradually refactor your code without ever
leaving it in a broken state.
To request a stream from the engine, use Kani.chat_round_stream() or Kani.full_round_stream(). These
methods will return a StreamManager, which you can use in different ways to consume the stream.
The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of
str.
# CHAT ROUND (no function calling):
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
print(token, end="")
msg = await stream.message()
# FULL ROUND (with function calling):
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
async for token in stream:
print(token, end="")
msg = await stream.message()
kani also provides a helper to print streams (kani.print_stream()):
stream = ai.chat_round_stream("What is the most interesting train line in Tokyo?")
await kani.print_stream(stream)
After a stream finishes, its contents will be available as a ChatMessage. You can retrieve the final
message or BaseCompletion with:
msg = await stream.message()
completion = await stream.completion()
The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final
message or completion is requested before the stream is iterated over, the stream manager will consume the entire
stream.
Tip
For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:
msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
(note the await that is not present in the above examples). This allows you to refactor your code by changing
chat_round to chat_round_stream without other changes.
- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?")
+ msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
Multimodal Inputs¶
kani optionally supports multimodal inputs (images, audio, video) for various language models. To use multimodal inputs,
install the kani-multimodal-core extension package or use pip install "kani[multimodal]". See the
kani-multimodal-core documentation for more info.
Read the kani-multimodal-core docs!
from kani import Kani
from kani.engines.openai import OpenAIEngine
from kani.ext.multimodal_core import ImagePart
engine = OpenAIEngine(model="gpt-4.1-nano")
ai = Kani(engine)
# notice how the arg is a list of parts rather than a single str!
msg = await ai.chat_round_str([
"Please describe these images:",
ImagePart.from_file("path/to/image.png"),
await ImagePart.from_url(
"https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Whitehead%27s_Trogon_0A2A6014.jpg/1024px-Whitehead%27s_Trogon_0A2A6014.jpg"
),
])
print(msg)
Multimodal handling is deeply integrated with the rest of the kani ecosystem, so you get all the benefits of kani’s fluent tool usage and automatic context management with minimal development cost!
kani CLI¶
kani comes with a CLI for you to chat with a model in your terminal with zero setup.
The kani CLI takes the form of $ kani <provider>:<model-id>. Use kani --help for more information.
Examples:
$ kani openai:gpt-4.1-nano
$ kani huggingface:meta-llama/Meta-Llama-3-8B-Instruct
$ kani anthropic:claude-sonnet-4-0
$ kani google:gemini-2.5-flash
This CLI helper automatically creates a Engine and Kani instance, and calls chat_in_terminal() so you can test LLMs
faster. When kani-multimodal-core is installed, you can provide multimodal media on your disk or on the internet
to the model by prepending a path or URL with an @ symbol:
USER: Please describe this image: @path/to/image.png and also this one: @https://example.com/image.png
Few-Shot Prompting¶
Few-shot prompting (AKA in-context learning) is the idea that language models can “learn” the task the user wants to accomplish through examples provided to it in its prompt.
To few-shot prompt a language model with kani, you can initialize it with an existing chat history. In this example, we give the model a few-shot prompt in which it translates English to Japanese, and see that it continues to do so in the chat session despite never being explicitly prompted to do so.
>>> from kani import ChatMessage
>>> fewshot = [
... ChatMessage.user("thank you"),
... ChatMessage.assistant("arigato"),
... ChatMessage.user("good morning"),
... ChatMessage.assistant("ohayo"),
... ]
>>> ai = Kani(engine, chat_history=fewshot)
>>> chat_in_terminal(ai, rounds=1)
USER: crab
AI: kani
Tip
Passing the fewshot prompt as chat_history allows kani to manage it as normal - meaning it can slide out of the
context window. For kani to always include the fewshot prompt, use always_included_messages.
Saving & Loading Chats¶
You can save or load a kani’s chat state using Kani.save() and Kani.load(). This will dump the state to
a specified JSON file, which you can load into a later kani instance:
- Kani.save(fp: str | bytes | PathLike, *, save_format: Literal['json', 'kani'] | None = None, **kwargs)[source]
Save the chat state of this kani to a
.kanifile or JSON. This will overwrite the file if it exists!- Parameters:
fp – The path to the file to save.
save_format – Whether to save the chat state as a
.kanifile or JSON. If not set, determines format by file path extension (defaulting to.kaniif uncertain).kwargs – Additional arguments to pass to Pydantic’s
model_dump_json.
- Kani.load(fp: str | bytes | PathLike, **kwargs)[source]
Load a chat state from a
.kanifile or JSON file into this instance. This will overwrite any existing chat state!- Parameters:
fp – The path to the file containing the chat state.
kwargs – Additional arguments to pass to Pydantic’s
model_validate_json.
If you’d like more manual control over how you store chat state, there are two attributes you need to save:
Kani.always_included_messages and Kani.chat_history (both lists of ChatMessage).
These are pydantic models, which you can save and load using
ChatMessage.model_dump() and ChatMessage.model_validate().
You could, for example, save the chat state to a database and load it when necessary. A common pattern is to save
only the chat_history and use always_included_messages as an application-specific prompt.
Next Steps¶
In the next section, we’ll look at subclassing Kani in order to supply functions to the language model.
Then, we’ll look at how you can override and/or extend the implementations of kani methods to control each part of
a chat round.