Basic Usage¶

Let’s take a look back at the quickstart program:

from kani import Kani, chat_in_terminal
from kani.engines.openai import OpenAIEngine

api_key = "sk-..."
engine = OpenAIEngine(api_key, model="gpt-4o-mini")
ai = Kani(engine)
chat_in_terminal(ai)

kani is comprised of two main parts: the engine, which is the interface between kani and the language model, and the kani, which is responsible for tracking chat history, prompting the engine, and handling function calls.

In this section, we’ll look at how to initialize a Kani class and core concepts in the library.

Kani¶

Entrypoints¶

While chat_in_terminal() is helpful in development, let’s look at how to use a Kani in a larger application.

The two standard entrypoints are Kani.chat_round() and Kani.full_round(), and their _str counterparts:

async Kani.chat_round(

query: str | Sequence[MessagePart | str] | None,

**kwargs,

) → ChatMessage[source]

Perform a single chat round (user -> model -> user, no functions allowed).

Parameters:

query – The contents of the user’s chat message. Can be None to generate a completion without a user prompt.
kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).

Returns:

The model’s reply.

async Kani.full_round(

query: str | Sequence[MessagePart | str] | None,

*,

max_function_rounds: int = None,

**kwargs,

) → AsyncIterable[ChatMessage][source]

Perform a full chat round (user -> model [-> function -> model -> …] -> user).

Yields each non-user ChatMessage created during the round. A ChatMessage will have at least one of (content, function_call).

Use this in an async for loop, like so:

async for msg in kani.full_round("How's the weather?"):
    print(msg.text)

Parameters:

query – The content of the user’s chat message. Can be None to generate a completion without a user prompt.
max_function_rounds – The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. Default unlimited (continues until model’s response does not contain a function call).
kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).

Important

These are asynchronous methods, which means you’ll need to be in an async context.

Web frameworks like FastAPI and Flask 2 allow your route methods to be async, meaning you can await a kani method from within your route method without having to get too in the weeds with asyncio.

Otherwise, you can create an async context by defining an async function and using asyncio.run(). For example, here’s how you might implement a simple chat:

import asyncio
from kani import Kani
from kani.engines.openai import OpenAIEngine

api_key = "sk-..."
engine = OpenAIEngine(api_key, model="gpt-4o-mini")
ai = Kani(engine, system_prompt="You are a helpful assistant.")

# define your function normally, using `async def` instead of `def`
async def chat_with_kani():
    while True:
        user_message = input("USER: ")
        # now, you can use `await` to call kani's async methods
        message = await ai.chat_round_str(user_message)
        print("AI:", message)

# use `asyncio.run` to call your async function to start the program
asyncio.run(chat_with_kani())

Engines¶

Engines are responsible for interfacing with a language model.

This table lists the engines built in to kani:

Model Name	Extra	Capabilities	Engine
All OpenAI Models	`openai`	🛠️ 🖼	`kani.engines.openai.OpenAIEngine`
All Anthropic Models	`anthropic`	🛠️ 🖼	`kani.engines.anthropic.AnthropicEngine`
All Google AI Models	`google`	🛠️ 🖼	`kani.engines.google.GoogleAIEngine`
🤗 transformers[3]	`huggingface`[1]	(model-specific)	`kani.engines.huggingface.HuggingEngine`
llama.cpp[2]	`cpp`	(model-specific)	`kani.engines.llamacpp.LlamaCppEngine`
vLLM[2]	`vllm`	(model-specific)	`kani.ext.vllm.VLLMEngine`, `VLLMServerEngine`, or `VLLMOpenAIEngine`

Additional models using the classes above are also supported - see the model zoo for a more comprehensive list of models!

Legend

🛠️: Supports function calling.
🖼: Supports multimodal inputs.

Concept: Chat Messages¶

At a high level, a Kani is responsible for managing a list of ChatMessage: the chat session associated with it. You can access the chat messages through the Kani.chat_history attribute.

Each message contains the role (a ChatRole: system, assistant, user, or function) that sent the message and the content of the message. Optionally, a user message can also contain a name (for multi-user conversations), and an assistant message can contain a function_call (discussed in Function Calling).

class kani.ChatMessage( *, role: ChatRole, content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None, name: str | None = None, tool_call_id: str | None = None, tool_calls: list[ToolCall] | None = None, is_tool_call_error: bool | None = None, extra: dict = {}, )[source]

Represents a message in the chat context.

role: ChatRole: Who said the message?

content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None: The data used to create this message. Generally, you should use text or parts instead.

property text: str | None: The content of the message, as a string. Can be None only if the message is a requested function call from the assistant. If the message is comprised of multiple parts, concatenates the parts.

property parts: list[MessagePart | str]

The parts of the message that make up its content. Can be an empty tuple only if the message is a requested function call from the assistant.

This is a read-only list; changes here will not affect the message’s content. To mutate the message content, use copy_with() and set text, parts, or content.

name: str | None: The name of the user who sent the message, if set (user/function messages only).

tool_call_id: str | None: The ID for a requested ToolCall which this message is a response to (function messages only).

tool_calls: list[ToolCall] | None: The tool calls requested by the model (assistant messages only).

is_tool_call_error: bool | None: If this is a FUNCTION message containing the results of a function call, whether the function call raised an exception.

property function_call: FunctionCall | None

If there is exactly one tool call to a function, return that tool call’s requested function.

This is mostly provided for backwards-compatibility purposes; iterating over tool_calls should be preferred.

extra: dict

Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.

This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).

classmethod system(content: str | Sequence[MessagePart | str], **kwargs)[source]: Create a new system message.

classmethod user(content: str | Sequence[MessagePart | str], **kwargs)[source]: Create a new user message.

classmethod assistant(content: str | Sequence[MessagePart | str] | None, **kwargs)[source]: Create a new assistant message.

classmethod function(

name: str | None,

content: str | Sequence[MessagePart | str],

tool_call_id: str = None,

**kwargs,

)[source]: Create a new function message.

copy_with(**new_values)[source]

Make a shallow copy of this object, updating the passed attributes (if any) to new values.

This does not validate the updated attributes! This is mostly just a convenience wrapper around .model_copy.

Only one of (content, text, parts) may be passed and will update the other two attributes accordingly.

Only one of (tool_calls, function_call) may be passed and will update the other accordingly.

You may even modify the chat history (e.g. append or delete ChatMessages or edit a message’s content) to change the prompt at any time.

Warning

In some advanced use cases, ChatMessage.content may be a list of MessagePart or str rather than a string. ChatMessage exposes ChatMessage.text (always a string or None) and ChatMessage.parts (always a list of message parts), which we recommend using instead of ChatMessage.content. See Message Parts for more information.

These properties are dynamically generated based on the underlying content, and it is safe to mix messages with different content types in a single Kani.

>>> from kani import Kani, chat_in_terminal
>>> from kani.engines.openai import OpenAIEngine
>>> api_key = "sk-..."
>>> engine = OpenAIEngine(api_key, model="gpt-4o-mini")
>>> ai = Kani(engine, system_prompt="You are a helpful assistant.")
>>> chat_in_terminal(ai, rounds=1)
USER: Hello kani!
AI: Hello! How can I assist you today?
>>> ai.chat_history
[
    ChatMessage(role=ChatRole.USER, content="Hello kani!"),
    ChatMessage(role=ChatRole.ASSISTANT, content="Hello! How can I assist you today?"),
]
>>> await ai.get_prompt()
# The system prompt is passed to the engine, but isn't part of chat_history
# - this will be useful later in advanced use cases.
[
    ChatMessage(role=ChatRole.SYSTEM, content="You are a helpful assistant."),
    ChatMessage(role=ChatRole.USER, content="Hello kani!"),
    ChatMessage(role=ChatRole.ASSISTANT, content="Hello! How can I assist you today?"),
]

Function Calling¶

Function calling gives language models the ability to choose when to call a function you provide based off its documentation.

With kani, you can write functions in Python and expose them to the model with just one line of code: the @ai_function decorator.

# import the library
import asyncio
from typing import Annotated
from kani import AIParam, Kani, ai_function, chat_in_terminal, ChatRole
from kani.engines.openai import OpenAIEngine

# set up the engine as above
api_key = "sk-..."
engine = OpenAIEngine(api_key, model="gpt-4o-mini")

# subclass Kani to add AI functions
class MyKani(Kani):
    # Adding the annotation to a method exposes it to the AI
    @ai_function()
    def get_weather(
        self,
        # and you can provide extra documentation about specific parameters
        location: Annotated[str, AIParam(desc="The city and state, e.g. San Francisco, CA")],
    ):
        """Get the current weather in a given location."""
        # In this example, we mock the return, but you could call a real weather API
        return f"Weather in {location}: Sunny, 72 degrees fahrenheit."

ai = MyKani(engine)

# the terminal utility allows you to test function calls...
chat_in_terminal(ai)

# and you can track multiple rounds programmatically.
async def main():
    async for msg in ai.full_round("What's the weather in Tokyo?"):
        print(msg.role, msg.text)

if __name__ == "__main__":
    asyncio.run(main())

kani guarantees that function calls are valid by the time they reach your methods while allowing you to focus on writing code. For more information, check out the function calling docs.

Streaming¶

kani supports streaming to print tokens from the engine as they are received. Streaming is designed to be a drop-in superset of the chat_round and full_round methods, allowing you to gradually refactor your code without ever leaving it in a broken state.

To request a stream from the engine, use Kani.chat_round_stream() or Kani.full_round_stream(). These methods will return a StreamManager, which you can use in different ways to consume the stream.

The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str.

# CHAT ROUND (no function calling):
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
    print(token, end="")
msg = await stream.message()

# FULL ROUND (with function calling):
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
    async for token in stream:
        print(token, end="")
    msg = await stream.message()

kani also provides a helper to print streams (kani.print_stream()):

stream = ai.chat_round_stream("What is the most interesting train line in Tokyo?")
await kani.print_stream(stream)

After a stream finishes, its contents will be available as a ChatMessage. You can retrieve the final message or BaseCompletion with:

msg = await stream.message()
completion = await stream.completion()

The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.

Tip

For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:

msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

(note the await that is not present in the above examples). This allows you to refactor your code by changing chat_round to chat_round_stream without other changes.

- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?")
+ msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

Multimodal Inputs¶

kani optionally supports multimodal inputs (images, audio, video) for various language models. To use multimodal inputs, install the kani-multimodal-core extension package or use pip install "kani[multimodal]". See the kani-multimodal-core documentation for more info.

Read the kani-multimodal-core docs!

from kani import Kani
from kani.engines.openai import OpenAIEngine
from kani.ext.multimodal_core import ImagePart

engine = OpenAIEngine(model="gpt-4.1-nano")
ai = Kani(engine)

# notice how the arg is a list of parts rather than a single str!
msg = await ai.chat_round_str([
    "Please describe these images:",
    ImagePart.from_file("path/to/image.png"),
    await ImagePart.from_url(
        "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Whitehead%27s_Trogon_0A2A6014.jpg/1024px-Whitehead%27s_Trogon_0A2A6014.jpg"
    ),
])
print(msg)

Multimodal handling is deeply integrated with the rest of the kani ecosystem, so you get all the benefits of kani’s fluent tool usage and automatic context management with minimal development cost!

kani CLI¶

kani comes with a CLI for you to chat with a model in your terminal with zero setup.

The kani CLI takes the form of $ kani <provider>:<model-id>. Use kani --help for more information.

Examples:

$ kani openai:gpt-4.1-nano
$ kani huggingface:meta-llama/Meta-Llama-3-8B-Instruct
$ kani anthropic:claude-sonnet-4-0
$ kani google:gemini-2.5-flash

This CLI helper automatically creates a Engine and Kani instance, and calls chat_in_terminal() so you can test LLMs faster. When kani-multimodal-core is installed, you can provide multimodal media on your disk or on the internet to the model by prepending a path or URL with an @ symbol:

USER: Please describe this image: @path/to/image.png and also this one: @https://example.com/image.png

Few-Shot Prompting¶

Few-shot prompting (AKA in-context learning) is the idea that language models can “learn” the task the user wants to accomplish through examples provided to it in its prompt.

To few-shot prompt a language model with kani, you can initialize it with an existing chat history. In this example, we give the model a few-shot prompt in which it translates English to Japanese, and see that it continues to do so in the chat session despite never being explicitly prompted to do so.

>>> from kani import ChatMessage
>>> fewshot = [
...     ChatMessage.user("thank you"),
...     ChatMessage.assistant("arigato"),
...     ChatMessage.user("good morning"),
...     ChatMessage.assistant("ohayo"),
... ]
>>> ai = Kani(engine, chat_history=fewshot)
>>> chat_in_terminal(ai, rounds=1)
USER: crab
AI: kani

Tip

Passing the fewshot prompt as chat_history allows kani to manage it as normal - meaning it can slide out of the context window. For kani to always include the fewshot prompt, use always_included_messages.

Saving & Loading Chats¶

You can save or load a kani’s chat state using Kani.save() and Kani.load(). This will dump the state to a specified JSON file, which you can load into a later kani instance:

Kani.save(fp: str | bytes | PathLike, *, save_format: Literal['json', 'kani'] | None = None, **kwargs)[source]

Save the chat state of this kani to a .kani file or JSON. This will overwrite the file if it exists!

Parameters:

fp – The path to the file to save.
save_format – Whether to save the chat state as a .kani file or JSON. If not set, determines format by file path extension (defaulting to .kani if uncertain).
kwargs – Additional arguments to pass to Pydantic’s model_dump_json.

Kani.load(fp: str | bytes | PathLike, **kwargs)[source]

Load a chat state from a .kani file or JSON file into this instance. This will overwrite any existing chat state!

Parameters:

fp – The path to the file containing the chat state.
kwargs – Additional arguments to pass to Pydantic’s model_validate_json.

If you’d like more manual control over how you store chat state, there are two attributes you need to save: Kani.always_included_messages and Kani.chat_history (both lists of ChatMessage).

These are pydantic models, which you can save and load using ChatMessage.model_dump() and ChatMessage.model_validate().

You could, for example, save the chat state to a database and load it when necessary. A common pattern is to save only the chat_history and use always_included_messages as an application-specific prompt.

Next Steps¶

In the next section, we’ll look at subclassing Kani in order to supply functions to the language model. Then, we’ll look at how you can override and/or extend the implementations of kani methods to control each part of a chat round.