API Reference

Kani

class kani.Kani(
engine: BaseEngine,
system_prompt: str = None,
always_included_messages: list[ChatMessage] = None,
desired_response_tokens: int = None,
chat_history: list[ChatMessage] = None,
functions: list[AIFunction] = None,
retry_attempts: int = 1,
)[source]

Base class for all kani.

Entrypoints

chat_round(query: str, **kwargs) -> ChatMessage

chat_round_str(query: str, **kwargs) -> str

chat_round_stream(query: str, **kwargs) -> StreamManager

full_round(query: str, **kwargs) -> AsyncIterable[ChatMessage]

full_round_str(query: str, message_formatter: Callable[[ChatMessage], str], **kwargs) -> AsyncIterable[str]

full_round_stream(query: str, **kwargs) -> AsyncIterable[StreamManager]

Function Calling

Subclass and use @ai_function() to register functions. The schema will be autogenerated from the function signature (see ai_function()).

To perform a chat round with functions, use full_round() as an async iterator:

async for msg in kani.full_round(prompt):
    # responses...

Each response will be a ChatMessage.

Alternatively, you can use full_round_str() and control the format of a yielded function call with function_call_formatter.

Retry & Model Feedback

If the model makes an error when attempting to call a function (e.g. calling a function that does not exist or passing params with incorrect and non-coercible types) or the function raises an exception, Kani will send the error in a system message to the model, allowing it up to retry_attempts to correct itself and retry the call.

Parameters:
  • engine – The LM engine implementation to use.

  • system_prompt – The system prompt to provide to the LM. The prompt will not be included in chat_history.

  • always_included_messages – A list of messages to always include as a prefix in all chat rounds (i.e., evict newer messages rather than these to manage context length). These will not be included in chat_history.

  • desired_response_tokens – The minimum amount of space to leave in max context size - tokens in prompt. To control the maximum number of tokens generated more precisely, you may be able to configure the engine (e.g. OpenAIEngine(..., max_tokens=250)). Defaults to 10% of the engine’s context length or 8192 tokens, whichever is smaller.

  • chat_history

    The chat history to start with (not including system prompt or always included messages), for advanced use cases. By default, each kani starts with a new conversation session.

    Caution

    If you pass another kani’s chat history here without copying it, the same list will be mutated! Use chat_history=mykani.chat_history.copy() to pass a copy.

  • functions – A list of AIFunction to expose to the model (for dynamic function calling). Use ai_function() to define static functions (see Function Calling).

  • retry_attempts – How many attempts the LM may take per full round if any tool call raises an exception.

always_included_messages: list[ChatMessage]

Chat messages that are always included as a prefix in the model’s prompt. Includes the system message, if supplied.

chat_history: list[ChatMessage]

All messages in the current chat state, not including system or always included messages.

async chat_round(
query: str | Sequence[MessagePart | str] | None,
**kwargs,
) ChatMessage[source]

Perform a single chat round (user -> model -> user, no functions allowed).

Parameters:
  • query – The contents of the user’s chat message. Can be None to generate a completion without a user prompt.

  • kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).

Returns:

The model’s reply.

async chat_round_str(query: str | Sequence[MessagePart | str] | None, **kwargs) str[source]

Like chat_round(), but only returns the text content of the message.

chat_round_stream(
query: str | Sequence[MessagePart | str] | None,
**kwargs,
) StreamManager[source]

Returns a stream of tokens from the engine as they are generated.

To consume tokens from a stream, use this class as so:

stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
    print(token, end="")
msg = await stream.message()

Tip

For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:

msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

(note the await that is not present in the above examples).

The arguments are the same as chat_round().

async full_round(
query: str | Sequence[MessagePart | str] | None,
*,
max_function_rounds: int = None,
**kwargs,
) AsyncIterable[ChatMessage][source]

Perform a full chat round (user -> model [-> function -> model -> …] -> user).

Yields each non-user ChatMessage created during the round. A ChatMessage will have at least one of (content, function_call).

Use this in an async for loop, like so:

async for msg in kani.full_round("How's the weather?"):
    print(msg.text)
Parameters:
  • query – The content of the user’s chat message. Can be None to generate a completion without a user prompt.

  • max_function_rounds – The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. Default unlimited (continues until model’s response does not contain a function call).

  • kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).

async full_round_str(query: str | ~typing.Sequence[~kani.models.MessagePart | str] | None, message_formatter: ~typing.Callable[[~kani.models.ChatMessage], str | None] = <function assistant_message_contents>, *, max_function_rounds: int = None, **kwargs) AsyncIterable[str][source]

Like full_round(), but each yielded element is a str rather than a ChatMessage.

Parameters:
  • query – The content of the user’s chat message.

  • message_formatter – A function that returns a string to yield for each message. By default, full_round_str yields the content of each assistant message.

  • max_function_rounds – The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. Default unlimited (continues until model’s response does not contain a function call).

  • kwargs – Additional arguments to pass to the model engine (e.g. hyperparameters).

async full_round_stream(
query: str | Sequence[MessagePart | str] | None,
*,
max_function_rounds: int = None,
**kwargs,
) AsyncIterable[StreamManager][source]

Perform a full chat round (user -> model [-> function -> model -> …] -> user).

Yields a stream of tokens for each non-user ChatMessage created during the round.

To consume tokens from a stream, use this class as so:

async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
    async for token in stream:
        print(token, end="")
    msg = await stream.message()

Each StreamManager object yielded by this method contains a StreamManager.role attribute that can be used to determine if a message is from the engine or a function call. This attribute will be available before iterating over the stream.

The arguments are the same as full_round().

async prompt_token_len(
messages: list[ChatMessage],
functions: list[AIFunction] | None = None,
**kwargs,
)[source]

Returns the number of tokens used by the given prompt (i.e., list of messages and functions).

In general, this is preferred over message_token_len().

async get_model_completion(include_functions: bool = True, **kwargs) BaseCompletion[source]

Get the model’s completion with the current chat state.

Compared to chat_round() and full_round(), this lower-level method does not save the model’s reply to the chat history or mutate the chat state; it is intended to help with logging or to repeat a call multiple times.

Parameters:
  • include_functions – Whether to pass this kani’s function definitions to the engine.

  • kwargs – Arguments to pass to the model engine.

async get_model_stream(
include_functions: bool = True,
**kwargs,
) AsyncIterable[str | BaseCompletion][source]

Get the model’s completion with the current chat state as a stream. This is a low-level method like get_model_completion() but for streams.

async get_prompt(include_functions=True, **kwargs) list[ChatMessage][source]

Called each time before asking the LM engine for a completion to generate the chat prompt. Returns a list of messages such that the total token count in the messages is less than (self.max_context_size - self.desired_response_tokens).

Always includes the system prompt plus any always_included_messages at the start of the prompt.

You may override this to get more fine-grained control over what is exposed in the model’s memory at any given call.

Parameters:
  • include_functions – Whether to account for the tokens that will be used for function definitions in the context length.

  • kwargs – Additional arguments that were passed to the model engine from chat_round() or full_round() (e.g. decoding arguments).

get_enabled_functions() list[AIFunction][source]

Get the list of current enabled AIFunctions. By default this returns all AIFunctions in self.functions where AIFunction.enabled is truthy.

async do_function_call(call: FunctionCall, tool_call_id: str = None) FunctionCallResult[source]

Resolve a single function call.

By default, any exception raised from this method will be an instance of a FunctionCallException.

You may implement an override to add instrumentation around function calls (e.g. tracking success counts for varying prompts). See Handle a Function Call.

Parameters:
  • call – The name of the function to call and arguments to call it with.

  • tool_call_id – The tool_call_id to set in the returned FUNCTION message.

Returns:

A FunctionCallResult including whose turn it is next and the message with the result of the function call.

Raises:
async handle_function_call_exception(
call: FunctionCall,
err: FunctionCallException,
attempt: int,
tool_call_id: str = None,
) ExceptionHandleResult[source]

Called when a function call raises an exception.

By default, returns a message telling the LM about the error and allows a retry if the error is recoverable and there are remaining retry attempts.

You may implement an override to customize the error prompt, log the error, or use custom retry logic. See Handle a Function Call Exception.

Parameters:
  • call – The FunctionCall the model was attempting to make.

  • err – The error the call raised. Usually this is NoSuchFunction or WrappedCallException, although it may be any exception raised by do_function_call().

  • attempt – The attempt number for the current call (0-indexed).

  • tool_call_id – The tool_call_id to set in the returned FUNCTION message.

Returns:

A ExceptionHandleResult detailing whether the model should retry and the message to add to the chat history.

async add_completion_to_history(completion: BaseCompletion)[source]

Add the message in the given completion to the chat history and return it.

You might want to override this to log token counts. By default, this calls add_to_history().

This method differs from add_to_history() in that it is only called on model completions (stream and non-stream) rather than on each message, and takes a BaseCompletion as input.

async add_to_history(message: ChatMessage)[source]

Add the given message to the chat history.

You might want to override this to log messages to an external or control how messages are saved to the chat session’s memory. By default, this appends to chat_history.

save(fp: str | bytes | PathLike, *, save_format: Literal['json', 'kani'] | None = None, **kwargs)[source]

Save the chat state of this kani to a .kani file or JSON. This will overwrite the file if it exists!

Parameters:
  • fp – The path to the file to save.

  • save_format – Whether to save the chat state as a .kani file or JSON. If not set, determines format by file path extension (defaulting to .kani if uncertain).

  • kwargs – Additional arguments to pass to Pydantic’s model_dump_json.

load(fp: str | bytes | PathLike, **kwargs)[source]

Load a chat state from a .kani file or JSON file into this instance. This will overwrite any existing chat state!

Parameters:
  • fp – The path to the file containing the chat state.

  • kwargs – Additional arguments to pass to Pydantic’s model_validate_json.

property always_len: int

Returns the number of tokens that will always be reserved.

(e.g. for system prompts, always included messages, the engine, and the response).

message_token_len(message: ChatMessage)[source]

Returns the estimated number of tokens used by a single given message.

Deprecated since version 1.7.0: Use prompt_token_len() instead.

Note

The token count returned by this may not exactly reflect the actual token count (e.g., due to prompt formatting or not having access to the tokenizer). It should, however, be a safe overestimate to use as an upper bound.

Warning

This method may not be available for all models (e.g., models which do not expose a local tokenization method and require API calls to count tokens, or models enforcing strict constraints on prompt formats). Use prompt_token_len() instead.

Common Models

class kani.ChatRole(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Represents who said a chat message.

SYSTEM = 'system'

The message is from the system (usually a steering prompt).

USER = 'user'

The message is from the user.

ASSISTANT = 'assistant'

The message is from the language model.

FUNCTION = 'function'

The message is the result of a function call.

class kani.FunctionCall(*, name: str, arguments: str)[source]

Represents a model’s request to call a single function.

name: str

The name of the requested function.

arguments: str

The arguments to call it with, encoded in JSON.

property kwargs: dict[str, Any]

The arguments to call the function with, as a Python dictionary.

classmethod with_args(_FunctionCall__name: str, /, **kwargs)[source]

Create a function call with the given arguments (e.g. for few-shot prompting).

class kani.ToolCall(*, id: str, type: str, function: FunctionCall)[source]

Represents a model’s request to call a tool with a unique request ID.

See Internal Representation for more information about tool calls vs function calls.

id: str

The request ID created by the engine. This should be passed back to the engine in ChatMessage.tool_call_id in order to associate a FUNCTION message with this request.

type: str

The type of tool requested (currently only “function”).

function: FunctionCall

The requested function call.

classmethod from_function(_ToolCall__name: str, /, *, call_id_: str = None, **kwargs)[source]

Create a tool call request for a function with the given name and arguments.

Parameters:

call_id – The ID to assign to the request. If not passed, generates a random ID.

classmethod from_function_call(call: FunctionCall, call_id_: str = None)[source]

Create a tool call request from an existing FunctionCall.

Parameters:

call_id – The ID to assign to the request. If not passed, generates a random ID.

class kani.MessagePart(*, extra: dict = {})[source]

Base class for a part of a message.

Engines should inherit from this class to tag substrings with metadata or provide multimodality to an engine. By default, if coerced to a string, will raise a warning noting that rich message part data was lost. For more information see Message Parts.

__str__()[source]

Used to define the fallback behaviour when a part is serialized to a string (e.g. via ChatMessage.text ). Override this to specify the canonical string representation of your message part.

Engines that support message parts should generally not use this, preferring to iterate over ChatMessage.parts instead.

extra: dict

Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.

This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).

class kani.ChatMessage(
*,
role: ChatRole,
content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None,
name: str | None = None,
tool_call_id: str | None = None,
tool_calls: list[ToolCall] | None = None,
is_tool_call_error: bool | None = None,
extra: dict = {},
)[source]

Represents a message in the chat context.

role: ChatRole

Who said the message?

content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None

The data used to create this message. Generally, you should use text or parts instead.

property text: str | None

The content of the message, as a string. Can be None only if the message is a requested function call from the assistant. If the message is comprised of multiple parts, concatenates the parts.

property parts: list[MessagePart | str]

The parts of the message that make up its content. Can be an empty tuple only if the message is a requested function call from the assistant.

This is a read-only list; changes here will not affect the message’s content. To mutate the message content, use copy_with() and set text, parts, or content.

name: str | None

The name of the user who sent the message, if set (user/function messages only).

tool_call_id: str | None

The ID for a requested ToolCall which this message is a response to (function messages only).

tool_calls: list[ToolCall] | None

The tool calls requested by the model (assistant messages only).

is_tool_call_error: bool | None

If this is a FUNCTION message containing the results of a function call, whether the function call raised an exception.

property function_call: FunctionCall | None

If there is exactly one tool call to a function, return that tool call’s requested function.

This is mostly provided for backwards-compatibility purposes; iterating over tool_calls should be preferred.

extra: dict

Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.

This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).

classmethod system(content: str | Sequence[MessagePart | str], **kwargs)[source]

Create a new system message.

classmethod user(content: str | Sequence[MessagePart | str], **kwargs)[source]

Create a new user message.

classmethod assistant(content: str | Sequence[MessagePart | str] | None, **kwargs)[source]

Create a new assistant message.

classmethod function(
name: str | None,
content: str | Sequence[MessagePart | str],
tool_call_id: str = None,
**kwargs,
)[source]

Create a new function message.

copy_with(**new_values)[source]

Make a shallow copy of this object, updating the passed attributes (if any) to new values.

This does not validate the updated attributes! This is mostly just a convenience wrapper around .model_copy.

Only one of (content, text, parts) may be passed and will update the other two attributes accordingly.

Only one of (tool_calls, function_call) may be passed and will update the other accordingly.

AI Function

kani.ai_function(
func=None,
*,
after: ChatRole = ChatRole.ASSISTANT,
name: str | None = None,
desc: str | None = None,
auto_retry: bool = True,
json_schema: dict | None = None,
auto_truncate: int | None = None,
enabled: bool = True,
)[source]

Decorator to mark a method of a Kani to expose to the AI.

Parameters:
  • after – Who should speak next after the function call completes (see Next Actor). Defaults to the model.

  • name – The name of the function (defaults to the name of the function in source code).

  • desc – The function’s description (defaults to the function’s docstring).

  • auto_retry – Whether the model should retry calling the function if it gets it wrong (see Retry & Model Feedback).

  • json_schema – A JSON Schema document describing the function’s parameters. By default, kani will automatically generate one, but this can be helpful for overriding it in any tricky cases.

  • auto_truncate

    If a function response is longer than this many characters, truncate it until it is at most this many characters and add “…” to the end. By default, no responses will be truncated. This uses a paragraph-aware truncation algorithm.

    Changed in version 1.7.0: This parameter now truncates to a certain number of characters, rather than tokens, since it is not possible to reliably determine the token count of a message out of prompt context for all engines.

  • enabled – Whether the function should be included in the prompt passed to the model. Disabled functions will still be executed if the model generates a call to them despite not being passed to the model.

class kani.AIFunction(
inner,
after: ChatRole = ChatRole.ASSISTANT,
name: str | None = None,
desc: str | None = None,
auto_retry: bool = True,
json_schema: dict | None = None,
auto_truncate: int | None = None,
enabled: bool = True,
)[source]

Wrapper around a function to expose to a language model.

Parameters:
  • inner – The function implementation.

  • after – Who should speak next after the function call completes (see Next Actor). Defaults to the model.

  • name – The name of the function (defaults to the name of the function in source code).

  • desc – The function’s description (defaults to the function’s docstring).

  • auto_retry – Whether the model should retry calling the function if it gets it wrong (see Retry & Model Feedback).

  • json_schema – A JSON Schema document describing the function’s parameters. By default, kani will automatically generate one, but this can be helpful for overriding it in any tricky cases.

  • auto_truncate

    If a function response is longer than this many characters, truncate it until it is at most this many characters and add “…” to the end. By default, no responses will be truncated. This uses a paragraph-aware truncation algorithm.

    Changed in version 1.7.0: This parameter now truncates to a certain number of characters, rather than tokens, since it is not possible to reliably determine the token count of a message out of prompt context for all engines.

  • enabled – Whether the function should be included in the prompt passed to the model. Disabled functions will still be executed if the model generates a call to them despite not being passed to the model.

create_json_schema(include_desc=True) dict[source]

Create a JSON schema representing this function’s parameters as a JSON object.

Parameters:

include_desc – Whether to include the AIFunction’s description in the generated JSON schema.

class kani.AIParam(desc: str, *, title: str = None)[source]

Special tag to annotate types with in order to provide parameter-level metadata to kani.

Parameters:
  • desc – The description of the parameter.

  • title – If set, set the title of this parameter in generated JSON schema to this; otherwise omit the title (as it is already the key of the parameter in the schema).

Common MessageParts

class kani.parts.ReasoningPart(*, extra: dict = {}, content: str)[source]

A long CoT that should not be shown to the user (e.g. GPT-OSS, Anthropic extended thinking, Deepseek R1).

When using a low-level text engine (e.g., HuggingEngine), these parts will not be automatically extracted. Use a parser instead (e.g., GPTOSSParser for GPT-OSS).

content: str

The reasoning content.

Exceptions

exception kani.exceptions.KaniException[source]

Base class for all Kani exceptions/errors.

exception kani.exceptions.PromptTooLong[source]

A given prompt was too long to tokenize or generate a completion for.

exception kani.exceptions.MessageTooLong[source]

This chat message will never fit in the context window.

exception kani.exceptions.FunctionCallException(retry: bool)[source]

Base class for exceptions that occur when a model calls an @ai_function.

exception kani.exceptions.WrappedCallException(retry, original)[source]

The @ai_function raised an exception.

exception kani.exceptions.NoSuchFunction(name)[source]

The model attempted to call a function that does not exist.

exception kani.exceptions.FunctionSpecError[source]

This @ai_function spec is invalid.

exception kani.exceptions.MissingModelDependencies[source]

You are trying to use an engine but do not have engine-specific packages installed.

exception kani.exceptions.PromptError[source]

For some reason, the input to this model is invalid.

exception kani.exceptions.MissingMessagePartType(fqn: str, msg: str)[source]

During loading a saved kani, a message part has a type which is not currently defined in the runtime.

Parameters:

fqn – The fully qualified name of the type that is missing.

Streaming

class kani.streaming.StreamManager(
stream_iter: AsyncIterable[str | BaseCompletion],
role: ChatRole,
*,
after=None,
lock: Lock = None,
)[source]

This class is responsible for managing a stream returned by an engine. It should not be constructed manually.

To consume tokens from a stream, use this class as so:

# CHAT ROUND:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
    print(token, end="")
msg = await stream.message()

# FULL ROUND:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?")
    async for token in stream:
        print(token, end="")
    msg = await stream.message()

After a stream finishes, its contents will be available as a ChatMessage. You can retrieve the final message or BaseCompletion with:

msg = await stream.message()
completion = await stream.completion()

The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.

Tip

For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:

msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")

(note the await that is not present in the above examples).

Parameters:
  • stream_iter – The async iterable that generates elements of the stream.

  • role – The role of the message that will be returned eventually.

  • after – A coro to call with the generated completion as its argument after the stream is fully consumed.

  • lock – A lock to hold for the duration of the stream run.

__await__()[source]

Awaiting the StreamManager is equivalent to awaiting message().

__aiter__() AsyncIterable[str][source]

Iterate over tokens yielded from the engine.

role

The role of the message that this stream will return.

async completion() BaseCompletion[source]

Get the final BaseCompletion generated by the model.

async message() ChatMessage[source]

Get the final ChatMessage generated by the model.

Prompting

This submodule contains utilities to transform a list of Kani ChatMessage into low-level formats to be consumed by an engine (e.g. str, list[dict], or torch.Tensor).

class kani.PromptPipeline(steps: list[PipelineStep] = None)[source]

This class creates a reproducible pipeline for translating a list of ChatMessage into an engine-specific format using fluent-style chaining.

To build a pipeline, create an instance of PromptPipeline() and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.

Usage

To use the pipeline, call the created pipeline object with a list of kani chat messages.

To inspect the inputs/outputs of your pipeline, you can use explain() to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).

Example

Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:

from kani import PromptPipeline, ChatRole

pipe = (
    PromptPipeline()

    # System messages should be wrapped with this tag. We'll translate them to USER
    # messages since a system and user message go together in a single [INST] pair.
    .wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
    .translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)

    # If we see two consecutive USER messages, merge them together into one with a
    # newline in between.
    .merge_consecutive(role=ChatRole.USER, sep="\n")
    # Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
    # generations).
    .merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")

    # Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
    # message list ends with an ASSISTANT message, don't add the EOS token
    # (we want the model to continue the generation).
    .conversation_fmt(
        user_prefix="<s>[INST] ",
        user_suffix=" [/INST]",
        assistant_prefix=" ",
        assistant_suffix=" </s>",
        assistant_suffix_if_last="",
    )
)

# We can see what this pipeline does by calling explain()...
pipe.explain()

# And use it in our engine to build a string prompt for the LLM.
prompt = pipe(ai.get_prompt())
__call__(
msgs: list[ChatMessage],
functions: list[AIFunction] = None,
**kwargs,
) T[source]

Apply the pipeline to a list of kani messages. The return type will vary based on the steps in the pipeline; if no steps are defined the return type will be a copy of the input messages.

translate_role(
*,
to: ChatRole,
warn: str = None,
role: ChatRole | Collection[ChatRole] = None,
predicate: Callable[[ChatMessage], bool] = None,
) Self[source]

Change the role of the matching messages. (e.g. for models which do not support native function calling, make all FUNCTION messages a USER message)

Parameters:
  • to – The new role to translate the matching messages to.

  • warn – A warning to emit if any messages are translated (e.g. if a model does not support certain roles).

  • role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.

  • predicate – A function that takes a ChatMessage and returns a boolean specifying whether to operate on this message or not.

If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.

wrap(
*,
prefix: str = None,
suffix: str = None,
role: ChatRole | Collection[ChatRole] = None,
predicate: Callable[[ChatMessage], bool] = None,
) Self[source]

Wrap the matching messages with a given string prefix and/or suffix.

For more fine-grained control over user/assistant message pairs as the last step in a pipeline, use conversation_fmt() instead.

Parameters:
  • prefix – The prefix to add before each matching message, if any.

  • suffix – The suffix to add after each matching message, if any.

  • role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.

  • predicate – A function that takes a ChatMessage and returns a boolean specifying whether to operate on this message or not.

If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.

merge_consecutive(
*,
sep: str = None,
joiner: Callable[[list[ChatMessage]], str | list[MessagePart | str] | None] = None,
out_role: ChatRole = None,
role: ChatRole | Collection[ChatRole] = None,
predicate: Callable[[ChatMessage], bool] = None,
) Self[source]

If multiple messages that match are found consecutively, merge them by either joining their contents with a string or call a joiner function.

Caution

If multiple roles are specified, this method will merge them as a group (e.g. if role=(USER, ASSISTANT), a USER message followed by an ASSISTANT message will be merged together into one with a role of out_role).

Similarly, if a predicate is specified, this method will merge all consecutive messages which match the given predicate.

Parameters:
  • sep – The string to add between each matching message. Mutually exclusive with joiner. If this is set, this is roughly equivalent to joiner=lambda msgs: sep.join(m.text for m in msgs).

  • joiner – A function that will take a list of all messages in a consecutive group and return the final string. Mutually exclusive with sep.

  • out_role – The role of the merged message to use. This is required if multiple roles are specified or role is not set; otherwise it defaults to the common role of the merged messages.

  • role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.

  • predicate – A function that takes a ChatMessage and returns a boolean specifying whether to operate on this message or not.

If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.

function_call_fmt(
func: Callable[[ToolCall], str | None],
*,
prefix: str = '\n',
sep: str = '',
suffix: str = '',
) Self[source]

For each message with one or more requested tool calls, call the provided function on each requested tool call and append it to the message’s content.

Parameters:
  • func – A function taking a ToolCall and returning a string to append to the content of the message containing the requested call, or None to ignore the tool call.

  • prefix – If at least one tool call is formatted, a prefix to insert after the message’s contents and before the formatted string.

  • sep – If two or more tool calls are formatted, the string to insert between them.

  • suffix – If at least one tool call is formatted, a suffix to insert after the formatted string.

remove(
*,
role: ChatRole | Collection[ChatRole] = None,
predicate: Callable[[ChatMessage], bool] = None,
) Self[source]

Remove all messages that match the filters from the output.

Parameters:
  • role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.

  • predicate – A function that takes a ChatMessage and returns a boolean specifying whether to operate on this message or not.

If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.

ensure_start(
*,
role: ChatRole | Collection[ChatRole] = None,
predicate: Callable[[ChatMessage], bool] = None,
) Self[source]

Ensure that the output starts with a message with the given role by removing all messages from the start that do NOT match the given filters, such that the first message in the output matches.

This should NOT be used to ensure that a system prompt is passed; the intent of this step is to prevent an orphaned FUNCTION result or ASSISTANT reply after earlier messages were context-managed out.

Parameters:
  • role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.

  • predicate – A function that takes a ChatMessage and returns a boolean specifying whether to operate on this message or not.

If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.

ensure_bound_function_calls(id_translator: Callable[[str], str] = None) Self[source]

Ensure that each FUNCTION message is preceded by an ASSISTANT message requesting it, and that each FUNCTION message’s tool_call_id matches the request. If a FUNCTION message has no tool_call_id (e.g. a few-shot prompt), bind it to a preceding ASSISTANT message if it is unambiguous.

Will remove hanging FUNCTION messages (i.e. messages where the corresponding request was managed out of the model’s context) from the beginning of the prompt if necessary.

Parameters:

id_translator – A function that takes a function ID (usually a UUID4 string) and returns a translated ID. Used for engines that require the function_call_id to be in particular formats (e.g., Mistral).

Raises:

PromptError – if it is impossible to bind each function call to a request unambiguously.

apply(
func: Callable[[ChatMessage], ApplyResultT] | Callable[[ChatMessage, ApplyContext], ApplyResultT],
*,
role: ChatRole | Collection[ChatRole] = None,
predicate: Callable[[ChatMessage], bool] = None,
) PromptPipeline[list[ApplyResultT]][source]

Apply the given function to all matched messages. Replace the message with the function’s return value.

The function may take 1-2 positional parameters: the first will always be the matched message at the current pipeline step, and the second will be the context this operation is occurring in (a ApplyContext).

Parameters:
  • func – A function that takes 1-2 positional parameters (msg, ctx) that will be called on each matching message. If this function does not return a ChatMessage, it should be the last step in the pipeline. If this function returns None, the input message will be removed from the output.

  • role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.

  • predicate – A function that takes a ChatMessage and returns a boolean specifying whether to operate on this message or not.

If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.

macro_apply(
func: Callable[[list[ChatMessage], list[AIFunction]], list[MacroApplyResultT]],
) PromptPipeline[list[MacroApplyResultT]][source]

Apply the given function to the list of all messages in the pipeline. This step can effectively be used to create an ad-hoc step.

The function must take 2 positional parameters: the first is the list of messages, and the second is the list of available functions.

Parameters:

func – A function that takes 2 positional parameters (messages, functions) that will be called on the list of messages. If this function does not return a list[ChatMessage], it should be the last step in the pipeline.

conversation_fmt(
*,
prefix: str = '',
sep: str = '',
suffix: str = '',
generation_suffix: str = '',
user_prefix: str = '',
user_suffix: str = '',
assistant_prefix: str = '',
assistant_suffix: str = '',
assistant_suffix_if_last: str = None,
system_prefix: str = '',
system_suffix: str = '',
function_prefix: str = None,
function_suffix: str = None,
) PromptPipeline[str][source]

Takes in the list of messages and joins them into a single conversation-formatted string by:

  • wrapping messages with the defined prefixes/suffixes by role

  • joining the messages’ contents with the defined sep

  • adding a generation suffix, if necessary.

This method should be the last step in a pipeline and will cause the pipeline to return a str.

Parameters:
  • prefix – A string to insert once before the rest of the prompt, unconditionally.

  • sep – A string to insert between messages, if any. Similar to sep.join(...).

  • suffix – A string to insert once after the rest of the prompt, unconditionally.

  • generation_suffix – A string to add to the end of the prompt to prompt the model to begin its turn.

  • user_prefix – A prefix to add before each USER message.

  • user_suffix – A suffix to add after each USER message.

  • assistant_prefix – A prefix to add before each ASSISTANT message.

  • assistant_suffix – A suffix to add after each ASSISTANT message.

  • assistant_suffix_if_last – If not None and the prompt ends with an ASSISTANT message, this string will be added to the end of the prompt instead of the assistant_suffix + generation_suffix. This is intended to allow consecutive ASSISTANT messages to continue generation from an unfinished prior message.

  • system_prefix – A prefix to add before each SYSTEM message.

  • system_suffix – A suffix to add after each SYSTEM message.

  • function_prefix – A prefix to add before each FUNCTION message.

  • function_suffix – A suffix to add after each FUNCTION message.

conversation_dict(
*,
system_role: str = 'system',
user_role: str = 'user',
assistant_role: str = 'assistant',
function_role: str = 'tool',
content_transform: Callable[[ChatMessage], Any] = lambda msg: ...,
additional_keys: Callable[[ChatMessage], dict] = lambda msg: ...,
) PromptPipeline[list[dict[str, Any]]][source]

Takes in the list of messages and returns a list of dictionaries with (“role”, “content”) keys.

By default, the “role” key will be “system”, “user”, “assistant”, or “tool” unless the respective role override is specified.

By default, the “content” key will be message.text unless the content_transform argument is specified.

This method should be the last step in a pipeline and will cause the pipeline to return a list[dict].

Caution

By default, this step will truncate tool calling metadata! Use additional_keys to provide tool call requests on ASSISTANT messages and additional metadata like tool call IDs on FUNCTION messages.

Parameters:
  • system_role – The role to give to SYSTEM messages (default “system”).

  • user_role – The role to give to USER messages (default “user”).

  • assistant_role – The role to give to ASSISTANT messages (default “assistant”).

  • function_role – The role to give to FUNCTION messages (default “tool”).

  • content_transform – A function taking in the message and returning the contents of the “content” key (defaults to msg.text).

  • additional_keys – A function taking in the message and returning a dictionary containing any additional keys to add to the message’s dict.

execute(
msgs: list[ChatMessage],
functions: list[AIFunction] = None,
*,
deepcopy=False,
for_measurement=False,
) T[source]

Apply the pipeline to a list of kani messages. The return type will vary based on the steps in the pipeline; if no steps are defined the return type will be a copy of the input messages.

This lower-level method offers more fine-grained control over the steps that are run (e.g. to measure the length of a single message).

Parameters:
  • msgs – The messages to apply the pipeline to.

  • functions – Any functions available to the model.

  • deepcopy – Whether to deep-copy each message before running the pipeline.

  • for_measurement – If the pipeline is being run to measure the length of a single message. In this case, any ensure_start steps will be ignored, and the returned message may not be a valid prompt - the only guarantee is on the length.

explain(
example: list[ChatMessage] = None,
functions: list[AIFunction] = None,
*,
all_cases=False,
**kwargs,
)[source]

Print out a summary of the pipeline and an example conversation transformation based on the steps in the pipeline.

Caution

This method will run the pipeline on an example constructed based on the steps in this pipeline. You may encounter unexpected side effects if your pipeline uses apply() with a function with side effects.

class kani.prompts.PipelineStep[source]

The base class for all pipeline steps.

If needed, you can subclass this and manually add steps to a PromptPipeline, but this is generally not necessary (consider using PromptPipeline.apply() instead).

execute(msgs: list[ChatMessage], functions: list[AIFunction])[source]

Apply this step’s effects on the pipeline.

explain() str[source]

Return a string explaining what this step does.

explain_example_kwargs() dict[str, bool][source]

Return a dict of kwargs to pass to examples.build_conversation to ensure relevant examples are included.

class kani.prompts.ApplyContext(
msg: ChatMessage,
is_last: bool,
idx: int,
messages: list[ChatMessage],
functions: list[AIFunction],
)[source]

Context about where a message lives in the pipeline for an arbitrary Apply operation.

msg: ChatMessage

The message being operated on.

is_last: bool

Whether the message being operated on is the last message (of all types) in the chat prompt.

idx: int

The index of the message in the chat prompt.

messages: list[ChatMessage]

The list of all messages in the chat prompt.

functions: list[AIFunction]

The list of functions available in the chat prompt.

property is_last_of_type: bool

Whether this message is the last one of its role in the chat prompt.

Internals

class kani.FunctionCallResult(is_model_turn: bool, message: ChatMessage)[source]

A model requested a function call, and the kani runtime resolved it.

Parameters:
  • is_model_turn – True if the model should immediately react; False if the user speaks next.

  • message – The message containing the result of the function call, to add to the chat history.

class kani.ExceptionHandleResult(should_retry: bool, message: ChatMessage)[source]

A function call raised an exception, and the kani runtime has prompted the model with exception information.

Parameters:
  • should_retry – Whether the model should be allowed to retry the call that caused this exception.

  • message – The message containing details about the exception and/or instructions to retry, to add to the chat history.

Engines

See Engine Reference.

Utilities

kani.chat_in_terminal(
kani: Kani,
*,
rounds: int = 0,
stopword: str = None,
echo: bool = False,
ai_first: bool = False,
width: int = None,
show_function_args: bool = False,
show_function_returns: bool = False,
verbose: bool = False,
stream: bool = True,
)[source]

Chat with a kani right in your terminal.

Useful for playing with kani, quick prompt engineering, or demoing the library.

If the environment variable KANI_DEBUG is set, debug logging will be enabled.

If kani-multimodal-core is installed, you can send multimodal media to a compatible engine with a file path or URL after an @ symbol (e.g. “Describe this image: @image.png”). Use quotes (e.g. @"path/to/my image.png") for paths with spaces in their names.

Warning

This function is only a development utility and should not be used in production.

Parameters:
  • rounds (int) – The number of chat rounds to play (defaults to 0 for infinite).

  • stopword (str) – Break out of the chat loop if the user sends this message.

  • echo (bool) – Whether to echo the user’s input to stdout after they send a message (e.g. to save in interactive notebook outputs; default false)

  • ai_first (bool) – Whether the user should send the first message (default) or the model should generate a completion before prompting the user for a message.

  • width (int) – The maximum width of the printed outputs (default unlimited).

  • show_function_args (bool) – Whether to print the arguments the model is calling functions with for each call (default false).

  • show_function_returns (bool) – Whether to print the results of each function call (default false).

  • verbose (bool) – Equivalent to setting echo, show_function_args, and show_function_returns to True.

  • stream (bool) – Whether or not to print tokens as soon as they are generated by the model (default true).

async kani.chat_in_terminal_async(
kani: Kani,
*,
rounds: int = 0,
stopword: str = None,
echo: bool = False,
ai_first: bool = False,
width: int = None,
show_function_args: bool = False,
show_function_returns: bool = False,
verbose: bool = False,
stream: bool = True,
)[source]

Async version of chat_in_terminal(). Use in environments when there is already an asyncio loop running (e.g. Google Colab).

kani.format_width(msg: str, width: int = None, prefix: str = '') str[source]

Format the given message such that the width of each line is less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.

async kani.format_stream(stream: StreamManager, width: int = None, prefix: str = '') AsyncIterable[str][source]

Yield formatted tokens from a stream such that if concatenated, the width of each line is less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.

kani.print_width(msg: str, width: int = None, prefix: str = '')[source]

Print the given message such that the width of each line is less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.

async kani.print_stream(stream: StreamManager, width: int = None, prefix: str = '')[source]

Print tokens from a stream to the terminal, with the width of each line less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.

This is a helper function intended to be used with Kani.chat_round_stream() or Kani.full_round_stream().

Message Formatters

A couple convenience formatters to customize Kani.full_round_str().

You can pass any of these functions in with, e.g., Kani.full_round_str(..., message_formatter=all_message_contents).

kani.utils.message_formatters.all_message_contents(msg: ChatMessage)[source]

Return the content of any message.

kani.utils.message_formatters.assistant_message_contents(msg: ChatMessage, show_reasoning=True, color=True)[source]

Return the content of any assistant message; otherwise don’t return anything.

Parameters:
  • show_reasoning – If True, include any ReasoningParts in the output.

  • color – If True, the returned reasoning parts will be surrounded in ANSI codes to make it appear gray.

kani.utils.message_formatters.assistant_message_contents_thinking(msg: ChatMessage, show_args=False, show_reasoning=True, color=True)[source]

Return the content of any assistant message, and “Thinking…” on function calls.

You can use this in full_round_str by using a partial, e.g.: ai.full_round_str(..., message_formatter=functools.partial(assistant_message_contents_thinking, show_args=True))

Parameters:
  • show_args – If True, include the arguments to each function call.

  • show_reasoning – If True, include any ReasoningParts in the output.

  • color – If True, the returned reasoning parts will be surrounded in ANSI codes to make it appear gray.

kani.utils.message_formatters.assistant_message_thinking(msg: ChatMessage, show_args=False)[source]

Return “Thinking…” on assistant messages with function calls, ignoring any content.

This is useful if you are streaming the message’s contents.

If show_args is True, include the arguments to each function call.

Model-Specific Parsers

Model parsers are used when you have an LLM’s text output, which may contain tool calls or other interleaved content in their raw format (e.g., reasoning output). They translate the raw text format into Kani’s tool calling specification and MessageParts.

Tool parsers are WrapperEngines – this means to use them, you should wrap the text-only engine (e.g., a HuggingEngine) like so:

from kani.engines.huggingface import HuggingEngine
from kani.tool_parsers import GPTOSSParser

model = HuggingEngine("openai/gpt-oss-20b")
engine = GPTOSSParser(model)
class kani.model_specific.BaseParser(
*args,
tool_call_start_token: str | None = None,
tool_call_end_token: str | None = None,
reasoning_start_token: str | None = None,
reasoning_end_token: str | None = None,
reasoning_always_at_start=False,
show_reasoning_in_stream=False,
reasoning_in_stream_color=True,
**kwargs,
)[source]

Abstract base class for model-specific tool call/reasoning parsers.

To implement your own tool call/reasoning parser, subclass this class and:

  • implement parse_tool_calls(self, content: str) -> tuple[str, list[ToolCall]]

  • implement parse_reasoning(self, content: str) -> list[MessagePart]

  • pass default values of tool_call_start_token, tool_call_end_token, reasoning_start_token, and reasoning_end_token to super().__init__(...)

This class will handle calling the parser and interrupting streams when tool calls/reasoning are detected.

Parameters:
  • tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.

  • tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.

  • reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.

  • reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.

  • reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the reasoning_end_token is seen while streaming.

  • show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a ReasoningPart in the final ChatMessage. This does not change the final returned ChatMessage, it only affects streamed tokens.

  • reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when show_reasoning_in_stream=True.

parse_tool_calls(content: str) tuple[str, list[ToolCall]][source]

Given the string completion of the model, return the content without tool calls and the parsed tool calls.

parse_reasoning(content: str) str | list[MessagePart | str][source]

Given the string completion of the model (after parsing tool calls), return the content with reasoning transformed to ReasoningParts.

parse_completion(completion: BaseCompletion) BaseCompletion[source]

Single-step parsing, if you prefer handling it all in one place. By default, calls parse_tool_calls() and parse_reasoning().

async predict(messages, functions=None, **hyperparams) BaseCompletion[source]

Given the current context of messages and available functions, get the next predicted chat message from the LM.

Parameters:
  • messages – The messages in the current chat context. prompt_len(messages, functions) is guaranteed to be less than max_context_size.

  • functions – The functions the LM is allowed to call.

  • hyperparams – Any additional parameters to pass to the engine.

async stream(messages, functions=None, **hyperparams)[source]

Optional: Stream a completion from the engine, token-by-token.

This method’s signature is the same as BaseEngine.predict().

This method should yield strings as an asynchronous iterable.

Optionally, this method may also yield a BaseCompletion. If it does, it MUST be the last item yielded by this method.

If an engine does not implement streaming, this method will yield the entire text of the completion in a single chunk by default.

Parameters:
  • messages – The messages in the current chat context. prompt_len(messages, functions) is guaranteed to be less than max_context_size.

  • functions – The functions the LM is allowed to call.

  • hyperparams – Any additional parameters to pass to the engine.

class kani.model_specific.gpt_oss.GPTOSSParser(*args, **kwargs)[source]

Automatically handles the parsing of GPT-OSS reasoning segments and tool calls.

Reasoning segments are returned as ReasoningParts.

Parameters:
  • tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.

  • tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.

  • reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.

  • reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.

  • reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the reasoning_end_token is seen while streaming.

  • show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a ReasoningPart in the final ChatMessage. This does not change the final returned ChatMessage, it only affects streamed tokens.

  • reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when show_reasoning_in_stream=True.

class kani.model_specific.json.NaiveJSONToolCallParser(*args, **kwargs)[source]

If the model’s output contains only valid JSON of form:

{
    "name": "function_name",
    "parameters": {
        "key": "value..."
    }
}

then assume it is a function call. Otherwise, return the content unchanged.

Parameters:
  • tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.

  • tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.

  • reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.

  • reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.

  • reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the reasoning_end_token is seen while streaming.

  • show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a ReasoningPart in the final ChatMessage. This does not change the final returned ChatMessage, it only affects streamed tokens.

  • reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when show_reasoning_in_stream=True.

class kani.model_specific.mistral.MistralToolCallParser(*args, tool_call_start_token: str = '[TOOL_CALLS]', tool_call_end_token: str = '</s>', **kwargs)[source]

Tool calling adapter for Mistral models using the v3 or v7 tokenizer:

--- v3 ---
mistral-tiny-2407
open-mixtral-8x22b-2404
mistral-small-2409
mistral-large-2407
codestral-2405
codestral-mamba-2407
--- v7 ---
mistral-large-2411
Parameters:
  • tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.

  • tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.

  • reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.

  • reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.

  • reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the reasoning_end_token is seen while streaming.

  • show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a ReasoningPart in the final ChatMessage. This does not change the final returned ChatMessage, it only affects streamed tokens.

  • reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when show_reasoning_in_stream=True.

class kani.model_specific.deepseek.DeepSeekR1Parser(
*args,
tool_call_start_token: str = '<|tool▁calls▁begin|>',
tool_call_end_token: str = '<|tool▁outputs▁end|>',
reasoning_start_token: str | None = '<think>',
reasoning_end_token: str | None = '</think>',
reasoning_always_at_start=True,
**kwargs,
)[source]

Tool calling adapter for DeepSeek models using the R1 tool call format:

deepseek-ai/DeepSeek-R1
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
deepseek-ai/DeepSeek-R1-Distill-Llama-70B

Reasoning segments are returned as ReasoningParts.

Parameters:
  • tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.

  • tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.

  • reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.

  • reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.

  • reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the reasoning_end_token is seen while streaming.

  • show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a ReasoningPart in the final ChatMessage. This does not change the final returned ChatMessage, it only affects streamed tokens.

  • reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when show_reasoning_in_stream=True.

class kani.model_specific.qwen3.Qwen3Parser(
*args,
tool_call_start_token='<tool_call>',
tool_call_end_token='</tool_call>',
reasoning_start_token='<think>',
reasoning_end_token='</think>',
reasoning_always_at_start=False,
**kwargs,
)[source]

Tool calling + reasoning adapter for Qwen3 models:

Qwen/Qwen3-*

Reasoning segments are returned as ReasoningParts.

Parameters:
  • tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.

  • tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.

  • reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.

  • reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.

  • reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the reasoning_end_token is seen while streaming.

  • show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a ReasoningPart in the final ChatMessage. This does not change the final returned ChatMessage, it only affects streamed tokens.

  • reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when show_reasoning_in_stream=True.

Saving/Loading

kani.utils.saveload.get_ctx(info) KaniZipSaveContext | None[source]

Get the KaniZipSaveContext from a SerializationInfo/ValidationInfo object.

class kani.utils.saveload.KaniZipSaveContext(zf: zipfile.ZipFile)[source]
save_bytes(data: bytes, suffix: str = '') str[source]

Save the given bytes to the zip file and return its path. Filename is automatically determined by SHA256 hash. If suffix is given, the filename will end with the given suffix.

load_bytes(fp: str) bytes[source]

Read the bytes from the given path in the archive.