API Reference¶
Kani¶
- class kani.Kani(
- engine: BaseEngine,
- system_prompt: str = None,
- always_included_messages: list[ChatMessage] = None,
- desired_response_tokens: int = None,
- chat_history: list[ChatMessage] = None,
- functions: list[AIFunction] = None,
- retry_attempts: int = 1,
Base class for all kani.
Entrypoints
chat_round(query: str, **kwargs) -> ChatMessagechat_round_str(query: str, **kwargs) -> strchat_round_stream(query: str, **kwargs) -> StreamManagerfull_round(query: str, **kwargs) -> AsyncIterable[ChatMessage]full_round_str(query: str, message_formatter: Callable[[ChatMessage], str], **kwargs) -> AsyncIterable[str]full_round_stream(query: str, **kwargs) -> AsyncIterable[StreamManager]Function Calling
Subclass and use
@ai_function()to register functions. The schema will be autogenerated from the function signature (seeai_function()).To perform a chat round with functions, use
full_round()as an async iterator:async for msg in kani.full_round(prompt): # responses...
Each response will be a
ChatMessage.Alternatively, you can use
full_round_str()and control the format of a yielded function call withfunction_call_formatter.Retry & Model Feedback
If the model makes an error when attempting to call a function (e.g. calling a function that does not exist or passing params with incorrect and non-coercible types) or the function raises an exception, Kani will send the error in a system message to the model, allowing it up to retry_attempts to correct itself and retry the call.
- Parameters:
engine – The LM engine implementation to use.
system_prompt – The system prompt to provide to the LM. The prompt will not be included in
chat_history.always_included_messages – A list of messages to always include as a prefix in all chat rounds (i.e., evict newer messages rather than these to manage context length). These will not be included in
chat_history.desired_response_tokens – The minimum amount of space to leave in
max context size - tokens in prompt. To control the maximum number of tokens generated more precisely, you may be able to configure the engine (e.g.OpenAIEngine(..., max_tokens=250)). Defaults to 10% of the engine’s context length or 8192 tokens, whichever is smaller.chat_history –
The chat history to start with (not including system prompt or always included messages), for advanced use cases. By default, each kani starts with a new conversation session.
Caution
If you pass another kani’s chat history here without copying it, the same list will be mutated! Use
chat_history=mykani.chat_history.copy()to pass a copy.functions – A list of
AIFunctionto expose to the model (for dynamic function calling). Useai_function()to define static functions (see Function Calling).retry_attempts – How many attempts the LM may take per full round if any tool call raises an exception.
- always_included_messages: list[ChatMessage]¶
Chat messages that are always included as a prefix in the model’s prompt. Includes the system message, if supplied.
- chat_history: list[ChatMessage]¶
All messages in the current chat state, not including system or always included messages.
- async chat_round(
- query: str | Sequence[MessagePart | str] | None,
- **kwargs,
Perform a single chat round (user -> model -> user, no functions allowed).
- Parameters:
query – The contents of the user’s chat message. Can be None to generate a completion without a user prompt.
kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).
- Returns:
The model’s reply.
- async chat_round_str(query: str | Sequence[MessagePart | str] | None, **kwargs) str[source]¶
Like
chat_round(), but only returns the text content of the message.
- chat_round_stream(
- query: str | Sequence[MessagePart | str] | None,
- **kwargs,
Returns a stream of tokens from the engine as they are generated.
To consume tokens from a stream, use this class as so:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?") async for token in stream: print(token, end="") msg = await stream.message()
Tip
For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:
msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
(note the
awaitthat is not present in the above examples).The arguments are the same as
chat_round().
- async full_round( ) AsyncIterable[ChatMessage][source]¶
Perform a full chat round (user -> model [-> function -> model -> …] -> user).
Yields each non-user ChatMessage created during the round. A ChatMessage will have at least one of
(content, function_call).Use this in an async for loop, like so:
async for msg in kani.full_round("How's the weather?"): print(msg.text)
- Parameters:
query – The content of the user’s chat message. Can be None to generate a completion without a user prompt.
max_function_rounds – The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. Default unlimited (continues until model’s response does not contain a function call).
kwargs – Additional arguments to pass to the model engine (e.g. decoding arguments).
- async full_round_str(query: str | ~typing.Sequence[~kani.models.MessagePart | str] | None, message_formatter: ~typing.Callable[[~kani.models.ChatMessage], str | None] = <function assistant_message_contents>, *, max_function_rounds: int = None, **kwargs) AsyncIterable[str][source]¶
Like
full_round(), but each yielded element is a str rather than a ChatMessage.- Parameters:
query – The content of the user’s chat message.
message_formatter – A function that returns a string to yield for each message. By default,
full_round_stryields the content of each assistant message.max_function_rounds – The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined. Default unlimited (continues until model’s response does not contain a function call).
kwargs – Additional arguments to pass to the model engine (e.g. hyperparameters).
- async full_round_stream( ) AsyncIterable[StreamManager][source]¶
Perform a full chat round (user -> model [-> function -> model -> …] -> user).
Yields a stream of tokens for each non-user ChatMessage created during the round.
To consume tokens from a stream, use this class as so:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"): async for token in stream: print(token, end="") msg = await stream.message()
Each
StreamManagerobject yielded by this method contains aStreamManager.roleattribute that can be used to determine if a message is from the engine or a function call. This attribute will be available before iterating over the stream.The arguments are the same as
full_round().
- async prompt_token_len(
- messages: list[ChatMessage],
- functions: list[AIFunction] | None = None,
- **kwargs,
Returns the number of tokens used by the given prompt (i.e., list of messages and functions).
In general, this is preferred over
message_token_len().
- async get_model_completion(include_functions: bool = True, **kwargs) BaseCompletion[source]¶
Get the model’s completion with the current chat state.
Compared to
chat_round()andfull_round(), this lower-level method does not save the model’s reply to the chat history or mutate the chat state; it is intended to help with logging or to repeat a call multiple times.- Parameters:
include_functions – Whether to pass this kani’s function definitions to the engine.
kwargs – Arguments to pass to the model engine.
- async get_model_stream(
- include_functions: bool = True,
- **kwargs,
Get the model’s completion with the current chat state as a stream. This is a low-level method like
get_model_completion()but for streams.
- async get_prompt(include_functions=True, **kwargs) list[ChatMessage][source]¶
Called each time before asking the LM engine for a completion to generate the chat prompt. Returns a list of messages such that the total token count in the messages is less than
(self.max_context_size - self.desired_response_tokens).Always includes the system prompt plus any always_included_messages at the start of the prompt.
You may override this to get more fine-grained control over what is exposed in the model’s memory at any given call.
- Parameters:
include_functions – Whether to account for the tokens that will be used for function definitions in the context length.
kwargs – Additional arguments that were passed to the model engine from
chat_round()orfull_round()(e.g. decoding arguments).
- get_enabled_functions() list[AIFunction][source]¶
Get the list of current enabled AIFunctions. By default this returns all AIFunctions in self.functions where AIFunction.enabled is truthy.
- async do_function_call(call: FunctionCall, tool_call_id: str = None) FunctionCallResult[source]¶
Resolve a single function call.
By default, any exception raised from this method will be an instance of a
FunctionCallException.You may implement an override to add instrumentation around function calls (e.g. tracking success counts for varying prompts). See Handle a Function Call.
- Parameters:
call – The name of the function to call and arguments to call it with.
tool_call_id – The
tool_call_idto set in the returned FUNCTION message.
- Returns:
A
FunctionCallResultincluding whose turn it is next and the message with the result of the function call.- Raises:
NoSuchFunction – The requested function does not exist.
WrappedCallException – The function raised an exception.
- async handle_function_call_exception(
- call: FunctionCall,
- err: FunctionCallException,
- attempt: int,
- tool_call_id: str = None,
Called when a function call raises an exception.
By default, returns a message telling the LM about the error and allows a retry if the error is recoverable and there are remaining retry attempts.
You may implement an override to customize the error prompt, log the error, or use custom retry logic. See Handle a Function Call Exception.
- Parameters:
call – The
FunctionCallthe model was attempting to make.err – The error the call raised. Usually this is
NoSuchFunctionorWrappedCallException, although it may be any exception raised bydo_function_call().attempt – The attempt number for the current call (0-indexed).
tool_call_id – The
tool_call_idto set in the returned FUNCTION message.
- Returns:
A
ExceptionHandleResultdetailing whether the model should retry and the message to add to the chat history.
- async add_completion_to_history(completion: BaseCompletion)[source]¶
Add the message in the given completion to the chat history and return it.
You might want to override this to log token counts. By default, this calls
add_to_history().This method differs from
add_to_history()in that it is only called on model completions (stream and non-stream) rather than on each message, and takes aBaseCompletionas input.
- async add_to_history(message: ChatMessage)[source]¶
Add the given message to the chat history.
You might want to override this to log messages to an external or control how messages are saved to the chat session’s memory. By default, this appends to
chat_history.
- save(fp: str | bytes | PathLike, *, save_format: Literal['json', 'kani'] | None = None, **kwargs)[source]¶
Save the chat state of this kani to a
.kanifile or JSON. This will overwrite the file if it exists!- Parameters:
fp – The path to the file to save.
save_format – Whether to save the chat state as a
.kanifile or JSON. If not set, determines format by file path extension (defaulting to.kaniif uncertain).kwargs – Additional arguments to pass to Pydantic’s
model_dump_json.
- load(fp: str | bytes | PathLike, **kwargs)[source]¶
Load a chat state from a
.kanifile or JSON file into this instance. This will overwrite any existing chat state!- Parameters:
fp – The path to the file containing the chat state.
kwargs – Additional arguments to pass to Pydantic’s
model_validate_json.
- property always_len: int¶
Returns the number of tokens that will always be reserved.
(e.g. for system prompts, always included messages, the engine, and the response).
- message_token_len(message: ChatMessage)[source]¶
Returns the estimated number of tokens used by a single given message.
Deprecated since version 1.7.0: Use
prompt_token_len()instead.Note
The token count returned by this may not exactly reflect the actual token count (e.g., due to prompt formatting or not having access to the tokenizer). It should, however, be a safe overestimate to use as an upper bound.
Warning
This method may not be available for all models (e.g., models which do not expose a local tokenization method and require API calls to count tokens, or models enforcing strict constraints on prompt formats). Use
prompt_token_len()instead.
Common Models¶
- class kani.ChatRole(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Represents who said a chat message.
- SYSTEM = 'system'¶
The message is from the system (usually a steering prompt).
- USER = 'user'¶
The message is from the user.
- ASSISTANT = 'assistant'¶
The message is from the language model.
- FUNCTION = 'function'¶
The message is the result of a function call.
- class kani.FunctionCall(*, name: str, arguments: str)[source]¶
Represents a model’s request to call a single function.
- class kani.ToolCall(*, id: str, type: str, function: FunctionCall)[source]¶
Represents a model’s request to call a tool with a unique request ID.
See Internal Representation for more information about tool calls vs function calls.
- id: str¶
The request ID created by the engine. This should be passed back to the engine in
ChatMessage.tool_call_idin order to associate a FUNCTION message with this request.
- function: FunctionCall¶
The requested function call.
- classmethod from_function(_ToolCall__name: str, /, *, call_id_: str = None, **kwargs)[source]¶
Create a tool call request for a function with the given name and arguments.
- Parameters:
call_id – The ID to assign to the request. If not passed, generates a random ID.
- classmethod from_function_call(call: FunctionCall, call_id_: str = None)[source]¶
Create a tool call request from an existing FunctionCall.
- Parameters:
call_id – The ID to assign to the request. If not passed, generates a random ID.
- class kani.MessagePart(*, extra: dict = {})[source]¶
Base class for a part of a message.
Engines should inherit from this class to tag substrings with metadata or provide multimodality to an engine. By default, if coerced to a string, will raise a warning noting that rich message part data was lost. For more information see Message Parts.
- __str__()[source]¶
Used to define the fallback behaviour when a part is serialized to a string (e.g. via
ChatMessage.text). Override this to specify the canonical string representation of your message part.Engines that support message parts should generally not use this, preferring to iterate over
ChatMessage.partsinstead.
- extra: dict¶
Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.
This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).
- class kani.ChatMessage(
- *,
- role: ChatRole,
- content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None,
- name: str | None = None,
- tool_call_id: str | None = None,
- tool_calls: list[ToolCall] | None = None,
- is_tool_call_error: bool | None = None,
- extra: dict = {},
Represents a message in the chat context.
- content: str | list[Annotated[MessagePart, SerializeAsAny()] | str] | None¶
The data used to create this message. Generally, you should use
textorpartsinstead.
- property text: str | None¶
The content of the message, as a string. Can be None only if the message is a requested function call from the assistant. If the message is comprised of multiple parts, concatenates the parts.
- property parts: list[MessagePart | str]¶
The parts of the message that make up its content. Can be an empty tuple only if the message is a requested function call from the assistant.
This is a read-only list; changes here will not affect the message’s content. To mutate the message content, use
copy_with()and settext,parts, orcontent.
- tool_call_id: str | None¶
The ID for a requested
ToolCallwhich this message is a response to (function messages only).
- is_tool_call_error: bool | None¶
If this is a FUNCTION message containing the results of a function call, whether the function call raised an exception.
- property function_call: FunctionCall | None¶
If there is exactly one tool call to a function, return that tool call’s requested function.
This is mostly provided for backwards-compatibility purposes; iterating over
tool_callsshould be preferred.
- extra: dict¶
Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.
This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).
- classmethod system(content: str | Sequence[MessagePart | str], **kwargs)[source]¶
Create a new system message.
- classmethod user(content: str | Sequence[MessagePart | str], **kwargs)[source]¶
Create a new user message.
- classmethod assistant(content: str | Sequence[MessagePart | str] | None, **kwargs)[source]¶
Create a new assistant message.
- copy_with(**new_values)[source]¶
Make a shallow copy of this object, updating the passed attributes (if any) to new values.
This does not validate the updated attributes! This is mostly just a convenience wrapper around
.model_copy.Only one of (content, text, parts) may be passed and will update the other two attributes accordingly.
Only one of (tool_calls, function_call) may be passed and will update the other accordingly.
AI Function¶
- kani.ai_function(
- func=None,
- *,
- after: ChatRole = ChatRole.ASSISTANT,
- name: str | None = None,
- desc: str | None = None,
- auto_retry: bool = True,
- json_schema: dict | None = None,
- auto_truncate: int | None = None,
- enabled: bool = True,
Decorator to mark a method of a Kani to expose to the AI.
- Parameters:
after – Who should speak next after the function call completes (see Next Actor). Defaults to the model.
name – The name of the function (defaults to the name of the function in source code).
desc – The function’s description (defaults to the function’s docstring).
auto_retry – Whether the model should retry calling the function if it gets it wrong (see Retry & Model Feedback).
json_schema – A JSON Schema document describing the function’s parameters. By default, kani will automatically generate one, but this can be helpful for overriding it in any tricky cases.
auto_truncate –
If a function response is longer than this many characters, truncate it until it is at most this many characters and add “…” to the end. By default, no responses will be truncated. This uses a paragraph-aware truncation algorithm.
Changed in version 1.7.0: This parameter now truncates to a certain number of characters, rather than tokens, since it is not possible to reliably determine the token count of a message out of prompt context for all engines.
enabled – Whether the function should be included in the prompt passed to the model. Disabled functions will still be executed if the model generates a call to them despite not being passed to the model.
- class kani.AIFunction(
- inner,
- after: ChatRole = ChatRole.ASSISTANT,
- name: str | None = None,
- desc: str | None = None,
- auto_retry: bool = True,
- json_schema: dict | None = None,
- auto_truncate: int | None = None,
- enabled: bool = True,
Wrapper around a function to expose to a language model.
- Parameters:
inner – The function implementation.
after – Who should speak next after the function call completes (see Next Actor). Defaults to the model.
name – The name of the function (defaults to the name of the function in source code).
desc – The function’s description (defaults to the function’s docstring).
auto_retry – Whether the model should retry calling the function if it gets it wrong (see Retry & Model Feedback).
json_schema – A JSON Schema document describing the function’s parameters. By default, kani will automatically generate one, but this can be helpful for overriding it in any tricky cases.
auto_truncate –
If a function response is longer than this many characters, truncate it until it is at most this many characters and add “…” to the end. By default, no responses will be truncated. This uses a paragraph-aware truncation algorithm.
Changed in version 1.7.0: This parameter now truncates to a certain number of characters, rather than tokens, since it is not possible to reliably determine the token count of a message out of prompt context for all engines.
enabled – Whether the function should be included in the prompt passed to the model. Disabled functions will still be executed if the model generates a call to them despite not being passed to the model.
- class kani.AIParam(desc: str, *, title: str = None)[source]¶
Special tag to annotate types with in order to provide parameter-level metadata to kani.
- Parameters:
desc – The description of the parameter.
title – If set, set the title of this parameter in generated JSON schema to this; otherwise omit the title (as it is already the key of the parameter in the schema).
Common MessageParts¶
- class kani.parts.ReasoningPart(*, extra: dict = {}, content: str)[source]¶
A long CoT that should not be shown to the user (e.g. GPT-OSS, Anthropic extended thinking, Deepseek R1).
When using a low-level text engine (e.g.,
HuggingEngine), these parts will not be automatically extracted. Use a parser instead (e.g.,GPTOSSParserfor GPT-OSS).
Exceptions¶
- exception kani.exceptions.PromptTooLong[source]¶
A given prompt was too long to tokenize or generate a completion for.
- exception kani.exceptions.MessageTooLong[source]¶
This chat message will never fit in the context window.
- exception kani.exceptions.FunctionCallException(retry: bool)[source]¶
Base class for exceptions that occur when a model calls an @ai_function.
- exception kani.exceptions.WrappedCallException(retry, original)[source]¶
The @ai_function raised an exception.
- exception kani.exceptions.NoSuchFunction(name)[source]¶
The model attempted to call a function that does not exist.
Streaming¶
- class kani.streaming.StreamManager(
- stream_iter: AsyncIterable[str | BaseCompletion],
- role: ChatRole,
- *,
- after=None,
- lock: Lock = None,
This class is responsible for managing a stream returned by an engine. It should not be constructed manually.
To consume tokens from a stream, use this class as so:
# CHAT ROUND: stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?") async for token in stream: print(token, end="") msg = await stream.message() # FULL ROUND: async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?") async for token in stream: print(token, end="") msg = await stream.message()
After a stream finishes, its contents will be available as a
ChatMessage. You can retrieve the final message orBaseCompletionwith:msg = await stream.message() completion = await stream.completion()
The final
ChatMessagemay contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.Tip
For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:
msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
(note the
awaitthat is not present in the above examples).- Parameters:
stream_iter – The async iterable that generates elements of the stream.
role – The role of the message that will be returned eventually.
after – A coro to call with the generated completion as its argument after the stream is fully consumed.
lock – A lock to hold for the duration of the stream run.
- __aiter__() AsyncIterable[str][source]¶
Iterate over tokens yielded from the engine.
- role¶
The role of the message that this stream will return.
- async completion() BaseCompletion[source]¶
Get the final
BaseCompletiongenerated by the model.
- async message() ChatMessage[source]¶
Get the final
ChatMessagegenerated by the model.
Prompting¶
This submodule contains utilities to transform a list of Kani ChatMessage into low-level formats to be
consumed by an engine (e.g. str, list[dict], or torch.Tensor).
- class kani.PromptPipeline(steps: list[PipelineStep] = None)[source]¶
This class creates a reproducible pipeline for translating a list of
ChatMessageinto an engine-specific format using fluent-style chaining.To build a pipeline, create an instance of
PromptPipeline()and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.Usage
To use the pipeline, call the created pipeline object with a list of kani chat messages.
To inspect the inputs/outputs of your pipeline, you can use
explain()to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).Example
Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:
from kani import PromptPipeline, ChatRole pipe = ( PromptPipeline() # System messages should be wrapped with this tag. We'll translate them to USER # messages since a system and user message go together in a single [INST] pair. .wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n") .translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER) # If we see two consecutive USER messages, merge them together into one with a # newline in between. .merge_consecutive(role=ChatRole.USER, sep="\n") # Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of # generations). .merge_consecutive(role=ChatRole.ASSISTANT, sep=" ") # Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our # message list ends with an ASSISTANT message, don't add the EOS token # (we want the model to continue the generation). .conversation_fmt( user_prefix="<s>[INST] ", user_suffix=" [/INST]", assistant_prefix=" ", assistant_suffix=" </s>", assistant_suffix_if_last="", ) ) # We can see what this pipeline does by calling explain()... pipe.explain() # And use it in our engine to build a string prompt for the LLM. prompt = pipe(ai.get_prompt())
- __call__(
- msgs: list[ChatMessage],
- functions: list[AIFunction] = None,
- **kwargs,
Apply the pipeline to a list of kani messages. The return type will vary based on the steps in the pipeline; if no steps are defined the return type will be a copy of the input messages.
- translate_role(
- *,
- to: ChatRole,
- warn: str = None,
- role: ChatRole | Collection[ChatRole] = None,
- predicate: Callable[[ChatMessage], bool] = None,
Change the role of the matching messages. (e.g. for models which do not support native function calling, make all FUNCTION messages a USER message)
- Parameters:
to – The new role to translate the matching messages to.
warn – A warning to emit if any messages are translated (e.g. if a model does not support certain roles).
role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.
predicate – A function that takes a
ChatMessageand returns a boolean specifying whether to operate on this message or not.
If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.
- wrap(
- *,
- prefix: str = None,
- suffix: str = None,
- role: ChatRole | Collection[ChatRole] = None,
- predicate: Callable[[ChatMessage], bool] = None,
Wrap the matching messages with a given string prefix and/or suffix.
For more fine-grained control over user/assistant message pairs as the last step in a pipeline, use
conversation_fmt()instead.- Parameters:
prefix – The prefix to add before each matching message, if any.
suffix – The suffix to add after each matching message, if any.
role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.
predicate – A function that takes a
ChatMessageand returns a boolean specifying whether to operate on this message or not.
If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.
- merge_consecutive(
- *,
- sep: str = None,
- joiner: Callable[[list[ChatMessage]], str | list[MessagePart | str] | None] = None,
- out_role: ChatRole = None,
- role: ChatRole | Collection[ChatRole] = None,
- predicate: Callable[[ChatMessage], bool] = None,
If multiple messages that match are found consecutively, merge them by either joining their contents with a string or call a joiner function.
Caution
If multiple roles are specified, this method will merge them as a group (e.g. if
role=(USER, ASSISTANT), a USER message followed by an ASSISTANT message will be merged together into one with a role ofout_role).Similarly, if a predicate is specified, this method will merge all consecutive messages which match the given predicate.
- Parameters:
sep – The string to add between each matching message. Mutually exclusive with
joiner. If this is set, this is roughly equivalent tojoiner=lambda msgs: sep.join(m.text for m in msgs).joiner – A function that will take a list of all messages in a consecutive group and return the final string. Mutually exclusive with
sep.out_role – The role of the merged message to use. This is required if multiple
roles are specified orroleis not set; otherwise it defaults to the common role of the merged messages.role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.
predicate – A function that takes a
ChatMessageand returns a boolean specifying whether to operate on this message or not.
If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.
- function_call_fmt( ) Self[source]¶
For each message with one or more requested tool calls, call the provided function on each requested tool call and append it to the message’s content.
- Parameters:
func – A function taking a
ToolCalland returning a string to append to the content of the message containing the requested call, or None to ignore the tool call.prefix – If at least one tool call is formatted, a prefix to insert after the message’s contents and before the formatted string.
sep – If two or more tool calls are formatted, the string to insert between them.
suffix – If at least one tool call is formatted, a suffix to insert after the formatted string.
- remove(
- *,
- role: ChatRole | Collection[ChatRole] = None,
- predicate: Callable[[ChatMessage], bool] = None,
Remove all messages that match the filters from the output.
- Parameters:
role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.
predicate – A function that takes a
ChatMessageand returns a boolean specifying whether to operate on this message or not.
If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.
- ensure_start(
- *,
- role: ChatRole | Collection[ChatRole] = None,
- predicate: Callable[[ChatMessage], bool] = None,
Ensure that the output starts with a message with the given role by removing all messages from the start that do NOT match the given filters, such that the first message in the output matches.
This should NOT be used to ensure that a system prompt is passed; the intent of this step is to prevent an orphaned FUNCTION result or ASSISTANT reply after earlier messages were context-managed out.
- Parameters:
role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.
predicate – A function that takes a
ChatMessageand returns a boolean specifying whether to operate on this message or not.
If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.
- ensure_bound_function_calls(id_translator: Callable[[str], str] = None) Self[source]¶
Ensure that each FUNCTION message is preceded by an ASSISTANT message requesting it, and that each FUNCTION message’s
tool_call_idmatches the request. If a FUNCTION message has notool_call_id(e.g. a few-shot prompt), bind it to a preceding ASSISTANT message if it is unambiguous.Will remove hanging FUNCTION messages (i.e. messages where the corresponding request was managed out of the model’s context) from the beginning of the prompt if necessary.
- Parameters:
id_translator – A function that takes a function ID (usually a UUID4 string) and returns a translated ID. Used for engines that require the function_call_id to be in particular formats (e.g., Mistral).
- Raises:
PromptError – if it is impossible to bind each function call to a request unambiguously.
- apply(
- func: Callable[[ChatMessage], ApplyResultT] | Callable[[ChatMessage, ApplyContext], ApplyResultT],
- *,
- role: ChatRole | Collection[ChatRole] = None,
- predicate: Callable[[ChatMessage], bool] = None,
Apply the given function to all matched messages. Replace the message with the function’s return value.
The function may take 1-2 positional parameters: the first will always be the matched message at the current pipeline step, and the second will be the context this operation is occurring in (a
ApplyContext).- Parameters:
func – A function that takes 1-2 positional parameters
(msg, ctx)that will be called on each matching message. If this function does not return aChatMessage, it should be the last step in the pipeline. If this function returnsNone, the input message will be removed from the output.role – The role (if a single role is given) or roles (if a list is given) to apply this operation to. If not set, ignores the role of the message.
predicate – A function that takes a
ChatMessageand returns a boolean specifying whether to operate on this message or not.
If multiple filter params are supplied, this method will only operate on messages that match ALL of the filters.
- macro_apply(
- func: Callable[[list[ChatMessage], list[AIFunction]], list[MacroApplyResultT]],
Apply the given function to the list of all messages in the pipeline. This step can effectively be used to create an ad-hoc step.
The function must take 2 positional parameters: the first is the list of messages, and the second is the list of available functions.
- Parameters:
func – A function that takes 2 positional parameters
(messages, functions)that will be called on the list of messages. If this function does not return alist[ChatMessage], it should be the last step in the pipeline.
- conversation_fmt(
- *,
- prefix: str = '',
- sep: str = '',
- suffix: str = '',
- generation_suffix: str = '',
- user_prefix: str = '',
- user_suffix: str = '',
- assistant_prefix: str = '',
- assistant_suffix: str = '',
- assistant_suffix_if_last: str = None,
- system_prefix: str = '',
- system_suffix: str = '',
- function_prefix: str = None,
- function_suffix: str = None,
Takes in the list of messages and joins them into a single conversation-formatted string by:
wrapping messages with the defined prefixes/suffixes by role
joining the messages’ contents with the defined sep
adding a generation suffix, if necessary.
This method should be the last step in a pipeline and will cause the pipeline to return a
str.- Parameters:
prefix – A string to insert once before the rest of the prompt, unconditionally.
sep – A string to insert between messages, if any. Similar to
sep.join(...).suffix – A string to insert once after the rest of the prompt, unconditionally.
generation_suffix – A string to add to the end of the prompt to prompt the model to begin its turn.
user_prefix – A prefix to add before each USER message.
user_suffix – A suffix to add after each USER message.
assistant_prefix – A prefix to add before each ASSISTANT message.
assistant_suffix – A suffix to add after each ASSISTANT message.
assistant_suffix_if_last – If not None and the prompt ends with an ASSISTANT message, this string will be added to the end of the prompt instead of the
assistant_suffix + generation_suffix. This is intended to allow consecutive ASSISTANT messages to continue generation from an unfinished prior message.system_prefix – A prefix to add before each SYSTEM message.
system_suffix – A suffix to add after each SYSTEM message.
function_prefix – A prefix to add before each FUNCTION message.
function_suffix – A suffix to add after each FUNCTION message.
- conversation_dict(
- *,
- system_role: str = 'system',
- user_role: str = 'user',
- assistant_role: str = 'assistant',
- function_role: str = 'tool',
- content_transform: Callable[[ChatMessage], Any] = lambda msg: ...,
- additional_keys: Callable[[ChatMessage], dict] = lambda msg: ...,
Takes in the list of messages and returns a list of dictionaries with (“role”, “content”) keys.
By default, the “role” key will be “system”, “user”, “assistant”, or “tool” unless the respective role override is specified.
By default, the “content” key will be
message.textunless thecontent_transformargument is specified.This method should be the last step in a pipeline and will cause the pipeline to return a
list[dict].Caution
By default, this step will truncate tool calling metadata! Use
additional_keysto provide tool call requests on ASSISTANT messages and additional metadata like tool call IDs on FUNCTION messages.- Parameters:
system_role – The role to give to SYSTEM messages (default “system”).
user_role – The role to give to USER messages (default “user”).
assistant_role – The role to give to ASSISTANT messages (default “assistant”).
function_role – The role to give to FUNCTION messages (default “tool”).
content_transform – A function taking in the message and returning the contents of the “content” key (defaults to
msg.text).additional_keys – A function taking in the message and returning a dictionary containing any additional keys to add to the message’s dict.
- execute(
- msgs: list[ChatMessage],
- functions: list[AIFunction] = None,
- *,
- deepcopy=False,
- for_measurement=False,
Apply the pipeline to a list of kani messages. The return type will vary based on the steps in the pipeline; if no steps are defined the return type will be a copy of the input messages.
This lower-level method offers more fine-grained control over the steps that are run (e.g. to measure the length of a single message).
- Parameters:
msgs – The messages to apply the pipeline to.
functions – Any functions available to the model.
deepcopy – Whether to deep-copy each message before running the pipeline.
for_measurement – If the pipeline is being run to measure the length of a single message. In this case, any
ensure_startsteps will be ignored, and the returned message may not be a valid prompt - the only guarantee is on the length.
- explain(
- example: list[ChatMessage] = None,
- functions: list[AIFunction] = None,
- *,
- all_cases=False,
- **kwargs,
Print out a summary of the pipeline and an example conversation transformation based on the steps in the pipeline.
Caution
This method will run the pipeline on an example constructed based on the steps in this pipeline. You may encounter unexpected side effects if your pipeline uses
apply()with a function with side effects.
- class kani.prompts.PipelineStep[source]¶
The base class for all pipeline steps.
If needed, you can subclass this and manually add steps to a
PromptPipeline, but this is generally not necessary (consider usingPromptPipeline.apply()instead).- execute(msgs: list[ChatMessage], functions: list[AIFunction])[source]¶
Apply this step’s effects on the pipeline.
- class kani.prompts.ApplyContext(
- msg: ChatMessage,
- is_last: bool,
- idx: int,
- messages: list[ChatMessage],
- functions: list[AIFunction],
Context about where a message lives in the pipeline for an arbitrary Apply operation.
- msg: ChatMessage¶
The message being operated on.
- is_last: bool¶
Whether the message being operated on is the last message (of all types) in the chat prompt.
- messages: list[ChatMessage]¶
The list of all messages in the chat prompt.
- functions: list[AIFunction]¶
The list of functions available in the chat prompt.
Internals¶
- class kani.FunctionCallResult(is_model_turn: bool, message: ChatMessage)[source]¶
A model requested a function call, and the kani runtime resolved it.
- Parameters:
is_model_turn – True if the model should immediately react; False if the user speaks next.
message – The message containing the result of the function call, to add to the chat history.
- class kani.ExceptionHandleResult(should_retry: bool, message: ChatMessage)[source]¶
A function call raised an exception, and the kani runtime has prompted the model with exception information.
- Parameters:
should_retry – Whether the model should be allowed to retry the call that caused this exception.
message – The message containing details about the exception and/or instructions to retry, to add to the chat history.
Engines¶
See Engine Reference.
Utilities¶
- kani.chat_in_terminal(
- kani: Kani,
- *,
- rounds: int = 0,
- stopword: str = None,
- echo: bool = False,
- ai_first: bool = False,
- width: int = None,
- show_function_args: bool = False,
- show_function_returns: bool = False,
- verbose: bool = False,
- stream: bool = True,
Chat with a kani right in your terminal.
Useful for playing with kani, quick prompt engineering, or demoing the library.
If the environment variable
KANI_DEBUGis set, debug logging will be enabled.If
kani-multimodal-coreis installed, you can send multimodal media to a compatible engine with a file path or URL after an@symbol (e.g. “Describe this image: @image.png”). Use quotes (e.g.@"path/to/my image.png") for paths with spaces in their names.Warning
This function is only a development utility and should not be used in production.
- Parameters:
rounds (int) – The number of chat rounds to play (defaults to 0 for infinite).
stopword (str) – Break out of the chat loop if the user sends this message.
echo (bool) – Whether to echo the user’s input to stdout after they send a message (e.g. to save in interactive notebook outputs; default false)
ai_first (bool) – Whether the user should send the first message (default) or the model should generate a completion before prompting the user for a message.
width (int) – The maximum width of the printed outputs (default unlimited).
show_function_args (bool) – Whether to print the arguments the model is calling functions with for each call (default false).
show_function_returns (bool) – Whether to print the results of each function call (default false).
verbose (bool) – Equivalent to setting
echo,show_function_args, andshow_function_returnsto True.stream (bool) – Whether or not to print tokens as soon as they are generated by the model (default true).
- async kani.chat_in_terminal_async(
- kani: Kani,
- *,
- rounds: int = 0,
- stopword: str = None,
- echo: bool = False,
- ai_first: bool = False,
- width: int = None,
- show_function_args: bool = False,
- show_function_returns: bool = False,
- verbose: bool = False,
- stream: bool = True,
Async version of
chat_in_terminal(). Use in environments when there is already an asyncio loop running (e.g. Google Colab).
- kani.format_width(msg: str, width: int = None, prefix: str = '') str[source]¶
Format the given message such that the width of each line is less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.
- async kani.format_stream(stream: StreamManager, width: int = None, prefix: str = '') AsyncIterable[str][source]¶
Yield formatted tokens from a stream such that if concatenated, the width of each line is less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.
- kani.print_width(msg: str, width: int = None, prefix: str = '')[source]¶
Print the given message such that the width of each line is less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.
- async kani.print_stream(stream: StreamManager, width: int = None, prefix: str = '')[source]¶
Print tokens from a stream to the terminal, with the width of each line less than width. If prefix and width are provided, indents each line after the first by the length of the prefix.
This is a helper function intended to be used with
Kani.chat_round_stream()orKani.full_round_stream().
Message Formatters¶
A couple convenience formatters to customize Kani.full_round_str().
You can pass any of these functions in with, e.g., Kani.full_round_str(..., message_formatter=all_message_contents).
- kani.utils.message_formatters.all_message_contents(msg: ChatMessage)[source]¶
Return the content of any message.
- kani.utils.message_formatters.assistant_message_contents(msg: ChatMessage, show_reasoning=True, color=True)[source]¶
Return the content of any assistant message; otherwise don’t return anything.
- Parameters:
show_reasoning – If True, include any ReasoningParts in the output.
color – If True, the returned reasoning parts will be surrounded in ANSI codes to make it appear gray.
- kani.utils.message_formatters.assistant_message_contents_thinking(msg: ChatMessage, show_args=False, show_reasoning=True, color=True)[source]¶
Return the content of any assistant message, and “Thinking…” on function calls.
You can use this in
full_round_strby using a partial, e.g.:ai.full_round_str(..., message_formatter=functools.partial(assistant_message_contents_thinking, show_args=True))- Parameters:
show_args – If True, include the arguments to each function call.
show_reasoning – If True, include any ReasoningParts in the output.
color – If True, the returned reasoning parts will be surrounded in ANSI codes to make it appear gray.
- kani.utils.message_formatters.assistant_message_thinking(msg: ChatMessage, show_args=False)[source]¶
Return “Thinking…” on assistant messages with function calls, ignoring any content.
This is useful if you are streaming the message’s contents.
If show_args is True, include the arguments to each function call.
Model-Specific Parsers¶
Model parsers are used when you have an LLM’s text output, which may contain tool calls or other interleaved content in their raw format (e.g., reasoning output). They translate the raw text format into Kani’s tool calling specification and MessageParts.
Tool parsers are WrapperEngines – this means to use them, you should wrap the text-only engine (e.g.,
a HuggingEngine) like so:
from kani.engines.huggingface import HuggingEngine
from kani.tool_parsers import GPTOSSParser
model = HuggingEngine("openai/gpt-oss-20b")
engine = GPTOSSParser(model)
- class kani.model_specific.BaseParser(
- *args,
- tool_call_start_token: str | None = None,
- tool_call_end_token: str | None = None,
- reasoning_start_token: str | None = None,
- reasoning_end_token: str | None = None,
- reasoning_always_at_start=False,
- show_reasoning_in_stream=False,
- reasoning_in_stream_color=True,
- **kwargs,
Abstract base class for model-specific tool call/reasoning parsers.
To implement your own tool call/reasoning parser, subclass this class and:
implement
parse_tool_calls(self, content: str) -> tuple[str, list[ToolCall]]implement
parse_reasoning(self, content: str) -> list[MessagePart]pass default values of
tool_call_start_token,tool_call_end_token,reasoning_start_token, andreasoning_end_tokentosuper().__init__(...)
This class will handle calling the parser and interrupting streams when tool calls/reasoning are detected.
- Parameters:
tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.
tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.
reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.
reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.
reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the
reasoning_end_tokenis seen while streaming.show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a
ReasoningPartin the finalChatMessage. This does not change the final returnedChatMessage, it only affects streamed tokens.reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when
show_reasoning_in_stream=True.
- parse_tool_calls(content: str) tuple[str, list[ToolCall]][source]¶
Given the string completion of the model, return the content without tool calls and the parsed tool calls.
- parse_reasoning(content: str) str | list[MessagePart | str][source]¶
Given the string completion of the model (after parsing tool calls), return the content with reasoning transformed to ReasoningParts.
- parse_completion(completion: BaseCompletion) BaseCompletion[source]¶
Single-step parsing, if you prefer handling it all in one place. By default, calls
parse_tool_calls()andparse_reasoning().
- async predict(messages, functions=None, **hyperparams) BaseCompletion[source]¶
Given the current context of messages and available functions, get the next predicted chat message from the LM.
- Parameters:
messages – The messages in the current chat context.
prompt_len(messages, functions)is guaranteed to be less than max_context_size.functions – The functions the LM is allowed to call.
hyperparams – Any additional parameters to pass to the engine.
- async stream(messages, functions=None, **hyperparams)[source]¶
Optional: Stream a completion from the engine, token-by-token.
This method’s signature is the same as
BaseEngine.predict().This method should yield strings as an asynchronous iterable.
Optionally, this method may also yield a
BaseCompletion. If it does, it MUST be the last item yielded by this method.If an engine does not implement streaming, this method will yield the entire text of the completion in a single chunk by default.
- Parameters:
messages – The messages in the current chat context.
prompt_len(messages, functions)is guaranteed to be less than max_context_size.functions – The functions the LM is allowed to call.
hyperparams – Any additional parameters to pass to the engine.
- class kani.model_specific.gpt_oss.GPTOSSParser(*args, **kwargs)[source]¶
Automatically handles the parsing of GPT-OSS reasoning segments and tool calls.
Reasoning segments are returned as
ReasoningParts.- Parameters:
tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.
tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.
reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.
reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.
reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the
reasoning_end_tokenis seen while streaming.show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a
ReasoningPartin the finalChatMessage. This does not change the final returnedChatMessage, it only affects streamed tokens.reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when
show_reasoning_in_stream=True.
- class kani.model_specific.json.NaiveJSONToolCallParser(*args, **kwargs)[source]¶
If the model’s output contains only valid JSON of form:
{ "name": "function_name", "parameters": { "key": "value..." } }
then assume it is a function call. Otherwise, return the content unchanged.
- Parameters:
tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.
tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.
reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.
reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.
reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the
reasoning_end_tokenis seen while streaming.show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a
ReasoningPartin the finalChatMessage. This does not change the final returnedChatMessage, it only affects streamed tokens.reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when
show_reasoning_in_stream=True.
- class kani.model_specific.mistral.MistralToolCallParser(*args, tool_call_start_token: str = '[TOOL_CALLS]', tool_call_end_token: str = '</s>', **kwargs)[source]¶
Tool calling adapter for Mistral models using the v3 or v7 tokenizer:
--- v3 --- mistral-tiny-2407 open-mixtral-8x22b-2404 mistral-small-2409 mistral-large-2407 codestral-2405 codestral-mamba-2407 --- v7 --- mistral-large-2411
- Parameters:
tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.
tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.
reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.
reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.
reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the
reasoning_end_tokenis seen while streaming.show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a
ReasoningPartin the finalChatMessage. This does not change the final returnedChatMessage, it only affects streamed tokens.reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when
show_reasoning_in_stream=True.
- class kani.model_specific.deepseek.DeepSeekR1Parser(
- *args,
- tool_call_start_token: str = '<|tool▁calls▁begin|>',
- tool_call_end_token: str = '<|tool▁outputs▁end|>',
- reasoning_start_token: str | None = '<think>',
- reasoning_end_token: str | None = '</think>',
- reasoning_always_at_start=True,
- **kwargs,
Tool calling adapter for DeepSeek models using the R1 tool call format:
deepseek-ai/DeepSeek-R1 deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B deepseek-ai/DeepSeek-R1-Distill-Qwen-7B deepseek-ai/DeepSeek-R1-Distill-Llama-8B deepseek-ai/DeepSeek-R1-Distill-Qwen-14B deepseek-ai/DeepSeek-R1-Distill-Qwen-32B deepseek-ai/DeepSeek-R1-Distill-Llama-70B
Reasoning segments are returned as
ReasoningParts.- Parameters:
tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.
tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.
reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.
reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.
reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the
reasoning_end_tokenis seen while streaming.show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a
ReasoningPartin the finalChatMessage. This does not change the final returnedChatMessage, it only affects streamed tokens.reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when
show_reasoning_in_stream=True.
- class kani.model_specific.qwen3.Qwen3Parser(
- *args,
- tool_call_start_token='<tool_call>',
- tool_call_end_token='</tool_call>',
- reasoning_start_token='<think>',
- reasoning_end_token='</think>',
- reasoning_always_at_start=False,
- **kwargs,
Tool calling + reasoning adapter for Qwen3 models:
Qwen/Qwen3-*
Reasoning segments are returned as
ReasoningParts.- Parameters:
tool_call_start_token – The token used to delimit the start of a tool call. Used to determine when to buffer streams.
tool_call_end_token – The token used to delimit the end of a tool call. Used to determine when to yield streams.
reasoning_start_token – The token used to delimit the start of a reasoning segment. Used to determine when to buffer streams.
reasoning_end_token – The token used to delimit the end of a reasoning segment. Used to determine when to buffer streams and implement default reasoning parsing behaviour.
reasoning_always_at_start – Whether the model’s response always starts with reasoning and should be buffered until the
reasoning_end_tokenis seen while streaming.show_reasoning_in_stream – Whether reasoning tokens should be yielded during streams. By default, only non-reasoning tokens will be yielded, and reasoning tokens will be included in a
ReasoningPartin the finalChatMessage. This does not change the final returnedChatMessage, it only affects streamed tokens.reasoning_in_stream_color – If True, wraps yielded reasoning tokens in an ANSI color code to make them appear gray when printed in a terminal. Only takes effect when
show_reasoning_in_stream=True.
Saving/Loading¶
- kani.utils.saveload.get_ctx(info) KaniZipSaveContext | None[source]¶
Get the KaniZipSaveContext from a SerializationInfo/ValidationInfo object.
- class kani.utils.saveload.KaniZipSaveContext(zf: zipfile.ZipFile)[source]¶