Message Extras (Metadata)¶

Message Extras are a way to attach engine-specific metadata to a ChatMessage object. It is a dictionary that remains attached to the message for the duration of its lifetime.

Warning

Don’t confuse Message Extras with Message Parts. Message Extras are used for engine-specific metadata about a certain message (e.g., an internal ID or detailed engine-specific usage data), whereas Message Parts are used for engine-agnostic inputs that require richer representation than a string (e.g., a multimodal input or a hidden chain of thought).

You can use Message Extras to store additional semi-structured information for your own downstream use. Certain engines may also use Message Extras to return engine-specific metadata.

ChatMessage.extra: dict

Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.

This key will only be persisted to disk on a best-effort basis – any value that is not JSON-serializable or a Pydantic class will be cast to a repr. Upon loading, values may not retain the same type as they were saved as (Pydantic objects will be loaded as a dict).

The MessagePart class also contains a similar attribute for the same purpose:

MessagePart.extra: dict

Specific engines may store additional extra data in this dictionary. See an engine’s documentation for details about any extras it may store or expect.

Example (OpenAI)¶

One example of using Message Extras is to retrieve an OpenAI-specific usage object. Although kani returns basic prompt_tokens and completion_tokens usage attributes with most completions, OpenAI completions contain much more detailed usage:

"usage": {
    "prompt_tokens": 1117,
    "completion_tokens": 46,
    "total_tokens": 1163,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
}

However, this detailed usage is only available when using the OpenAIEngine. In order to store this detailed usage data without interfering with other engines, kani saves this usage object to the "openai_usage" message extra.

class kani.engines.openai.OpenAIEngine(

api_key: str = None,

model='gpt-4.1-nano',

max_context_size: int = None,

*,

api_type: Literal['chat_completions', 'responses'] = None,

organization: str = None,

retry: int = 5,

api_base: str = 'https://api.openai.com/v1',

headers: dict = None,

client: AsyncOpenAI = None,

tokenizer=None,

**hyperparams,

)[source]

Engine for using the OpenAI API.

This engine supports all chat-based models and fine-tunes.

Multimodal support: images, audio.

Message Extras

"openai_completion": The ChatCompletion (raw response) returned by the OpenAI servers, as a dictionary. Non-streaming responses only.
"openai_usage": The usage data (raw response) returned by the OpenAI servers, as a dictionary.

Parameters:

api_key – Your OpenAI API key. By default, the API key will be read from the OPENAI_API_KEY environment variable.
model – The id of the model to use (e.g. “gpt-4o-mini”, “ft:gpt-3.5-turbo:my-org:custom_suffix:id”).
max_context_size – The maximum amount of tokens allowed in the chat prompt. If None, uses the given model’s full context size.
api_type – Whether to use the Chat Completions API (default for most models) or Responses API (default for “deep-reasoning” style models). If unset, the best API type for the given model will be chosen.
organization – The OpenAI organization to use in requests. By default, the org ID would be read from the OPENAI_ORG_ID environment variable (defaults to the API key’s default org if not set).
retry – How many times the engine should retry failed HTTP calls with exponential backoff (default 5).
api_base – The base URL of the OpenAI API to use.
headers – A dict of HTTP headers to include with each request.
client – An instance of openai.AsyncOpenAI (for reusing the same client in multiple engines). You must specify exactly one of (api_key, client). If this is passed the organization, retry, api_base, and headers params will be ignored.
tokenizer – The tokenizer to use for token estimation - for OpenAI models this will be loaded automatically. A class with a .encode(text: str) method that returns a list (usually of token ids).
hyperparams – The arguments to pass to the create_chat_completion call with each request. See https://platform.openai.com/docs/api-reference/chat/create for a full list of params.

So, we can access this usage like so:

from kani.engines.openai import OpenAIEngine
from kani import Kani

async def detailed_usage():
    engine = OpenAIEngine(model="gpt-5-nano")
    ai = Kani(engine)
    msg = await ai.chat_round("How many 'o's are in 'pneumonoultramicroscopicsilicovolcanoconiosis'?")
    print(msg.text)
    print(msg.extra.get("openai_usage"))

>>> import asyncio
>>> asyncio.run(detailed_usage())
9
Reason: Break it into segments—pneumo, noultra, micro, scopic, silico, volcano, coniosis—each contains 1,1,1,1,1,2,2 o’s respectively, totaling 9.

CompletionUsage(
    completion_tokens=1663,
    prompt_tokens=30,
    total_tokens=1693,
    completion_tokens_details=CompletionTokensDetails(
        accepted_prediction_tokens=0,
        audio_tokens=0,
        reasoning_tokens=1600,
        rejected_prediction_tokens=0
    ),
    prompt_tokens_details=PromptTokensDetails(
        audio_tokens=0,
        cached_tokens=0
    )
)

Best Practices¶

A given extra MAY NOT be present on a ChatMessage returned by different engines.
Downstream code SHOULD NOT rely on the presence of a certain extra, but MAY conditionally check for the presence of certain extras for logging purposes. Reliance on certain extras tightly couples code with a certain engine.
An engine SHOULD NOT rely on an extra it set in a past round being present in future rounds.