Extensions

Extensions are 3rd-party packages that extend the core functionality available in kani with more domain-specific tooling. With kani’s philosophy of hackability, extensions can provide new engines, feature sets, prompting frameworks, and even multimodality.

For 3rd party packages, we recommend using the kani.ext.* namespace (e.g. kani.ext.my_cool_extension).

To get started with building a kani extension package, you can use the kani extension template repository! This repo contains the right directory layout to use the kani.ext.* namespace and the PyPI packaging basics to make your package available on pip.

Community Extensions

If you’ve made a cool extension, add it to this list with a PR!

  • kani-multimodal-core: Adds support for multimodal inputs: images, audio, and video.

  • kani-ratelimits: Adds a wrapper engine to enforce request-per-minute (RPM), token-per-minute (TPM), and/or max-concurrency ratelimits before making requests to an underlying engine.

  • kani-ext-vllm: Adds support for loading models using vLLM with a local engine or managing a vLLM server process. The local vLLM engine is a drop-in replacement for a HuggingEngine.

  • kani-ext-realtime: Adds support for OpenAI’s Realtime API and full-duplex voice models with Kani.

Design Considerations

Engine Extensions

When building an engine extension, you have a couple options. If your extension provides an implementation for a new LLM, you can simply have your extension provide a new subclass of BaseEngine (see Engines for more information on how to build an engine).

Engine extensions do not necessarily have to be an LLM, however; for example, you might provide specific prompting frameworks as an engine that are compatible with any LLM of their own. In this case, we suggest having your engine wrap another engine:

"""An example showing how to wrap another kani engine."""

from kani.engines import WrapperEngine

# subclassing WrapperEngine automatically implements passthrough of untouched attributes
# to the wrapped engine!
class MyEngineWrapper(WrapperEngine):
    def prompt_len(self, messages, functions=None, **kwargs):
        # wrap the inner messages with the prompt framework...
        prompted_messages = [ChatMessage(...) for m in messages]
        return super().prompt_len(prompted_messages, functions, **kwargs)

    async def predict(self, messages, functions=None, **hyperparams):
        # wrap the messages with the prompt framework and pass it to the inner engine
        prompted_messages = [ChatMessage(...) for m in messages]
        prompted_completion = await super().predict(prompted_messages, ...)
        # unwrap the resulting message (if necessary) and store the metadata separately
        completion = self.unwrap(prompted_completion)
        return completion

The kani.engines.WrapperEngine is a base class that automatically creates a constructor that takes in the engine to wrap, and passes through any non-overriden attributes to the wrapped engine.

CLI Extensions

An extension package can define CLI entrypoints for use with the Kani CLI. In order to do this, an extension package must use the kani.ext.* namespace and define a CLI_PROVIDERS name in its __init__.py. This variable should be a 3-tuple: (name: str, aliases: list[str], entrypoint: (model_id) -> Engine).

For example, for the vLLM engine, which is provided through the kani.ext.vllm extension package, we define:

# kani/ext/vllm/__init__.py

# ...
CLI_PROVIDERS = [
    ("vllm", [], VLLMServerEngine),
]

The Kani CLI will automatically discover and provide CLI entrypoints defined in this way when called.