LlamaCppEngine ============== If your language model backend is available with GGUF, kani includes a base engine that implements a prediction pipeline. **TL;DR** .. code-block:: python from kani.engines.huggingface import ChatTemplatePromptPipeline from kani.engines.llamacpp import LlamaCppEngine pipeline = ChatTemplatePromptPipeline.from_pretrained("org-id/base-model-id") engine = LlamaCppEngine(repo_id="org-id/quant-model-id", filename="*.your-quant-type.gguf", prompt_pipeline=pipeline) .. important:: .. versionadded:: 1.4.0 For most models that use a chat template, you won't need to create a new engine class - kani will automatically use a `Chat Template `_ if a model has one included. This means you can safely ignore this section of the documentation for most use cases! Just use: .. code-block:: python from kani.engines.llamacpp import LlamaCppEngine engine = LlamaCppEngine(repo_id="your-org/your-model-id", filename="*Q4_K_M.gguf") kani uses `llama-cpp-python `_ for binding to the llama.cpp runtime. .. versionadded:: 1.0.0 For most models that use a chat format, you won't even need to create a new engine class - instead, you can pass a :class:`.PromptPipeline` to the :class:`.LlamaCppEngine`. If you do create a new engine, instead of having to implement the prediction logic, all you have to do is subclass :class:`.LlamaCppEngine` and implement :meth:`~.LlamaCppEngine.build_prompt`. .. autoclass:: kani.engines.llamacpp.LlamaCppEngine :noindex: .. automethod:: kani.engines.llamacpp.LlamaCppEngine.build_prompt :noindex: