High-performance Android SDK for on-device LLM inference (GGUF). Privacy-focused, offline-first, and powered by llama.cpp with a clean Kotlin Coroutines API.
-
Updated
Mar 25, 2026 - Kotlin
High-performance Android SDK for on-device LLM inference (GGUF). Privacy-focused, offline-first, and powered by llama.cpp with a clean Kotlin Coroutines API.
KnoLo Core is a local-first knowledge base engine built for small language models (LLMs). It packages your documents into a compact .knolo file and enables fully deterministic querying — no embeddings, no vector databases, no cloud services required. Designed for on-device and edge LLM deployments.
Documentation for MobileTransformers - a lightweight, modular framework based on ONNX Runtime for running and adapting large language models (LLMs) directly on mobile and edge devices. It supports on-device fine-tuning (PEFT), efficient inference, quantization, weight merging, and direct inference from merged models.
Add a description, image, and links to the on-device-llm topic page so that developers can more easily learn about it.
To associate your repository with the on-device-llm topic, visit your repo's landing page and select "manage topics."