diff --git a/.gitignore b/.gitignore
index 42c6d12..7881c42 100644
--- a/.gitignore
+++ b/.gitignore
@@ -164,4 +164,5 @@ Thumbs.db
 env
 env/*
 
-__pycache__/
\ No newline at end of file
+__pycache__/
+devlog/
\ No newline at end of file
diff --git a/deployment/on-device/android/advanced-features.mdx b/deployment/on-device/android/advanced-features.mdx
index fd22267..2766be5 100644
--- a/deployment/on-device/android/advanced-features.mdx
+++ b/deployment/on-device/android/advanced-features.mdx
@@ -3,6 +3,11 @@ title: "Advanced Features"
 description: "API reference for constrained generation and function calling in the LEAP Android SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## Constrained Generation
 
 Please refer to [Constrained Generation](./constrained-generation) guide on detailed usage.
diff --git a/deployment/on-device/android/ai-agent-usage-guide.mdx b/deployment/on-device/android/ai-agent-usage-guide.mdx
index fa3a9ac..d636f29 100644
--- a/deployment/on-device/android/ai-agent-usage-guide.mdx
+++ b/deployment/on-device/android/ai-agent-usage-guide.mdx
@@ -3,6 +3,11 @@ title: "AI Agent Usage Guide"
 description: "Complete reference for using the LEAP Android SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## Core Architecture
 
 ```
diff --git a/deployment/on-device/android/android-quick-start-guide.mdx b/deployment/on-device/android/android-quick-start-guide.mdx
index 36a9b1d..f4ba22e 100644
--- a/deployment/on-device/android/android-quick-start-guide.mdx
+++ b/deployment/on-device/android/android-quick-start-guide.mdx
@@ -3,6 +3,11 @@ title: "Quick Start Guide"
 description: "Get up and running with the LEAP Android SDK in minutes. Install the SDK, load models, and start generating content."
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 Latest version: `v0.9.7`
 
 <Info>
diff --git a/deployment/on-device/android/cloud-ai-comparison.mdx b/deployment/on-device/android/cloud-ai-comparison.mdx
index 2f07445..7d562d5 100644
--- a/deployment/on-device/android/cloud-ai-comparison.mdx
+++ b/deployment/on-device/android/cloud-ai-comparison.mdx
@@ -4,6 +4,11 @@ description: "Compare LEAP Android SDK with cloud-based AI APIs like OpenAI"
 sidebar_position: 5
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 If you are familiar with cloud-based AI APIs (e.g. [OpenAI API](https://openai.com/api/)), this document
 shows the similarity and differences between these clould APIs and Leap.
 
diff --git a/deployment/on-device/android/constrained-generation.mdx b/deployment/on-device/android/constrained-generation.mdx
index f01220e..b3ee5af 100644
--- a/deployment/on-device/android/constrained-generation.mdx
+++ b/deployment/on-device/android/constrained-generation.mdx
@@ -4,6 +4,11 @@ description: "Generate structured JSON output with compile-time validation using
 sidebar_position: 3
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 Setting the `jsonSchemaConstraint` field in [`GenerationOptions`](./conversation-generation#generationoptions) will enable constrained generation. While it is possible to
 directly set the constraint with raw JSON Schema strings, we recommend to create the constraints with the `Generatable` annotation.
 
diff --git a/deployment/on-device/android/conversation-generation.mdx b/deployment/on-device/android/conversation-generation.mdx
index c5b6b26..07a4124 100644
--- a/deployment/on-device/android/conversation-generation.mdx
+++ b/deployment/on-device/android/conversation-generation.mdx
@@ -3,6 +3,11 @@ title: "Conversation & Generation"
 description: "API reference for conversations, model runners, and generation in the LEAP Android SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 <Info>
 All functions listed in this document are safe to call from the main or UI thread and all callbacks will be run on the main thread, unless there are explicit instructions or explanations.
 </Info>
diff --git a/deployment/on-device/android/function-calling.mdx b/deployment/on-device/android/function-calling.mdx
index 0bfdc81..99369db 100644
--- a/deployment/on-device/android/function-calling.mdx
+++ b/deployment/on-device/android/function-calling.mdx
@@ -4,6 +4,11 @@ description: "Function calling allows the model to make requests to call some pr
 sidebar_position: 4
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 <Warning>
 Not all models support function calling. Please check the model card before using the model for function calling.
 </Warning>
diff --git a/deployment/on-device/android/messages-content.mdx b/deployment/on-device/android/messages-content.mdx
index 58daedc..80d74eb 100644
--- a/deployment/on-device/android/messages-content.mdx
+++ b/deployment/on-device/android/messages-content.mdx
@@ -3,6 +3,11 @@ title: "Messages & Content"
 description: "API reference for chat messages and content types in the LEAP Android SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## `ChatMessage`
 
 Data class that is compatible with the message object in OpenAI chat completion API.
diff --git a/deployment/on-device/android/model-loading.mdx b/deployment/on-device/android/model-loading.mdx
index db61957..e3024e5 100644
--- a/deployment/on-device/android/model-loading.mdx
+++ b/deployment/on-device/android/model-loading.mdx
@@ -3,6 +3,11 @@ title: "Model Loading"
 description: "API reference for loading models in the LEAP SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 <Info>
 The LEAP SDK is now **Kotlin Multiplatform** and provides two model loading options:
 - **`LeapModelDownloader`** - Android-specific, recommended for Android apps (background downloads, notifications, WorkManager)
diff --git a/deployment/on-device/android/utilities.mdx b/deployment/on-device/android/utilities.mdx
index 7ff1772..45a8f77 100644
--- a/deployment/on-device/android/utilities.mdx
+++ b/deployment/on-device/android/utilities.mdx
@@ -3,6 +3,11 @@ title: "Utilities"
 description: "API reference for error handling, serialization, and utilities in the LEAP Android SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## Error Handling
 
 All errors are thrown as `LeapException`, which has following subclasses:
diff --git a/deployment/on-device/ios/advanced-features.mdx b/deployment/on-device/ios/advanced-features.mdx
index 43c29cb..835da3d 100644
--- a/deployment/on-device/ios/advanced-features.mdx
+++ b/deployment/on-device/ios/advanced-features.mdx
@@ -3,6 +3,11 @@ title: "Advanced Features"
 description: "API reference for constrained generation and function calling in the LEAP iOS SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## `GenerationOptions`
 
 Tune generation behavior with `GenerationOptions`.
diff --git a/deployment/on-device/ios/ai-agent-usage-guide.mdx b/deployment/on-device/ios/ai-agent-usage-guide.mdx
index 442d17f..5ad6904 100644
--- a/deployment/on-device/ios/ai-agent-usage-guide.mdx
+++ b/deployment/on-device/ios/ai-agent-usage-guide.mdx
@@ -3,6 +3,11 @@ title: "AI Agent Usage Guide"
 description: "Complete reference for using the LEAP iOS SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## Core Architecture
 
 ```
diff --git a/deployment/on-device/ios/cloud-ai-comparison.mdx b/deployment/on-device/ios/cloud-ai-comparison.mdx
index 1f2ae36..c6e3368 100644
--- a/deployment/on-device/ios/cloud-ai-comparison.mdx
+++ b/deployment/on-device/ios/cloud-ai-comparison.mdx
@@ -4,6 +4,11 @@ description: "Compare LEAP iOS SDK with cloud-based AI APIs like OpenAI"
 sidebar_position: 5
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 If you are familiar with cloud-based AI APIs (e.g. [OpenAI API](https://openai.com/api/)), this document
 shows the similarity and differences between these clould APIs and Leap.
 
diff --git a/deployment/on-device/ios/constrained-generation.mdx b/deployment/on-device/ios/constrained-generation.mdx
index 0b78cc6..849365d 100644
--- a/deployment/on-device/ios/constrained-generation.mdx
+++ b/deployment/on-device/ios/constrained-generation.mdx
@@ -4,6 +4,11 @@ description: "Generate structured JSON output with compile-time validation using
 sidebar_position: 3
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 LeapSDK provides powerful constrained generation capabilities using Swift macros that enable you to generate structured JSON output with compile-time validation. This feature ensures the AI model produces responses that conform to your predefined Swift types.
 
 ## Overview
diff --git a/deployment/on-device/ios/conversation-generation.mdx b/deployment/on-device/ios/conversation-generation.mdx
index 6a68942..04e874a 100644
--- a/deployment/on-device/ios/conversation-generation.mdx
+++ b/deployment/on-device/ios/conversation-generation.mdx
@@ -3,6 +3,11 @@ title: "Conversation & Generation"
 description: "API reference for conversations, model runners, and generation in the LEAP iOS SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 <Info>
 All functions listed in this document are safe to call from the main thread and all callbacks will be run on the main thread, unless there are explicit instructions or explanations.
 </Info>
diff --git a/deployment/on-device/ios/function-calling.mdx b/deployment/on-device/ios/function-calling.mdx
index 401e153..e4e7650 100644
--- a/deployment/on-device/ios/function-calling.mdx
+++ b/deployment/on-device/ios/function-calling.mdx
@@ -4,6 +4,11 @@ description: "Function calling allows the model to make requests to call some pr
 sidebar_position: 4
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 <Warning>
 Not all models support function calling. Please check the model card before using the model for function calling.
 </Warning>
diff --git a/deployment/on-device/ios/ios-quick-start-guide.mdx b/deployment/on-device/ios/ios-quick-start-guide.mdx
index d7d9a28..ccd1f89 100644
--- a/deployment/on-device/ios/ios-quick-start-guide.mdx
+++ b/deployment/on-device/ios/ios-quick-start-guide.mdx
@@ -3,6 +3,11 @@ title: "Quick Start Guide"
 description: "Get up and running with the LEAP iOS SDK in minutes. Install the SDK, load models, and start generating content."
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 Latest version: `v0.9.2`
 
 ## Prerequisites[​](#prerequisites "Direct link to Prerequisites")
diff --git a/deployment/on-device/ios/messages-content.mdx b/deployment/on-device/ios/messages-content.mdx
index d710d59..44144bc 100644
--- a/deployment/on-device/ios/messages-content.mdx
+++ b/deployment/on-device/ios/messages-content.mdx
@@ -3,6 +3,11 @@ title: "Messages & Content"
 description: "API reference for chat messages and content types in the LEAP iOS SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## Chat Messages
 
 ### Roles
diff --git a/deployment/on-device/ios/model-loading.mdx b/deployment/on-device/ios/model-loading.mdx
index 4c56242..d96cfd5 100644
--- a/deployment/on-device/ios/model-loading.mdx
+++ b/deployment/on-device/ios/model-loading.mdx
@@ -3,6 +3,11 @@ title: "Model Loading"
 description: "API reference for loading models in the LEAP iOS SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## `Leap`
 
 `Leap` is the static entry point for loading on-device models.
diff --git a/deployment/on-device/ios/utilities.mdx b/deployment/on-device/ios/utilities.mdx
index 2218b18..794705e 100644
--- a/deployment/on-device/ios/utilities.mdx
+++ b/deployment/on-device/ios/utilities.mdx
@@ -3,6 +3,11 @@ title: "Utilities"
 description: "API reference for error handling and utilities in the LEAP iOS SDK"
 ---
 
+<Warning>
+**Legacy Documentation** — This page documents the standalone iOS/Android SDK which has been superseded by the unified [LEAP SDK](/deployment/on-device/leap-sdk/quick-start-guide). New projects should use the LEAP SDK.
+</Warning>
+
+
 ## Errors
 
 Errors are surfaced as `LeapError` values. The most common cases are:
diff --git a/deployment/on-device/leap-sdk/advanced-features.mdx b/deployment/on-device/leap-sdk/advanced-features.mdx
new file mode 100644
index 0000000..7b1043d
--- /dev/null
+++ b/deployment/on-device/leap-sdk/advanced-features.mdx
@@ -0,0 +1,455 @@
+---
+title: "Advanced Features"
+description: "API reference for advanced generation options, constrained generation, and function calling in the LEAP SDK"
+---
+
+## `GenerationOptions`
+
+Tune generation behavior with `GenerationOptions`. Leave a field as `nil` / `null` to fall back to the defaults packaged with the model bundle.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+data class GenerationOptions(
+    val temperature: Float? = null,
+    val topP: Float? = null,
+    val minP: Float? = null,
+    val repetitionPenalty: Float? = null,
+    val jsonSchemaConstraint: String? = null,
+    val functionCallParser: LeapFunctionCallParser? = LFMFunctionCallParser(),
+    val injectSchemaIntoPrompt: Boolean = true,
+    val topK: Int? = null,
+    val rngSeed: Long? = null,
+    val enableThinking: Boolean = false,
+)
+```
+
+| Field | Type | Description |
+|---|---|---|
+| `temperature` | `Float?` | Sampling temperature. |
+| `topP` | `Float?` | Nucleus sampling probability mass. |
+| `minP` | `Float?` | Minimum probability threshold. |
+| `repetitionPenalty` | `Float?` | Penalizes repeated tokens. |
+| `jsonSchemaConstraint` | `String?` | JSON schema string for [Constrained Generation](./constrained-generation). |
+| `functionCallParser` | `LeapFunctionCallParser?` | Parser for tool-call tokens. Default is `LFMFunctionCallParser`. Use `HermesFunctionCallParser` for Hermes/Qwen3 formats, or `null` for raw text. |
+| `injectSchemaIntoPrompt` | `Boolean` | Whether to inject the JSON schema into the system prompt. |
+| `topK` | `Int?` | Top-K sampling: only the K most likely tokens are considered. |
+| `rngSeed` | `Long?` | Random number generator seed for reproducible outputs. |
+| `enableThinking` | `Boolean` | Enable the model's thinking/reasoning mode. |
+
+Use `setResponseFormat` to populate `jsonSchemaConstraint` from a `@Generatable`-annotated data class:
+
+```kotlin
+val options = GenerationOptions(temperature = 0.6f, topP = 0.9f)
+    .apply { setResponseFormat(CityFact::class) }
+
+conversation.generateResponse(
+    message = user,
+    generationOptions = options,
+).collect { response ->
+    // Handle structured output
+}
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public struct GenerationOptions {
+    public var temperature: Float?
+    public var topP: Float?
+    public var minP: Float?
+    public var repetitionPenalty: Float?
+    public var jsonSchemaConstraint: String?
+    public var functionCallParser: LeapFunctionCallParserProtocol?
+    public var topK: Int?
+    public var rngSeed: UInt64?
+    public var enableThinking: Bool
+    public var maxOutputTokens: UInt32?
+    public var sequenceLength: UInt32?
+    public var cacheControl: CacheControl?
+
+    public init(
+        temperature: Float? = nil,
+        topP: Float? = nil,
+        minP: Float? = nil,
+        repetitionPenalty: Float? = nil,
+        jsonSchemaConstraint: String? = nil,
+        functionCallParser: LeapFunctionCallParserProtocol? = LFMFunctionCallParser(),
+        topK: Int? = nil,
+        rngSeed: UInt64? = nil,
+        enableThinking: Bool = false,
+        maxOutputTokens: UInt32? = nil,
+        sequenceLength: UInt32? = nil,
+        cacheControl: CacheControl? = nil
+    )
+}
+```
+
+| Field | Type | Description |
+|---|---|---|
+| `temperature` | `Float?` | Sampling temperature. |
+| `topP` | `Float?` | Nucleus sampling probability mass. |
+| `minP` | `Float?` | Minimum probability threshold. |
+| `repetitionPenalty` | `Float?` | Penalizes repeated tokens. |
+| `jsonSchemaConstraint` | `String?` | JSON schema string for [Constrained Generation](./constrained-generation). |
+| `functionCallParser` | `LeapFunctionCallParserProtocol?` | Parser for tool-call tokens. Default is `LFMFunctionCallParser`. Use `HermesFunctionCallParser()` for Hermes/Qwen3 formats, or `nil` for raw text. |
+| `topK` | `Int?` | Top-K sampling: only the K most likely tokens are considered. |
+| `rngSeed` | `UInt64?` | Random number generator seed for reproducible outputs. |
+| `enableThinking` | `Bool` | Enable the model's thinking/reasoning mode. |
+| `maxOutputTokens` | `UInt32?` | Maximum number of tokens to generate. |
+| `sequenceLength` | `UInt32?` | Total sequence length (prompt + output). |
+| `cacheControl` | `CacheControl?` | Controls KV-cache behavior for the session. |
+
+Use `setResponseFormat(type:)` to populate `jsonSchemaConstraint` from a type annotated with the `@Generatable` macro:
+
+```swift
+extension GenerationOptions {
+    public mutating func setResponseFormat<T: GeneratableType>(type: T.Type) throws {
+        self.jsonSchemaConstraint = try JSONSchemaGenerator.getJSONSchema(for: type)
+    }
+}
+```
+
+```swift
+var options = GenerationOptions(temperature: 0.6, topP: 0.9)
+try options.setResponseFormat(type: CityFact.self)
+
+for try await response in conversation.generateResponse(
+    message: user,
+    generationOptions: options
+) {
+    // Handle structured output
+}
+```
+
+</Tab>
+
+</Tabs>
+
+## Constrained Generation Utilities
+
+For full usage details, see the [Constrained Generation](./constrained-generation) guide.
+
+### `JSONSchemaGenerator`
+
+Generates a JSON schema string from a `@Generatable`-annotated type. This schema is passed to `GenerationOptions.jsonSchemaConstraint` to activate constrained generation.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+package ai.liquid.leap.structuredoutput
+
+object JSONSchemaGenerator {
+    @Throws(LeapGeneratableSchematizationException::class)
+    fun <T : Any> getJSONSchema(
+        klass: KClass<T>,
+        indentSpaces: Int? = null,
+    ): String
+}
+```
+
+- `klass` -- the Kotlin class object created from `T::class`. It must be a data class annotated with `@Generatable`.
+- `indentSpaces` -- a non-null value will format the JSON output into a pretty style with the given indent spaces.
+
+If the data class cannot be supported or any other issue blocks JSON schema generation, a `LeapGeneratableSchematizationException` is thrown.
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public enum JSONSchemaGenerator {
+    public static func getJSONSchema<T: GeneratableType>(for type: T.Type) throws -> String
+}
+```
+
+Pass the result directly to `GenerationOptions.jsonSchemaConstraint` or use the convenience method `setResponseFormat(type:)`.
+
+</Tab>
+
+</Tabs>
+
+### `GeneratableFactory`
+
+Deserializes a JSON object into an instance of a `@Generatable`-annotated type. Available on all platforms (commonMain).
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+package ai.liquid.leap.structuredoutput
+
+object GeneratableFactory {
+    @Throws(LeapGeneratableDeserializationException::class)
+    fun <T : Any> createFromJSONObject(
+        jsonObject: JSONObject,
+        klass: KClass<T>,
+    ): T
+
+    @Throws(LeapGeneratableDeserializationException::class)
+    inline fun <reified T : Any> createFromJSONObject(jsonObject: JSONObject): T {
+        return createFromJSONObject(jsonObject, T::class)
+    }
+}
+```
+
+The single-parameter version can be called when the return type can be inferred from context. It is a convenience wrapper around the full version.
+
+- `jsonObject` -- the JSON object used as the data source for creating the generatable data class instance.
+- `klass` -- the Kotlin class object created from `T::class`. It must be a data class annotated with `@Generatable`.
+
+</Tab>
+
+<Tab title="Swift">
+
+On Swift, constrained generation output is decoded using standard `Codable` conformance synthesized by the `@Generatable` macro. Use `JSONDecoder` to create instances from JSON data returned by the model.
+
+```swift
+let decoder = JSONDecoder()
+let cityFact = try decoder.decode(CityFact.self, from: jsonData)
+```
+
+</Tab>
+
+</Tabs>
+
+### Annotations
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+package ai.liquid.leap.structuredoutput
+
+@Target(AnnotationTarget.CLASS)
+annotation class Generatable(val description: String)
+
+@Target(AnnotationTarget.PROPERTY)
+annotation class Guide(val description: String)
+```
+
+- `@Generatable` marks a data class for use as a generation constraint.
+- `@Guide` adds a human-readable description to a field, helping the model produce accurate values.
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+@attached(member, conformances: GeneratableType)
+public macro Generatable(description: String)
+
+@attached(peer)
+public macro Guide(description: String)
+```
+
+- `@Generatable` marks a struct for constrained generation and synthesizes `GeneratableType` conformance.
+- `@Guide` adds a description to a property to guide the model's output.
+
+</Tab>
+
+</Tabs>
+
+## Function Calling Types
+
+For full usage details, see the [Function Calling](./function-calling) guide.
+
+### `LeapFunction`
+
+Describes the signature of a function that can be called by the model.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+data class LeapFunction(
+    val name: String,
+    val description: String,
+    val parameters: List<LeapFunctionParameter>,
+)
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public struct LeapFunction {
+    public let name: String
+    public let description: String
+    public let parameters: [LeapFunctionParameter]
+}
+```
+
+</Tab>
+
+</Tabs>
+
+- `name` -- name of the function.
+- `description` -- a human- and LLM-readable description of the function.
+- `parameters` -- the list of parameters accepted by the function.
+
+### `LeapFunctionParameter`
+
+Describes the signature of a parameter in a function.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+data class LeapFunctionParameter(
+    val name: String,
+    val type: LeapFunctionParameterType,
+    val description: String,
+    val optional: Boolean = false,
+)
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public struct LeapFunctionParameter {
+    public let name: String
+    public let type: LeapFunctionParameterType
+    public let description: String
+    public let optional: Bool
+}
+```
+
+</Tab>
+
+</Tabs>
+
+- `name` -- name of the parameter.
+- `type` -- data type of the parameter.
+- `description` -- a human- and LLM-readable description of the parameter.
+- `optional` -- whether this parameter is optional.
+
+### `LeapFunctionParameterType`
+
+Represents a data type for function parameters. All types must be valid [JSON Schema](https://json-schema.org/) types.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+sealed class LeapFunctionParameterType(description: kotlin.String? = null) {
+    val description: kotlin.String? = description
+
+    class String(val enumValues: List<kotlin.String>? = null, description: kotlin.String? = null)
+    class Number(val enumValues: List<kotlin.Number>? = null, description: kotlin.String? = null)
+    class Integer(val enumValues: List<Int>? = null, description: kotlin.String? = null)
+    class Boolean(description: kotlin.String? = null)
+    class Null
+    class Array(val itemType: LeapFunctionParameterType, description: kotlin.String? = null)
+    class Object(
+        val properties: Map<kotlin.String, LeapFunctionParameterType>,
+        val required: List<kotlin.String> = listOf(),
+        description: kotlin.String? = null,
+    )
+}
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public indirect enum LeapFunctionParameterType: Codable, Equatable {
+    case string(StringType)
+    case number(NumberType)
+    case integer(IntegerType)
+    case boolean(BooleanType)
+    case array(ArrayType)
+    case object(ObjectType)
+    case null(NullType)
+}
+```
+
+</Tab>
+
+</Tabs>
+
+| Variant | Description |
+|---|---|
+| `String` | A string literal. `enumValues` restricts accepted values. |
+| `Number` | A number (integer or floating point). `enumValues` restricts accepted values. |
+| `Integer` | An integer literal. `enumValues` restricts accepted values. |
+| `Boolean` | A boolean literal. |
+| `Null` | Accepts only null. |
+| `Array` | An array. `itemType` describes the element type. |
+| `Object` | An object. `properties` maps property names to types; `required` lists mandatory properties. |
+
+### `LeapFunctionCall`
+
+Describes a function call request generated by the model.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+data class LeapFunctionCall(
+    val name: String,
+    val arguments: Map<String, Any?>,
+)
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public struct LeapFunctionCall {
+    public let name: String
+    public let arguments: [String: Any]
+}
+```
+
+</Tab>
+
+</Tabs>
+
+- `name` -- name of the function to be called.
+- `arguments` -- the arguments for the call. Values can be strings, numbers, booleans, null, lists (arrays), or maps/dictionaries (objects).
+
+### Function Call Parsers
+
+Function call parsers convert raw tool-call tokens from the model into `LeapFunctionCall` instances.
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+Two built-in implementations of `LeapFunctionCallParser`:
+
+- **`LFMFunctionCallParser`** -- parses Liquid Foundation Model (LFM2) Pythonic function calls. This is the default.
+- **`HermesFunctionCallParser`** -- parses Hermes/Qwen3 function calling format.
+
+</Tab>
+
+<Tab title="Swift">
+
+Two built-in implementations of `LeapFunctionCallParserProtocol`:
+
+- **`LFMFunctionCallParser`** -- parses Liquid Foundation Model (LFM2) Pythonic function calls. This is the default.
+- **`HermesFunctionCallParser`** -- parses Hermes/Qwen3 function calling format.
+
+</Tab>
+
+</Tabs>
+
+Set the parser to `nil` / `null` in `GenerationOptions` to receive raw tool-call text instead.
diff --git a/deployment/on-device/leap-sdk/ai-agent-usage-guide.mdx b/deployment/on-device/leap-sdk/ai-agent-usage-guide.mdx
new file mode 100644
index 0000000..23ebe4a
--- /dev/null
+++ b/deployment/on-device/leap-sdk/ai-agent-usage-guide.mdx
@@ -0,0 +1,1797 @@
+---
+title: "AI Agent Usage Guide"
+description: "Complete reference for using the LEAP SDK"
+---
+
+## Core Architecture
+
+```
+LeapModelDownloader / LeapDownloader / Leap.load()
+    ↓
+ModelRunner
+    ↓
+Conversation
+    ↓
+MessageResponse (streaming)
+```
+
+The LEAP SDK uses Kotlin Multiplatform (KMP) to share core inference logic across Android, iOS, and macOS. Platform-specific wrappers (`LeapModelDownloader` on Android, `Leap.load()` on Apple) provide native ergonomics while the shared `ModelRunner`, `Conversation`, and `MessageResponse` layer remains consistent.
+
+## Installation
+
+<Tabs>
+<Tab title="Kotlin">
+
+### Gradle Dependencies
+
+**Recommended**: Use a version catalog for dependency management.
+
+```toml
+# gradle/libs.versions.toml
+[versions]
+leapSdk = "0.10.0-SNAPSHOT"
+
+[libraries]
+leap-sdk = { module = "ai.liquid.leap:leap-sdk", version.ref = "leapSdk" }
+leap-model-downloader = { module = "ai.liquid.leap:leap-model-downloader", version.ref = "leapSdk" }
+```
+
+```kotlin
+// app/build.gradle.kts
+dependencies {
+    implementation(libs.leap.sdk)
+    implementation(libs.leap.model.downloader)  // For Android notifications & background downloads
+}
+```
+
+**Alternative**: Direct dependencies
+
+```kotlin
+// app/build.gradle.kts
+dependencies {
+    implementation("ai.liquid.leap:leap-sdk:0.10.0-SNAPSHOT")
+    implementation("ai.liquid.leap:leap-model-downloader:0.10.0-SNAPSHOT")
+}
+```
+
+### Required Permissions
+
+Add to `AndroidManifest.xml`:
+
+```xml
+<uses-permission android:name="android.permission.INTERNET"></uses-permission>
+<uses-permission android:name="android.permission.POST_NOTIFICATIONS"></uses-permission>
+<uses-permission android:name="android.permission.FOREGROUND_SERVICE"></uses-permission>
+<uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC"></uses-permission>
+```
+
+### Runtime Permissions (Android 13+)
+
+Request notification permission before downloading:
+
+```kotlin
+// In Activity
+private val permissionLauncher = registerForActivityResult(
+    ActivityResultContracts.RequestPermission()
+) { isGranted ->
+    if (isGranted) {
+        // Permission granted, proceed with download
+    } else {
+        // Permission denied, handle gracefully
+    }
+}
+
+// Before downloading
+if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
+    if (ContextCompat.checkSelfPermission(this, POST_NOTIFICATIONS) != PERMISSION_GRANTED) {
+        permissionLauncher.launch(android.Manifest.permission.POST_NOTIFICATIONS)
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+### Swift Package Manager
+
+```swift
+// In Xcode: File → Add Package Dependencies
+// Repository: https://github.com/Liquid4All/leap-sdk
+// Version: 0.10.0-SNAPSHOT
+
+dependencies: [
+    .package(url: "https://github.com/Liquid4All/leap-sdk", from: "0.10.0-SNAPSHOT")
+]
+
+targets: [
+    .target(
+        name: "YourApp",
+        dependencies: [
+            .product(name: "LeapSDK", package: "leap-sdk"),
+            .product(name: "LeapModelDownloader", package: "leap-sdk") // Optional
+        ]
+    )
+]
+```
+
+</Tab>
+</Tabs>
+
+## Loading Models
+
+### Method 1: Automatic Download (Recommended)
+
+The simplest approach -- specify model name and quantization, SDK handles everything:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+import ai.liquid.leap.downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloaderNotificationConfig
+
+class ChatViewModel(application: Application) : AndroidViewModel(application) {
+    private val downloader = LeapModelDownloader(
+        application,
+        notificationConfig = LeapModelDownloaderNotificationConfig.build {
+            notificationTitleDownloading = "Downloading AI model..."
+            notificationTitleDownloaded = "Model ready!"
+        }
+    )
+
+    private var modelRunner: ModelRunner? = null
+
+    fun loadModel() {
+        viewModelScope.launch {
+            try {
+                // Downloads if not cached, then loads
+                modelRunner = downloader.loadModel(
+                    modelSlug = "LFM2.5-1.2B-Instruct",
+                    quantizationSlug = "Q4_K_M",
+                    progress = { progressData ->
+                        // progressData.progress: Float (0.0 to 1.0)
+                        Log.d(TAG, "Progress: ${(progressData.progress * 100).toInt()}%")
+                    }
+                )
+            } catch (e: Exception) {
+                Log.e(TAG, "Failed to load model", e)
+            }
+        }
+    }
+
+    override fun onCleared() {
+        super.onCleared()
+
+        // Unload model asynchronously to avoid ANR
+        // Do NOT use runBlocking - it blocks the main thread and can cause ANRs
+        CoroutineScope(Dispatchers.IO).launch {
+            try {
+                modelRunner?.unload()
+            } catch (e: Exception) {
+                Log.e(TAG, "Error unloading model", e)
+            }
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import LeapSDK
+
+// Load model with automatic download and caching
+let modelRunner = try await Leap.load(
+    model: "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+) { progress, speed in
+    // progress: Double (0.0 to 1.0)
+    // speed: Int64 (bytes per second)
+    print("Download progress: \(Int(progress * 100))% at \(speed) bytes/s")
+}
+```
+
+</Tab>
+</Tabs>
+
+Available models and quantizations: [LEAP Model Library](https://leap.liquid.ai/models)
+
+### Method 2: Download Without Loading
+
+Separate download from loading for better control:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+import ai.liquid.leap.downloader.LeapModelDownloader
+
+class ChatViewModel(application: Application) : AndroidViewModel(application) {
+    private val downloader = LeapModelDownloader(application)
+    private var modelRunner: ModelRunner? = null
+
+    // Step 1: Download model to cache (doesn't load into memory)
+    suspend fun downloadModel() {
+        try {
+            downloader.downloadModel(
+                modelSlug = "LFM2.5-1.2B-Instruct",
+                quantizationSlug = "Q4_K_M",
+                progress = { progressData ->
+                    Log.d(TAG, "Download: ${(progressData.progress * 100).toInt()}%")
+                }
+            )
+            // Model is now cached locally
+        } catch (e: Exception) {
+            Log.e(TAG, "Download failed", e)
+        }
+    }
+
+    // Step 2: Later, load from cache (no download)
+    suspend fun loadCachedModel() {
+        try {
+            modelRunner = downloader.loadModel(
+                modelSlug = "LFM2.5-1.2B-Instruct",
+                quantizationSlug = "Q4_K_M"
+            )
+            // Loads immediately from cache, no network request
+        } catch (e: Exception) {
+            Log.e(TAG, "Load failed", e)
+        }
+    }
+
+    override fun onCleared() {
+        super.onCleared()
+        CoroutineScope(Dispatchers.IO).launch {
+            try {
+                modelRunner?.unload()
+            } catch (e: Exception) {
+                Log.e(TAG, "Error unloading model", e)
+            }
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import LeapModelDownloader
+
+let downloader = ModelDownloader()
+
+// Download model to cache
+let manifest = try await downloader.downloadModel(
+    "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+) { progress, speed in
+    print("Progress: \(Int(progress * 100))%")
+}
+
+// Later, load from cache (no download)
+let modelRunner = try await Leap.load(
+    model: "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+)
+```
+
+</Tab>
+</Tabs>
+
+### Method 3: Cross-Platform LeapDownloader (Kotlin Multiplatform)
+
+For KMP projects targeting iOS, macOS, JVM, and Android:
+
+```kotlin
+import ai.liquid.leap.LeapDownloader
+import ai.liquid.leap.LeapDownloaderConfig
+
+val downloader = LeapDownloader(
+    config = LeapDownloaderConfig(saveDir = "/path/to/models")
+)
+
+// Load model (downloads if not cached)
+val modelRunner = downloader.loadModel(
+    modelSlug = "LFM2.5-1.2B-Instruct",
+    quantizationSlug = "Q4_K_M"
+)
+```
+
+<Note>
+`LeapDownloader` does not provide Android-specific features like notifications or WorkManager integration. Use `LeapModelDownloader` for better UX on Android.
+</Note>
+
+### Method 4: Custom Manifest URL (Swift only)
+
+Load from a custom manifest:
+
+```swift
+let manifestURL = URL(string: "https://your-server.com/model-manifest.json")!
+
+let modelRunner = try await Leap.load(
+    manifestURL: manifestURL,
+    downloadProgressHandler: { progress, speed in
+        print("Progress: \(Int(progress * 100))%")
+    }
+)
+```
+
+### Method 5: Local Bundle (Swift only, Legacy)
+
+Load from a local `.bundle` or `.gguf` file:
+
+```swift
+guard let bundleURL = Bundle.main.url(forResource: "model", withExtension: "bundle") else {
+    fatalError("Model bundle not found")
+}
+
+let modelRunner = try await Leap.load(
+    url: bundleURL,
+    options: LiquidInferenceEngineOptions(
+        bundlePath: bundleURL.path,
+        cpuThreads: 6,
+        contextSize: 8192,
+        nGpuLayers: 8  // Metal GPU acceleration on macOS
+    )
+)
+```
+
+## Core Classes
+
+### `ModelRunner`
+
+The loaded model instance. Create conversations from this.
+
+<Tabs>
+<Tab title="Kotlin">
+
+**Methods:**
+- `createConversation(systemPrompt: String? = null): Conversation` -- Start new chat
+- `createConversationFromHistory(history: List<ChatMessage>): Conversation` -- Restore chat
+- `suspend fun unload()` -- Free memory (MUST call in `onCleared`)
+
+```kotlin
+val conversation = modelRunner.createConversation(
+    systemPrompt = "Explain it to me like I'm 5 years old"
+)
+
+// Or restore from saved history
+val conversation = modelRunner.createConversationFromHistory(savedHistory)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+protocol ModelRunner {
+    func createConversation(systemPrompt: String?) -> Conversation
+    func createConversationFromHistory(history: [ChatMessage]) -> Conversation
+}
+```
+
+**Usage:**
+
+```swift
+let conversation = modelRunner.createConversation(
+    systemPrompt: "Explain it to me like I'm 5 years old"
+)
+
+// Or restore from saved history
+let savedHistory: [ChatMessage] = loadHistoryFromDisk()
+let conversation = modelRunner.createConversationFromHistory(history: savedHistory)
+```
+
+</Tab>
+</Tabs>
+
+### `Conversation`
+
+Manages chat history and generation.
+
+<Tabs>
+<Tab title="Kotlin">
+
+**Fields:**
+- `history: List<ChatMessage>` -- Full message history (returns a copy, immutable)
+- `isGenerating: Boolean` -- Thread-safe generation status
+
+**Methods:**
+- `generateResponse(userTextMessage: String, options: GenerationOptions? = null): Flow<MessageResponse>`
+- `generateResponse(message: ChatMessage, options: GenerationOptions? = null): Flow<MessageResponse>`
+- `registerFunction(function: LeapFunction)` -- Add tool for function calling
+- `appendToHistory(message: ChatMessage)` -- Add message without generating
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+class Conversation {
+    let modelRunner: ModelRunner
+    private(set) var history: [ChatMessage]
+    private(set) var isGenerating: Bool
+
+    func generateResponse(
+        message: ChatMessage,
+        generationOptions: GenerationOptions?
+    ) -> AsyncThrowingStream<MessageResponse, Error>
+
+    func generateResponse(
+        userTextMessage: String,
+        generationOptions: GenerationOptions?
+    ) -> AsyncThrowingStream<MessageResponse, Error>
+
+    func registerFunction(_ function: LeapFunction)
+    func exportToJSON() throws -> [[String: Any]]
+}
+```
+
+**Properties:**
+- `history` -- Array of `ChatMessage` objects representing the conversation
+- `isGenerating` -- Boolean indicating if generation is in progress
+
+</Tab>
+</Tabs>
+
+### `ChatMessage`
+
+Represents a single message in the conversation.
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+data class ChatMessage(
+    val role: Role,              // USER, ASSISTANT, SYSTEM, TOOL
+    val content: List<ChatMessageContent>,
+    val reasoningContent: String? = null,  // From reasoning models
+    val functionCalls: List<LeapFunctionCall>? = null
+)
+
+enum class Role { USER, ASSISTANT, SYSTEM, TOOL }
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+struct ChatMessage {
+    var role: ChatMessageRole  // .system, .user, .assistant, .tool
+    var content: [ChatMessageContent]
+    var reasoningContent: String?  // For reasoning models
+    var functionCalls: [LeapFunctionCall]?
+}
+
+enum ChatMessageRole: String {
+    case user = "user"
+    case system = "system"
+    case assistant = "assistant"
+    case tool = "tool"
+}
+```
+
+</Tab>
+</Tabs>
+
+### `ChatMessageContent`
+
+Content types supported in messages.
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+ChatMessageContent.Text(text: String)
+ChatMessageContent.Image(jpegByteArray: ByteArray)  // JPEG only
+ChatMessageContent.Audio(wavByteArray: ByteArray)   // WAV only
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+enum ChatMessageContent {
+    case text(String)
+    case image(Data)  // JPEG encoded
+    case audio(Data)  // WAV encoded (16kHz, mono, PCM)
+}
+```
+
+**Creating Audio Content:**
+
+```swift
+// From WAV file
+let wavData = try Data(contentsOf: audioFileURL)
+let audioContent = ChatMessageContent.audio(wavData)
+
+// From float samples
+let samples: [Float] = [0.1, 0.2, 0.15, ...]  // Normalized to -1.0 to 1.0
+let audioContent = ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)
+```
+
+</Tab>
+</Tabs>
+
+**Audio Requirements (both platforms):**
+- Format: WAV (RIFF) only -- no MP3/AAC/OGG
+- Sample Rate: 16 kHz (mono channel required)
+- Encoding: PCM (Float32, Int16, Int24, or Int32)
+- Channels: Mono (1 channel) -- stereo will be rejected
+
+### `MessageResponse`
+
+Streaming response types from generation.
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+MessageResponse.Chunk(text: String)                    // Text token
+MessageResponse.ReasoningChunk(reasoning: String)      // Thinking (LFM2.5-1.2B-Thinking)
+MessageResponse.FunctionCalls(functionCalls: List)     // Tool calls requested
+MessageResponse.AudioSample(samples: FloatArray, sampleRate: Int)  // Audio output (24kHz)
+MessageResponse.Complete(
+    fullMessage: ChatMessage,
+    finishReason: GenerationFinishReason,  // STOP or EXCEED_CONTEXT
+    stats: GenerationStats?                // Token counts, tokens/sec
+)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+enum MessageResponse {
+    case chunk(String)  // Text chunk
+    case reasoningChunk(String)  // Reasoning text (thinking models only)
+    case audioSample(samples: [Float], sampleRate: Int)  // Audio output (24kHz typically)
+    case functionCall([LeapFunctionCall])  // Function call requests
+    case complete(MessageCompletion)  // Generation complete
+}
+```
+
+**`MessageCompletion` Fields:**
+
+```swift
+struct MessageCompletion {
+    let message: ChatMessage  // Complete assistant message
+    let finishReason: GenerationFinishReason  // .stop or .exceed_context
+    let stats: GenerationStats?  // Token counts and speed
+}
+
+struct GenerationStats {
+    var promptTokens: UInt64
+    var completionTokens: UInt64
+    var totalTokens: UInt64
+    var tokenPerSecond: Float
+}
+```
+
+</Tab>
+</Tabs>
+
+### `GenerationOptions`
+
+Control generation behavior.
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+val options = GenerationOptions(
+    temperature = 0.7f,              // Randomness (0.0 = deterministic, 1.0+ = creative)
+    topP = 0.9f,                     // Nucleus sampling
+    minP = 0.05f,                    // Minimum probability
+    repetitionPenalty = 1.1f,        // Prevent repetition
+    jsonSchemaConstraint = """{"type":"object",...}""",  // Force JSON output
+    functionCallParser = LFMFunctionCallParser(),  // Enable function calling (null to disable)
+    inlineThinkingTags = false       // Emit ReasoningChunk separately (for thinking models)
+)
+
+conversation.generateResponse(userInput, options).collect { ... }
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+struct GenerationOptions {
+    var temperature: Float?  // Randomness (0.0 to 2.0)
+    var topP: Float?  // Nucleus sampling
+    var minP: Float?  // Minimum probability
+    var repetitionPenalty: Float?  // Reduce repetition
+    var rngSeed: UInt64?  // Seed for deterministic output
+    var jsonSchemaConstraint: String?  // JSON schema for structured output
+    var functionCallParser: LeapFunctionCallParserProtocol?
+    var resetHistory: Bool  // Default true
+    var sequenceLength: UInt32?  // Override context length
+    var maxOutputTokens: UInt32?  // Limit output length
+    var enableThinking: Bool  // Surface <think> blocks
+    var cacheControl: CacheControl?
+}
+```
+
+**Example:**
+
+```swift
+var options = GenerationOptions(
+    temperature: 0.7,
+    maxOutputTokens: 512,
+    enableThinking: false
+)
+
+// For structured output
+try options.setResponseFormat(type: MyStruct.self)
+```
+
+</Tab>
+</Tabs>
+
+## Generation Patterns
+
+### Basic Text Generation
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+class ChatViewModel : ViewModel() {
+    private var generationJob: Job? = null
+    private val _responseText = MutableStateFlow("")
+
+    fun generate(userInput: String) {
+        generationJob?.cancel()  // Cancel previous generation
+
+        generationJob = viewModelScope.launch {
+            conversation?.generateResponse(userInput)
+                ?.onEach { response ->
+                    when (response) {
+                        is MessageResponse.Chunk -> {
+                            _responseText.value += response.text
+                        }
+                        is MessageResponse.Complete -> {
+                            Log.d(TAG, "Tokens/sec: ${response.stats?.tokenPerSecond}")
+                        }
+                        else -> {}
+                    }
+                }
+                ?.catch { e ->
+                    // Handle error
+                }
+                ?.collect()
+        }
+    }
+
+    fun stopGeneration() {
+        generationJob?.cancel()
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import LeapSDK
+
+@MainActor
+final class ChatViewModel: ObservableObject {
+    @Published var messages: [ChatMessage] = []
+    @Published var isGenerating = false
+    @Published var currentResponse = ""
+
+    private var modelRunner: ModelRunner?
+    private var conversation: Conversation?
+    private var generationTask: Task<Void, Never>?
+
+    func loadModel() async {
+        do {
+            modelRunner = try await Leap.load(
+                model: "LFM2.5-1.2B-Instruct",
+                quantization: "Q4_K_M"
+            ) { progress, _ in
+                print("Loading: \(Int(progress * 100))%")
+            }
+            conversation = modelRunner?.createConversation(
+                systemPrompt: "Explain it to me like I'm 5 years old"
+            )
+        } catch {
+            print("Failed to load model: \(error)")
+        }
+    }
+
+    func send(_ text: String) {
+        guard let conversation else { return }
+
+        // Cancel any ongoing generation
+        generationTask?.cancel()
+
+        let userMessage = ChatMessage(role: .user, content: [.text(text)])
+        currentResponse = ""
+        isGenerating = true
+
+        generationTask = Task {
+            do {
+                for try await response in conversation.generateResponse(
+                    message: userMessage,
+                    generationOptions: GenerationOptions(temperature: 0.7)
+                ) {
+                    await handleResponse(response)
+                }
+            } catch {
+                print("Generation error: \(error)")
+            }
+            isGenerating = false
+        }
+    }
+
+    func stopGeneration() {
+        generationTask?.cancel()
+        generationTask = nil
+        isGenerating = false
+    }
+
+    @MainActor
+    private func handleResponse(_ response: MessageResponse) {
+        switch response {
+        case .chunk(let text):
+            currentResponse += text
+
+        case .reasoningChunk(let reasoning):
+            print("Thinking: \(reasoning)")
+
+        case .audioSample(let samples, let sampleRate):
+            // Handle audio output (typically 24kHz)
+            playAudio(samples: samples, sampleRate: sampleRate)
+
+        case .functionCall(let calls):
+            // Handle function calls
+            handleFunctionCalls(calls)
+
+        case .complete(let completion):
+            if let stats = completion.stats {
+                print("Generated \(stats.completionTokens) tokens at \(stats.tokenPerSecond) tok/s")
+            }
+            // Final message is already in conversation.history
+            messages = conversation?.history ?? []
+            currentResponse = ""
+        }
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+### Multimodal Input (Vision)
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+val imageBytes = File("image.jpg").readBytes()  // JPEG only
+
+val message = ChatMessage(
+    role = ChatMessage.Role.USER,
+    content = listOf(
+        ChatMessageContent.Image(imageBytes),
+        ChatMessageContent.Text("What's in this image?")
+    )
+)
+
+conversation.generateResponse(message).collect { ... }
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+func sendImageMessage(image: UIImage, prompt: String) {
+    guard let jpegData = image.jpegData(compressionQuality: 0.8) else { return }
+
+    let message = ChatMessage(
+        role: .user,
+        content: [
+            .text(prompt),
+            .image(jpegData)
+        ]
+    )
+
+    Task {
+        for try await response in conversation.generateResponse(message: message) {
+            await handleResponse(response)
+        }
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+### Audio Input
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+import ai.liquid.leap.audio.FloatAudioBuffer
+
+// From raw PCM samples
+val audioBuffer = FloatAudioBuffer(sampleRate = 16000)
+audioBuffer.add(floatArrayOf(...))  // Float samples normalized -1.0 to 1.0
+val wavBytes = audioBuffer.createWavBytes()
+
+val message = ChatMessage(
+    role = ChatMessage.Role.USER,
+    content = listOf(
+        ChatMessageContent.Audio(wavBytes),
+        ChatMessageContent.Text("Transcribe this audio")
+    )
+)
+
+conversation.generateResponse(message).collect { ... }
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import AVFoundation
+
+func transcribeAudio(audioFileURL: URL) async {
+    // Load WAV file (must be 16kHz, mono, PCM)
+    guard let wavData = try? Data(contentsOf: audioFileURL) else { return }
+
+    let message = ChatMessage(
+        role: .user,
+        content: [
+            .text("Transcribe this audio:"),
+            .audio(wavData)
+        ]
+    )
+
+    Task {
+        for try await response in conversation.generateResponse(message: message) {
+            await handleResponse(response)
+        }
+    }
+}
+
+// Recording audio with AVAudioRecorder
+class AudioRecorder {
+    private var audioRecorder: AVAudioRecorder?
+
+    func startRecording(to url: URL) throws {
+        let settings: [String: Any] = [
+            AVFormatIDKey: Int(kAudioFormatLinearPCM),
+            AVSampleRateKey: 16000.0,  // 16 kHz required
+            AVNumberOfChannelsKey: 1,  // Mono required
+            AVEncoderBitDepthKey: 16,  // 16-bit PCM
+            AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
+        ]
+
+        audioRecorder = try AVAudioRecorder(url: url, settings: settings)
+        audioRecorder?.record()
+    }
+
+    func stopRecording() -> URL? {
+        audioRecorder?.stop()
+        return audioRecorder?.url
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+### Audio Output (Text-to-Speech)
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+val audioSamples = mutableListOf<FloatArray>()
+
+conversation.generateResponse("Say hello").collect { response ->
+    when (response) {
+        is MessageResponse.AudioSample -> {
+            // samples: FloatArray (Float32 PCM, -1.0 to 1.0)
+            // sampleRate: Int (typically 24000 Hz)
+            audioSamples.add(response.samples)
+            playAudio(response.samples, response.sampleRate)
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+for try await response in conversation.generateResponse(message: message) {
+    switch response {
+    case .audioSample(let samples, let sampleRate):
+        // samples: [Float] (Float32 PCM, -1.0 to 1.0)
+        // sampleRate: Int (typically 24000 Hz)
+        playAudio(samples: samples, sampleRate: sampleRate)
+    default:
+        break
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+## Function Calling
+
+Register functions for the model to invoke. See also the [Function Calling guide](./function-calling).
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// 1. Define function
+val getWeather = LeapFunction(
+    name = "get_weather",
+    description = "Get current weather for a city",
+    parameters = """
+        {
+            "type": "object",
+            "properties": {
+                "city": {"type": "string"},
+                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
+            },
+            "required": ["city"]
+        }
+    """
+)
+
+// 2. Register function
+conversation.registerFunction(getWeather)
+
+// 3. Handle function calls
+conversation.generateResponse("What's the weather in Tokyo?").collect { response ->
+    when (response) {
+        is MessageResponse.FunctionCalls -> {
+            response.functionCalls.forEach { call ->
+                // call.name: String
+                // call.arguments: String (JSON)
+                val result = executeTool(call.name, call.arguments)
+
+                // Add result back to conversation
+                val toolMessage = ChatMessage(
+                    role = ChatMessage.Role.TOOL,
+                    content = listOf(ChatMessageContent.Text(result))
+                )
+                conversation.appendToHistory(toolMessage)
+
+                // Generate next response
+                conversation.generateResponse("").collect { ... }
+            }
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// Define function
+let weatherFunction = LeapFunction(
+    name: "get_weather",
+    description: "Get the current weather for a location",
+    parameters: [
+        LeapFunctionParameter(
+            name: "location",
+            description: "City name",
+            type: .string,
+            required: true
+        ),
+        LeapFunctionParameter(
+            name: "unit",
+            description: "Temperature unit",
+            type: .string,
+            required: false,
+            enumValues: ["celsius", "fahrenheit"]
+        )
+    ]
+)
+
+// Register with conversation
+conversation.registerFunction(weatherFunction)
+
+// Handle function calls in response
+func handleResponse(_ response: MessageResponse) {
+    switch response {
+    case .functionCall(let calls):
+        for call in calls {
+            if call.name == "get_weather" {
+                let location = call.arguments["location"] as? String ?? "Unknown"
+                let result = getWeather(location: location)
+
+                // Add tool result back to conversation
+                let toolMessage = ChatMessage(
+                    role: .tool,
+                    content: [.text(result)]
+                )
+
+                // Create new conversation with updated history
+                let updatedHistory = conversation.history + [toolMessage]
+                conversation = modelRunner.createConversationFromHistory(
+                    history: updatedHistory
+                )
+            }
+        }
+    default:
+        break
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+## Structured Output (Constrained Generation)
+
+Use the `@Generatable` annotation/macro for type-safe JSON output. See also the [Constrained Generation guide](./constrained-generation).
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+@Serializable
+@Generatable("Recipe information")
+data class Recipe(
+    val name: String,
+    val ingredients: List<String>,
+    val steps: List<String>
+)
+
+val options = GenerationOptions().apply {
+    setResponseFormatType<Recipe>()  // Auto-generates JSON schema
+}
+
+conversation.generateResponse("Generate a pasta recipe", options).collect { response ->
+    if (response is MessageResponse.Complete) {
+        val recipe = LeapJson.decodeFromString<Recipe>(response.fullMessage.content[0].text)
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import LeapSDK
+
+@Generatable
+struct Recipe {
+    let name: String
+    let ingredients: [String]
+    let steps: [String]
+    let cookingTime: Int
+}
+
+// Configure generation
+var options = GenerationOptions()
+try options.setResponseFormat(type: Recipe.self)
+
+// Generate
+for try await response in conversation.generateResponse(
+    message: ChatMessage(role: .user, content: [.text("Give me a pasta recipe")]),
+    generationOptions: options
+) {
+    if case .complete(let completion) = response {
+        // Parse JSON response into Recipe struct
+        if case .text(let json) = completion.message.content.first {
+            let recipe = try JSONDecoder().decode(Recipe.self, from: json.data(using: .utf8)!)
+            print("Recipe: \(recipe.name)")
+        }
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+## Conversation Persistence
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// Save conversation
+val json = LeapJson.encodeToString(conversation.history)
+
+// Restore conversation
+val history = LeapJson.decodeFromString<List<ChatMessage>>(json)
+val conversation = modelRunner.createConversationFromHistory(history)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import Foundation
+
+// Save conversation
+func saveConversation() throws {
+    let jsonArray = try conversation.exportToJSON()
+    let data = try JSONSerialization.data(withJSONObject: jsonArray)
+    try data.write(to: conversationFileURL)
+}
+
+// Restore conversation
+func restoreConversation() throws {
+    let data = try Data(contentsOf: conversationFileURL)
+    let jsonArray = try JSONSerialization.jsonObject(with: data) as! [[String: Any]]
+
+    let history = try jsonArray.map { json in
+        try ChatMessage(from: json)
+    }
+
+    conversation = modelRunner.createConversationFromHistory(history: history)
+}
+
+// Using Codable (alternative)
+func saveWithCodable() throws {
+    let encoder = JSONEncoder()
+    let data = try encoder.encode(conversation.history)
+    try data.write(to: conversationFileURL)
+}
+
+func restoreWithCodable() throws {
+    let data = try Data(contentsOf: conversationFileURL)
+    let decoder = JSONDecoder()
+    let history = try decoder.decode([ChatMessage].self, from: data)
+    conversation = modelRunner.createConversationFromHistory(history: history)
+}
+```
+
+</Tab>
+</Tabs>
+
+## Model Download Management
+
+Query download status and manage cached models.
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+import ai.liquid.leap.downloader.LeapModelDownloader
+
+val downloader = LeapModelDownloader(application)
+
+// Query status for a specific model
+viewModelScope.launch {
+    val status = downloader.queryStatus(
+        modelSlug = "LFM2.5-1.2B-Instruct",
+        quantizationSlug = "Q4_K_M"
+    )
+
+    when (status) {
+        is ModelDownloadStatus.NotOnLocal -> {
+            Log.d(TAG, "Model not downloaded")
+        }
+        is ModelDownloadStatus.DownloadInProgress -> {
+            val progressPercent = (status.progress * 100).toInt()
+            Log.d(TAG, "Downloading: $progressPercent%")
+        }
+        is ModelDownloadStatus.Downloaded -> {
+            Log.d(TAG, "Model ready to load")
+        }
+    }
+}
+
+// Get total model size before downloading
+val totalBytes = downloader.getModelSize(
+    modelSlug = "LFM2.5-1.2B-Instruct",
+    quantizationSlug = "Q4_K_M"
+)
+val totalMB = totalBytes / (1024 * 1024)
+
+// Remove a specific model from cache
+downloader.removeModel(
+    modelSlug = "LFM2.5-1.2B-Instruct",
+    quantizationSlug = "Q4_K_M"
+)
+
+// Cancel an in-progress download
+downloader.cancelDownload(
+    modelSlug = "LFM2.5-1.2B-Instruct",
+    quantizationSlug = "Q4_K_M"
+)
+```
+
+**Download Status Types:**
+
+```kotlin
+sealed interface ModelDownloadStatus {
+    object NotOnLocal : ModelDownloadStatus
+    data class DownloadInProgress(val progress: Float) : ModelDownloadStatus  // 0.0 to 1.0
+    object Downloaded : ModelDownloadStatus
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import LeapModelDownloader
+
+let downloader = ModelDownloader()
+
+// Check download status
+let status = downloader.queryStatus("LFM2.5-1.2B-Instruct", quantization: "Q4_K_M")
+
+switch status {
+case .notOnLocal:
+    print("Model not downloaded")
+case .downloadInProgress(let progress):
+    print("Downloading: \(Int(progress * 100))%")
+case .downloaded:
+    print("Model ready")
+}
+
+// Get model size before downloading
+let sizeInBytes = try await downloader.getModelSize(
+    modelName: "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+)
+print("Model size: \(sizeInBytes / 1_000_000) MB")
+
+// Remove downloaded model
+try downloader.removeModel("LFM2.5-1.2B-Instruct", quantization: "Q4_K_M")
+
+// Cancel ongoing download
+downloader.requestStopDownload(model)
+```
+
+</Tab>
+</Tabs>
+
+## Complete ViewModel Example
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+import ai.liquid.leap.*
+import ai.liquid.leap.downloader.*
+import ai.liquid.leap.message.*
+import android.app.Application
+import androidx.lifecycle.AndroidViewModel
+import androidx.lifecycle.viewModelScope
+import kotlinx.coroutines.*
+import kotlinx.coroutines.flow.*
+
+class ChatViewModel(application: Application) : AndroidViewModel(application) {
+    private val downloader = LeapModelDownloader(
+        application,
+        notificationConfig = LeapModelDownloaderNotificationConfig.build {
+            notificationTitleDownloading = "Downloading model..."
+            notificationTitleDownloaded = "Model ready!"
+        }
+    )
+
+    private var modelRunner: ModelRunner? = null
+    private var conversation: Conversation? = null
+    private var generationJob: Job? = null
+
+    private val _messages = MutableStateFlow<List<ChatMessage>>(emptyList())
+    val messages: StateFlow<List<ChatMessage>> = _messages.asStateFlow()
+
+    private val _isLoading = MutableStateFlow(false)
+    val isLoading: StateFlow<Boolean> = _isLoading.asStateFlow()
+
+    private val _isGenerating = MutableStateFlow(false)
+    val isGenerating: StateFlow<Boolean> = _isGenerating.asStateFlow()
+
+    private val _currentResponse = MutableStateFlow("")
+    val currentResponse: StateFlow<String> = _currentResponse.asStateFlow()
+
+    fun loadModel() {
+        viewModelScope.launch {
+            _isLoading.value = true
+            try {
+                modelRunner = downloader.loadModel(
+                    modelSlug = "LFM2.5-1.2B-Instruct",
+                    quantizationSlug = "Q4_K_M"
+                )
+                conversation = modelRunner?.createConversation(
+                    systemPrompt = "Explain it to me like I'm 5 years old"
+                )
+            } catch (e: Exception) {
+                // Handle error
+            } finally {
+                _isLoading.value = false
+            }
+        }
+    }
+
+    fun sendMessage(text: String) {
+        generationJob?.cancel()
+        _currentResponse.value = ""
+
+        generationJob = viewModelScope.launch {
+            _isGenerating.value = true
+            try {
+                conversation?.generateResponse(text)
+                    ?.onEach { response ->
+                        when (response) {
+                            is MessageResponse.Chunk -> {
+                                _currentResponse.value += response.text
+                            }
+                            is MessageResponse.Complete -> {
+                                _messages.value = conversation?.history ?: emptyList()
+                                _currentResponse.value = ""
+                            }
+                            else -> {}
+                        }
+                    }
+                    ?.catch { e ->
+                        // Handle generation error
+                    }
+                    ?.collect()
+            } finally {
+                _isGenerating.value = false
+            }
+        }
+    }
+
+    fun stopGeneration() {
+        generationJob?.cancel()
+        _isGenerating.value = false
+    }
+
+    override fun onCleared() {
+        super.onCleared()
+        generationJob?.cancel()
+        CoroutineScope(Dispatchers.IO).launch {
+            try {
+                modelRunner?.unload()
+            } catch (e: Exception) {
+                Log.e(TAG, "Error unloading model", e)
+            }
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+import SwiftUI
+import LeapSDK
+import LeapModelDownloader
+
+@MainActor
+final class ChatViewModel: ObservableObject {
+    @Published var messages: [ChatMessage] = []
+    @Published var currentResponse = ""
+    @Published var isGenerating = false
+    @Published var isLoadingModel = false
+    @Published var downloadProgress: Double = 0.0
+    @Published var error: String?
+
+    private var modelRunner: ModelRunner?
+    private var conversation: Conversation?
+    private var generationTask: Task<Void, Never>?
+
+    func loadModel() async {
+        isLoadingModel = true
+        downloadProgress = 0.0
+        error = nil
+
+        do {
+            modelRunner = try await Leap.load(
+                model: "LFM2.5-1.2B-Instruct",
+                quantization: "Q4_K_M"
+            ) { [weak self] progress, speed in
+                Task { @MainActor in
+                    self?.downloadProgress = progress
+                }
+            }
+
+            conversation = modelRunner?.createConversation(
+                systemPrompt: "Explain it to me like I'm 5 years old"
+            )
+
+        } catch {
+            self.error = "Failed to load model: \(error.localizedDescription)"
+        }
+
+        isLoadingModel = false
+    }
+
+    func send(_ text: String) {
+        guard let conversation, !text.isEmpty else { return }
+
+        generationTask?.cancel()
+
+        let userMessage = ChatMessage(role: .user, content: [.text(text)])
+        messages.append(userMessage)
+        currentResponse = ""
+        isGenerating = true
+
+        generationTask = Task {
+            do {
+                for try await response in conversation.generateResponse(
+                    message: userMessage,
+                    generationOptions: GenerationOptions(
+                        temperature: 0.7,
+                        maxOutputTokens: 512
+                    )
+                ) {
+                    await handleResponse(response)
+                }
+            } catch is CancellationError {
+                // Generation was cancelled
+            } catch {
+                self.error = "Generation failed: \(error.localizedDescription)"
+            }
+
+            isGenerating = false
+        }
+    }
+
+    func stopGeneration() {
+        generationTask?.cancel()
+        generationTask = nil
+        isGenerating = false
+    }
+
+    @MainActor
+    private func handleResponse(_ response: MessageResponse) {
+        switch response {
+        case .chunk(let text):
+            currentResponse += text
+
+        case .reasoningChunk(let reasoning):
+            print("Thinking: \(reasoning)")
+
+        case .audioSample(let samples, let sampleRate):
+            // Handle audio playback
+            break
+
+        case .functionCall(let calls):
+            // Handle function calls
+            break
+
+        case .complete(let completion):
+            if let stats = completion.stats {
+                print("Stats: \(stats.totalTokens) tokens, \(stats.tokenPerSecond) tok/s")
+            }
+            messages = conversation?.history ?? []
+            currentResponse = ""
+        }
+    }
+
+    deinit {
+        generationTask?.cancel()
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+## Error Handling
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+sealed class LeapException : Exception()
+class LeapModelLoadingException : LeapException()
+class LeapGenerationException : LeapException()
+class LeapGenerationPromptExceedContextLengthException : LeapException()
+class LeapSerializationException : LeapException()
+
+try {
+    modelRunner = downloader.loadModel(...)
+} catch (e: LeapModelLoadingException) {
+    // Model failed to load
+} catch (e: LeapGenerationPromptExceedContextLengthException) {
+    // Prompt too long
+} catch (e: Exception) {
+    // Other errors
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+enum LeapError: Error {
+    case modelLoadingFailure(String, Error?)
+    case generationFailure(String, Error?)
+    case serializationFailure(String, Error?)
+    case invalidInput(String)
+}
+
+// Handling errors
+do {
+    let modelRunner = try await Leap.load(model: "LFM2.5-1.2B-Instruct", quantization: "Q4_K_M")
+} catch let error as LeapError {
+    switch error {
+    case .modelLoadingFailure(let message, _):
+        print("Model loading failed: \(message)")
+    case .generationFailure(let message, _):
+        print("Generation failed: \(message)")
+    case .serializationFailure(let message, _):
+        print("Serialization failed: \(message)")
+    case .invalidInput(let message):
+        print("Invalid input: \(message)")
+    }
+} catch {
+    print("Unexpected error: \(error)")
+}
+```
+
+</Tab>
+</Tabs>
+
+## Imports Reference
+
+<Tabs>
+<Tab title="Kotlin">
+
+**Android (LeapModelDownloader):**
+
+```kotlin
+import ai.liquid.leap.Conversation
+import ai.liquid.leap.ModelRunner
+import ai.liquid.leap.downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloaderNotificationConfig
+import ai.liquid.leap.message.ChatMessage
+import ai.liquid.leap.message.ChatMessageContent
+import ai.liquid.leap.message.MessageResponse
+import ai.liquid.leap.generation.GenerationOptions
+import ai.liquid.leap.LeapException
+```
+
+**Cross-Platform (LeapDownloader):**
+
+```kotlin
+import ai.liquid.leap.Conversation
+import ai.liquid.leap.ModelRunner
+import ai.liquid.leap.LeapDownloader
+import ai.liquid.leap.LeapDownloaderConfig
+import ai.liquid.leap.message.ChatMessage
+import ai.liquid.leap.message.ChatMessageContent
+import ai.liquid.leap.message.MessageResponse
+import ai.liquid.leap.generation.GenerationOptions
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// Core SDK
+import LeapSDK
+
+// Optional model downloader
+import LeapModelDownloader
+
+// SwiftUI integration
+import SwiftUI
+import Combine
+
+// Audio handling
+import AVFoundation
+
+// Image processing
+import UIKit  // iOS
+import AppKit  // macOS
+```
+
+</Tab>
+</Tabs>
+
+## Model Selection Guide
+
+### Text Models
+- **LFM2.5-1.2B-Instruct**: General purpose (recommended)
+- **LFM2.5-1.2B-Thinking**: Extended reasoning (emits `ReasoningChunk`)
+- **LFM2-1.2B**: Stable version
+- **LFM2-1.2B-Tool**: Optimized for function calling
+
+### Multimodal Models
+- **LFM2.5-VL-1.6B**: Vision + text
+- **LFM2.5-Audio-1.5B**: Audio + text (TTS, ASR, voice chat)
+
+## Quantization Guide
+
+Choose the right balance of speed vs quality:
+
+| Quantization | Quality | Size | Speed | Use Case |
+|---|---|---|---|---|
+| **Q4_0** | Lowest | Smallest | Fastest | Prototyping, low-end devices |
+| **Q4_K_M** | Good | Small | Fast | **Recommended for most apps** |
+| **Q5_K_M** | Better | Medium | Medium | Quality-sensitive applications |
+| **Q6_K** | High | Large | Slower | High-quality responses needed |
+| **Q8_0** | Near-original | Larger | Slow | Maximum quality |
+| **F16** | Original | Largest | Slowest | Research, benchmarking |
+
+## Critical Best Practices
+
+### 1. Model Unloading (REQUIRED)
+
+Always release model resources when you are done. On Android, unload asynchronously to avoid ANR (Application Not Responding) errors. On iOS, nil out the references.
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+override fun onCleared() {
+    super.onCleared()
+
+    // Unload model asynchronously to avoid ANR
+    // NEVER use runBlocking - it blocks the main thread and causes ANRs
+    CoroutineScope(Dispatchers.IO).launch {
+        try {
+            modelRunner?.unload()
+        } catch (e: Exception) {
+            Log.e(TAG, "Error unloading model", e)
+        }
+    }
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// Explicitly unload model when done
+modelRunner = nil
+conversation = nil
+```
+
+</Tab>
+</Tabs>
+
+### 2. Generation Cancellation
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// Generation auto-cancels when Flow collection is cancelled
+generationJob?.cancel()
+
+// Or when viewModelScope is cleared (ViewModel destroyed)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+generationTask?.cancel()
+generationTask = nil
+isGenerating = false
+```
+
+</Tab>
+</Tabs>
+
+### 3. Thread Safety
+
+- All SDK operations are main-thread safe on both platforms
+- **Kotlin:** Use `viewModelScope.launch` for all suspend functions
+- **Swift:** Use `@MainActor` for UI-bound ViewModels and `Task {}` for async work
+- Callbacks run on the main thread
+
+### 4. History Management
+
+Both platforms return a copy of the history that is safe to read without synchronization:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// conversation.history returns a COPY
+val history = conversation.history  // Safe to read
+
+// To restore conversation
+val newConversation = modelRunner.createConversationFromHistory(savedHistory)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// conversation.history returns a copy
+let history = conversation.history  // Safe to read
+
+// To restore conversation
+let newConversation = modelRunner.createConversationFromHistory(history: savedHistory)
+```
+
+</Tab>
+</Tabs>
+
+### 5. Serialization
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// Save conversation
+val json = LeapJson.encodeToString(conversation.history)
+
+// Restore conversation
+val history = LeapJson.decodeFromString<List<ChatMessage>>(json)
+val conversation = modelRunner.createConversationFromHistory(history)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// Save conversation
+let data = try JSONEncoder().encode(conversation.history)
+try data.write(to: fileURL)
+
+// Restore conversation
+let data = try Data(contentsOf: fileURL)
+let history = try JSONDecoder().decode([ChatMessage].self, from: data)
+let conversation = modelRunner.createConversationFromHistory(history: history)
+```
+
+</Tab>
+</Tabs>
+
+## Troubleshooting
+
+### Model Fails to Load
+- Check internet connection (first download requires network)
+- **Android:** Verify `minSdk = 31` in `build.gradle.kts`; use physical device (emulators may crash)
+- **iOS/macOS:** Test on physical device (simulator is much slower)
+- Check storage space -- models typically need 500MB to 2GB
+
+### Generation is Slow
+- Test on a physical device (simulators and emulators are much slower)
+- Use smaller quantization (`Q4_K_M` instead of `Q8_0`)
+- Reduce context size in options
+- **macOS:** Increase `nGpuLayers` for Metal GPU acceleration
+
+### Audio Not Working
+- Verify WAV format (16kHz, mono, PCM) -- no MP3/AAC/OGG
+- Check that the model supports audio (LFM2.5-Audio models)
+- Ensure mono channel -- stereo will be rejected
+- Audio output is typically 24kHz (different from 16kHz input)
+
+### Memory Issues
+- Always unload the model when done (see Critical Best Practices above)
+- Do not load multiple models simultaneously
+- Use appropriate quantization (`Q4_K_M` recommended)
+- Use smaller models on devices with limited RAM (e.g., LFM2-350M for 3GB devices, LFM2.5-1.2B for 6GB+ devices)
+
+### Generation Fails
+- Check prompt length vs context window
+- Verify the model supports the feature you are using (vision, audio, function calling)
+- Check `isGenerating` before starting a new generation
+
+## Platform Requirements
+
+| Requirement | Android | iOS | macOS |
+|---|---|---|---|
+| **Minimum OS** | API 31 (Android 12) | 14.0+ | 11.0+ |
+| **Build tools** | Gradle + AGP | Xcode 15+ / Swift 5.9+ | Xcode 15+ / Swift 5.9+ |
+| **Distribution** | Maven (Gradle) | SPM | SPM |
+| **Device RAM** | 3GB min (6GB+ recommended) | 3GB min (6GB+ recommended) | 6GB+ recommended |
+| **Storage** | 500MB - 2GB per model | 500MB - 2GB per model | 500MB - 2GB per model |
+
+## Related Guides
+
+- [Quick Start Guide](./quick-start-guide) -- Get up and running in minutes
+- [Constrained Generation](./constrained-generation) -- Structured JSON output with schemas
+- [Function Calling](./function-calling) -- Tool use and agentic workflows
+- [Conversation & Generation](./conversation-generation) -- Deep dive into conversation management
+- [Messages & Content](./messages-content) -- Multimodal message types
+- [Model Loading](./model-loading) -- Advanced loading options and configuration
diff --git a/deployment/on-device/leap-sdk/cloud-ai-comparison.mdx b/deployment/on-device/leap-sdk/cloud-ai-comparison.mdx
new file mode 100644
index 0000000..4a04fb7
--- /dev/null
+++ b/deployment/on-device/leap-sdk/cloud-ai-comparison.mdx
@@ -0,0 +1,285 @@
+---
+title: "Cloud AI Comparison"
+description: "Compare the LEAP SDK with cloud-based AI APIs like OpenAI"
+sidebar_position: 5
+---
+
+If you are familiar with cloud-based AI APIs (e.g. [OpenAI API](https://openai.com/api/)), this document
+shows the similarities and differences between cloud APIs and the LEAP SDK.
+
+We will inspect this Python-based OpenAI API chat completion request and show how to achieve the same with LeapSDK.
+This example is modified from [OpenAI API documentation](https://platform.openai.com/docs/guides/streaming-responses?api-mode=chat).
+
+```python
+from openai import OpenAI
+client = OpenAI()
+
+stream = client.chat.completions.create(
+    model="gpt-4.1",
+    messages=[
+        {
+            "role": "user",
+            "content": "Say 'double bubble bath' ten times fast.",
+        },
+    ],
+    stream=True,
+)
+
+for chunk in stream:
+    if chunk.choices:
+        delta_content = chunk.choices[0].delta.get("content")
+        if delta_content:
+            print(delta_content, end="", flush=True)
+
+print("")
+print("Generation done!")
+```
+
+## Loading the Model
+
+While cloud APIs let you use models immediately after creating a client, LeapSDK requires you to explicitly load the model first — because the model runs locally. This step generally takes a few seconds depending on model size and device performance.
+
+On cloud API, you create an API client:
+
+```python
+client = OpenAI()
+```
+
+In LeapSDK, you download and load the model to create a model runner:
+
+<Tabs>
+  <Tab title="Kotlin">
+    ```kotlin
+    // Using LeapModelDownloader (Android - recommended)
+    val downloader = LeapModelDownloader(context)
+    val modelRunner = downloader.loadModel(
+        modelSlug = "LFM2.5-1.2B-Instruct",
+        quantizationSlug = "Q4_K_M"
+    )
+
+    // OR using LeapDownloader (cross-platform)
+    val downloader = LeapDownloader()
+    val modelRunner = downloader.loadModel(
+        modelSlug = "LFM2.5-1.2B-Instruct",
+        quantizationSlug = "Q4_K_M"
+    )
+    ```
+  </Tab>
+  <Tab title="Swift">
+    ```swift
+    let modelRunner = try await Leap.load(
+        model: "LFM2.5-1.2B-Instruct",
+        quantization: "Q4_K_M"
+    )
+    ```
+  </Tab>
+</Tabs>
+
+The return value is a "model runner" which plays a similar role to the client object in the cloud API — except that it carries the model weights. If the model runner is released, the app has to reload the model before requesting new generations.
+
+## Requesting Generation
+
+In the cloud API, `client.chat.completions.create` returns a stream object:
+
+```python
+stream = client.chat.completions.create(
+    model="gpt-4.1",
+    messages=[
+        {
+            "role": "user",
+            "content": "Say 'double bubble bath' ten times fast.",
+        },
+    ],
+    stream=True,
+)
+```
+
+In LeapSDK, use `generateResponse` on the conversation object to get a stream for generation. Since the model runner already contains all model information, you don't need to specify the model name again:
+
+<Tabs>
+  <Tab title="Kotlin">
+    ```kotlin
+    val conversation = modelRunner.createConversation()
+    val stream = conversation.generateResponse(
+        ChatMessage(
+            ChatMessage.Role.USER,
+            listOf(ChatMessageContent.Text("Say 'double bubble bath' ten times fast."))
+        )
+    )
+
+    // Simplified version with the same effect:
+    val stream = conversation.generateResponse("Say 'double bubble bath' ten times fast.")
+    ```
+  </Tab>
+  <Tab title="Swift">
+    ```swift
+    let conversation = modelRunner.createConversation()
+    let stream = conversation.generateResponse(
+        message: ChatMessage(
+            role: .user,
+            content: [.text("Say 'double bubble bath' ten times fast.")]
+        )
+    )
+
+    // Simplified version with the same effect:
+    let stream = conversation.generateResponse(
+        userTextMessage: "Say 'double bubble bath' ten times fast."
+    )
+    ```
+  </Tab>
+</Tabs>
+
+## Processing Generated Content
+
+In cloud API Python code, a for-loop retrieves the content:
+
+```python
+for chunk in stream:
+    if chunk.choices:
+        delta_content = chunk.choices[0].delta.get("content")
+        if delta_content:
+            print(delta_content, end="", flush=True)
+
+print("")
+print("Generation done!")
+```
+
+<Tabs>
+  <Tab title="Kotlin">
+    In LeapSDK, call `onEach` on the Kotlin Flow to process content. Call `collect()` to start generation:
+
+    ```kotlin
+    stream.onEach { chunk ->
+        when (chunk) {
+            is MessageResponse.Chunk -> {
+                print(chunk.text)
+            }
+            else -> {}
+        }
+    }.onCompletion {
+        print("")
+        print("Generation done!")
+    }.collect()
+    ```
+  </Tab>
+  <Tab title="Swift">
+    In LeapSDK, use a `for try await` loop on the AsyncThrowingStream:
+
+    ```swift
+    for try await response in stream {
+        switch response {
+        case .chunk(let text):
+            print(text, terminator: "")
+        case .reasoningChunk(let reasoning):
+            break
+        case .complete(let completion):
+            print("")
+            print("Generation done!")
+            if let stats = completion.stats {
+                print("Tokens: \(stats.totalTokens), Speed: \(stats.tokenPerSecond) tok/s")
+            }
+        default:
+            break
+        }
+    }
+    ```
+  </Tab>
+</Tabs>
+
+## Async Context
+
+Most LeapSDK APIs are asynchronous. You need an async context to execute them:
+
+<Tabs>
+  <Tab title="Kotlin">
+    LeapSDK Android APIs use [Kotlin coroutines](https://kotlinlang.org/docs/coroutines-basics.html). Use `viewModelScope` in a ViewModel:
+
+    ```kotlin
+    class ChatViewModel(application: Application) : AndroidViewModel(application) {
+        private val downloader = LeapModelDownloader(application)
+        private var modelRunner: ModelRunner? = null
+        private var conversation: Conversation? = null
+
+        fun loadModelAndGenerate() {
+            viewModelScope.launch {
+                modelRunner = downloader.loadModel(
+                    modelSlug = "LFM2.5-1.2B-Instruct",
+                    quantizationSlug = "Q4_K_M"
+                )
+
+                conversation = modelRunner?.createConversation()
+
+                conversation?.generateResponse("Say 'double bubble bath' ten times fast.")
+                    ?.onEach { chunk ->
+                        when (chunk) {
+                            is MessageResponse.Chunk -> print(chunk.text)
+                            else -> {}
+                        }
+                    }?.onCompletion {
+                        println("\nGeneration done!")
+                    }?.collect()
+            }
+        }
+
+        override fun onCleared() {
+            super.onCleared()
+            runBlocking(Dispatchers.IO) {
+                modelRunner?.unload()
+            }
+        }
+    }
+    ```
+  </Tab>
+  <Tab title="Swift">
+    LeapSDK iOS/macOS APIs use [Swift async/await](https://docs.swift.org/swift-book/documentation/the-swift-programming-language/concurrency/). Use `Task` or `async` functions within SwiftUI views:
+
+    ```swift
+    @MainActor
+    final class ChatViewModel: ObservableObject {
+        @Published var currentResponse = ""
+        private var modelRunner: ModelRunner?
+        private var conversation: Conversation?
+
+        func loadModel() async {
+            do {
+                modelRunner = try await Leap.load(
+                    model: "LFM2.5-1.2B-Instruct",
+                    quantization: "Q4_K_M"
+                )
+                conversation = modelRunner?.createConversation()
+            } catch {
+                print("Failed to load model: \(error)")
+            }
+        }
+
+        func sendMessage(_ text: String) {
+            guard let conversation else { return }
+
+            Task {
+                do {
+                    for try await response in conversation.generateResponse(
+                        message: ChatMessage(role: .user, content: [.text(text)])
+                    ) {
+                        switch response {
+                        case .chunk(let text):
+                            currentResponse += text
+                        case .complete:
+                            print("Generation done!")
+                            currentResponse = ""
+                        default:
+                            break
+                        }
+                    }
+                } catch {
+                    print("Generation error: \(error)")
+                }
+            }
+        }
+    }
+    ```
+  </Tab>
+</Tabs>
+
+## Next Steps
+
+For more information, see the [Quick Start Guide](./quick-start-guide).
diff --git a/deployment/on-device/leap-sdk/constrained-generation.mdx b/deployment/on-device/leap-sdk/constrained-generation.mdx
new file mode 100644
index 0000000..b3b7f7a
--- /dev/null
+++ b/deployment/on-device/leap-sdk/constrained-generation.mdx
@@ -0,0 +1,804 @@
+---
+title: "Constrained Generation"
+description: "Generate structured JSON output with compile-time validation using constrained generation"
+---
+
+LeapSDK provides powerful constrained generation capabilities that enable you to generate structured JSON output with compile-time validation. This feature ensures the AI model produces responses that conform to your predefined types, and works across both iOS and Android platforms.
+
+## Overview
+
+Constrained generation allows you to:
+
+- Define structured output formats using native types on each platform
+- Get compile-time validation of your type definitions
+- Generate JSON responses that are guaranteed to match your structures
+- Decode responses directly into type-safe objects
+
+## How It Works
+
+The constrained generation system works through a three-step process:
+
+1. **Compile-time**: The `@Generatable` annotation (Kotlin) or macro (Swift) analyzes your type and generates a JSON schema based on property types and `@Guide` descriptions
+2. **Runtime**: The generation options are configured with the generated schema to constrain the model's output
+3. **Generation**: The LLM produces valid JSON that conforms to your structure, which you can deserialize directly into your typed object
+
+<Info>
+The JSON schema generation happens at compile time, not runtime, ensuring optimal performance.
+</Info>
+
+## Setup
+
+<Tabs>
+<Tab title="Kotlin">
+
+The constrained generation annotations are included in the LeapSDK dependency. No additional setup is required beyond your existing LeapSDK integration.
+
+```kotlin
+import ai.liquid.leap.structuredoutput.Generatable
+import ai.liquid.leap.structuredoutput.Guide
+```
+
+</Tab>
+<Tab title="Swift">
+
+When adding LeapSDK via Swift Package Manager, the constrained generation macros are automatically available. No additional setup is required.
+
+```swift
+import LeapSDK
+import Foundation
+```
+
+<Info>
+Constrained generation requires Swift 5.9+ and uses Swift macros for compile-time code generation.
+</Info>
+
+</Tab>
+</Tabs>
+
+## Defining Structured Types
+
+Use the `@Generatable` and `@Guide` annotations (Kotlin) or macros (Swift) to define types for structured output.
+
+### Basic Example
+
+<Tabs>
+<Tab title="Kotlin">
+
+Only Kotlin data classes can be annotated with `@Generatable`, and all the fields of the data class should be declared in the parameter of the constructor. The `@Guide` annotation adds descriptions to individual fields to help the AI understand what each field should contain.
+
+```kotlin
+@Generatable(description = "A joke with metadata")
+data class Joke(
+    @Guide(description = "The joke text")
+    val text: String,
+
+    @Guide(description = "The category of humor (pun, dad-joke, programming, etc.)")
+    val category: String,
+
+    @Guide(description = "Humor rating from 1-10")
+    val rating: Int,
+
+    @Guide(description = "Whether the joke is suitable for children")
+    val kidFriendly: Boolean,
+)
+```
+
+</Tab>
+<Tab title="Swift">
+
+The `@Generatable` macro automatically generates conformance to the `GeneratableType` protocol, a `typeDescription` property, and a `jsonSchema()` method. The `@Guide` macro provides descriptions for individual properties that help the AI understand what each field should contain.
+
+```swift
+@Generatable("A joke with metadata")
+struct Joke: Codable {
+    @Guide("The joke text")
+    let text: String
+
+    @Guide("The category of humor (pun, dad-joke, programming, etc.)")
+    let category: String
+
+    @Guide("Humor rating from 1-10")
+    let rating: Int
+
+    @Guide("Whether the joke is suitable for children")
+    let kidFriendly: Bool
+}
+```
+
+</Tab>
+</Tabs>
+
+## Setting the Response Format
+
+<Tabs>
+<Tab title="Kotlin">
+
+Use `setResponseFormatType()` in `GenerationOptions` to set up the constraint:
+
+```kotlin
+val options = GenerationOptions.build {
+    // Set the response format to follow `Joke`
+    setResponseFormatType(Joke::class)
+    // Example of other parameters
+    minP = 0.0f
+    temperature = 0.7f
+}
+
+conversation.generateResponse("Create a programming joke in JSON format", options)
+```
+
+If you want to add the JSON Schema into the prompt to help the generation, you can get the raw JSON Schema with `JSONSchemaGenerator`:
+
+```kotlin
+val jsonSchema = JSONSchemaGenerator.getJSONSchema(Joke::class)
+conversation.generateResponse(
+    "Create a programming joke following this JSON Schema: $jsonSchema",
+    options
+)
+```
+
+If the JSON Schema cannot be created from the provided data class, a `LeapGeneratableSchematizationException` will be thrown.
+
+</Tab>
+<Tab title="Swift">
+
+Use `setResponseFormat(type:)` on `GenerationOptions` to configure the response format:
+
+```swift
+var options = GenerationOptions()
+options.temperature = 0.7
+
+do {
+    // Set the response format to your custom type
+    try options.setResponseFormat(type: Joke.self)
+} catch {
+    print("Failed to set response format: \(error)")
+}
+```
+
+</Tab>
+</Tabs>
+
+## Deserializing Output
+
+<Tabs>
+<Tab title="Kotlin">
+
+Use `GeneratableFactory.createFromJSONObject()` to deserialize the JSON string generated by the model into the generatable data class:
+
+```kotlin
+import ai.liquid.leap.structuredoutput.GeneratableFactory
+
+conversation.generateResponse(
+    "Create a programming joke.",
+    options
+).onEach {
+    if (it is MessageResponse.Complete) {
+        val message = it.fullMessage
+        val jsonContent = (message.content.first() as ChatMessageContent.Text).text
+
+        // Deserialize the content as a `Joke` object.
+        val joke: Joke = GeneratableFactory.createFromJSONObject(
+            JSONObject(jsonContent),
+        )
+
+        println("Text: ${joke.text}")
+        println("Category: ${joke.category}")
+        println("Rating: ${joke.rating}/10")
+        println("Kid-friendly: ${joke.kidFriendly}")
+    }
+}.collect()
+```
+
+If the JSON string generated by the model is not valid for creating instances of the generatable data class, a `LeapGeneratableDeserializationException` will be thrown.
+
+</Tab>
+<Tab title="Swift">
+
+Decode the JSON response using `JSONDecoder` or `GeneratableFactory.createFromJSONObject()`:
+
+```swift
+func generateStructuredJoke() async {
+    guard let conversation = conversation else { return }
+
+    var options = GenerationOptions()
+    options.temperature = 0.7
+
+    do {
+        try options.setResponseFormat(type: Joke.self)
+
+        let message = ChatMessage(
+            role: .user,
+            content: [.text("Create a programming joke in JSON format")]
+        )
+
+        for try await response in conversation.generateResponse(
+            message: message,
+            generationOptions: options
+        ) {
+            switch response {
+            case .chunk(let token):
+                print(token, terminator: "")
+            case .audioSample:
+                break
+            case .reasoningChunk:
+                break
+            case .complete(let completion):
+                let jsonFragments = completion.message.content.compactMap { part -> String? in
+                    if case .text(let value) = part { return value }
+                    return nil
+                }
+                let jsonText = jsonFragments.joined()
+                guard !jsonText.isEmpty else { continue }
+
+                if let jokeData = jsonText.data(using: .utf8) {
+                    let joke = try JSONDecoder().decode(Joke.self, from: jokeData)
+                    print("Text: \(joke.text)")
+                    print("Category: \(joke.category)")
+                    print("Rating: \(joke.rating)/10")
+                    print("Kid-friendly: \(joke.kidFriendly)")
+                }
+            }
+        }
+    } catch {
+        print("Failed: \(error)")
+    }
+}
+```
+
+<Info>
+`completion.message.content` can contain multiple fragments (text, audio, images). The JSON payload you need for decoding typically lives in the `.text` fragments. Filter and join those fragments before decoding, as shown above.
+</Info>
+
+</Tab>
+</Tabs>
+
+## Supported Data Types
+
+Not all data types are supported in constrained generation. Here is the list of supported JSON Schema types:
+
+| JSON Schema Type | Kotlin Types | Swift Types |
+|---|---|---|
+| String | `String` | `String` |
+| Integer | `Int`, `Long` | `Int` |
+| Number | `Float`, `Double` | `Double` |
+| Boolean | `Boolean` | `Bool` |
+| Enum | Enum class (plain name strings as values) | — |
+| Object | Data classes annotated with `@Generatable` | Structs with `@Generatable` macro |
+| Array | `List`, `MutableList` of supported types; arrays of integer, float, and boolean | `[T]` (Array) of supported types |
+| Optional | — | `T?` (Optional) |
+
+Only if the data types of the fields of object types and the items of array types are themselves supported can these composition types be used.
+
+## Advanced Examples
+
+### Complex Nested Structures
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+@Generatable(description = "Facts about a city")
+data class CityFact(
+    @Guide(description = "Name of the city")
+    val name: String,
+
+    @Guide(description = "State/province of the city")
+    val state: String,
+
+    @Guide(description = "Country name")
+    val country: String,
+
+    @Guide(description = "Places of interest in the city")
+    val placeOfInterests: List<String>,
+)
+
+// Usage
+val options = GenerationOptions.build {
+    setResponseFormatType(CityFact::class)
+    temperature = 0.7f
+}
+
+conversation.generateResponse("Show the city facts about Tokyo", options)
+    .onEach {
+        if (it is MessageResponse.Complete) {
+            val message = it.fullMessage
+            val jsonContent = (message.content.first() as ChatMessageContent.Text).text
+            val cityFact: CityFact = GeneratableFactory.createFromJSONObject(
+                JSONObject(jsonContent),
+            )
+        }
+    }.collect()
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+@Generatable("A recipe with ingredients and instructions")
+struct Recipe: Codable {
+    @Guide("Name of the dish")
+    let name: String
+
+    @Guide("List of ingredients with quantities")
+    let ingredients: [String]
+
+    @Guide("Step-by-step cooking instructions")
+    let instructions: [String]
+
+    @Guide("Cooking time in minutes")
+    let cookingTimeMinutes: Int
+
+    @Guide("Difficulty level: easy, medium, or hard")
+    let difficulty: String
+
+    @Guide("Number of servings this recipe makes")
+    let servings: Int?
+
+    @Guide("Nutritional information if available")
+    let nutrition: NutritionInfo?
+}
+
+@Generatable("Nutritional information for a recipe")
+struct NutritionInfo: Codable {
+    @Guide("Calories per serving")
+    let caloriesPerServing: Int
+
+    @Guide("Protein in grams")
+    let proteinGrams: Double
+
+    @Guide("Carbohydrates in grams")
+    let carbsGrams: Double
+}
+```
+
+</Tab>
+</Tabs>
+
+### Mathematical Problem Solving
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+@Generatable(description = "Mathematical calculation result with detailed steps")
+data class MathResult(
+    @Guide(description = "The mathematical expression that was solved")
+    val expression: String,
+
+    @Guide(description = "The final numeric result")
+    val result: Double,
+
+    @Guide(description = "Step-by-step solution process")
+    val steps: List<String>,
+
+    @Guide(description = "The mathematical operation type (addition, multiplication, etc.)")
+    val operationType: String,
+
+    @Guide(description = "Whether the solution is exact or approximate")
+    val isExact: Boolean,
+)
+
+// Usage
+val options = GenerationOptions.build {
+    setResponseFormatType(MathResult::class)
+    temperature = 0.3f  // Lower temperature for mathematical accuracy
+}
+
+conversation.generateResponse(
+    "Solve: 15 x 4 + 8 / 2. Show your work step by step.",
+    options
+)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+@Generatable("Mathematical calculation result with detailed steps")
+struct MathResult: Codable {
+    @Guide("The mathematical expression that was solved")
+    let expression: String
+
+    @Guide("The final numeric result")
+    let result: Double
+
+    @Guide("Step-by-step solution process")
+    let steps: [String]
+
+    @Guide("The mathematical operation type (addition, multiplication, etc.)")
+    let operationType: String
+
+    @Guide("Whether the solution is exact or approximate")
+    let isExact: Bool
+}
+
+// Usage
+var options = GenerationOptions()
+options.temperature = 0.3  // Lower temperature for mathematical accuracy
+
+try options.setResponseFormat(type: MathResult.self)
+
+let message = ChatMessage(
+    role: .user,
+    content: [.text("Solve: 15 x 4 + 8 / 2. Show your work step by step.")]
+)
+
+// Process the response...
+```
+
+</Tab>
+</Tabs>
+
+### Data Analysis Results
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+@Generatable(description = "Statistical summary of data")
+data class StatisticalSummary(
+    @Guide(description = "Total number of data points")
+    val totalPoints: Int,
+
+    @Guide(description = "Mean value")
+    val mean: Double,
+
+    @Guide(description = "Standard deviation")
+    val standardDeviation: Double,
+
+    @Guide(description = "Minimum value observed")
+    val minimum: Double,
+
+    @Guide(description = "Maximum value observed")
+    val maximum: Double,
+)
+
+@Generatable(description = "Analysis results for a dataset")
+data class DataAnalysis(
+    @Guide(description = "Name or description of the dataset")
+    val datasetName: String,
+
+    @Guide(description = "Key insights discovered")
+    val insights: List<String>,
+
+    @Guide(description = "Statistical summary")
+    val statistics: StatisticalSummary,
+
+    @Guide(description = "Recommended next steps")
+    val recommendations: List<String>,
+)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+@Generatable("Analysis results for a dataset")
+struct DataAnalysis: Codable {
+    @Guide("Name or description of the dataset")
+    let datasetName: String
+
+    @Guide("Key insights discovered")
+    let insights: [String]
+
+    @Guide("Statistical summary")
+    let statistics: StatisticalSummary
+
+    @Guide("Recommended next steps")
+    let recommendations: [String]
+}
+
+@Generatable("Statistical summary of data")
+struct StatisticalSummary: Codable {
+    @Guide("Total number of data points")
+    let totalPoints: Int
+
+    @Guide("Mean value")
+    let mean: Double
+
+    @Guide("Standard deviation")
+    let standardDeviation: Double
+
+    @Guide("Minimum value observed")
+    let minimum: Double
+
+    @Guide("Maximum value observed")
+    let maximum: Double
+}
+```
+
+</Tab>
+</Tabs>
+
+## Best Practices
+
+### 1. Use Descriptive Guide Annotations
+
+Good `@Guide` descriptions help the AI understand what each field should contain:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// Good - specific and descriptive
+@Guide(description = "The programming language name (e.g., Kotlin, Python, JavaScript)")
+val language: String
+
+// Less helpful - too generic
+@Guide(description = "A string")
+val language: String
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// Good - specific and descriptive
+@Guide("The programming language name (e.g., Swift, Python, JavaScript)")
+let language: String
+
+// Less helpful - too generic
+@Guide("A string")
+let language: String
+```
+
+</Tab>
+</Tabs>
+
+### 2. Keep Structures Focused
+
+Smaller, well-defined types work better than large complex ones:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+// Good - focused single responsibility
+@Generatable(description = "A user's basic profile information")
+data class UserProfile(
+    @Guide(description = "Full name") val name: String,
+    @Guide(description = "Email address") val email: String,
+    @Guide(description = "Age in years") val age: Int,
+)
+
+// Less ideal - too many responsibilities
+@Generatable(description = "Everything about a user")
+data class ComplexUser(
+    // ... 20+ properties mixing profile, preferences, history, etc.
+)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+// Good - focused single responsibility
+@Generatable("A user's basic profile information")
+struct UserProfile: Codable {
+    @Guide("Full name") let name: String
+    @Guide("Email address") let email: String
+    @Guide("Age in years") let age: Int
+}
+
+// Less ideal - too many responsibilities
+@Generatable("Everything about a user")
+struct ComplexUser: Codable {
+    // ... 20+ properties mixing profile, preferences, history, etc.
+}
+```
+
+</Tab>
+</Tabs>
+
+### 3. Handle Optional Fields Appropriately
+
+Use optional types when fields might not always be present:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+@Generatable(description = "A book review")
+data class BookReview(
+    @Guide(description = "The book title")
+    val title: String,
+
+    @Guide(description = "Review text")
+    val reviewText: String,
+
+    @Guide(description = "Rating from 1-5 stars, if provided")
+    val rating: Int?,  // Nullable - reviewer might not provide a rating
+
+    @Guide(description = "Reviewer's name, if available")
+    val reviewerName: String?,  // Nullable - might be anonymous
+)
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+@Generatable("A book review")
+struct BookReview: Codable {
+    @Guide("The book title")
+    let title: String
+
+    @Guide("Review text")
+    let reviewText: String
+
+    @Guide("Rating from 1-5 stars, if provided")
+    let rating: Int?  // Optional - reviewer might not provide a rating
+
+    @Guide("Reviewer's name, if available")
+    let reviewerName: String?  // Optional - might be anonymous
+}
+```
+
+</Tab>
+</Tabs>
+
+### 4. Validate Generated Output
+
+Always handle potential parsing errors gracefully:
+
+<Tabs>
+<Tab title="Kotlin">
+
+```kotlin
+try {
+    val result: Joke = GeneratableFactory.createFromJSONObject(
+        JSONObject(jsonContent),
+    )
+    processJoke(result)
+} catch (e: LeapGeneratableDeserializationException) {
+    println("Failed to deserialize structured response: ${e.message}")
+    // Fallback to treating as plain text
+    processPlainText(jsonContent)
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+```swift
+private func parseResponse<T: Codable>(_ jsonText: String, as type: T.Type) -> T? {
+    guard let data = jsonText.data(using: .utf8) else {
+        print("Failed to convert response to data")
+        return nil
+    }
+
+    do {
+        return try JSONDecoder().decode(type, from: data)
+    } catch {
+        print("Failed to decode response as \(type): \(error)")
+        return nil
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+## Error Handling
+
+<Tabs>
+<Tab title="Kotlin">
+
+### Schema Errors
+
+If the JSON Schema cannot be created from the provided data class, a `LeapGeneratableSchematizationException` will be thrown at the point where `setResponseFormatType()` is called.
+
+### Deserialization Errors
+
+If the JSON string generated by the model is not valid for creating instances of the generatable data class, a `LeapGeneratableDeserializationException` will be thrown by `GeneratableFactory.createFromJSONObject()`.
+
+```kotlin
+try {
+    val result: Joke = GeneratableFactory.createFromJSONObject(
+        JSONObject(jsonContent),
+    )
+    processJoke(result)
+} catch (e: LeapGeneratableDeserializationException) {
+    println("Deserialization failed: ${e.message}")
+    processPlainText(jsonContent)
+} catch (e: LeapGeneratableSchematizationException) {
+    println("Schema generation failed: ${e.message}")
+}
+```
+
+</Tab>
+<Tab title="Swift">
+
+### Compile-time Errors
+
+All properties in a `@Generatable` struct must have a `@Guide` annotation:
+
+```swift
+// Error: Missing @Guide annotation
+@Generatable("A person")
+struct Person: Codable {
+    let name: String  // Missing @Guide - compile error
+    @Guide("Age in years") let age: Int
+}
+
+// Fixed: All properties must have @Guide
+@Generatable("A person")
+struct Person: Codable {
+    @Guide("Full name") let name: String
+    @Guide("Age in years") let age: Int
+}
+```
+
+### Runtime Parsing Errors
+
+Handle cases where the AI generates invalid JSON:
+
+```swift
+func handleResponse(_ jsonText: String) {
+    do {
+        let data = jsonText.data(using: .utf8)!
+        let result = try JSONDecoder().decode(Joke.self, from: data)
+        processJoke(result)
+    } catch {
+        print("Failed to parse structured response: \(error)")
+        // Fallback to treating as plain text
+        processPlainText(jsonText)
+    }
+}
+```
+
+</Tab>
+</Tabs>
+
+## Troubleshooting
+
+<Tabs>
+<Tab title="Kotlin">
+
+### Generated JSON doesn't match expected format
+
+- Check your `@Guide` descriptions are clear and specific
+- Try adjusting the temperature in `GenerationOptions` (lower values like 0.3-0.5 can improve structured output)
+- Include the JSON Schema in the prompt using `JSONSchemaGenerator.getJSONSchema()` to give the model additional guidance
+- Ensure your prompt clearly requests JSON format output
+
+### `LeapGeneratableSchematizationException` thrown
+
+- Verify that only supported data types are used in your data class fields
+- Ensure the class is a `data class`, not a regular class
+- All fields must be declared in the primary constructor
+
+</Tab>
+<Tab title="Swift">
+
+### "Cannot find type 'GeneratableType' in scope"
+
+Make sure you've imported the constrained generation package:
+
+```swift
+import LeapSDK
+import LeapSDKConstrainedGeneration  // Required for macros
+```
+
+### "External macro implementation could not be found"
+
+This typically means there's an issue with the macro plugin. Try:
+
+1. Clean your build folder (Cmd+Shift+K)
+2. Restart Xcode
+3. Ensure you're using Swift 5.9 or later
+
+### Generated JSON doesn't match expected format
+
+- Check your `@Guide` descriptions are clear and specific
+- Try adjusting the temperature in `GenerationOptions` (lower values like 0.3-0.5 can improve structured output)
+- Ensure your prompt clearly requests JSON format output
+
+</Tab>
+</Tabs>
+
+<Warning>
+If you encounter persistent issues with constrained generation, try testing with a simpler
+structure first to verify the basic functionality is working.
+</Warning>
diff --git a/deployment/on-device/leap-sdk/conversation-generation.mdx b/deployment/on-device/leap-sdk/conversation-generation.mdx
new file mode 100644
index 0000000..21c5939
--- /dev/null
+++ b/deployment/on-device/leap-sdk/conversation-generation.mdx
@@ -0,0 +1,571 @@
+---
+title: "Conversation & Generation"
+description: "API reference for conversations, model runners, and generation in the LEAP SDK"
+---
+
+<Info>
+All functions listed in this document are safe to call from the main thread and all callbacks will be run on the main thread, unless there are explicit instructions or explanations.
+</Info>
+
+## ModelRunner
+
+A `ModelRunner` represents a loaded model instance that creates conversations and drives generation.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+interface ModelRunner {
+  fun createConversation(systemPrompt: String? = null): Conversation
+  fun createConversationFromHistory(history: List<ChatMessage>): Conversation
+  suspend fun unload()
+  fun generateFromConversation(
+    conversation: Conversation,
+    callback: GenerationCallback,
+    generationOptions: GenerationOptions? = null,
+  ): GenerationHandler
+}
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public protocol ModelRunner {
+  func createConversation(systemPrompt: String?) -> Conversation
+  func createConversationFromHistory(history: [ChatMessage]) -> Conversation
+  func generateResponse(
+    conversation: Conversation,
+    generationOptions: GenerationOptions?,
+    onResponseCallback: @escaping (MessageResponse) -> Void,
+    onErrorCallback: ((LeapError) -> Void)?
+  ) -> GenerationHandler
+  func unload() async
+  var modelId: String { get }
+}
+```
+</Tab>
+</Tabs>
+
+### Lifecycle
+
+- Create conversations using `createConversation(systemPrompt:)` or `createConversationFromHistory(history:)`.
+- Hold a strong reference to the `ModelRunner` for as long as you need to perform generations. If the model runner is destroyed, any conversations it created will fail to generate.
+- Call `unload()` when you are done to release native resources.
+- On iOS, `unload()` is `async` and cleanup also happens automatically on `deinit`. Access `modelId` to identify the loaded model.
+- On Android, if you need your model runner to survive after the destruction of activities, you may need to wrap it in an [Android Service](https://developer.android.com/develop/background-work/services).
+
+### Low-level generation API
+
+Both platforms expose a lower-level generation method that returns a `GenerationHandler` for cancellation. Most apps should use the higher-level streaming helpers on `Conversation`, but you can invoke this method directly when you need fine-grained control.
+
+<Tabs>
+<Tab title="Kotlin">
+`generateFromConversation(...)` is an internal interface for the model runner implementation. `Conversation.generateResponse` is the recommended wrapper, which relies on Kotlin coroutines for lifecycle-aware components.
+
+<Warning>
+`generateFromConversation` may block the caller thread. If you must use it, call it outside the main thread.
+</Warning>
+</Tab>
+<Tab title="Swift">
+`generateResponse(...)` drives generation with callbacks and returns a `GenerationHandler` you can store to cancel the run.
+
+```swift
+let handler = runner.generateResponse(
+  conversation: conversation,
+  generationOptions: options,
+  onResponseCallback: { message in
+    // Handle MessageResponse values here
+  },
+  onErrorCallback: { error in
+    // Handle LeapError
+  }
+)
+
+// Stop generation early if needed
+handler.stop()
+```
+</Tab>
+</Tabs>
+
+## GenerationHandler
+
+The handler returned by the low-level generation API or `Conversation.generateResponse` lets you cancel generation without tearing down the conversation.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+interface GenerationHandler {
+  fun stop()
+}
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public protocol GenerationHandler: Sendable {
+  func stop()
+}
+```
+</Tab>
+</Tabs>
+
+## Conversation
+
+`Conversation` tracks chat state and provides streaming helpers built on top of the model runner. Instances should always be created from a `ModelRunner`, not initialized directly.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+interface Conversation {
+  val history: List<ChatMessage>
+  val isGenerating: Boolean
+
+  fun generateResponse(
+    userTextMessage: String,
+    generationOptions: GenerationOptions? = null
+  ): Flow<MessageResponse>
+
+  fun generateResponse(
+    message: ChatMessage,
+    generationOptions: GenerationOptions? = null
+  ): Flow<MessageResponse>
+
+  fun registerFunction(function: LeapFunction)
+  fun exportToJSONArray(): JSONArray
+}
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public class Conversation {
+  public let modelRunner: ModelRunner
+  public private(set) var history: [ChatMessage]
+  public private(set) var functions: [LeapFunction]
+  public private(set) var isGenerating: Bool
+
+  public init(modelRunner: ModelRunner, history: [ChatMessage])
+
+  public func registerFunction(_ function: LeapFunction)
+  public func exportToJSON() throws -> [[String: Any]]
+
+  public func generateResponse(
+    userTextMessage: String,
+    generationOptions: GenerationOptions? = nil
+  ) -> AsyncThrowingStream<MessageResponse, Error>
+
+  public func generateResponse(
+    message: ChatMessage,
+    generationOptions: GenerationOptions? = nil
+  ) -> AsyncThrowingStream<MessageResponse, Error>
+
+  @discardableResult
+  public func generateResponse(
+    message: ChatMessage,
+    generationOptions: GenerationOptions? = nil,
+    onResponse: @escaping (MessageResponse) -> Void
+  ) -> GenerationHandler?
+}
+```
+</Tab>
+</Tabs>
+
+### Properties
+
+- **`history`** -- Returns a **copy** of the accumulated chat messages. The SDK appends the assistant reply when a generation finishes successfully. During an ongoing generation, the partial message may not be present.
+- **`isGenerating`** -- `true` while a generation is running. On Kotlin, its value is consistent across all threads. On Swift, attempting to start a new generation while this is `true` immediately finishes with an empty stream (or `nil` handler for the callback variant).
+
+### Streaming Generation
+
+The primary pattern for generating responses is to collect the stream returned by `generateResponse`.
+
+<Tabs>
+<Tab title="Kotlin">
+The return value is a Kotlin [asynchronous Flow](https://kotlinlang.org/docs/flow.html). Generation does **not** start until the flow is collected. Refer to the [Android documentation](https://developer.android.com/topic/libraries/architecture/coroutines) on how to properly handle flows with lifecycle-aware components.
+
+```kotlin
+viewModelScope.launch {
+  conversation.generateResponse(userInput)
+    .onEach { response ->
+      when (response) {
+        is MessageResponse.Chunk -> {
+          generatedText += response.text
+        }
+        is MessageResponse.ReasoningChunk -> {
+          Log.d(TAG, "Reasoning: ${response.reasoning}")
+        }
+        is MessageResponse.FunctionCalls -> {
+          handleFunctionCalls(response.functionCalls)
+        }
+        is MessageResponse.AudioSample -> {
+          audioRenderer.enqueue(response.samples, response.sampleRate)
+        }
+        is MessageResponse.Complete -> {
+          Log.d(TAG, "Generation is done!")
+        }
+      }
+    }
+    .catch { e -> Log.e(TAG, "Generation failed", e) }
+    .collect()
+}
+```
+
+<Info>
+Errors will be thrown as `LeapGenerationException` in the stream. Use `.catch` to capture errors from the generation.
+</Info>
+
+<Warning>
+If there is already a running generation, further generation requests are blocked until the current generation is done. However, there is no guarantee that the order in which requests are received will be the order in which they are processed.
+</Warning>
+</Tab>
+<Tab title="Swift">
+The async-stream helpers return an `AsyncThrowingStream<MessageResponse, Error>`. Iterate with `for try await` inside a `Task`.
+
+```swift
+let user = ChatMessage(role: .user, content: [.text("Hello! What can you do?")])
+
+Task {
+  do {
+    for try await response in conversation.generateResponse(
+      message: user,
+      generationOptions: GenerationOptions(temperature: 0.7)
+    ) {
+      switch response {
+      case .chunk(let delta):
+        print(delta, terminator: "")
+      case .reasoningChunk(let thought):
+        print("Reasoning:", thought)
+      case .functionCall(let calls):
+        handleFunctionCalls(calls)
+      case .audioSample(let samples, let sampleRate):
+        audioRenderer.enqueue(samples, sampleRate: sampleRate)
+      case .complete(let completion):
+        let text = completion.message.content.compactMap { item in
+          if case .text(let value) = item { return value }
+          return nil
+        }.joined()
+        print("\nComplete:", text)
+        if let stats = completion.stats {
+          print("Prompt tokens: \(stats.promptTokens), completions: \(stats.completionTokens)")
+        }
+      }
+    }
+  } catch {
+    print("Generation failed: \(error)")
+  }
+}
+```
+
+Cancelling the `Task` that iterates the stream stops generation and cleans up native resources.
+</Tab>
+</Tabs>
+
+### Callback Convenience (Swift only)
+
+On Swift, use `generateResponse(message:onResponse:)` when you prefer callbacks or need to integrate with imperative UI components:
+
+```swift
+let handler = conversation.generateResponse(message: user) { response in
+  updateUI(with: response)
+}
+
+// Later
+handler?.stop()
+```
+
+If a generation is already running, the method returns `nil` and emits a `.complete` message with `finishReason == .stop` via the callback.
+
+<Warning>
+The callback overload does not surface generation errors. Use the async-stream helper or call
+`ModelRunner.generateResponse` with `onErrorCallback` when you need error handling.
+</Warning>
+
+### Function Registration
+
+Register functions for the model to invoke during generation. See the [Function Calling](./function-calling) guide for detailed usage.
+
+### Export Chat History
+
+Export the conversation history into a serialized format that mirrors OpenAI's chat-completions schema. Useful for persistence, analytics, or debugging.
+
+<Tabs>
+<Tab title="Kotlin">
+`exportToJSONArray()` returns a `JSONArray`. Each element can be interpreted as a `ChatCompletionRequestMessage` instance in the OpenAI API schema.
+
+See also: [Serialization Support](./utilities#serialization-support).
+</Tab>
+<Tab title="Swift">
+`exportToJSON()` returns a `[[String: Any]]` payload.
+</Tab>
+</Tabs>
+
+### Cancellation
+
+<Tabs>
+<Tab title="Kotlin">
+Generation stops when the coroutine `Job` that collects the flow is cancelled. We highly recommend using a `ViewModel` with `viewModelScope` to manage the generation lifecycle. The generation will be automatically cancelled when the ViewModel is cleared.
+
+```kotlin
+import kotlinx.coroutines.Dispatchers
+import kotlinx.coroutines.Job
+import kotlinx.coroutines.runBlocking
+
+class ChatViewModel(application: Application) : AndroidViewModel(application) {
+    private var conversation: Conversation? = null
+    private var modelRunner: ModelRunner? = null
+    private var generationJob: Job? = null
+
+    private val _generatedText = MutableStateFlow("")
+    val generatedText: StateFlow<String> = _generatedText.asStateFlow()
+
+    fun generateResponse(userInput: String) {
+        generationJob = viewModelScope.launch {
+            _generatedText.value = ""
+            conversation?.generateResponse(userInput)
+                ?.onEach { response ->
+                    when (response) {
+                        is MessageResponse.Chunk -> {
+                            _generatedText.value += response.text
+                        }
+                        is MessageResponse.Complete -> {
+                            Log.d(TAG, "Generation is done!")
+                        }
+                        else -> {}
+                    }
+                }
+                ?.collect()
+        }
+    }
+
+    fun stopGeneration() {
+        generationJob?.cancel()
+        generationJob = null
+    }
+
+    override fun onCleared() {
+        super.onCleared()
+        generationJob?.cancel()
+
+        // Use runBlocking to ensure model is unloaded before ViewModel is destroyed
+        // viewModelScope is cancelled during clearing, so we need a non-cancelled context
+        runBlocking(Dispatchers.IO) {
+            modelRunner?.unload()
+        }
+    }
+
+    companion object {
+        private const val TAG = "ChatViewModel"
+    }
+}
+```
+</Tab>
+<Tab title="Swift">
+Cancel the `Task` that iterates the `AsyncThrowingStream` to stop generation and clean up native resources. Alternatively, call `stop()` on the `GenerationHandler` returned by the callback-based API.
+
+```swift
+// Store the task
+let generationTask = Task {
+  for try await response in conversation.generateResponse(message: user) {
+    handleResponse(response)
+  }
+}
+
+// Cancel later
+generationTask.cancel()
+```
+</Tab>
+</Tabs>
+
+## MessageResponse
+
+The response emitted during generation. Text is streamed as chunks, with a final completion signal when the model finishes.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+sealed interface MessageResponse {
+  class Chunk(val text: String) : MessageResponse
+  class ReasoningChunk(val reasoning: String) : MessageResponse
+  class FunctionCalls(val functionCalls: List<LeapFunctionCall>) : MessageResponse
+  class AudioSample(val samples: FloatArray, val sampleRate: Int) : MessageResponse
+  class Complete(
+    val fullMessage: ChatMessage,
+    val finishReason: GenerationFinishReason,
+    val stats: GenerationStats?,
+  ) : MessageResponse
+}
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public enum MessageResponse {
+  case chunk(String)
+  case reasoningChunk(String)
+  case audioSample(samples: [Float], sampleRate: Int)
+  case functionCall([LeapFunctionCall])
+  case complete(MessageCompletion)
+}
+
+public struct MessageCompletion {
+  public let message: ChatMessage
+  public let finishReason: GenerationFinishReason
+  public let stats: GenerationStats?
+
+  public var info: GenerationCompleteInfo { get }
+}
+
+public struct GenerationCompleteInfo {
+  public let finishReason: GenerationFinishReason
+  public let stats: GenerationStats?
+}
+```
+</Tab>
+</Tabs>
+
+### Response types
+
+- **Chunk** -- Partial assistant text emitted during streaming.
+- **ReasoningChunk** -- Model reasoning tokens (only for models that expose reasoning traces, wrapped between `<think>` / `</think>`).
+- **AudioSample** -- PCM audio frames streamed from audio-capable checkpoints. Feed them into an audio renderer or buffer for later playback. The sample rate remains constant throughout a generation.
+- **FunctionCall / FunctionCalls** -- One or more function/tool invocations requested by the model. See the [Function Calling](./function-calling) guide.
+- **Complete** -- Signals the end of generation. Access the assembled assistant reply through the full message. The `finishReason` indicates why generation stopped (`STOP` means the model decided to stop; `EXCEED_CONTEXT` means the maximum context length was reached). The optional `stats` field contains generation statistics.
+
+Errors during streaming are delivered through the thrown error of `AsyncThrowingStream` (Swift) or as `LeapGenerationException` in the `Flow` (Kotlin).
+
+## GenerationStats
+
+Statistics about a completed generation.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+data class GenerationStats(
+  val promptTokens: Long,
+  val completionTokens: Long,
+  val totalTokens: Long,
+  val tokenPerSecond: Float,
+)
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public struct GenerationStats {
+  public var promptTokens: UInt64
+  public var completionTokens: UInt64
+  public var totalTokens: UInt64
+  public var tokenPerSecond: Float
+}
+```
+</Tab>
+</Tabs>
+
+## GenerationOptions
+
+Tune generation behavior per request. Leave a field as `nil`/`null` to fall back to the defaults packaged with the model bundle.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+data class GenerationOptions(
+    var temperature: Float? = null,
+    var topP: Float? = null,
+    var minP: Float? = null,
+    var repetitionPenalty: Float? = null,
+    var jsonSchemaConstraint: String? = null,
+    var functionCallParser: LeapFunctionCallParser? = LFMFunctionCallParser(),
+) {
+  fun setResponseFormatType(kClass: KClass<*>)
+
+  companion object {
+    fun build(buildAction: GenerationOptions.() -> Unit): GenerationOptions
+  }
+}
+```
+
+**Fields:**
+
+- `temperature` -- Sampling temperature. Higher values produce more random output; lower values produce more focused, deterministic output.
+- `topP` -- Nucleus sampling parameter. The model only considers tokens with cumulative probability mass up to `topP`.
+- `minP` -- Minimum probability for a token to be considered during generation.
+- `repetitionPenalty` -- Penalizes repeated tokens. A positive value decreases the likelihood of repeating the same line verbatim.
+- `jsonSchemaConstraint` -- Enable constrained generation with a [JSON Schema](https://json-schema.org/). See [constrained generation](./constrained-generation) for details.
+- `functionCallParser` -- Parser for function calling requests. `LFMFunctionCallParser` (the default) handles Liquid Foundation Model Pythonic function calling. See the [Function Calling](./function-calling) guide for details.
+
+**Builder pattern:**
+
+```kotlin
+val options = GenerationOptions.build {
+  setResponseFormatType(MyDataType::class)
+  temperature = 0.5f
+}
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public struct GenerationOptions {
+  public var temperature: Float?
+  public var topP: Float?
+  public var topK: Int?
+  public var minP: Float?
+  public var repetitionPenalty: Float?
+  public var rngSeed: UInt64?
+  public var enableThinking: Bool?
+  public var maxOutputTokens: Int?
+  public var sequenceLength: Int?
+  public var cacheControl: CacheControl?
+  public var jsonSchemaConstraint: String?
+  public var functionCallParser: LeapFunctionCallParserProtocol?
+
+  public init(
+    temperature: Float? = nil,
+    topP: Float? = nil,
+    topK: Int? = nil,
+    minP: Float? = nil,
+    repetitionPenalty: Float? = nil,
+    rngSeed: UInt64? = nil,
+    enableThinking: Bool? = nil,
+    maxOutputTokens: Int? = nil,
+    sequenceLength: Int? = nil,
+    cacheControl: CacheControl? = nil,
+    jsonSchemaConstraint: String? = nil,
+    functionCallParser: LeapFunctionCallParserProtocol? = LFMFunctionCallParser()
+  )
+}
+```
+
+**Fields:**
+
+- `temperature` -- Sampling temperature. Higher values produce more random output; lower values produce more focused, deterministic output.
+- `topP` -- Nucleus sampling parameter.
+- `topK` -- Top-K sampling parameter. Limits the token pool to the K most probable candidates.
+- `minP` -- Minimum probability for a token to be considered during generation.
+- `repetitionPenalty` -- Penalizes repeated tokens.
+- `rngSeed` -- Seed for the random number generator, for reproducible output.
+- `enableThinking` -- Enable or disable the model's reasoning trace (for thinking models).
+- `maxOutputTokens` -- Maximum number of tokens to generate.
+- `sequenceLength` -- Maximum sequence length (prompt + output).
+- `cacheControl` -- Controls KV-cache behavior for the generation.
+- `jsonSchemaConstraint` -- Enable constrained generation with a [JSON Schema](https://json-schema.org/). See [constrained generation](./constrained-generation) for details.
+- `functionCallParser` -- Parser for function calling requests. `LFMFunctionCallParser` (the default) handles Liquid Foundation Model Pythonic function calling. Supply `HermesFunctionCallParser()` for Hermes/Qwen3 formats, or set the parser to `nil` to receive raw tool-call text in `MessageResponse.chunk`.
+
+**Constrained generation helper:**
+
+```swift
+extension GenerationOptions {
+  public mutating func setResponseFormat<T: GeneratableType>(type: T.Type) throws {
+    self.jsonSchemaConstraint = try JSONSchemaGenerator.getJSONSchema(for: type)
+  }
+}
+```
+
+```swift
+var options = GenerationOptions(temperature: 0.6, topP: 0.9)
+try options.setResponseFormat(type: CityFact.self)
+
+for try await response in conversation.generateResponse(
+  message: user,
+  generationOptions: options
+) {
+  // Handle structured output
+}
+```
+
+`LiquidInferenceEngineRunner` exposes advanced utilities such as `getPromptTokensSize(messages:addBosToken:)` for applications that need to budget tokens ahead of time. These methods are backend-specific and may be elevated to the `ModelRunner` protocol in a future release.
+</Tab>
+</Tabs>
diff --git a/deployment/on-device/leap-sdk/function-calling.mdx b/deployment/on-device/leap-sdk/function-calling.mdx
new file mode 100644
index 0000000..fcc09dd
--- /dev/null
+++ b/deployment/on-device/leap-sdk/function-calling.mdx
@@ -0,0 +1,369 @@
+---
+title: "Function Calling"
+description: "Function calling allows the model to make requests to call some predefined functions provided by the app to interact with the environment."
+---
+
+<Warning>
+Not all models support function calling. Please check the model card before using the model for function calling.
+</Warning>
+
+<Info>
+Vision and audio-capable models require companion files. Bundles embed these references; GGUF
+checkpoints expect siblings such as `mmproj-*.gguf` (vision) and audio decoder/tokenizer files.
+When detected, you can attach image and audio parts to your messages and tool responses.
+</Info>
+
+## Register Functions to Conversations
+
+To enable function calling, function definitions should be registered to the [`Conversation`](./conversation-generation#conversation) instance before content generation.
+`Conversation.registerFunction` takes a `LeapFunction` instance as the input, which describes the name, parameters, and ability of the function.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+val conversation = modelRunner.createConversation("You are a helpful assistant.")
+
+conversation.registerFunction(
+  LeapFunction(
+    name = "get_weather",
+    description = "Get the weather forecast of a city",
+    parameters = listOf(
+      LeapFunctionParameter(
+        name = "city",
+        type = LeapFunctionParameterType.String(),
+        description = "The city name",
+      ),
+    ),
+  ),
+)
+```
+</Tab>
+<Tab title="Swift">
+```swift
+conversation.registerFunction(
+  LeapFunction(
+    name: "get_weather",
+    description: "Query the weather of a city",
+    parameters: [
+      LeapFunctionParameter(
+        name: "city",
+        type: LeapFunctionParameterType.string(StringType()),
+        description: "The city to query weather for"
+      ),
+      LeapFunctionParameter(
+        name: "unit",
+        type: LeapFunctionParameterType.string(
+          StringType(enumValues: ["celsius", "fahrenheit"])),
+        description: "Temperature unit (celsius or fahrenheit)"
+      ),
+    ]
+  )
+)
+```
+</Tab>
+</Tabs>
+
+Generally speaking, function names and parameter names should be normal identifiers that are recognized by most common programming languages (e.g. Python, JavaScript, etc.).
+We recommend using descriptive names composed of only letters, underscores, and digits (not starting with digits).
+
+## Handle Function Calling Response
+
+Function calling requests by the model are returned as part of the response stream from `generateResponse`. Each platform represents them differently:
+
+<Tabs>
+<Tab title="Kotlin">
+On Android, function call requests arrive as a `MessageResponse.FunctionCalls` instance containing a list of function calls.
+
+```kotlin
+data class FunctionCalls(val functionCalls: List<LeapFunctionCall>): MessageResponse
+```
+
+Each `LeapFunctionCall` instance contains the name and arguments of the function call request. The `arguments` field is a map from `String` to `Any?`.
+The app needs to check whether the required parameters are filled by the model. It is possible (even though very unlikely) that some
+parameters are missing or the function name is invalid.
+
+```kotlin
+data class LeapFunctionCall(
+  val name: String,
+  val arguments: Map<String, Any?>,
+)
+```
+
+To handle the function call response, add a new branch to match responses from the `generateResponse` flow:
+
+```kotlin
+conversation.generateResponse(userMessage).onEach { response ->
+  when (response) {
+    is MessageResponse.Chunk -> {
+      // process text chunk
+    }
+    is MessageResponse.FunctionCalls {
+      response.functionCalls.forEach { call ->
+        // Process function calls here
+        Log.d(TAG, "Call function: ${call.name}, arguments: ${call.arguments}")
+      }
+    }
+    else -> {
+      // other responses
+    }
+}
+```
+
+The function calls are also included in the assistant message generated by the model, so it is possible to delay function call processing until generation is complete:
+
+```kotlin
+conversation.generateResponse(userMessage).onEach { response ->
+  when (response) {
+    is MessageResponse.Complete -> {
+      val assistantMessage = response.fullMessage
+      val functionCalls = assistantMessage.functionCalls
+      functionCalls?.forEach { call ->
+        // process function calls here
+        Log.d(TAG, "Call function: ${call.name}, arguments: ${call.arguments}")
+      }
+    }
+    else -> {
+      // process chunks
+    }
+  }
+}
+```
+</Tab>
+<Tab title="Swift">
+On iOS, function call requests arrive as the `.functionCall([LeapFunctionCall])` case of the `MessageResponse` enum.
+
+```swift
+public enum MessageResponse {
+  case functionCall([LeapFunctionCall])
+  // ...
+}
+```
+
+Each `LeapFunctionCall` instance contains the name and arguments of the function call request. The `arguments` field is a map from `String` to `Any?`.
+The app needs to check whether the required parameters are filled by the model. It is possible (even though very unlikely) that some
+parameters are missing or the function name is invalid.
+
+```swift
+public struct LeapFunctionCall {
+  public let name: String
+  public let arguments: [String: Any?]
+}
+```
+
+To handle the function call response, add a new branch to match responses from the `generateResponse` flow:
+
+```swift
+let userMessage = ChatMessage(role: .user, content: [.text("What's the weather in NYC?")])
+
+for try await response in conversation.generateResponse(
+  message: userMessage
+) {
+  switch response {
+  case .functionCall(let calls):
+    for call in calls {
+      // process function call here
+      print("Function call: \(call.name), \(call.arguments)")
+    }
+  case .audioSample:
+    break // Optional: route audio output elsewhere
+  default:
+    // process other responses
+    break
+  }
+}
+```
+</Tab>
+</Tabs>
+
+## Function Definition
+
+Functions for models to call are defined by `LeapFunction` instances. A `LeapFunction` has three fields: `name`, `description`, and `parameters`.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+data class LeapFunction(
+    val name: String,
+    val description: String,
+    val parameters: List<LeapFunctionParameter>,
+)
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public struct LeapFunction: Equatable {
+  public let name: String
+  public let description: String
+  public let parameters: [LeapFunctionParameter]
+}
+```
+</Tab>
+</Tabs>
+
+- `name` is the function name. It is recommended to use only English letters, underscores, and digits (not starting with digits) because this format is supported by most models.
+- `description` tells the model what this function does.
+- `parameters` declares what arguments the function accepts.
+
+### LeapFunctionParameter
+
+The items of `parameters` are instances of `LeapFunctionParameter`.
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+data class LeapFunctionParameter(
+    val name: String,
+    val type: LeapFunctionParameterType,
+    val description: String,
+    val optional: Boolean = false,
+)
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public struct LeapFunctionParameter: Equatable {
+  public let name: String
+  public let type: LeapFunctionParameterType
+  public let description: String
+  public let optional: Bool
+}
+```
+</Tab>
+</Tabs>
+
+- `name` -- The name of the parameter.
+- `type` -- Data type of the parameter.
+- `description` -- Tells the model what this parameter is about.
+- `optional` -- Whether the parameter is optional.
+
+### LeapFunctionParameterType
+
+`LeapFunctionParameterType` describes the data types of the parameters. They are translated into JSON Schema for the model to understand.
+The following types are supported:
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+LeapFunctionParameterType.String(enumValues: List<kotlin.String>? = null, description: kotlin.String? = null)
+LeapFunctionParameterType.Number(enumValues: List<kotlin.Number>? = null, description: kotlin.String? = null)
+LeapFunctionParameterType.Integer(enumValues: List<Int>? = null, description: kotlin.String? = null)
+LeapFunctionParameterType.Boolean(description: kotlin.String? = null)
+LeapFunctionParameterType.Array(itemType: LeapFunctionParameterType, description: kotlin.String? = null)
+LeapFunctionParameterType.Object(
+  properties: Map<kotlin.String, LeapFunctionParameterType>,
+  required: List<kotlin.String> = listOf(),
+  description: kotlin.String? = null,
+)
+```
+</Tab>
+<Tab title="Swift">
+```swift
+public indirect enum LeapFunctionParameterType: Codable, Equatable {
+  case string(StringType)
+  case number(NumberType)
+  case integer(IntegerType)
+  case boolean(BooleanType)
+  case array(ArrayType)
+  case object(ObjectType)
+  case null(NullType)
+}
+```
+</Tab>
+</Tabs>
+
+- **String** -- String literals. Accepts optional `enumValues` to restrict valid values.
+- **Number** -- Number literals including integers and floating point numbers. Accepts optional `enumValues`.
+- **Integer** -- Integer literals. Accepts optional `enumValues`.
+- **Boolean** -- Boolean literals.
+- **Array** -- Arrays of a defined type. The `itemType` parameter describes the data type of its items.
+- **Object** -- Objects with their own properties. `properties` maps property names to their data types. `required` lists the names of all non-optional properties.
+
+All types accept an optional `description`, but it will be overridden if the type is used directly as `LeapFunctionParameter.type`. The description only takes effect when the type instance is used as `Array.itemType` or as a type within object properties.
+
+### Comprehensive Example
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+LeapFunction(
+  name = "get_weather",
+  description = "Get the weather forecast of cities",
+  parameters = listOf(
+    LeapFunctionParameter(
+      name = "cities",
+      type = LeapFunctionParameterType.Array(
+        itemType = LeapFunctionParameterType.String()
+      ),
+      description = "City names to query",
+    ),
+    LeapFunctionParameter(
+      name = "temperature_unit",
+      type = LeapFunctionParameterType.String(
+        enumValues = listOf(
+          "Fahrenheit", "Celsius", "Kelvin"
+        )
+      ),
+      description = "Units for temperature",
+    ),
+  ),
+)
+```
+</Tab>
+<Tab title="Swift">
+```swift
+LeapFunction(
+  name: "get_weather",
+  description: "Query the weather of cities",
+  parameters: [
+    LeapFunctionParameter(
+      name: "cities",
+      type: LeapFunctionParameterType.array(
+        ArrayType(itemType: .string(StringType()))
+      ),
+      description: "Names of the cities to query weather for"
+    ),
+    LeapFunctionParameter(
+      name: "unit",
+      type: LeapFunctionParameterType.string(
+        StringType(enumValues: ["celsius", "fahrenheit"])),
+      description: "Temperature unit (celsius or fahrenheit)"
+    ),
+  ]
+)
+```
+</Tab>
+</Tabs>
+
+## Function Call Parser
+
+Function call parsers translate the model's tool-call tokens into `LeapFunctionCall` values. Different models emit tool calls in different formats, so you need to use the parser that matches your checkpoint.
+
+By default, `LFMFunctionCallParser` is used. It supports Liquid Foundation Model (LFM2) Pythonic-style control tokens.
+
+For Qwen3 models and other models that use the [Hermes function calling format](https://github.com/NousResearch/Hermes-Function-Calling),
+apply `HermesFunctionCallParser` by injecting a parser instance on the generation options:
+
+<Tabs>
+<Tab title="Kotlin">
+```kotlin
+val options = GenerationOptions.build {
+  functionCallParser = HermesFunctionCallParser()
+}
+conversation.generateResponse(userMessage, options).onEach {
+    // process message response here
+}
+```
+</Tab>
+<Tab title="Swift">
+```swift
+var options = GenerationOptions()
+options.functionCallParser = HermesFunctionCallParser()
+for try await response in conversation.generateResponse(
+  message: userMessage,
+  generationOptions: options
+) {
+  // process message response here
+}
+```
+</Tab>
+</Tabs>
diff --git a/deployment/on-device/leap-sdk/messages-content.mdx b/deployment/on-device/leap-sdk/messages-content.mdx
new file mode 100644
index 0000000..33703ea
--- /dev/null
+++ b/deployment/on-device/leap-sdk/messages-content.mdx
@@ -0,0 +1,470 @@
+---
+title: "Messages & Content"
+description: "API reference for chat messages and content types in the LEAP SDK"
+---
+
+## Chat Messages
+
+### Roles
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+Roles of the chat messages, which follows the OpenAI API definition:
+
+```kotlin
+enum class Role(val type: String) {
+  SYSTEM("system"),
+  USER("user"),
+  ASSISTANT("assistant"),
+  TOOL("tool"),
+}
+```
+
+- `SYSTEM`: Indicates the associated content is part of the system prompt. It is generally the first message, to provide guidance on how the model should behave.
+- `USER`: Indicates the associated content is user input.
+- `ASSISTANT`: Indicates the associated content is model-generated output.
+- `TOOL`: Used when appending function-call results back into the conversation.
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public enum ChatMessageRole: String {
+  case user
+  case system
+  case assistant
+  case tool
+}
+```
+
+Include `.tool` messages when you append function-call results back into the conversation.
+
+</Tab>
+
+</Tabs>
+
+### Message Structure
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+data class ChatMessage(
+  val role: Role,
+  val content: List<ChatMessageContent>
+  val reasoningContent: String? = null
+  val functionCalls: List<LeapFunctionCall>? = null,
+) {
+  fun toJSONObject(): JSONObject
+}
+
+ChatMessage.fromJSONObject(obj: JSONObject): ChatMessage
+```
+
+#### Fields
+
+- `role`: The role of this message (see [`ChatMessage.Role`](#roles)).
+- `content`: A list of message contents. Each element is an instance of [`ChatMessageContent`](#message-content).
+- `reasoningContent`: The reasoning content generated by the reasoning models. Only messages generated by reasoning models will have this field. For other models or other roles, this field should be `null`.
+- `functionCalls`: Function call requests generated by the model. See the Function Calling guide for more details.
+
+#### `toJSONObject`
+
+Returns a `JSONObject` that represents the chat message. The returned object is compatible with `ChatCompletionRequestMessage` from the OpenAI API. It contains 2 fields: `role` and `content`.
+
+#### `fromJSONObject`
+
+Constructs a `ChatMessage` instance from a `JSONObject`. Not all JSON object variants in `ChatCompletionRequestMessage` of the OpenAI API are acceptable. As of now, `role` supports `user`, `system` and `assistant`; `content` can be a string or an array.
+
+<Info>
+`LeapSerializationException` will be thrown if the provided JSONObject cannot be recognized as a
+message.
+</Info>
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public struct ChatMessage {
+  public var role: ChatMessageRole
+  public var content: [ChatMessageContent]
+  public var reasoningContent: String?
+  public var functionCalls: [LeapFunctionCall]?
+
+  public init(
+    role: ChatMessageRole,
+    content: [ChatMessageContent],
+    reasoningContent: String? = nil,
+    functionCalls: [LeapFunctionCall]? = nil
+  )
+
+  public init(from json: [String: Any]) throws
+}
+```
+
+#### Fields
+
+- `content`: Ordered fragments of the message. The SDK supports `.text`, `.image`, and `.audio` parts.
+- `reasoningContent`: Optional text produced inside `<think>` tags by eligible models.
+- `functionCalls`: Attach the calls returned by `MessageResponse.functionCall` when you include tool execution results in the history.
+
+</Tab>
+
+</Tabs>
+
+### Message Content
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+Data class that is compatible with the content object in the OpenAI chat completion API. It is a sealed interface.
+
+```kotlin
+abstract interface ChatMessageContent {
+  fun clone(): ChatMessageContent
+  fun toJSONObject(): JSONObject
+}
+fun ChatMessageContent.fromJSONObject(obj: JSONObject): ChatMessageContent
+
+data class ChatMessageContent.Text(val text: String): ChatMessageContent
+data class ChatMessageContent.Image(val jpegByteArray: ByteArray): ChatMessageContent
+data class ChatMessageContent.Audio(val wavByteArray: ByteArray): ChatMessageContent
+```
+
+- `toJSONObject` returns an OpenAI API compatible content object (with a `type` field and the real content fields).
+- `fromJSONObject` receives an OpenAI API compatible content object to build a message content. Not all OpenAI content objects are accepted.
+
+Currently, the following content types are supported:
+
+- `Text`: Pure text content.
+- `Image`: JPEG-encoded image content.
+- `Audio`: WAV-encoded audio content.
+
+<Info>
+`LeapSerializationException` will be thrown if the provided JSONObject cannot be recognized as a
+message.
+</Info>
+
+#### `ChatMessageContent.Text`
+
+```kotlin
+data class ChatMessageContent.Text(val text: String): ChatMessageContent
+```
+
+Pure text content. The content is available in the `text` field.
+
+#### `ChatMessageContent.Image`
+
+```kotlin
+data class ChatMessageContent.Image(val jpegByteArray: ByteArray): ChatMessageContent {
+  companion object {
+    suspend fun fromBitmap(
+      bitmap: android.graphics.Bitmap,
+      compressionQuality: Int = 85,
+    ): ChatMessageContent.Image
+  }
+}
+```
+
+Image content. Only JPEG-encoded data is supported. The `fromBitmap` helper function creates a `ChatMessageContent.Image` from an Android `Bitmap` object (the image will be compressed).
+
+<Warning>
+Only the models with vision capabilities can process image content. Sending image content to other
+models may result in unexpected outputs or errors.
+</Warning>
+
+#### `ChatMessageContent.Audio`
+
+```kotlin
+data class Audio(val wavByteArray: ByteArray) : ChatMessageContent {
+  constructor(data: ByteArray) : this(inputAudio = InputAudio(data = data))
+}
+```
+
+Audio content for speech recognition and audio understanding. The inference engine requires **WAV-encoded audio** with specific format requirements (see [Audio Format Requirements](#audio-format-requirements) below).
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+public enum ChatMessageContent {
+  case text(String)
+  case image(Data)   // JPEG bytes
+  case audio(Data)   // WAV bytes
+
+  public init(from json: [String: Any]) throws
+}
+```
+
+Provide JPEG-encoded bytes for `.image` and WAV data for `.audio`. Helper initializers such as `ChatMessageContent.fromUIImage`, `ChatMessageContent.fromNSImage`, `ChatMessageContent.fromWAVData`, and `ChatMessageContent.fromFloatSamples(_:sampleRate:channelCount:)` simplify interop with platform-native buffers. On the wire, image parts are encoded as OpenAI-style `image_url` payloads and audio parts as `input_audio` arrays with Base64 data.
+
+</Tab>
+
+</Tabs>
+
+## Audio
+
+### Audio Format Requirements
+
+The LEAP inference engine requires **WAV-encoded audio** with specific format requirements:
+
+| Property | Required Value | Notes |
+|----------|---------------|-------|
+| **Format** | WAV (RIFF) | Only WAV format is supported |
+| **Sample Rate** | 16000 Hz (16 kHz) recommended | Other sample rates are automatically resampled to 16 kHz |
+| **Encoding** | PCM (various bit depths) | Supports Float32, Int16, Int24, Int32 |
+| **Channels** | Mono (1 channel) | **Required** - stereo audio will be rejected |
+| **Byte Order** | Little-endian | Standard WAV format |
+
+**Supported PCM Encodings:**
+- **Float32**: 32-bit floating point, normalized to [-1.0, 1.0]
+- **Int16**: 16-bit signed integer, range [-32768, 32767] (recommended)
+- **Int24**: 24-bit signed integer, range [-8388608, 8388607]
+- **Int32**: 32-bit signed integer, range [-2147483648, 2147483647]
+
+<Warning>
+The inference engine **only accepts WAV format**. M4A, MP3, AAC, OGG, or other compressed formats are not supported and will cause errors. Audio must be converted to WAV before sending to the model.
+</Warning>
+
+<Info>
+**Automatic Resampling**: The inference engine automatically resamples audio to 16 kHz if provided at a different sample rate. However, for best performance and quality, provide audio at 16 kHz to avoid resampling overhead.
+</Info>
+
+<Warning>
+**Mono Channel Required**: The inference engine strictly requires single-channel (mono) audio. Multi-channel or stereo WAV files will be rejected with an error. Convert stereo audio to mono before sending.
+</Warning>
+
+### Creating Audio Content from WAV Files
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+val audioFile = File("/path/to/audio.wav")
+val wavBytes = audioFile.readBytes()
+val audioContent = ChatMessageContent.Audio(wavBytes)
+
+val message = ChatMessage(
+    role = ChatMessage.Role.USER,
+    content = listOf(
+        ChatMessageContent.Text("What is being said in this audio?"),
+        audioContent
+    )
+)
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+import LeapSDK
+
+// Load WAV file
+let wavURL = Bundle.main.url(forResource: "audio", withExtension: "wav")!
+let wavData = try Data(contentsOf: wavURL)
+
+let message = ChatMessage(
+    role: .user,
+    content: [
+        .text("What is being said in this audio?"),
+        .audio(wavData)
+    ]
+)
+```
+
+</Tab>
+
+</Tabs>
+
+### Creating Audio Content from Raw PCM Samples
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+If you're recording audio or have raw PCM data, use the `FloatAudioBuffer` utility to create properly formatted WAV files:
+
+```kotlin
+import ai.liquid.leap.audio.FloatAudioBuffer
+
+// Collect audio samples (32-bit float PCM, normalized to -1.0 to 1.0)
+val audioBuffer = FloatAudioBuffer(sampleRate = 16000)
+
+// Add audio chunks as they arrive
+audioBuffer.add(floatArrayOf(0.1f, 0.2f, 0.15f, ...))
+audioBuffer.add(floatArrayOf(0.3f, 0.25f, ...))
+
+// Create WAV-encoded bytes
+val wavBytes = audioBuffer.createWavBytes()
+val audioContent = ChatMessageContent.Audio(wavBytes)
+```
+
+<Info>
+`FloatAudioBuffer` automatically creates a valid WAV header and encodes the samples as 32-bit float PCM in a WAV container, which is compatible with the inference engine.
+</Info>
+
+</Tab>
+
+<Tab title="Swift">
+
+Use the `fromFloatSamples` helper to create WAV-encoded data from raw audio samples:
+
+```swift
+import AVFoundation
+
+// Float samples normalized to -1.0 to 1.0
+let samples: [Float] = [0.1, 0.2, 0.15, -0.3, ...]
+
+// Create WAV-encoded Data
+let audioContent = ChatMessageContent.fromFloatSamples(
+    samples,
+    sampleRate: 16000,
+    channelCount: 1
+)
+
+let message = ChatMessage(
+    role: .user,
+    content: [
+        .text("Transcribe this audio"),
+        audioContent
+    ]
+)
+```
+
+</Tab>
+
+</Tabs>
+
+### Recording Audio
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+When recording audio from the device microphone, configure `AudioRecord` or use a library like `WaveRecorder` with the correct settings:
+
+```kotlin
+import com.github.squti.androidwaverecorder.WaveRecorder
+
+val waveRecorder = WaveRecorder(outputFilePath)
+waveRecorder.configureWaveSettings {
+    sampleRate = 16000                                      // 16 kHz
+    channels = android.media.AudioFormat.CHANNEL_IN_MONO    // Mono
+    audioEncoding = android.media.AudioFormat.ENCODING_PCM_16BIT  // 16-bit PCM
+}
+
+waveRecorder.startRecording()
+// ... wait for user to finish speaking ...
+waveRecorder.stopRecording()
+
+// Read the WAV file
+val audioFile = File(outputFilePath)
+val wavBytes = audioFile.readBytes()
+val audioContent = ChatMessageContent.Audio(wavBytes)
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+When recording audio from the device microphone, configure `AVAudioRecorder` with the correct settings:
+
+```swift
+import AVFoundation
+
+let audioURL = FileManager.default.temporaryDirectory
+    .appendingPathComponent("recording.wav")
+
+let settings: [String: Any] = [
+    AVFormatIDKey: kAudioFormatLinearPCM,           // Linear PCM
+    AVSampleRateKey: 16000.0,                       // 16 kHz
+    AVNumberOfChannelsKey: 1,                       // Mono
+    AVLinearPCMBitDepthKey: 16,                     // 16-bit
+    AVLinearPCMIsFloatKey: false,                   // Integer samples
+    AVLinearPCMIsBigEndianKey: false                // Little-endian
+]
+
+let audioRecorder = try AVAudioRecorder(url: audioURL, settings: settings)
+audioRecorder.record()
+
+// ... wait for user to finish speaking ...
+
+audioRecorder.stop()
+
+// Read the WAV file
+let wavData = try Data(contentsOf: audioURL)
+let audioContent: ChatMessageContent = .audio(wavData)
+```
+
+</Tab>
+
+</Tabs>
+
+### Audio Duration Considerations
+
+- **Minimum duration**: At least 1 second of audio is recommended for reliable speech recognition
+- **Maximum duration**: Limited by the model's context window (typically several minutes)
+- **Silence**: Trim excessive silence from the beginning and end for better results
+
+### Audio Output from Models
+
+When generating audio responses (e.g., with `LFM2.5-Audio-1.5B`), the model outputs audio at **24 kHz sample rate**:
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+conversation.generateResponse(userMessage)
+    .onEach { response ->
+        when (response) {
+            is MessageResponse.AudioSample -> {
+                // samples: FloatArray (32-bit float PCM)
+                // sampleRate: Int (typically 24000 Hz for audio generation models)
+                val samples = response.samples
+                val sampleRate = response.sampleRate
+
+                // Accumulate or play audio samples
+                audioBuffer.add(samples)
+            }
+        }
+    }
+    .collect()
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+for try await response in conversation.generateResponse(message: userMessage) {
+    switch response {
+    case .audioSample(let samples, let sampleRate):
+        // samples: [Float] (32-bit float PCM, normalized -1.0 to 1.0)
+        // sampleRate: Int (typically 24000 Hz for audio generation models)
+
+        // Accumulate samples or play immediately
+        audioPlayer.enqueue(samples: samples, sampleRate: sampleRate)
+
+    default:
+        break
+    }
+}
+```
+
+</Tab>
+
+</Tabs>
+
+<Info>
+**Note**: Audio **input** should be 16 kHz, but audio **output** from generation models is typically 24 kHz. Make sure your audio playback code supports the correct sample rate.
+</Info>
diff --git a/deployment/on-device/leap-sdk/model-loading.mdx b/deployment/on-device/leap-sdk/model-loading.mdx
new file mode 100644
index 0000000..61fec54
--- /dev/null
+++ b/deployment/on-device/leap-sdk/model-loading.mdx
@@ -0,0 +1,336 @@
+---
+title: "Model Loading"
+description: "API reference for loading models in the LEAP SDK"
+---
+
+<Info>
+The LEAP SDK provides multiple model loading options:
+- **`LeapModelDownloader`** — Platform-specific downloader with background download support (Android: WorkManager + notifications; Apple: NSURLSession background downloads)
+- **`LeapDownloader`** — Cross-platform (Android, iOS, macOS, JVM), lightweight downloader for all platforms
+- **`Leap.load()`** — Swift convenience API via the compatibility layer
+</Info>
+
+## `LeapModelDownloader`
+
+The recommended option for production apps on Android and Apple platforms. It provides platform-specific features for robust background downloads.
+
+<Tabs>
+  <Tab title="Kotlin">
+    ```kotlin
+    class LeapModelDownloader(
+        private val context: Context,
+        modelFileDir: File? = null,
+        private val extraHTTPRequestHeaders: Map<String, String> = mapOf(),
+        private val notificationConfig: LeapModelDownloaderNotificationConfig = LeapModelDownloaderNotificationConfig(),
+    )
+    ```
+
+    <Info>
+    This class is part of the `ai.liquid.leap:leap-model-downloader` module. It uses Android WorkManager for background downloads and displays foreground service notifications.
+    </Info>
+
+    ### Constructor Parameters
+
+    | Field | Type | Required | Default | Description |
+    |-------|------|----------|---------|-------------|
+    | `context` | `Context` | Yes | - | Android context (Activity or Application) |
+    | `modelFileDir` | `File` | No | `null` | Directory to save models. If null, uses app's external files directory. |
+    | `extraHTTPRequestHeaders` | `Map<String, String>` | No | `mapOf()` | Additional HTTP headers for download requests |
+    | `notificationConfig` | `LeapModelDownloaderNotificationConfig` | No | `LeapModelDownloaderNotificationConfig()` | Notification configuration for the foreground service |
+
+    ### `loadModel`
+
+    Download and load a model in one operation. If the model is already cached, it loads directly without downloading.
+
+    | Name | Type | Required | Default | Description |
+    |------|------|----------|---------|-------------|
+    | `modelSlug` | `String` | Yes | - | The model name (e.g., "LFM2-1.2B"). See the [LEAP Model Library](https://leap.liquid.ai/models). |
+    | `quantizationSlug` | `String` | Yes | - | The quantization level (e.g., "Q4_K_M", "Q5_K_M"). |
+    | `modelLoadingOptions` | `ModelLoadingOptions` | No | `null` | Options for loading the model. See [`ModelLoadingOptions`](#modelloadingoptions). |
+    | `generationTimeParameters` | `GenerationTimeParameters` | No | `null` | Parameters to control generation. See [`GenerationTimeParameters`](#generationtimeparameters). |
+    | `progress` | `(ProgressData) -> Unit` | No | `{}` | Callback for download progress updates |
+
+    **Returns:** [`ModelRunner`](./conversation-generation#modelrunner) instance.
+
+    ### `downloadModel`
+
+    Download a model without loading it into memory. Useful for pre-downloading models in the background.
+
+    | Name | Type | Required | Default | Description |
+    |------|------|----------|---------|-------------|
+    | `modelSlug` | `String` | Yes | - | The model name |
+    | `quantizationSlug` | `String` | Yes | - | The quantization level |
+    | `progress` | `(ProgressData) -> Unit` | No | `{}` | Callback for download progress |
+
+    **Returns:** `Manifest` metadata about the downloaded model.
+
+    ### Example
+
+    ```kotlin
+    import ai.liquid.leap.model_downloader.LeapModelDownloader
+    import ai.liquid.leap.model_downloader.LeapModelDownloaderNotificationConfig
+
+    val modelDownloader = LeapModelDownloader(
+        context,
+        notificationConfig = LeapModelDownloaderNotificationConfig.build {
+            notificationTitleDownloading = "Downloading AI model..."
+            notificationTitleDownloaded = "Model ready!"
+        }
+    )
+
+    lifecycleScope.launch {
+        val modelRunner = modelDownloader.loadModel(
+            modelSlug = "LFM2-1.2B",
+            quantizationSlug = "Q5_K_M",
+            progress = { progressData ->
+                println("Progress: ${progressData.progress * 100}%")
+            }
+        )
+    }
+    ```
+  </Tab>
+  <Tab title="Swift">
+    On Apple platforms, `LeapModelDownloader` supports background downloads via `NSURLSessionConfiguration`.
+
+    ```swift
+    import LeapModelDownloader
+
+    let downloader = LeapModelDownloader()
+
+    // Download and get manifest
+    let manifest = try await downloader.downloadModel(
+        model: "LFM2-1.2B",
+        quantization: "Q5_K_M"
+    ) { fraction, bytesPerSecond in
+        print("Progress: \(Int(fraction * 100))%")
+    }
+
+    // Query download status
+    let status = await downloader.queryStatus(
+        model: "LFM2-1.2B",
+        quantization: "Q5_K_M"
+    )
+    switch status {
+    case .notOnLocal:
+        print("Model not downloaded")
+    case .downloadInProgress(let progress):
+        print("Downloading: \(Int(progress * 100))%")
+    case .downloaded:
+        print("Model ready")
+    }
+
+    // Remove downloaded model
+    try await downloader.removeModel(
+        model: "LFM2-1.2B",
+        quantization: "Q5_K_M"
+    )
+    ```
+  </Tab>
+</Tabs>
+
+## `Leap.load()` (Swift)
+
+`Leap` is the static entry point for loading models on Apple platforms via the Swift compatibility layer.
+
+### `Leap.load(model:quantization:options:downloadProgressHandler:)`
+
+Download a model from the LEAP Model Library and load it into memory. If already downloaded, it loads from the local cache.
+
+```swift
+public struct Leap {
+  public static func load(
+    model: String,
+    quantization: String,
+    options: LiquidInferenceEngineManifestOptions? = nil,
+    downloadProgressHandler: @escaping (_ progress: Double, _ speed: Int64) -> Void
+  ) async throws -> ModelRunner
+}
+```
+
+| Name | Type | Required | Default | Description |
+|------|------|----------|---------|-------------|
+| `model` | `String` | Yes | - | The model name. See the [LEAP Model Library](https://leap.liquid.ai/models). |
+| `quantization` | `String` | Yes | - | The quantization level. |
+| `options` | `LiquidInferenceEngineManifestOptions` | No | `nil` | Override options for loading (advanced). |
+| `downloadProgressHandler` | `(Double, Int64) -> Void` | No | `nil` | Progress callback (0–1 fraction, bytes/sec). |
+
+**Returns:** [`ModelRunner`](./conversation-generation#modelrunner) instance.
+
+<Accordion title="Legacy: Leap.load(url:options:)">
+  Load a local model file (`.bundle` or `.gguf`) directly:
+
+  ```swift
+  public struct Leap {
+    public static func load(
+      url: URL,
+      options: LiquidInferenceEngineOptions? = nil
+    ) async throws -> ModelRunner
+  }
+  ```
+
+  ```swift
+  // ExecuTorch backend via .bundle
+  let bundleURL = Bundle.main.url(forResource: "qwen3-0_6b", withExtension: "bundle")!
+  let runner = try await Leap.load(url: bundleURL)
+
+  // llama.cpp backend via .gguf
+  let ggufURL = Bundle.main.url(forResource: "qwen3-0_6b", withExtension: "gguf")!
+  let ggufRunner = try await Leap.load(url: ggufURL)
+  ```
+</Accordion>
+
+### `LiquidInferenceEngineOptions`
+
+Override runtime configuration when loading from a local file:
+
+```swift
+public struct LiquidInferenceEngineOptions {
+  public var bundlePath: String
+  public let cacheOptions: LiquidCacheOptions?
+  public let cpuThreads: UInt32?
+  public let contextSize: UInt32?
+  public let nGpuLayers: UInt32?
+  public let mmProjPath: String?
+  public let audioDecoderPath: String?
+  public let chatTemplate: String?
+  public let audioTokenizerPath: String?
+  public let extras: String?
+}
+```
+
+- `cpuThreads`: Number of CPU threads for token generation.
+- `contextSize`: Override maximum context length.
+- `nGpuLayers`: Layers to offload to GPU (macOS/Mac Catalyst with Metal).
+- `mmProjPath`: Auxiliary multimodal projection model path. Leave `nil` to auto-detect `mmproj-*.gguf`.
+- `audioDecoderPath`: Audio decoder model. Leave `nil` to auto-detect.
+
+```swift
+let options = LiquidInferenceEngineOptions(
+  bundlePath: bundleURL.path,
+  cpuThreads: 6,
+  contextSize: 8192
+)
+let runner = try await Leap.load(url: bundleURL, options: options)
+```
+
+## `LeapDownloader` (Cross-Platform)
+
+A lightweight, cross-platform model loader available in the core `leap-sdk` module. Works on all platforms (Android, iOS, macOS, JVM).
+
+```kotlin
+class LeapDownloader(config: LeapDownloaderConfig = LeapDownloaderConfig())
+```
+
+### `loadModel`
+
+Download and load a model. If already cached, loads from the local cache.
+
+| Name | Type | Required | Default | Description |
+|------|------|----------|---------|-------------|
+| `modelSlug` | `String` | Yes | - | The model name. See the [LEAP Model Library](https://leap.liquid.ai/models). |
+| `quantizationSlug` | `String` | Yes | - | The quantization level. |
+| `modelLoadingOptions` | `ModelLoadingOptions` | No | `null` | Options for loading. See [`ModelLoadingOptions`](#modelloadingoptions). |
+| `generationTimeParameters` | `GenerationTimeParameters` | No | `null` | Generation parameters. See [`GenerationTimeParameters`](#generationtimeparameters). |
+| `progress` | `(ProgressData) -> Unit` | No | `{}` | Download progress callback. |
+
+**Returns:** [`ModelRunner`](./conversation-generation#modelrunner) instance.
+
+### `downloadModel`
+
+Download a model without loading it into memory.
+
+| Name | Type | Required | Default | Description |
+|------|------|----------|---------|-------------|
+| `modelSlug` | `String` | Yes | - | The model name. |
+| `quantizationSlug` | `String` | Yes | - | The quantization level. |
+| `progress` | `(ProgressData) -> Unit` | No | `{}` | Download progress callback. |
+
+**Returns:** `Manifest` metadata about the downloaded model.
+
+## Supporting Types
+
+### `LeapDownloaderConfig`
+
+```kotlin
+data class LeapDownloaderConfig(
+    val saveDir: String = "leap_models",
+    val validateSha256: Boolean = true,
+)
+```
+
+### `GenerationTimeParameters`
+
+```kotlin
+data class GenerationTimeParameters(
+    val samplingParameters: SamplingParameters? = null,
+    val numberOfDecodingThreads: Int? = null,
+)
+```
+
+### `SamplingParameters`
+
+```kotlin
+data class SamplingParameters(
+    val temperature: Double? = null,
+    val topP: Double? = null,
+    val minP: Double? = null,
+    val repetitionPenalty: Double? = null,
+)
+```
+
+<Warning>
+LEAP models are trained to perform well with the default parameters from the model manifest. Overriding with `SamplingParameters` can degrade output quality.
+</Warning>
+
+### `ProgressData`
+
+```kotlin
+data class ProgressData(
+    val bytes: Long,
+    val total: Long,
+) {
+    val progress: Float // 0.0 to 1.0
+}
+```
+
+### `Manifest`
+
+```kotlin
+data class Manifest(
+    val schemaVersion: String,
+    val inferenceType: String,
+    val loadTimeParameters: LoadTimeParameters,
+    val generationTimeParameters: GenerationTimeParameters? = null,
+    val originalUrl: String? = null,
+    val pathOnDisk: String? = null,
+)
+```
+
+### `ModelLoadingOptions`
+
+```kotlin
+data class ModelLoadingOptions(
+    var randomSeed: Long? = null,
+    var cpuThreads: Int = 2,
+) {
+    companion object {
+        fun build(action: ModelLoadingOptions.() -> Unit): ModelLoadingOptions
+    }
+}
+```
+
+- `randomSeed`: Set the random seed to reproduce output.
+- `cpuThreads`: Number of threads for generation.
+
+<Accordion title="Legacy: LeapClient">
+  The `LeapClient` class is a legacy entry point. Use `LeapDownloader` or `LeapModelDownloader` instead.
+
+  ```kotlin
+  object LeapClient {
+    suspend fun loadModel(path: String, options: ModelLoadingOptions? = null): ModelRunner
+    suspend fun loadModelAsResult(path: String, options: ModelLoadingOptions? = null): Result<ModelRunner>
+    suspend fun loadModel(bundlePath: String, mmprojPath: String, options: ModelLoadingOptions? = null): ModelRunner
+    suspend fun loadModel(model: AudioGenerationModelDescriptor, options: ModelLoadingOptions? = null): ModelRunner
+  }
+  ```
+</Accordion>
diff --git a/deployment/on-device/leap-sdk/quick-start-guide.mdx b/deployment/on-device/leap-sdk/quick-start-guide.mdx
new file mode 100644
index 0000000..0a0b220
--- /dev/null
+++ b/deployment/on-device/leap-sdk/quick-start-guide.mdx
@@ -0,0 +1,577 @@
+---
+title: "Quick Start Guide"
+description: "Get up and running with the LEAP SDK in minutes. Install the SDK, load models, and start generating content on Android, iOS, macOS, and more."
+---
+
+Latest version: `v0.10.0 (preview)`
+
+<Info>
+The LEAP SDK is a **Kotlin Multiplatform** library. It supports Android, iOS, macOS, JVM, and more from a single codebase. Choose your platform below to get started.
+</Info>
+
+## 1. Prerequisites
+
+<Tabs>
+  <Tab title="Kotlin">
+    You should already have:
+
+    - An Android project created in Android Studio. The LEAP SDK is Kotlin-first.
+    - [Kotlin Android plugin](https://kotlinlang.org/docs/releases.html#update-to-a-new-kotlin-version) v2.3.0 or above and [Android Gradle Plugin](https://developer.android.com/build/releases/gradle-plugin) v8.13.0 or above. Declare them in your root `build.gradle.kts`:
+      ```kotlin
+      plugins {
+          id("com.android.application") version "8.13.2" apply false
+          id("com.android.library") version "8.13.2" apply false
+          id("org.jetbrains.kotlin.android") version "2.3.10" apply false
+      }
+      ```
+    - A working Android device that supports `arm64-v8a` ABI with [developer mode enabled](https://developer.android.com/studio/debug/dev-options). We recommend 3GB+ RAM.
+    - Minimum SDK requirement is API 31:
+      ```kotlin
+      android { defaultConfig { minSdk = 31; targetSdk = 36 } }
+      ```
+
+    <Warning>
+    The SDK may crash when loading models on emulators. A physical Android device is recommended.
+    </Warning>
+  </Tab>
+  <Tab title="Swift">
+    You should already have:
+
+    - Xcode 15.0 or later with Swift 5.9.
+    - An iOS project targeting **iOS 14.0+** (macOS 11.0+ is also supported).
+    - A physical iPhone or iPad with at least 3 GB RAM for best performance. The simulator works for development but runs models much slower.
+
+    <Warning>
+    Always test on a real device before shipping. Simulator performance is not representative of production behaviour.
+    </Warning>
+  </Tab>
+</Tabs>
+
+## 2. Install the SDK
+
+<Tabs>
+  <Tab title="Kotlin">
+    **Option A: Version catalog (recommended)**
+
+    In `gradle/libs.versions.toml`:
+
+    ```toml
+    [versions]
+    leapSdk = "0.10.0-SNAPSHOT"
+
+    [libraries]
+    leap-sdk = { module = "ai.liquid.leap:leap-sdk", version.ref = "leapSdk" }
+    leap-model-downloader = { module = "ai.liquid.leap:leap-model-downloader", version.ref = "leapSdk" }
+    ```
+
+    Then in `app/build.gradle.kts`:
+
+    ```kotlin
+    dependencies {
+      implementation(libs.leap.sdk)
+      implementation(libs.leap.model.downloader)
+    }
+    ```
+
+    **Option B: Direct dependency declaration**
+
+    ```kotlin
+    dependencies {
+      implementation("ai.liquid.leap:leap-sdk:0.10.0-SNAPSHOT")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.0-SNAPSHOT")
+    }
+    ```
+
+    Then perform a project sync in Android Studio to fetch the LeapSDK artifacts.
+  </Tab>
+  <Tab title="Swift">
+    Install via **Swift Package Manager**:
+
+    1. In Xcode choose **File -> Add Package Dependencies**.
+    2. Enter `https://github.com/Liquid4All/leap-sdk`.
+    3. Select the `0.10.0-SNAPSHOT` tag.
+    4. Add the **`LeapSDK`** product to your app target.
+    5. (Optional) Add **`LeapModelDownloader`** if you plan to download models at runtime.
+
+    <Info>
+    The constrained-generation macros (`@Generatable`, `@Guide`) ship inside the `LeapSDK` product. No additional package is required.
+    </Info>
+  </Tab>
+</Tabs>
+
+## 3. Configure Permissions (Android only)
+
+<Tabs>
+  <Tab title="Kotlin">
+    The `LeapModelDownloader` runs as a foreground service and displays notifications during downloads. Add the following permissions to your `AndroidManifest.xml`:
+
+    ```xml
+    <manifest xmlns:android="http://schemas.android.com/apk/res/android">
+        <uses-permission android:name="android.permission.INTERNET" />
+        <uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
+        <uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
+        <uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC" />
+
+        <application ...>
+            <!-- Your activities -->
+        </application>
+    </manifest>
+    ```
+
+    <Info>
+    The `POST_NOTIFICATIONS` permission requires a runtime permission request on Android 13 (API 33) and above. See the code example in step 4 for how to request this permission.
+    </Info>
+  </Tab>
+  <Tab title="Swift">
+    No special permissions are required on iOS or macOS. The SDK handles network access and file storage automatically.
+  </Tab>
+</Tabs>
+
+## 4. Load a Model
+
+The SDK uses **GGUF manifests** for loading models. Given a model name and quantization (from the [LEAP Model Library](https://leap.liquid.ai/models)), the SDK automatically downloads the necessary files and loads the model with optimal parameters.
+
+<Tabs>
+  <Tab title="Kotlin">
+    **Using LeapModelDownloader (recommended for Android)**
+
+    `LeapModelDownloader` provides background downloads with WorkManager integration and notification support.
+
+    **ViewModel**
+
+    ```kotlin
+    import android.app.Application
+    import androidx.lifecycle.AndroidViewModel
+    import androidx.lifecycle.viewModelScope
+    import ai.liquid.leap.Conversation
+    import ai.liquid.leap.ModelRunner
+    import ai.liquid.leap.model_downloader.LeapModelDownloader
+    import ai.liquid.leap.model_downloader.LeapModelDownloaderNotificationConfig
+    import kotlinx.coroutines.Dispatchers
+    import kotlinx.coroutines.flow.MutableStateFlow
+    import kotlinx.coroutines.flow.StateFlow
+    import kotlinx.coroutines.flow.asStateFlow
+    import kotlinx.coroutines.launch
+    import kotlinx.coroutines.runBlocking
+
+    class ChatViewModel(application: Application) : AndroidViewModel(application) {
+        private val modelDownloader = LeapModelDownloader(
+            application,
+            notificationConfig = LeapModelDownloaderNotificationConfig.build {
+                notificationTitleDownloading = "Downloading AI model..."
+                notificationTitleDownloaded = "Model ready!"
+                notificationContentDownloading = "Please wait while the model downloads"
+            }
+        )
+
+        private var modelRunner: ModelRunner? = null
+        private var conversation: Conversation? = null
+
+        private val _isLoading = MutableStateFlow(false)
+        val isLoading: StateFlow<Boolean> = _isLoading.asStateFlow()
+
+        private val _downloadProgress = MutableStateFlow(0f)
+        val downloadProgress: StateFlow<Float> = _downloadProgress.asStateFlow()
+
+        private val _errorMessage = MutableStateFlow<String?>(null)
+        val errorMessage: StateFlow<String?> = _errorMessage.asStateFlow()
+
+        fun loadModel() {
+            viewModelScope.launch {
+                _isLoading.value = true
+                _errorMessage.value = null
+                try {
+                    modelRunner = modelDownloader.loadModel(
+                        modelSlug = "LFM2-1.2B",
+                        quantizationSlug = "Q5_K_M",
+                        progress = { progressData ->
+                            _downloadProgress.value = progressData.progress
+                        }
+                    )
+                    conversation = modelRunner?.createConversation()
+                    _isLoading.value = false
+                } catch (e: Exception) {
+                    _errorMessage.value = "Failed to load model: ${e.message}"
+                    _isLoading.value = false
+                }
+            }
+        }
+
+        override fun onCleared() {
+            super.onCleared()
+            runBlocking(Dispatchers.IO) {
+                modelRunner?.unload()
+            }
+        }
+    }
+    ```
+
+    **Activity**
+
+    ```kotlin
+    import android.os.Build
+    import android.os.Bundle
+    import androidx.activity.result.contract.ActivityResultContracts
+    import androidx.activity.viewModels
+    import androidx.appcompat.app.AppCompatActivity
+    import androidx.core.content.ContextCompat
+    import androidx.lifecycle.lifecycleScope
+    import android.content.pm.PackageManager
+    import kotlinx.coroutines.launch
+
+    class MainActivity : AppCompatActivity() {
+        private val viewModel: ChatViewModel by viewModels()
+
+        private val requestPermissionLauncher = registerForActivityResult(
+            ActivityResultContracts.RequestPermission()
+        ) { isGranted: Boolean ->
+            viewModel.loadModel()
+        }
+
+        override fun onCreate(savedInstanceState: Bundle?) {
+            super.onCreate(savedInstanceState)
+            setContentView(R.layout.activity_main)
+
+            lifecycleScope.launch {
+                viewModel.isLoading.collect { isLoading ->
+                    // Update UI loading indicator
+                }
+            }
+
+            lifecycleScope.launch {
+                viewModel.downloadProgress.collect { progress ->
+                    // Update download progress UI (0.0 to 1.0)
+                }
+            }
+
+            checkPermissionsAndLoadModel()
+        }
+
+        private fun checkPermissionsAndLoadModel() {
+            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
+                when {
+                    ContextCompat.checkSelfPermission(
+                        this,
+                        android.Manifest.permission.POST_NOTIFICATIONS
+                    ) == PackageManager.PERMISSION_GRANTED -> {
+                        viewModel.loadModel()
+                    }
+                    else -> {
+                        requestPermissionLauncher.launch(android.Manifest.permission.POST_NOTIFICATIONS)
+                    }
+                }
+            } else {
+                viewModel.loadModel()
+            }
+        }
+    }
+    ```
+
+    <Accordion title="Alternative: Using LeapDownloader (Cross-Platform)">
+      For cross-platform projects or if you don't need Android-specific features, use `LeapDownloader` from the core `leap-sdk` module:
+
+      ```kotlin
+      import ai.liquid.leap.LeapDownloader
+      import ai.liquid.leap.LeapDownloaderConfig
+
+      lifecycleScope.launch {
+        try {
+          val baseDir = File(context.filesDir, "model_files").absolutePath
+          val modelDownloader = LeapDownloader(config = LeapDownloaderConfig(saveDir = baseDir))
+          val modelRunner = modelDownloader.loadModel(
+              modelSlug = "LFM2-1.2B",
+              quantizationSlug = "Q5_K_M"
+          )
+        } catch (e: LeapModelLoadingException) {
+          Log.e(TAG, "Failed to load the model. Error message: ${e.message}")
+        }
+      }
+      ```
+
+      This approach works on all platforms (Android, iOS, macOS, JVM) but doesn't provide Android-specific features like background downloads or notifications.
+    </Accordion>
+
+    <Accordion title="Legacy: Executorch Bundles">
+      Browse the [Leap Model Library](https://leap.liquid.ai/models) to download a model bundle.
+
+      Push the bundle to the device:
+
+      ```bash
+      adb shell mkdir -p /data/local/tmp/leap
+      adb push ~/Downloads/model.bundle /data/local/tmp/leap/model.bundle
+      ```
+
+      Load from the local bundle:
+
+      ```kotlin
+      lifecycleScope.launch {
+        try {
+          modelRunner = LeapClient.loadModel("/data/local/tmp/leap/model.bundle")
+        }
+        catch (e: LeapModelLoadingException) {
+          Log.e(TAG, "Failed to load the model. Error message: ${e.message}")
+        }
+      }
+      ```
+    </Accordion>
+  </Tab>
+  <Tab title="Swift">
+    ```swift
+    import LeapSDK
+
+    @MainActor
+    final class ChatViewModel: ObservableObject {
+        @Published var isLoading = false
+        @Published var conversation: Conversation?
+        private var modelRunner: ModelRunner?
+        private var generationTask: Task<Void, Never>?
+
+        func loadModel() async {
+            isLoading = true
+            defer { isLoading = false }
+            do {
+                let modelRunner = try await Leap.load(
+                    model: "LFM2-1.2B",
+                    quantization: "Q5_K_M",
+                    downloadProgressHandler: { progress, speed in
+                        // progress: Double (0...1), speed: bytes per second
+                    }
+                )
+                conversation = modelRunner.createConversation(
+                    systemPrompt: "You are a helpful assistant."
+                )
+                self.modelRunner = modelRunner
+            } catch {
+                print("Failed to load model: \(error)")
+            }
+        }
+    }
+    ```
+
+    <Accordion title="Legacy: Executorch Bundles">
+      Browse the [Leap Model Library](https://leap.liquid.ai/models) and download a `.bundle` file.
+
+      Ship it with your app by dragging the bundle into your Xcode project, or download at runtime using `LeapModelDownloader`.
+
+      ```swift
+      guard let bundleURL = Bundle.main.url(
+          forResource: "LFM2-350-ENJP-MT",
+          withExtension: "bundle"
+      ) else {
+          assertionFailure("Model bundle missing")
+          return
+      }
+
+      let modelRunner = try await Leap.load(url: bundleURL)
+      let conversation = modelRunner.createConversation(
+          systemPrompt: "You are a helpful assistant."
+      )
+      ```
+
+      Override runtime settings with `LiquidInferenceEngineOptions`:
+
+      ```swift
+      let options = LiquidInferenceEngineOptions(
+          bundlePath: bundleURL.path,
+          cpuThreads: 6,
+          contextSize: 8192,
+          nGpuLayers: 8
+      )
+      let runner = try await Leap.load(url: bundleURL, options: options)
+      ```
+    </Accordion>
+  </Tab>
+</Tabs>
+
+## 5. Generate Content
+
+<Tabs>
+  <Tab title="Kotlin">
+    Use the conversation object to generate content. `Conversation.generateResponse` returns a Kotlin [Flow](https://kotlinlang.org/docs/flow.html) of `MessageResponse`.
+
+    ```kotlin
+    import ai.liquid.leap.MessageResponse
+    import kotlinx.coroutines.Job
+    import kotlinx.coroutines.flow.catch
+    import kotlinx.coroutines.flow.onCompletion
+    import kotlinx.coroutines.flow.onEach
+
+    class ChatViewModel(application: Application) : AndroidViewModel(application) {
+        // ... previous code ...
+
+        private val _responseText = MutableStateFlow("")
+        val responseText: StateFlow<String> = _responseText.asStateFlow()
+
+        private val _isGenerating = MutableStateFlow(false)
+        val isGenerating: StateFlow<Boolean> = _isGenerating.asStateFlow()
+
+        private var generationJob: Job? = null
+
+        fun generateResponse(userMessage: String) {
+            generationJob?.cancel()
+
+            generationJob = viewModelScope.launch {
+                _isGenerating.value = true
+                _responseText.value = ""
+
+                conversation?.generateResponse(userMessage)
+                    ?.onEach { response ->
+                        when (response) {
+                            is MessageResponse.Chunk -> {
+                                _responseText.value += response.text
+                            }
+                            is MessageResponse.ReasoningChunk -> {
+                                Log.d(TAG, "Reasoning: ${response.text}")
+                            }
+                            is MessageResponse.Complete -> {
+                                Log.d(TAG, "Generation done. Stats: ${response.stats}")
+                            }
+                            else -> {}
+                        }
+                    }
+                    ?.onCompletion {
+                        _isGenerating.value = false
+                    }
+                    ?.catch { exception ->
+                        Log.e(TAG, "Generation failed: ${exception.message}")
+                        _isGenerating.value = false
+                    }
+                    ?.collect()
+            }
+        }
+
+        fun stopGeneration() {
+            generationJob?.cancel()
+            _isGenerating.value = false
+        }
+
+        companion object {
+            private const val TAG = "ChatViewModel"
+        }
+    }
+    ```
+
+    - `onEach` is called for each generated chunk
+    - `onCompletion` fires when generation finishes — at this point, `conversation.history` contains the complete conversation
+    - `catch` handles exceptions during generation
+    - Cancel `generationJob` to stop generation early
+  </Tab>
+  <Tab title="Swift">
+    `Conversation.generateResponse` returns an `AsyncThrowingStream<MessageResponse, Error>`. Iterate it with `for try await`:
+
+    ```swift
+    extension ChatViewModel {
+        func send(_ text: String) {
+            guard let conversation else { return }
+            generationTask?.cancel()
+            let userMessage = ChatMessage(role: .user, content: [.text(text)])
+
+            generationTask = Task { [weak self] in
+                do {
+                    for try await response in conversation.generateResponse(
+                        message: userMessage
+                    ) {
+                        self?.handle(response)
+                    }
+                } catch {
+                    print("Generation failed: \(error)")
+                }
+            }
+        }
+
+        func stopGeneration() {
+            generationTask?.cancel()
+        }
+
+        @MainActor
+        private func handle(_ response: MessageResponse) {
+            switch response {
+            case .chunk(let delta):
+                print(delta, terminator: "")
+            case .reasoningChunk(let thought):
+                print("Reasoning:", thought)
+            case .audioSample(let samples, let sr):
+                print("Audio samples: \(samples.count) at \(sr)Hz")
+            case .functionCall(let calls):
+                print("Function calls: \(calls)")
+            case .complete(let completion):
+                if let stats = completion.stats {
+                    print("Finished: \(stats.totalTokens) tokens")
+                }
+            }
+        }
+    }
+    ```
+
+    Cancel the task to stop generation early. You can also observe `conversation.isGenerating` to disable UI controls while a request is in flight.
+  </Tab>
+</Tabs>
+
+### Send Images and Audio (optional)
+
+When the loaded model supports multimodal input, you can include image and audio content in messages:
+
+<Tabs>
+  <Tab title="Kotlin">
+    ```kotlin
+    // Image input (JPEG bytes)
+    val imageMessage = ChatMessage(
+        role = ChatMessage.Role.USER,
+        content = listOf(
+            ChatMessageContent.Text("Describe what you see."),
+            ChatMessageContent.Image(jpegBytes)
+        )
+    )
+
+    // Audio input (WAV bytes, 16kHz mono)
+    val audioMessage = ChatMessage(
+        role = ChatMessage.Role.USER,
+        content = listOf(
+            ChatMessageContent.Text("Transcribe this audio."),
+            ChatMessageContent.Audio(wavBytes)
+        )
+    )
+    ```
+  </Tab>
+  <Tab title="Swift">
+    ```swift
+    // Image input (JPEG Data)
+    let imageMessage = ChatMessage(
+        role: .user,
+        content: [
+            .text("Describe what you see."),
+            .image(jpegData)
+        ]
+    )
+
+    // Audio input (WAV Data, 16kHz mono)
+    let audioMessage = ChatMessage(
+        role: .user,
+        content: [
+            .text("Transcribe this audio."),
+            .audio(wavData)
+        ]
+    )
+
+    // From raw PCM float samples
+    let pcmMessage = ChatMessage(
+        role: .user,
+        content: [
+            .text("Give feedback on my pronunciation."),
+            ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)
+        ]
+    )
+    ```
+  </Tab>
+</Tabs>
+
+## 6. Examples
+
+See [LeapSDK-Examples](https://github.com/Liquid4All/LeapSDK-Examples) for complete example apps.
+
+## Next Steps
+
+- Learn about structured JSON output with [Constrained Generation](./constrained-generation).
+- Wire up tools and external APIs with [Function Calling](./function-calling).
+- Compare on-device and cloud behaviour in [Cloud AI Comparison](./cloud-ai-comparison).
+- Explore the full API in [Model Loading](./model-loading) and [Conversation & Generation](./conversation-generation).
diff --git a/deployment/on-device/leap-sdk/utilities.mdx b/deployment/on-device/leap-sdk/utilities.mdx
new file mode 100644
index 0000000..b8f8a59
--- /dev/null
+++ b/deployment/on-device/leap-sdk/utilities.mdx
@@ -0,0 +1,498 @@
+---
+title: "Utilities"
+description: "API reference for error handling, serialization, and utilities in the LEAP SDK"
+---
+
+## Error Handling
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+All errors are thrown as `LeapException`, which has the following subclasses:
+
+- `LeapModelLoadingException`: Error loading the model.
+- `LeapGenerationException`: Error generating content.
+- `LeapGenerationPromptExceedContextLengthException`: The prompt text exceeds the maximum context length so no content will be generated.
+- `LeapSerializationException`: Error serializing or deserializing data.
+
+</Tab>
+
+<Tab title="Swift">
+
+Errors are surfaced as `LeapError` values. The most common cases are:
+
+- `LeapError.modelLoadingFailure`: Problems reading or validating the model bundle.
+- `LeapError.generationFailure`: Unexpected native inference errors.
+- `LeapError.promptExceedContextLengthFailure`: Prompt length exceeded the configured context size.
+- `LeapError.serializationFailure`: JSON encoding/decoding problems when working with chat history or function calls.
+
+Handle thrown errors with `do` / `catch` when using async streams, or use the `onErrorCallback` in the lower-level API.
+
+</Tab>
+
+</Tabs>
+
+## Serialization Support
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+The LEAP SDK uses [kotlinx.serialization](https://github.com/Kotlin/kotlinx.serialization) for JSON serialization and deserialization. This is built into the core SDK and requires no additional dependencies.
+
+The following types are `@Serializable`:
+
+- [`ChatMessage`](./messages-content#chatmessage)
+- [`ChatMessageContent`](./messages-content#chatmessagecontent)
+- [`LeapFunctionCall`](./function-calling)
+- [`Manifest`](./model-loading#manifest)
+
+### Serializing and Deserializing Conversation History
+
+**Add kotlinx.serialization to your project:**
+
+```kotlin
+// app/build.gradle.kts
+plugins {
+    id("org.jetbrains.kotlin.plugin.serialization") version "2.3.10"
+}
+
+dependencies {
+    implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.7.3")
+}
+```
+
+**Save conversation history:**
+
+```kotlin
+import kotlinx.serialization.json.Json
+import kotlinx.serialization.encodeToString
+
+val json = Json { ignoreUnknownKeys = true }
+
+// Serialize to JSON string
+val jsonString = json.encodeToString(conversation.history)
+
+// Save to SharedPreferences, file, database, etc.
+sharedPreferences.edit().putString("conversation_history", jsonString).apply()
+```
+
+**Restore conversation history:**
+
+```kotlin
+import kotlinx.serialization.decodeFromString
+
+// Load from storage
+val jsonString = sharedPreferences.getString("conversation_history", null)
+
+if (jsonString != null) {
+    val history = json.decodeFromString<List<ChatMessage>>(jsonString)
+    val restoredConversation = modelRunner.createConversationFromHistory(history)
+}
+```
+
+**Serialize a single message:**
+
+```kotlin
+val message = ChatMessage(
+    role = ChatMessage.Role.USER,
+    content = listOf(ChatMessageContent.Text("Hello"))
+)
+
+val messageJson = json.encodeToString(message)
+```
+
+</Tab>
+
+<Tab title="Swift">
+
+The LEAP SDK provides two approaches for serializing conversation history: `exportToJSON()` and Swift's `Codable` protocol.
+
+### Using exportToJSON
+
+```swift
+import Foundation
+
+// Save conversation
+func saveConversation() throws {
+    let jsonArray = try conversation.exportToJSON()
+    let data = try JSONSerialization.data(withJSONObject: jsonArray)
+    try data.write(to: conversationFileURL)
+}
+
+// Restore conversation
+func restoreConversation() throws {
+    let data = try Data(contentsOf: conversationFileURL)
+    if let history = try JSONSerialization.jsonObject(with: data) as? [[String: Any]] {
+        conversation = modelRunner.createConversationFromHistory(history: history)
+    }
+}
+```
+
+### Using Codable
+
+`ChatMessage` conforms to `Codable`, so you can use `JSONEncoder` and `JSONDecoder` directly:
+
+```swift
+func saveWithCodable() throws {
+    let encoder = JSONEncoder()
+    let data = try encoder.encode(conversation.history)
+    try data.write(to: conversationFileURL)
+}
+
+func restoreWithCodable() throws {
+    let data = try Data(contentsOf: conversationFileURL)
+    let decoder = JSONDecoder()
+    let history = try decoder.decode([ChatMessage].self, from: data)
+    conversation = modelRunner.createConversationFromHistory(history: history)
+}
+```
+
+</Tab>
+
+</Tabs>
+
+## LeapModelDownloader
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+<Info>
+`LeapModelDownloader` is the **recommended** option for Android applications. It provides background downloads with WorkManager, foreground service notifications, and robust handling of network interruptions.
+</Info>
+
+The LeapSDK Android Model Downloader module is a production-ready helper for downloading models from the LEAP Model Library on Android. It runs as a [foreground service](https://developer.android.com/develop/background-work/services/fgs) and displays notifications to users during downloads.
+
+### Permission Setup
+
+The model downloader requires notification permissions to display download progress. You need to:
+
+1. **Add permissions to AndroidManifest.xml**:
+```xml
+<uses-permission android:name="android.permission.INTERNET" />
+<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />
+<uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
+<uses-permission android:name="android.permission.FOREGROUND_SERVICE_DATA_SYNC" />
+```
+
+2. **Request notification permission at runtime** (Android 13+):
+```kotlin
+// In your Activity
+private val requestPermissionLauncher = registerForActivityResult(
+    ActivityResultContracts.RequestPermission()
+) { isGranted: Boolean ->
+    if (isGranted) {
+        Log.d(TAG, "Notification permission granted")
+        // Proceed with download
+    } else {
+        Log.w(TAG, "Notification permission denied")
+        // Handle permission denial
+    }
+}
+
+// Before downloading
+if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
+    if (ContextCompat.checkSelfPermission(
+            this,
+            android.Manifest.permission.POST_NOTIFICATIONS
+        ) != PackageManager.PERMISSION_GRANTED
+    ) {
+        requestPermissionLauncher.launch(android.Manifest.permission.POST_NOTIFICATIONS)
+    }
+}
+```
+
+### Installation
+
+```kotlin
+// In build.gradle.kts
+dependencies {
+  implementation("ai.liquid.leap:leap-sdk:0.9.7")
+  implementation("ai.liquid.leap:leap-model-downloader:0.9.7")
+}
+```
+
+### Basic Usage
+
+```kotlin
+import ai.liquid.leap.model_downloader.LeapModelDownloader
+import ai.liquid.leap.model_downloader.LeapModelDownloaderNotificationConfig
+
+// Initialize (in onCreate or similar)
+val modelDownloader = LeapModelDownloader(
+    context,
+    notificationConfig = LeapModelDownloaderNotificationConfig.build {
+        notificationTitleDownloading = "Downloading AI model..."
+        notificationTitleDownloaded = "Model ready!"
+        notificationContentDownloading = "Please wait..."
+    }
+)
+
+// Download and load model
+lifecycleScope.launch {
+    try {
+        val modelRunner = modelDownloader.loadModel(
+            modelSlug = "LFM2-1.2B",
+            quantizationSlug = "Q5_K_M",
+            progress = { progressData ->
+                Log.d(TAG, "Progress: ${progressData.progress * 100}%")
+            }
+        )
+        // Model is ready to use
+    } catch (e: Exception) {
+        Log.e(TAG, "Failed to load model: ${e.message}")
+    }
+}
+```
+
+### API Reference
+
+#### LeapModelDownloader
+
+`LeapModelDownloader` is the instance to make request of downloading models and to query the status of a model download request.
+
+```kotlin
+class LeapModelDownloader(
+    private val context: Context,
+    modelFileDir: File? = null,
+    private val extraHTTPRequestHeaders: Map<String, String> = mapOf(),
+    private val notificationConfig: LeapModelDownloaderNotificationConfig = LeapModelDownloaderNotificationConfig(),
+) {
+  fun getModelFile(model: DownloadableModel): File
+  fun requestDownloadModel(model: DownloadableModel, forceDownload: Boolean = false)
+  fun requestStopDownload(model: DownloadableModel)
+  suspend fun queryStatus(model: DownloadableModel): ModelDownloadStatus
+  fun requestStopService()
+}
+```
+
+**Constructor parameters:**
+
+- `context`: The Android context to retrieve cache directory and launch services. The activity context works for this purpose.
+- `modelFileDir`: The path to store model files. If not set, a path in the app's external file dir will be used.
+- `extraHTTPRequestHeaders`: Any extra HTTP request headers to send when downloading a model.
+- `notificationConfig`: Configuration for the content of Android notifications visible to the users.
+
+**getModelFile** -- Returns a file object of the model file based on the `DownloadableModel` instance. The file may not exist.
+
+**requestDownloadModel** -- Makes a request to download the model. If the model file already exists locally, it will not be downloaded.
+
+- `model`: A [`DownloadableModel`](#downloadablemodel) instance.
+- `forceDownload`: If true, the downloader will remove the model bundle file that exists locally and re-download it.
+
+**requestStopDownload** -- Makes a request to stop downloading a model.
+
+**queryStatus** -- Queries the status of the model. Returns a [`ModelDownloadStatus`](#modeldownloadstatus) object.
+
+**requestStopService** -- Makes a request to stop the foreground service of the model downloader.
+
+#### DownloadableModel
+
+`DownloadableModel` is an interface describing a model that can be downloaded by the LeapSDK Model Downloader.
+
+```kotlin
+interface DownloadableModel {
+  val uri: Uri
+  val name: String
+  val localFilename: String
+}
+```
+
+- `uri`: The URI of the model to download.
+- `name`: A user-friendly name of the model. It will be displayed in the notification.
+- `localFilename`: The filename to store the model bundle file locally.
+
+#### LeapDownloadableModel
+
+`LeapDownloadableModel` implements [`DownloadableModel`](#downloadablemodel). It is designed to download models from the LEAP Model Library. The `resolve` method retrieves the model from the LEAP Model Library.
+
+```kotlin
+class LeapDownloadableModel {
+  companion object {
+    suspend fun resolve(modelSlug: String, quantizationSlug: String) : LeapDownloadableModel?
+  }
+}
+```
+
+The `resolve` method accepts two parameters:
+
+- `modelSlug`: The model slug that identifies the model. It is usually the lowercase string of the model name. For example, the slug of `LFM2-1.2B` is `lfm2-1.2b`.
+- `quantizationSlug`: The model quantization slug. It can be found in the "Available quantizations" section of the model card.
+
+#### ModelDownloadStatus
+
+```kotlin
+sealed interface ModelDownloadStatus {
+  data object NotOnLocal: ModelDownloadStatus
+  data class DownloadInProgress(
+    val totalSizeInBytes: Long,
+    val downloadedSizeInBytes: Long,
+  ): ModelDownloadStatus
+  data class Downloaded(
+    val totalSizeInBytes: Long,
+  ) : ModelDownloadStatus
+}
+```
+
+There are three possible value types:
+
+- `NotOnLocal`: The model file has not been downloaded or has already been deleted.
+- `DownloadInProgress`: The model file is still being downloaded. `totalSizeInBytes` is the total size of the file and `downloadedSizeInBytes` is the size of the downloaded portion. If the total size is not available, `totalSizeInBytes` will be -1.
+- `Downloaded`: The file has been downloaded. `totalSizeInBytes` is the file size.
+
+</Tab>
+
+<Tab title="Swift">
+
+<Info>
+`LeapModelDownloader` is the **recommended** option for Apple platforms. It supports background downloads and provides utilities for querying download status and managing cached models.
+</Info>
+
+### Installation
+
+Add the `LeapModelDownloader` product when adding the LEAP SDK to your project:
+
+- **Swift Package Manager**: Select both `LeapSDK` and `LeapModelDownloader` when adding `https://github.com/Liquid4All/leap-ios.git`.
+- **Manual**: Download `LeapModelDownloader.xcframework.zip` from the [GitHub releases](https://github.com/Liquid4All/leap-ios/releases), unzip, and embed in your target.
+
+### Basic Usage
+
+```swift
+import LeapModelDownloader
+
+let downloader = ModelDownloader()
+
+// Download model to cache
+let manifest = try await downloader.downloadModel(
+    "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+) { progress, speed in
+    print("Progress: \(Int(progress * 100))%")
+}
+
+// Later, load from cache (no download needed)
+let modelRunner = try await Leap.load(
+    model: "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+)
+```
+
+### Query Status
+
+```swift
+let status = downloader.queryStatus("LFM2.5-1.2B-Instruct", quantization: "Q4_K_M")
+
+switch status {
+case .notOnLocal:
+    print("Model not downloaded")
+case .downloadInProgress(let progress):
+    print("Downloading: \(Int(progress * 100))%")
+case .downloaded:
+    print("Model ready")
+}
+```
+
+### Remove a Downloaded Model
+
+```swift
+try downloader.removeModel("LFM2.5-1.2B-Instruct", quantization: "Q4_K_M")
+```
+
+### Get Model Size
+
+Check the model size before downloading:
+
+```swift
+let sizeInBytes = try await downloader.getModelSize(
+    modelName: "LFM2.5-1.2B-Instruct",
+    quantization: "Q4_K_M"
+)
+print("Model size: \(sizeInBytes / 1_000_000) MB")
+```
+
+### Get Available Disk Space
+
+```swift
+if let freeSpace = downloader.getAvailableDiskSpace() {
+    print("Free space: \(freeSpace / 1_000_000_000) GB")
+}
+```
+
+### Cancel an Ongoing Download
+
+```swift
+downloader.requestStopDownload(model)
+```
+
+</Tab>
+
+</Tabs>
+
+## Putting It Together
+
+<Tabs>
+
+<Tab title="Kotlin">
+
+```kotlin
+val modelRunner = modelDownloader.loadModel(
+    modelSlug = "LFM2-1.2B",
+    quantizationSlug = "Q5_K_M"
+)
+
+val conversation = modelRunner.createConversation(
+    systemPrompt = "You are a travel assistant."
+)
+
+conversation.registerFunction(weatherFunction)
+
+val options = GenerationOptions(temperature = 0.8f)
+
+val userMessage = ChatMessage(
+    role = ChatMessage.Role.USER,
+    content = listOf(ChatMessageContent.Text("Plan a 3-day trip to Kyoto with food highlights"))
+)
+
+conversation.generateResponse(
+    message = userMessage,
+    generationOptions = options
+).collect { response ->
+    process(response)
+}
+```
+
+Refer to the [Quick Start](./quick-start-guide) for end-to-end project setup, [Function Calling](./function-calling) for tool invocation, and [Constrained Generation](./advanced-features) for structured outputs.
+
+</Tab>
+
+<Tab title="Swift">
+
+```swift
+let runner = try await Leap.load(url: bundleURL)
+let conversation = runner.createConversation(systemPrompt: "You are a travel assistant.")
+
+conversation.registerFunction(weatherFunction)
+
+var options = GenerationOptions(temperature: 0.8)
+try options.setResponseFormat(type: TripRecommendation.self)
+
+let userMessage = ChatMessage(
+  role: .user,
+  content: [.text("Plan a 3-day trip to Kyoto with food highlights")]
+)
+
+for try await response in conversation.generateResponse(
+  message: userMessage,
+  generationOptions: options
+) {
+  process(response)
+}
+```
+
+Refer to the [Quick Start](./quick-start-guide) for end-to-end project setup, [Function Calling](./function-calling) for tool invocation, and [Constrained Generation](./advanced-features) for structured outputs.
+
+</Tab>
+
+</Tabs>
diff --git a/docs.json b/docs.json
index 76d9ade..01fcee6 100644
--- a/docs.json
+++ b/docs.json
@@ -140,7 +140,23 @@
             "icon": "mobile",
             "pages": [
               {
-                "group": "iOS SDK",
+                "group": "LEAP SDK",
+                "icon": "microchip-ai",
+                "pages": [
+                  "deployment/on-device/leap-sdk/quick-start-guide",
+                  "deployment/on-device/leap-sdk/ai-agent-usage-guide",
+                  "deployment/on-device/leap-sdk/model-loading",
+                  "deployment/on-device/leap-sdk/conversation-generation",
+                  "deployment/on-device/leap-sdk/messages-content",
+                  "deployment/on-device/leap-sdk/advanced-features",
+                  "deployment/on-device/leap-sdk/utilities",
+                  "deployment/on-device/leap-sdk/cloud-ai-comparison",
+                  "deployment/on-device/leap-sdk/constrained-generation",
+                  "deployment/on-device/leap-sdk/function-calling"
+                ]
+              },
+              {
+                "group": "iOS SDK (Legacy)",
                 "icon": "apple",
                 "pages": [
                   "deployment/on-device/ios/ios-quick-start-guide",
@@ -156,7 +172,7 @@
                 ]
               },
               {
-                "group": "Android SDK",
+                "group": "Android SDK (Legacy)",
                 "icon": "robot",
                 "pages": [
                   "deployment/on-device/android/android-quick-start-guide",
@@ -351,15 +367,15 @@
     },
     {
       "source": "/leap/edge-sdk/overview",
-      "destination": "/deployment/on-device/ios/ios-quick-start-guide"
+      "destination": "/deployment/on-device/leap-sdk/quick-start-guide"
     },
     {
       "source": "/leap/edge-sdk/ios/:slug*",
-      "destination": "/deployment/on-device/ios/:slug*"
+      "destination": "/deployment/on-device/leap-sdk/:slug*"
     },
     {
       "source": "/leap/edge-sdk/android/:slug*",
-      "destination": "/deployment/on-device/android/:slug*"
+      "destination": "/deployment/on-device/leap-sdk/:slug*"
     },
     {
       "source": "/leap/leap-bundle/:slug*",
diff --git a/leap/edge-sdk/overview.mdx b/leap/edge-sdk/overview.mdx
index fa91a45..619cd4a 100644
--- a/leap/edge-sdk/overview.mdx
+++ b/leap/edge-sdk/overview.mdx
@@ -1,39 +1,41 @@
 ---
 title: "Overview"
-description: "The LEAP Edge SDK is a native framework for running LFMs (and other open source models) on mobile devices."
+description: "The LEAP SDK is a Kotlin Multiplatform framework for running LFMs (and other open source models) on-device across Android, iOS, macOS, JVM, and more."
 ---
 
-## Improving access[​](#improving-access "Direct link to Improving access")
+## Improving Access
 
-Up until now, deploying small language models (SLMs) on mobile devices has been an extremely painful process, generally accessible to only inference engineers or AI/ML programmers.
+Up until now, deploying small language models (SLMs) on mobile and edge devices has been an extremely painful process, generally accessible only to inference engineers or AI/ML programmers.
 
-Written for Android (Kotlin) and iOS (Swift), the goal of the Edge SDK is to make SLM deployment as easy as calling a cloud LLM API endpoint - for any app developer.
+Built with **Kotlin Multiplatform**, the LEAP SDK provides a unified API across Android, iOS, macOS, JVM, and more — making SLM deployment as easy as calling a cloud LLM API endpoint, for any app developer.
 
-## Get started[​](#get-started "Direct link to Get started")
+## Get Started
 
-Choose your platform to get started
+<Card title="LEAP SDK Quick Start" icon="rocket" href="/deployment/on-device/leap-sdk/quick-start-guide">
+  Get started with the LEAP SDK. Install via Gradle (Kotlin) or Swift Package Manager (Swift), load models, and start generating content on any supported platform.
+</Card>
 
-<CardGroup cols={2}>
-  <Card title="iOS" icon="apple" href="./ios/ios-quick-start-guide">
-    Get started with the LEAP Edge SDK for iOS using Swift. Deploy models directly in your iOS app.
-  </Card>
+## Supported Platforms
 
-  <Card title="Android" icon="robot" href="./android/android-quick-start-guide">
-    Get started with the LEAP Edge SDK for Android using Kotlin. Deploy models directly in your Android app.
-  </Card>
-</CardGroup>
+| Platform | Language | Status |
+|----------|----------|--------|
+| **Android** | Kotlin | Production-ready |
+| **iOS** | Swift | Production-ready |
+| **macOS** | Swift | Production-ready |
+| **JVM/Desktop** | Kotlin | In testing |
+| **Linux** | Kotlin/Native | In testing |
+| **Windows** | Kotlin/Native | In testing |
+| **Web (WASM)** | Kotlin/JS | In testing |
 
-## Features[​](#features "Direct link to Features")
+## Features
 
-The current list of main features includes:
+- **Model downloading** — automatic download and caching from the LEAP Model Library
+- **Chat completion** — streaming text generation with conversation history
+- **Multimodal input** — image and audio support for compatible models
+- **Audio generation** — text-to-speech output from audio models
+- **Constrained generation** — structured JSON output with compile-time schema validation
+- **Function calling** — tool use with automatic parsing of model requests
+- **Reasoning models** — support for thinking/reasoning token streams
+- **Cross-platform** — single codebase, multiple platforms via Kotlin Multiplatform
 
-* Model downloading service
-* Chat completion (generation)
-* Constrained generation
-* Function calling
-* Gson support (Android)
-* Image support (for LFM2-VL)
-
-We are consistently adding to this list - see our [changelog](/leap/changelog) for detailed updates.
-
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/deployment/on-device/ios/ios-quick-start-guide.mdx)
+We are consistently adding to this list — see our [changelog](/leap/changelog) for detailed updates.