ZeticMLangeLLMModel
API reference for running LLM inference on Android with ZeticMLangeLLMModel.
The ZeticMLangeLLMModel class provides on-device LLM inference for Android. It handles model downloading, quantization selection, KV cache management, and token-by-token streaming through a single unified API.
Package
com.zeticai.mlange.core.model.llmImport
import com.zeticai.mlange.core.model.llm.ZeticMLangeLLMModelConstructors
Automatic Configuration (Recommended)
Creates a new LLM model instance with default settings optimized for the device.
ZeticMLangeLLMModel(context: Context, personalKey: String, name: String)| Parameter | Type | Description |
|---|---|---|
context | Context | The Android application or activity context. |
personalKey | String | Your personal authentication key from the Melange Dashboard. |
name | String | Identifier for the model. Accepts a pre-built model key or a Hugging Face repository ID. |
val model = ZeticMLangeLLMModel(this, "YOUR_PERSONAL_KEY", "YOUR_MODEL_NAME")The constructor performs a network call on first use to download the model binary. Ensure you call it from a background thread or handle threading appropriately. The binary is cached locally after the first download.
Custom Configuration (Advanced)
Creates a new LLM model instance with full control over inference mode, dataset evaluation, KV cache policy, and download progress reporting.
ZeticMLangeLLMModel(
context: Context,
personalKey: String,
name: String,
version: Int? = null,
modelMode: LLMModelMode = LLMModelMode.RUN_SPEED,
dataSetType: LLMDataSetType = LLMDataSetType.NONE,
kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy = LLMKVCacheCleanupPolicy.CLEAN_UP_ON_FULL,
onProgress: ((Float) -> Unit)? = null
)| Parameter | Type | Default | Description |
|---|---|---|---|
context | Context | - | The Android application or activity context. |
personalKey | String | - | Your personal authentication key. |
name | String | - | Identifier for the model to download. |
version | Int? | null | Specific model version. If null, the latest version is used. |
modelMode | LLMModelMode | RUN_SPEED | Inference mode for backend selection. See LLMModelMode. |
dataSetType | LLMDataSetType | NONE | Dataset used for accuracy-based model selection. See LLMDataSetType. |
kvCacheCleanupPolicy | LLMKVCacheCleanupPolicy | CLEAN_UP_ON_FULL | Policy for handling KV cache when full. See LLMKVCacheCleanupPolicy. |
onProgress | ((Float) -> Unit)? | null | Callback reporting download progress as a Float from 0.0 to 1.0. |
val model = ZeticMLangeLLMModel(
this,
personalKey = "YOUR_PERSONAL_KEY",
name = "google/gemma-3-4b-it",
version = null,
modelMode = LLMModelMode.RUN_SPEED,
dataSetType = LLMDataSetType.NONE,
kvCacheCleanupPolicy = LLMKVCacheCleanupPolicy.CLEAN_UP_ON_FULL,
onProgress = { progress ->
println("Download progress: ${(progress * 100).toInt()}%")
}
)Methods
run(prompt)
Starts a generation context with the provided prompt. After calling run(), consume generated tokens by calling waitForNextToken() in a loop.
fun run(prompt: String): LLMRunResult| Parameter | Type | Description |
|---|---|---|
prompt | String | The input prompt to begin generation. |
Returns: LLMRunResult: The initial run result.
waitForNextToken()
Blocks until the next token is generated and returns it. Call this in a loop after run() to stream the full response.
fun waitForNextToken(): LLMNextTokenResultReturns: LLMNextTokenResult with the following properties:
| Property | Type | Description |
|---|---|---|
token | String | The generated token text. May be empty for special tokens. |
generatedTokens | Int | Number of tokens generated so far. Returns 0 when generation is complete. |
cleanUp()
Cleans up the model's context and releases resources. Required when using DO_NOT_CLEAN_UP KV cache policy before starting a new conversation.
fun cleanUp()When using LLMKVCacheCleanupPolicy.DO_NOT_CLEAN_UP, you must call cleanUp() before calling run() again. Failing to do so may cause unexpected behavior.
Compatible Model Inputs
ZeticMLangeLLMModel accepts two types of model identifiers for the name parameter:
- Pre-built models from the Melange Dashboard: select a ready-to-use model and use its key.
- Hugging Face repository IDs: pass the full repo ID directly (e.g.,
google/gemma-3-4b-it,LiquidAI/LFM2.5-1.2B-Instruct).
Currently supports public Hugging Face repositories with permissive open-source licenses. Private repository authentication is on the roadmap.
Full Working Example
import com.zeticai.mlange.core.model.llm.ZeticMLangeLLMModel
// (1) Initialize the model
val model = ZeticMLangeLLMModel(context, PERSONAL_KEY, "google/gemma-3-4b-it")
// (2) Start generation
model.run("Explain on-device AI in one paragraph.")
// (3) Stream tokens
val sb = StringBuilder()
while (true) {
val waitResult = model.waitForNextToken()
val token = waitResult.token
val generatedTokens = waitResult.generatedTokens
if (generatedTokens == 0) break
if (token.isNotEmpty()) sb.append(token)
}
val output = sb.toString()
// (4) Clean up when done
model.cleanUp()See Also
- ZeticMLangeLLMModel (iOS): iOS equivalent
- LLM Inference Modes: Choosing the right inference mode
- Enums and Constants: All enum types used by this class
- Android Integration Guide: Step-by-step setup guide