Melange
API ReferenceAndroid

ZeticMLangeLLMModel

API reference for running LLM inference on Android with ZeticMLangeLLMModel.

The ZeticMLangeLLMModel class provides on-device LLM inference for Android. It handles model downloading, quantization selection, KV cache management, and token-by-token streaming through a single unified API.

Package

com.zeticai.mlange.core.model.llm

Import

import com.zeticai.mlange.core.model.llm.ZeticMLangeLLMModel

Constructors

Creates a new LLM model instance with default settings optimized for the device.

ZeticMLangeLLMModel(context: Context, personalKey: String, name: String)
ParameterTypeDescription
contextContextThe Android application or activity context.
personalKeyStringYour personal authentication key from the Melange Dashboard.
nameStringIdentifier for the model. Accepts a pre-built model key or a Hugging Face repository ID.
val model = ZeticMLangeLLMModel(this, "YOUR_PERSONAL_KEY", "YOUR_MODEL_NAME")

The constructor performs a network call on first use to download the model binary. Ensure you call it from a background thread or handle threading appropriately. The binary is cached locally after the first download.


Custom Configuration (Advanced)

Creates a new LLM model instance with full control over inference mode, dataset evaluation, KV cache policy, and download progress reporting.

ZeticMLangeLLMModel(
    context: Context,
    personalKey: String,
    name: String,
    version: Int? = null,
    modelMode: LLMModelMode = LLMModelMode.RUN_SPEED,
    dataSetType: LLMDataSetType = LLMDataSetType.NONE,
    kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy = LLMKVCacheCleanupPolicy.CLEAN_UP_ON_FULL,
    onProgress: ((Float) -> Unit)? = null
)
ParameterTypeDefaultDescription
contextContext-The Android application or activity context.
personalKeyString-Your personal authentication key.
nameString-Identifier for the model to download.
versionInt?nullSpecific model version. If null, the latest version is used.
modelModeLLMModelModeRUN_SPEEDInference mode for backend selection. See LLMModelMode.
dataSetTypeLLMDataSetTypeNONEDataset used for accuracy-based model selection. See LLMDataSetType.
kvCacheCleanupPolicyLLMKVCacheCleanupPolicyCLEAN_UP_ON_FULLPolicy for handling KV cache when full. See LLMKVCacheCleanupPolicy.
onProgress((Float) -> Unit)?nullCallback reporting download progress as a Float from 0.0 to 1.0.
val model = ZeticMLangeLLMModel(
    this,
    personalKey = "YOUR_PERSONAL_KEY",
    name = "google/gemma-3-4b-it",
    version = null,
    modelMode = LLMModelMode.RUN_SPEED,
    dataSetType = LLMDataSetType.NONE,
    kvCacheCleanupPolicy = LLMKVCacheCleanupPolicy.CLEAN_UP_ON_FULL,
    onProgress = { progress ->
        println("Download progress: ${(progress * 100).toInt()}%")
    }
)

Methods

run(prompt)

Starts a generation context with the provided prompt. After calling run(), consume generated tokens by calling waitForNextToken() in a loop.

fun run(prompt: String): LLMRunResult
ParameterTypeDescription
promptStringThe input prompt to begin generation.

Returns: LLMRunResult: The initial run result.


waitForNextToken()

Blocks until the next token is generated and returns it. Call this in a loop after run() to stream the full response.

fun waitForNextToken(): LLMNextTokenResult

Returns: LLMNextTokenResult with the following properties:

PropertyTypeDescription
tokenStringThe generated token text. May be empty for special tokens.
generatedTokensIntNumber of tokens generated so far. Returns 0 when generation is complete.

cleanUp()

Cleans up the model's context and releases resources. Required when using DO_NOT_CLEAN_UP KV cache policy before starting a new conversation.

fun cleanUp()

When using LLMKVCacheCleanupPolicy.DO_NOT_CLEAN_UP, you must call cleanUp() before calling run() again. Failing to do so may cause unexpected behavior.


Compatible Model Inputs

ZeticMLangeLLMModel accepts two types of model identifiers for the name parameter:

  1. Pre-built models from the Melange Dashboard: select a ready-to-use model and use its key.
  2. Hugging Face repository IDs: pass the full repo ID directly (e.g., google/gemma-3-4b-it, LiquidAI/LFM2.5-1.2B-Instruct).

Currently supports public Hugging Face repositories with permissive open-source licenses. Private repository authentication is on the roadmap.


Full Working Example

import com.zeticai.mlange.core.model.llm.ZeticMLangeLLMModel

// (1) Initialize the model
val model = ZeticMLangeLLMModel(context, PERSONAL_KEY, "google/gemma-3-4b-it")

// (2) Start generation
model.run("Explain on-device AI in one paragraph.")

// (3) Stream tokens
val sb = StringBuilder()

while (true) {
    val waitResult = model.waitForNextToken()
    val token = waitResult.token
    val generatedTokens = waitResult.generatedTokens

    if (generatedTokens == 0) break

    if (token.isNotEmpty()) sb.append(token)
}

val output = sb.toString()

// (4) Clean up when done
model.cleanUp()

See Also