ZeticMLangeLLMModel

This page reflects ZeticMLange Android 1.9.0.

ZeticMLangeLLMModel loads an on-device LLM from the Melange registry and supports text generation, token streaming, function calling, image response for LFM-VL models, and KV state persistence.

Import

import com.zeticai.mlange.core.model.llm.ZeticMLangeLLMModel

Constructor

ZeticMLangeLLMModel(
    context: Context,
    personalKey: String,
    name: String,
    version: Int? = null,
    modelMode: LLMModelMode = LLMModelMode.RUN_AUTO,
    apType: APType? = null,
    quantType: LLMQuantType? = null,
    cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
    initOption: LLMInitOption = LLMInitOption(),
    onDownload: ((Float) -> Unit)? = null,
)

Parameter	Type	Default	Description
`context`	`Context`	-	Android context used for cache and file access.
`personalKey`	`String`	-	Personal key for accessing the model.
`name`	`String`	-	Model name in `account_name/project_name` format.
`version`	`Int?`	`null`	Model version. `null` loads the latest version.
`modelMode`	`LLMModelMode`	`RUN_AUTO`	Backend selection strategy.
`apType`	`APType?`	`null`	Optional processor filter.
`quantType`	`LLMQuantType?`	`null`	Optional quantization filter.
`cacheHandlingPolicy`	`ModelCacheHandlingPolicy`	`REMOVE_OVERLAPPING`	Managed artifact cache cleanup policy.
`initOption`	`LLMInitOption`	`LLMInitOption()`	LLM initialization options.
`onDownload`	`((Float) -> Unit)?`	`null`	Download progress callback from `0.0` to `1.0`.

val model = ZeticMLangeLLMModel(
    context = context,
    personalKey = PERSONAL_KEY,
    name = "account_name/project_name",
    initOption = LLMInitOption(nCtx = 4096),
)

Text Generation

`run(text)`

Starts generation for a prompt.

fun run(text: String): LLMRunResult

val result = model.run("Explain on-device AI in one paragraph.")

`waitForNextToken()`

Waits for the next generated token.

fun waitForNextToken(): LLMNextTokenResult

while (true) {
    val next = model.waitForNextToken()
    if (next.isFinal || next.token.isEmpty()) break
    append(next.token)
}

Vision-Language Response

Use respond(...) with an LFM-VL-capable model.

data class Image(
    val rgb: ByteArray,
    val width: Int,
    val height: Int,
)

fun respond(
    systemPrompt: String = "",
    userText: String,
    image: ZeticMLangeLLMModel.Image,
): Flow<String>

val image = ZeticMLangeLLMModel.Image(rgbBytes, width, height)

model.respond(
    systemPrompt = "Answer briefly.",
    userText = "What is in this image?",
    image = image,
).collect { token ->
    append(token)
}

Function Calling

var functionCallingSystemPrompt: String?

fun registerTool(spec: LLMToolSpec, executor: LLMToolExecutor)
fun unregisterTool(name: String): Boolean
fun clearTools()
fun registeredTools(): List<LLMToolSpec>

model.registerTool(
    LLMToolSpec(
        name = "lookup",
        description = "Look up local app data.",
        parametersJson = """{"type":"object","properties":{"query":{"type":"string"}}}""",
    ),
) { call ->
    LLMToolResult(content = """{"result":"Found"}""")
}

model.run("Use lookup to answer the question.")

KV State Persistence

fun saveKVState(path: String)
fun loadKVState(path: String)
fun resetKVState()

Use these APIs to persist or reset the current LLM state for resume flows.

Lifecycle

val isClosed: Boolean
fun cleanUp()
fun resetSession()
fun close()
fun deinit()

Call cleanUp() or resetSession() before starting a fresh conversation. Call close() when the model is no longer needed.

ZeticMLangeLLMModel

On this page