ZeticMLangeModel

This page reflects ZeticMLange Android 1.6.1.

The ZeticMLangeModel class is the primary interface for running on-device AI inference on Android. It handles model downloading, NPU context initialization, and hardware-accelerated execution through a single unified API.

Package

com.zeticai.mlange.core.model

Import

import com.zeticai.mlange.core.model.ZeticMLangeModel

Constructors

ZeticMLangeModel exposes three constructors. Use the default (automatic) constructor unless you need to pin the runtime backend or processor:

Default (Automatic Selection) — selects the optimal runtime via ModelMode. Available on all tiers.
Explicit Target — pins the runtime backend. Requires Lite tier or higher.
Explicit Target + APType — pins both the runtime backend and the application processor. Requires Lite tier or higher.

Default (Automatic Selection)

Creates a new model instance using automatic runtime selection. All parameters after name have defaults, so the shortest call takes only context, personalKey, and name.

@JvmOverloads
ZeticMLangeModel(
    context: Context,
    personalKey: String,
    name: String,
    version: Int? = null,
    modelMode: ModelMode = ModelMode.RUN_AUTO,
    quantType: QuantType? = null,
    onProgress: ((Float) -> Unit)? = null,
    onStatusChanged: ((ModelLoadingStatus) -> Unit)? = null,
    cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
)

Parameter	Type	Default	Description
`context`	`Context`	—	Android application or activity context. Used for file storage and native library loading.
`personalKey`	`String`	—	Your personal authentication key. See Personal Key.
`name`	`String`	—	Full model identifier in `account_name/project_name` format (e.g., `"Steve/YOLOv11_comparison"`).
`version`	`Int?`	`null`	Specific model version to load. `null` uses the latest.
`modelMode`	`ModelMode`	`RUN_AUTO`	Inference strategy used by automatic selection. See Enums.
`quantType`	`QuantType?`	`null`	Quantization precision filter (`FP32`, `FP16`, `INT`). When set, only targets matching this precision are considered during automatic selection. `null` disables precision-based filtering. See Enums → QuantType.
`onProgress`	`((Float) -> Unit)?`	`null`	Optional download progress callback from `0.0` to `1.0`.
`onStatusChanged`	`((ModelLoadingStatus) -> Unit)?`	`null`	Loading status callback (download, extraction, etc.).
`cacheHandlingPolicy`	`ModelCacheHandlingPolicy`	`REMOVE_OVERLAPPING`	Managed artifact cache policy. See Cache Management.

Throws: ZeticMLangeException if name is not in account_name/project_name format, or a runtime error if the model cannot be downloaded or initialized.

val model = ZeticMLangeModel(this, PERSONAL_KEY, "Steve/YOLOv11_comparison")

val model = ZeticMLangeModel(
    context = this,
    personalKey = PERSONAL_KEY,
    name = "Steve/YOLOv11_comparison",
    modelMode = ModelMode.RUN_AUTO,
    quantType = QuantType.FP16,   // only consider FP16 targets during automatic selection
    onProgress = { progress -> Log.d("Melange", "downloading: $progress") },
)

Free-tier NPU availability. Under the Free tier, NPU execution is only permitted on Galaxy S25 and S25 Ultra. Other devices automatically fall back to CPU/GPU. Upgrade to Lite tier or higher to use explicit Target / APType selection on all supported devices.

The constructor performs a network call on first use to download the model binary. Call it from a background thread. The binary is cached locally after the first download.

Explicit `Target`

Pins the runtime backend (e.g., QNN, LiteRT, TFLite, Exynos) instead of relying on ModelMode's automatic pick.

@JvmOverloads
ZeticMLangeModel(
    context: Context,
    personalKey: String,
    name: String,
    version: Int? = null,
    target: Target,
    onProgress: ((Float) -> Unit)? = null,
    onStatusChanged: ((ModelLoadingStatus) -> Unit)? = null,
    cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
)

Parameter	Type	Default	Description
`target`	`Target`	—	Runtime target (e.g., `Target.ZETIC_MLANGE_TARGET_QNN`, `Target.ZETIC_MLANGE_TARGET_LITERT_FP16`).

Other parameters match the default constructor above.

val model = ZeticMLangeModel(
    context = this,
    personalKey = PERSONAL_KEY,
    name = "Steve/YOLOv11_comparison",
    target = Target.ZETIC_MLANGE_TARGET_QNN,
)

Requires a Lite tier or higher subscription. Free-tier keys cannot use explicit Target selection.

Explicit `Target` + `APType`

Pins both the runtime backend and the application processor (CPU/GPU/NPU).

@JvmOverloads
ZeticMLangeModel(
    context: Context,
    personalKey: String,
    name: String,
    version: Int? = null,
    target: Target,
    apType: APType,
    onProgress: ((Float) -> Unit)? = null,
    onStatusChanged: ((ModelLoadingStatus) -> Unit)? = null,
    cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
)

Parameter	Type	Default	Description
`target`	`Target`	—	Runtime target.
`apType`	`APType`	—	Application processor: `CPU`, `GPU`, or `NPU`.

Other parameters match the default constructor above.

val model = ZeticMLangeModel(
    context = this,
    personalKey = PERSONAL_KEY,
    name = "Steve/YOLOv11_comparison",
    target = Target.ZETIC_MLANGE_TARGET_QNN,
    apType = APType.NPU,
)

Requires a Lite tier or higher subscription.

Methods

`run(inputs)`

Executes inference on the loaded model using the provided input tensors.

fun run(inputs: Array<Tensor> = emptyArray()): Array<Tensor>

Parameter	Type	Description
`inputs`	`Array<Tensor>`	(Optional) Input tensors matching the model's expected shapes and data types. When provided, each tensor's bytes are `memcpy`'d into the model's own `getInputBufferAt(i)` before running. When empty, inference runs against whatever is already loaded in the model-owned input buffers (useful in combination with `getInputBuffers()`).

Returns: Array<Tensor>: The model's output tensors. These wrap model-owned buffers that are reused across calls — do not hold them past the next run() if you need a snapshot.

Throws: RuntimeException if input shapes do not match the model's expected inputs, or if inference execution fails.

val outputs = model.run(inputs)

run(inputs) and run() both execute against getInputBufferAt(i) — the only difference is the byte copy. For hot loops, prefer getInputBuffers() + run() without arguments to skip that copy entirely.

`getInputBuffers()`

Returns the model's internal input tensors so you can fill them in place instead of allocating new tensors for every inference call.

fun getInputBuffers(): Array<Tensor>

Returns: Array<Tensor>: Tensors wrapping the model-owned input buffers. Their lifetime is tied to the model — they are valid until close() is called.

val buffers = model.getInputBuffers()
buffers[0].from(sourceTensor) // copies source bytes into the model's input buffer
val outputs = model.run()     // no inputs argument needed — buffers are already filled

getInputBuffers() is the allocation-free path for hot loops (e.g. per-frame camera inference). Fill the returned tensors directly and call run() with no arguments.

`close()`

Releases the underlying native model and its input/output buffers.

override fun close()

After close(), any tensor returned from getInputBuffers() or a prior run() call is invalid. ZeticMLangeModel implements Closeable, so you can use it with use { ... }.

Memory Management Recommendation

Tensors returned from getInputBuffers() and the output of run() wrap model-owned direct buffers. Prefer using those over allocating your own tensors:

Model-owned (preferred): fill inputs via getInputBuffers()[i].from(...) / .copy(...), read outputs directly from run()'s return. No extra off-heap allocation per frame; the buffers are released once when close() runs. Leave these tensors with the default immediateRelease = false.
User-allocated: if you build inputs with Tensor.of(...) (e.g. converting a FloatArray into a new tensor each frame) and pass them to run(), the model copies your bytes into its own input buffer and your tensor is no longer needed. In that case set immediateRelease = true so TensorCleaner reclaims the off-heap memory promptly.

Never set immediateRelease = true on tensors obtained from getInputBuffers() or run(). Those tensors share memory with the model; if the cleaner fires, the model's next run() will read from already-freed native memory and crash.

See Tensor — Memory Management Recommendation for the full discussion.

Full Working Example

import com.zeticai.mlange.core.model.ZeticMLangeModel
import com.zeticai.mlange.core.tensor.Tensor

class MainActivity : AppCompatActivity() {
    private lateinit var model: ZeticMLangeModel
    private lateinit var inputBuffers: Array<Tensor>

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        // (1) Load model
        // Downloads the optimized binary on first run, then caches locally
        model = ZeticMLangeModel(this, PERSONAL_KEY, "Steve/YOLOv11_comparison")

        // (2) Grab the model-owned input buffers once — reuse for every frame
        inputBuffers = model.getInputBuffers()
    }

    private fun runFrame(sourceTensors: Array<Tensor>) {
        // (3) Fill the model's input buffers in place (no extra allocation)
        for (i in inputBuffers.indices) {
            inputBuffers[i].from(sourceTensors[i])
        }

        // (4) Run inference — no need to pass inputs again
        val outputs = model.run()

        // (5) Outputs wrap model-owned buffers, reused across calls
        for (output in outputs) {
            // Process each output tensor before the next run()
        }
    }
}

Always ensure your input tensor shapes exactly match what the model expects. A shape mismatch will throw a RuntimeException. Check the model's input specification on the Melange Dashboard.

Gradle Setup

The ZeticMLangeModel class requires the Melange AAR dependency. Add the following to your app-level build.gradle:

android {
    ...
    packagingOptions {
        jniLibs {
            useLegacyPackaging true
        }
    }
}

dependencies {
    implementation("com.zeticai.mlange:mlange:1.6.1+")
}

The useLegacyPackaging true setting is required. Without it, the native JNI libraries for NPU acceleration will not load correctly. See Common Errors for details.