ZeticMLangeModel
Complete API reference for the ZeticMLangeModel class on Android.
This page reflects ZeticMLange Android 1.6.1.
The ZeticMLangeModel class is the primary interface for running on-device AI inference on Android. It handles model downloading, NPU context initialization, and hardware-accelerated execution through a single unified API.
Package
com.zeticai.mlange.core.modelImport
import com.zeticai.mlange.core.model.ZeticMLangeModelConstructors
ZeticMLangeModel exposes three constructors. Use the default (automatic) constructor unless you need to pin the runtime backend or processor:
- Default (Automatic Selection) — selects the optimal runtime via
ModelMode. Available on all tiers. - Explicit
Target— pins the runtime backend. Requires Lite tier or higher. - Explicit
Target+APType— pins both the runtime backend and the application processor. Requires Lite tier or higher.
Default (Automatic Selection)
Creates a new model instance using automatic runtime selection. All parameters after name have defaults, so the shortest call takes only context, personalKey, and name.
@JvmOverloads
ZeticMLangeModel(
context: Context,
personalKey: String,
name: String,
version: Int? = null,
modelMode: ModelMode = ModelMode.RUN_AUTO,
quantType: QuantType? = null,
onProgress: ((Float) -> Unit)? = null,
onStatusChanged: ((ModelLoadingStatus) -> Unit)? = null,
cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
)| Parameter | Type | Default | Description |
|---|---|---|---|
context | Context | — | Android application or activity context. Used for file storage and native library loading. |
personalKey | String | — | Your personal authentication key. See Personal Key. |
name | String | — | Full model identifier in account_name/project_name format (e.g., "Steve/YOLOv11_comparison"). |
version | Int? | null | Specific model version to load. null uses the latest. |
modelMode | ModelMode | RUN_AUTO | Inference strategy used by automatic selection. See Enums. |
quantType | QuantType? | null | Quantization precision filter (FP32, FP16, INT). When set, only targets matching this precision are considered during automatic selection. null disables precision-based filtering. See Enums → QuantType. |
onProgress | ((Float) -> Unit)? | null | Optional download progress callback from 0.0 to 1.0. |
onStatusChanged | ((ModelLoadingStatus) -> Unit)? | null | Loading status callback (download, extraction, etc.). |
cacheHandlingPolicy | ModelCacheHandlingPolicy | REMOVE_OVERLAPPING | Managed artifact cache policy. See Cache Management. |
Throws: ZeticMLangeException if name is not in account_name/project_name format, or a runtime error if the model cannot be downloaded or initialized.
val model = ZeticMLangeModel(this, PERSONAL_KEY, "Steve/YOLOv11_comparison")val model = ZeticMLangeModel(
context = this,
personalKey = PERSONAL_KEY,
name = "Steve/YOLOv11_comparison",
modelMode = ModelMode.RUN_AUTO,
quantType = QuantType.FP16, // only consider FP16 targets during automatic selection
onProgress = { progress -> Log.d("Melange", "downloading: $progress") },
)Free-tier NPU availability. Under the Free tier, NPU execution is only permitted on Galaxy S25 and S25 Ultra. Other devices automatically fall back to CPU/GPU. Upgrade to Lite tier or higher to use explicit Target / APType selection on all supported devices.
The constructor performs a network call on first use to download the model binary. Call it from a background thread. The binary is cached locally after the first download.
Explicit Target
Pins the runtime backend (e.g., QNN, LiteRT, TFLite, Exynos) instead of relying on ModelMode's automatic pick.
@JvmOverloads
ZeticMLangeModel(
context: Context,
personalKey: String,
name: String,
version: Int? = null,
target: Target,
onProgress: ((Float) -> Unit)? = null,
onStatusChanged: ((ModelLoadingStatus) -> Unit)? = null,
cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
)| Parameter | Type | Default | Description |
|---|---|---|---|
target | Target | — | Runtime target (e.g., Target.ZETIC_MLANGE_TARGET_QNN, Target.ZETIC_MLANGE_TARGET_LITERT_FP16). |
Other parameters match the default constructor above.
val model = ZeticMLangeModel(
context = this,
personalKey = PERSONAL_KEY,
name = "Steve/YOLOv11_comparison",
target = Target.ZETIC_MLANGE_TARGET_QNN,
)Requires a Lite tier or higher subscription. Free-tier keys cannot use explicit Target selection.
Explicit Target + APType
Pins both the runtime backend and the application processor (CPU/GPU/NPU).
@JvmOverloads
ZeticMLangeModel(
context: Context,
personalKey: String,
name: String,
version: Int? = null,
target: Target,
apType: APType,
onProgress: ((Float) -> Unit)? = null,
onStatusChanged: ((ModelLoadingStatus) -> Unit)? = null,
cacheHandlingPolicy: ModelCacheHandlingPolicy = ModelCacheHandlingPolicy.REMOVE_OVERLAPPING,
)| Parameter | Type | Default | Description |
|---|---|---|---|
target | Target | — | Runtime target. |
apType | APType | — | Application processor: CPU, GPU, or NPU. |
Other parameters match the default constructor above.
val model = ZeticMLangeModel(
context = this,
personalKey = PERSONAL_KEY,
name = "Steve/YOLOv11_comparison",
target = Target.ZETIC_MLANGE_TARGET_QNN,
apType = APType.NPU,
)Requires a Lite tier or higher subscription.
Methods
run(inputs)
Executes inference on the loaded model using the provided input tensors.
fun run(inputs: Array<Tensor> = emptyArray()): Array<Tensor>| Parameter | Type | Description |
|---|---|---|
inputs | Array<Tensor> | (Optional) Input tensors matching the model's expected shapes and data types. When provided, each tensor's bytes are memcpy'd into the model's own getInputBufferAt(i) before running. When empty, inference runs against whatever is already loaded in the model-owned input buffers (useful in combination with getInputBuffers()). |
Returns: Array<Tensor>: The model's output tensors. These wrap model-owned buffers that are reused across calls — do not hold them past the next run() if you need a snapshot.
Throws: RuntimeException if input shapes do not match the model's expected inputs, or if inference execution fails.
val outputs = model.run(inputs)run(inputs) and run() both execute against getInputBufferAt(i) — the only difference is the byte copy. For hot loops, prefer getInputBuffers() + run() without arguments to skip that copy entirely.
getInputBuffers()
Returns the model's internal input tensors so you can fill them in place instead of allocating new tensors for every inference call.
fun getInputBuffers(): Array<Tensor>Returns: Array<Tensor>: Tensors wrapping the model-owned input buffers. Their lifetime is tied to the model — they are valid until close() is called.
val buffers = model.getInputBuffers()
buffers[0].from(sourceTensor) // copies source bytes into the model's input buffer
val outputs = model.run() // no inputs argument needed — buffers are already filledgetInputBuffers() is the allocation-free path for hot loops (e.g. per-frame camera inference). Fill the returned tensors directly and call run() with no arguments.
close()
Releases the underlying native model and its input/output buffers.
override fun close()After close(), any tensor returned from getInputBuffers() or a prior run() call is invalid. ZeticMLangeModel implements Closeable, so you can use it with use { ... }.
Memory Management Recommendation
Tensors returned from getInputBuffers() and the output of run() wrap model-owned direct buffers. Prefer using those over allocating your own tensors:
- Model-owned (preferred): fill inputs via
getInputBuffers()[i].from(...)/.copy(...), read outputs directly fromrun()'s return. No extra off-heap allocation per frame; the buffers are released once whenclose()runs. Leave these tensors with the defaultimmediateRelease = false. - User-allocated: if you build inputs with
Tensor.of(...)(e.g. converting aFloatArrayinto a new tensor each frame) and pass them torun(), the model copies your bytes into its own input buffer and your tensor is no longer needed. In that case setimmediateRelease = truesoTensorCleanerreclaims the off-heap memory promptly.
Never set immediateRelease = true on tensors obtained from getInputBuffers() or run(). Those tensors share memory with the model; if the cleaner fires, the model's next run() will read from already-freed native memory and crash.
See Tensor — Memory Management Recommendation for the full discussion.
Full Working Example
import com.zeticai.mlange.core.model.ZeticMLangeModel
import com.zeticai.mlange.core.tensor.Tensor
class MainActivity : AppCompatActivity() {
private lateinit var model: ZeticMLangeModel
private lateinit var inputBuffers: Array<Tensor>
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// (1) Load model
// Downloads the optimized binary on first run, then caches locally
model = ZeticMLangeModel(this, PERSONAL_KEY, "Steve/YOLOv11_comparison")
// (2) Grab the model-owned input buffers once — reuse for every frame
inputBuffers = model.getInputBuffers()
}
private fun runFrame(sourceTensors: Array<Tensor>) {
// (3) Fill the model's input buffers in place (no extra allocation)
for (i in inputBuffers.indices) {
inputBuffers[i].from(sourceTensors[i])
}
// (4) Run inference — no need to pass inputs again
val outputs = model.run()
// (5) Outputs wrap model-owned buffers, reused across calls
for (output in outputs) {
// Process each output tensor before the next run()
}
}
}Always ensure your input tensor shapes exactly match what the model expects. A shape mismatch will throw a RuntimeException. Check the model's input specification on the Melange Dashboard.
Gradle Setup
The ZeticMLangeModel class requires the Melange AAR dependency. Add the following to your app-level build.gradle:
android {
...
packagingOptions {
jniLibs {
useLegacyPackaging true
}
}
}
dependencies {
implementation("com.zeticai.mlange:mlange:1.6.1+")
}The useLegacyPackaging true setting is required. Without it, the native JNI libraries for NPU acceleration will not load correctly. See Common Errors for details.
See Also
- Android Integration Guide: Step-by-step setup guide
- ZeticMLangeModel (iOS): iOS equivalent
- Common Errors: Troubleshooting guide