Melange
Platform IntegrationAndroid

Basic Inference

Run your first AI model inference on Android with ZETIC Melange.

This guide shows how to run inference on Android after completing the SDK setup.

Prerequisites

Running Inference

// (1) Load model
// This handles model download (if needed) and NPU context creation
val model = ZeticMLangeModel(CONTEXT, PERSONAL_KEY, MODEL_NAME)

// (2) Prepare model inputs
// Ensure input shapes match your model's requirement (e.g., Float32 arrays)
val inputs: Array<Tensor> = // Prepare your inputs

// (3) Run Inference
// Executes the fully automated hardware graph.
// No manual delegate configuration or memory syncing required.
val outputs = model.run(inputs)
// (1) Load model
// This handles model download (if needed) and NPU context creation
ZeticMLangeModel model = new ZeticMLangeModel(CONTEXT, PERSONAL_KEY, MODEL_NAME);

// (2) Prepare model inputs
// Ensure input shapes match your model's requirement (e.g., Float32 arrays)
Tensor[] inputs = // Prepare your inputs;

// (3) Run Inference
// Executes the hardware-accelerated graph. This is a blocking call.
Tensor[] outputs = model.run(inputs);

Understanding the Flow

  1. Model Download: On first use, the SDK downloads the pre-compiled, hardware-optimized model binary from the Melange CDN. This binary is specific to your device's NPU chipset.
  2. NPU Context Creation: Melange initializes the appropriate hardware accelerator (Qualcomm HTP, MediaTek APU, Samsung DSP) and loads the model into NPU memory using zero-copy memory mapping.
  3. Inference Execution: Your input tensor is processed through the NPU-accelerated computation graph, and the output tensor is returned. No data leaves the device.

Always ensure your input tensor shapes exactly match what the model expects. A shape mismatch will throw a RuntimeException. Check the model's input specification on the Melange Dashboard.

Zero-Copy Input Path

For hot loops (e.g. per-frame camera inference), skip the per-call byte copy by filling the model's own input buffers in place:

val inputBuffers = model.getInputBuffers()

for (i in inputBuffers.indices) {
    inputBuffers[i].from(sourceTensors[i])
}

val outputs = model.run()

The tensors returned from run() already wrap model-owned output buffers — read them directly for zero-copy post-processing, but consume them before the next run() call since the buffers are reused. See getInputBuffers() for details.

Sample Application

Please refer to the ZETIC Melange Apps repository for complete sample applications and more details.


Next Steps

On this page