Basic Inference

This guide shows how to run inference on Android after completing the SDK setup.

Prerequisites

Melange SDK added to your project (Android Setup)
A compiled model on the Melange Dashboard
Your Personal Key and Model Key

Running Inference

// (1) Load model
// This handles model download (if needed) and NPU context creation
val model = ZeticMLangeModel(CONTEXT, PERSONAL_KEY, MODEL_NAME)

// (2) Prepare model inputs
// Ensure input shapes match your model's requirement (e.g., Float32 arrays)
val inputs: Array<Tensor> = // Prepare your inputs

// (3) Run Inference
// Executes the fully automated hardware graph.
// No manual delegate configuration or memory syncing required.
val outputs = model.run(inputs)

// (1) Load model
// This handles model download (if needed) and NPU context creation
ZeticMLangeModel model = new ZeticMLangeModel(CONTEXT, PERSONAL_KEY, MODEL_NAME);

// (2) Prepare model inputs
// Ensure input shapes match your model's requirement (e.g., Float32 arrays)
Tensor[] inputs = // Prepare your inputs;

// (3) Run Inference
// Executes the hardware-accelerated graph. This is a blocking call.
Tensor[] outputs = model.run(inputs);

Understanding the Flow

Model Download: On first use, the SDK downloads the pre-compiled, hardware-optimized model binary from the Melange CDN. This binary is specific to your device's NPU chipset.
NPU Context Creation: Melange initializes the appropriate hardware accelerator (Qualcomm HTP, MediaTek APU, Samsung DSP) and loads the model into NPU memory using zero-copy memory mapping.
Inference Execution: Your input tensor is processed through the NPU-accelerated computation graph, and the output tensor is returned. No data leaves the device.

Always ensure your input tensor shapes exactly match what the model expects. A shape mismatch will throw a RuntimeException. Check the model's input specification on the Melange Dashboard.

Sample Application

Please refer to the ZETIC Melange Apps repository for complete sample applications and more details.

Next Steps

Advanced Configuration: Inference modes and pipeline usage
Custom Preprocessing: Implement input preprocessing
Multi-Model Pipelines: Chain models together