Melange
How-To Guides

Performance Optimization

Tips for optimizing on-device AI performance with ZETIC Melange.

This guide covers strategies for maximizing the performance of your on-device AI applications with ZETIC Melange.

Inference Mode Selection

The most impactful optimization is choosing the right inference mode. See Inference Mode Selection for details.

ModeTrade-off
RUN_SPEEDFastest inference, may sacrifice some accuracy
RUN_AUTOBalanced: fast while maintaining SNR > 20dB
RUN_ACCURACYHighest precision, may be slower

Model Format Selection

  • Prefer .pt2 or .onnx over TorchScript (.pt). The newer formats produce cleaner computation graphs that optimize better for NPU execution.
  • Simplify ONNX models with onnxsim before uploading to reduce redundant operations.
pip install onnxsim
onnxsim input_model.onnx output_model.onnx

Input Optimization

  • Use fixed input shapes. Dynamic shapes prevent NPU compilation. Export with static dimensions.
  • Match expected input sizes. Do not upload inputs larger than necessary. Smaller inputs mean faster inference.
  • Use Float32 inputs. Melange handles quantization internally: provide full-precision inputs.

Runtime Best Practices

Initialize Once, Run Many

Model initialization involves downloading and NPU context creation. Do this once and reuse the model instance:

// Do this once
val model = ZeticMLangeModel(context, PERSONAL_KEY, MODEL_NAME)

// Reuse for multiple inferences
for (frame in videoFrames) {
    val outputs = model.run(preprocessFrame(frame))
}

Background Threading

Always run inference on a background thread to keep the UI responsive:

lifecycleScope.launch(Dispatchers.IO) {
    val outputs = model.run(inputs)
    withContext(Dispatchers.Main) {
        updateUI(outputs)
    }
}
DispatchQueue.global().async {
    let outputs = try? model.run(inputs)
    DispatchQueue.main.async {
        self.updateUI(outputs)
    }
}

Minimize Preprocessing Overhead

Preprocessing (image resize, normalization) can become a bottleneck. Profile your preprocessing code alongside inference time.

Device Considerations

  • Physical devices only. Emulators and simulators do not have NPU hardware.
  • Keep firmware updated. NPU driver updates can improve performance.
  • Test on target devices. Performance varies significantly across chipsets.

Melange automatically selects the optimal compiled binary for each device through Performance-Adaptive Deployment. Your model is benchmarked on 200+ physical devices to ensure the best possible performance on each hardware configuration.


Next Steps