Performance Optimization
Tips for optimizing on-device AI performance with ZETIC Melange.
This guide covers strategies for maximizing the performance of your on-device AI applications with ZETIC Melange.
Inference Mode Selection
The most impactful optimization is choosing the right inference mode. See Inference Mode Selection for details.
| Mode | Trade-off |
|---|---|
RUN_SPEED | Fastest inference, may sacrifice some accuracy |
RUN_AUTO | Balanced: fast while maintaining SNR > 20dB |
RUN_ACCURACY | Highest precision, may be slower |
Model Format Selection
- Prefer
.pt2or.onnxover TorchScript (.pt). The newer formats produce cleaner computation graphs that optimize better for NPU execution. - Simplify ONNX models with
onnxsimbefore uploading to reduce redundant operations.
pip install onnxsim
onnxsim input_model.onnx output_model.onnxInput Optimization
- Use fixed input shapes. Dynamic shapes prevent NPU compilation. Export with static dimensions.
- Match expected input sizes. Do not upload inputs larger than necessary. Smaller inputs mean faster inference.
- Use Float32 inputs. Melange handles quantization internally: provide full-precision inputs.
Runtime Best Practices
Initialize Once, Run Many
Model initialization involves downloading and NPU context creation. Do this once and reuse the model instance:
// Do this once
val model = ZeticMLangeModel(context, PERSONAL_KEY, MODEL_NAME)
// Reuse for multiple inferences
for (frame in videoFrames) {
val outputs = model.run(preprocessFrame(frame))
}Background Threading
Always run inference on a background thread to keep the UI responsive:
lifecycleScope.launch(Dispatchers.IO) {
val outputs = model.run(inputs)
withContext(Dispatchers.Main) {
updateUI(outputs)
}
}DispatchQueue.global().async {
let outputs = try? model.run(inputs)
DispatchQueue.main.async {
self.updateUI(outputs)
}
}Minimize Preprocessing Overhead
Preprocessing (image resize, normalization) can become a bottleneck. Profile your preprocessing code alongside inference time.
Device Considerations
- Physical devices only. Emulators and simulators do not have NPU hardware.
- Keep firmware updated. NPU driver updates can improve performance.
- Test on target devices. Performance varies significantly across chipsets.
Melange automatically selects the optimal compiled binary for each device through Performance-Adaptive Deployment. Your model is benchmarked on 200+ physical devices to ensure the best possible performance on each hardware configuration.
Next Steps
- Inference Mode Selection: Choose the right mode
- Device Compatibility: Supported NPU chipsets
- Benchmark Methodology: How performance is measured