Performance Optimization

This guide covers strategies for maximizing the performance of your on-device AI applications with ZETIC Melange.

Inference Mode Selection

The most impactful optimization is choosing the right inference mode. See Inference Mode Selection for details.

Mode	Trade-off
`RUN_SPEED`	Fastest inference, may sacrifice some accuracy
`RUN_AUTO`	Balanced: fast while maintaining SNR > 20dB
`RUN_ACCURACY`	Highest precision, may be slower

Model Format Selection

Simplify ONNX models with onnxsim before uploading to reduce redundant operations.

pip install onnxsim
onnxsim input_model.onnx output_model.onnx

Input Optimization

Use fixed input shapes. Dynamic shapes prevent NPU compilation. Export with static dimensions.
Match expected input sizes. Do not upload inputs larger than necessary. Smaller inputs mean faster inference.
Use Float32 inputs. Melange handles quantization internally: provide full-precision inputs.

Runtime Best Practices

Initialize Once, Run Many

Model initialization involves downloading and NPU context creation. Do this once and reuse the model instance:

// Do this once
val model = ZeticMLangeModel(context, PERSONAL_KEY, MODEL_NAME)

// Reuse for multiple inferences
for (frame in videoFrames) {
    val outputs = model.run(preprocessFrame(frame))
}

Background Threading

Always run inference on a background thread to keep the UI responsive:

lifecycleScope.launch(Dispatchers.IO) {
    val outputs = model.run(inputs)
    withContext(Dispatchers.Main) {
        updateUI(outputs)
    }
}

DispatchQueue.global().async {
    let outputs = try? model.run(inputs: inputs)
    DispatchQueue.main.async {
        self.updateUI(outputs)
    }
}

Future<void> runInference() async {
  final outputs = await Future(() => model.run(inputs));
  if (!context.mounted) return;
  updateUI(outputs);
}

Minimize Preprocessing Overhead

Preprocessing (image resize, normalization) can become a bottleneck. Profile your preprocessing code alongside inference time.

Device Considerations

Physical devices only. Emulators and simulators do not have NPU hardware.
Keep firmware updated. NPU driver updates can improve performance.
Test on target devices. Performance varies significantly across chipsets.

Melange automatically selects the optimal compiled binary for each device through Performance-Adaptive Deployment. Your model is benchmarked on 200+ physical devices to ensure the best possible performance on each hardware configuration.

Next Steps

Inference Mode Selection: Choose the right mode
Device Compatibility: Supported NPU chipsets
Benchmark Methodology: How performance is measured

Performance Optimization

On this page