Inference Mode Selection
Choose between CPU, GPU, and NPU inference modes in ZETIC Melange.
Melange provides several inference modes for general (non-LLM) models to balance between speed and accuracy based on your application's requirements.
Available Modes
Default / Auto (RUN_AUTO)
Intelligently balances speed and accuracy for optimal performance. This mode automatically selects the fastest configuration while ensuring high-quality results (SNR > 20dB). This is the recommended mode for most use cases.
Speed-First (RUN_SPEED)
Maximizes inference speed with minimum latency. Recommended for real-time applications where response time is the top priority.
Accuracy-First (RUN_ACCURACY)
Delivers the highest precision based on maximum SNR scores. Best suited for applications where accuracy is more critical than speed.
The optimal mode is automatically determined based on:
- Speed metrics: Inference time (latency in ms)
- Accuracy metrics: SNR (Signal-to-Noise Ratio in dB)
You can override this automatic selection by explicitly specifying a mode.
API Usage
// Default: Auto mode
// Speed first, but maintains SNR above 20dB
val modelDefault = ZeticMLangeModel(
context = this,
personalKey = PERSONAL_KEY,
name = MODEL_NAME,
modelMode = ModelMode.RUN_AUTO
)
// Speed First Mode
val modelFast = ZeticMLangeModel(
context = this,
personalKey = PERSONAL_KEY,
name = MODEL_NAME,
modelMode = ModelMode.RUN_SPEED
)
// Accuracy First Mode
val modelAccurate = ZeticMLangeModel(
context = this,
personalKey = PERSONAL_KEY,
name = MODEL_NAME,
modelMode = ModelMode.RUN_ACCURACY
)// Default: Auto mode
// Speed first, but maintains SNR above 20dB
let modelDefault = try ZeticMLangeModel(
personalKey: PERSONAL_KEY,
name: MODEL_NAME,
modelMode: .runAuto
)
// Speed First Mode
let modelFast = try ZeticMLangeModel(
personalKey: PERSONAL_KEY,
name: MODEL_NAME,
modelMode: .runSpeed
)
// Accuracy First Mode
let modelAccurate = try ZeticMLangeModel(
personalKey: PERSONAL_KEY,
name: MODEL_NAME,
modelMode: .runAccuracy
)Choosing the Right Mode
| Use Case | Recommended Mode | Why |
|---|---|---|
| Real-time video processing | RUN_SPEED | Minimize frame processing latency |
| Medical image analysis | RUN_ACCURACY | Precision is critical |
| General mobile app | RUN_AUTO | Best balance for most users |
| Prototype / testing | RUN_AUTO | Good default behavior |
Next Steps
- LLM Inference Modes: Modes specific to LLM models
- Performance Optimization: Additional tuning tips
- Performance-Adaptive Deployment: How Melange selects optimal binaries