Advanced Configuration
Advanced Melange configuration options for Flutter.
This guide covers advanced configuration options available in the Melange Flutter SDK.
Inference Mode Selection
Melange supports multiple inference modes to balance speed and accuracy. By default, the Flutter SDK uses ModelMode.runAuto.
// Default: automatic runtime selection
final modelDefault = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
modelMode: ModelMode.runAuto,
);
// Speed-first
final modelFast = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
modelMode: ModelMode.runSpeed,
);
// Accuracy-first
final modelAccurate = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
modelMode: ModelMode.runAccuracy,
);For a detailed explanation of each mode, see Inference Mode Selection.
Model Version Pinning
By default, the SDK loads the latest model version. Pin a specific version for production stability:
final model = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
version: 2,
);Explicit Runtime Selection
When your account tier and deployed model support it, pass target and apType to request a specific runtime backend or processor:
final model = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
target: Target.qnn,
apType: APType.npu,
);Use automatic selection unless you have measured a device-specific reason to pin a backend. Explicit target selection follows the same tier restrictions as Android and iOS.
Quantization Preference
Use quantType to request a preferred precision during automatic selection:
final model = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
quantType: QuantType.fp16,
);Flutter keeps the Dart API aligned across platforms. Availability of a specific Target, APType, or quantization depends on the model artifacts deployed for the current device.
Hugging Face Models
Use ZeticMLangeHFModel for supported Hugging Face repositories:
final hfModel = await ZeticMLangeHFModel.create(
'owner/repository',
userAccessToken: hfToken,
manifestDir: 'optional-manifest-directory',
index: 0,
);
final outputs = hfModel.run(inputs);
hfModel.close();manifestDir is Android-only. iOS accepts the Dart parameter through the Flutter API and ignores it.
LLM Models
Use ZeticMLangeLLMModel for text generation:
final llm = await ZeticMLangeLLMModel.create(
personalKey: personalKey,
name: llmModelName,
initOption: const LLMInitOption(
nCtx: 4096,
kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy.cleanUpOnFull,
),
onDownload: (progress) {
print('Loading LLM ${(progress * 100).round()}%');
},
);
llm.run('Explain on-device AI in one paragraph.');
while (true) {
final next = llm.waitForNextToken();
if (next.isFinished) {
break;
}
print(next.token);
}
llm.cleanUp();
llm.close();waitForNextToken() blocks until the native runtime returns the next token. Run generation from a worker isolate or keep the UI responsive by scheduling updates carefully when building an interactive chat screen.
Threading Considerations
Flutter model creation is asynchronous. Keep model initialization out of frame-critical paths and update UI state after the Future completes:
Future<void> loadModel() async {
setState(() => isLoading = true);
try {
model = await ZeticMLangeModel.create(
personalKey: personalKey,
name: modelName,
onProgress: (progress) {
setState(() => loadingProgress = progress);
},
);
} finally {
setState(() => isLoading = false);
}
}Next Steps
- Inference Mode Selection: Detailed mode comparison
- Performance Optimization: Tips for best performance
- Flutter API Reference: Full Dart API documentation