Advanced Configuration

This guide covers advanced configuration options available in the Melange Flutter SDK.

Inference Mode Selection

Melange supports multiple inference modes to balance speed and accuracy. By default, the Flutter SDK uses ModelMode.runAuto.

// Default: automatic runtime selection
final modelDefault = await ZeticMLangeModel.create(
  personalKey: personalKey,
  name: modelName,
  modelMode: ModelMode.runAuto,
);

// Speed-first
final modelFast = await ZeticMLangeModel.create(
  personalKey: personalKey,
  name: modelName,
  modelMode: ModelMode.runSpeed,
);

// Accuracy-first
final modelAccurate = await ZeticMLangeModel.create(
  personalKey: personalKey,
  name: modelName,
  modelMode: ModelMode.runAccuracy,
);

For a detailed explanation of each mode, see Inference Mode Selection.

Model Version Pinning

By default, the SDK loads the latest model version. Pin a specific version for production stability:

final model = await ZeticMLangeModel.create(
  personalKey: personalKey,
  name: modelName,
  version: 2,
);

Explicit Runtime Selection

When your account tier and deployed model support it, pass target and apType to request a specific runtime backend or processor:

final model = await ZeticMLangeModel.create(
  personalKey: personalKey,
  name: modelName,
  target: Target.qnn,
  apType: APType.npu,
);

Use automatic selection unless you have measured a device-specific reason to pin a backend. Explicit target selection follows the same tier restrictions as Android and iOS.

Quantization Preference

Use quantType to request a preferred precision during automatic selection:

final model = await ZeticMLangeModel.create(
  personalKey: personalKey,
  name: modelName,
  quantType: QuantType.fp16,
);

Flutter keeps the Dart API aligned across platforms. Availability of a specific Target, APType, or quantization depends on the model artifacts deployed for the current device.

Hugging Face Models

Use ZeticMLangeHFModel for supported Hugging Face repositories:

final hfModel = await ZeticMLangeHFModel.create(
  'owner/repository',
  userAccessToken: hfToken,
  manifestDir: 'optional-manifest-directory',
  index: 0,
);

final outputs = hfModel.run(inputs);

hfModel.close();

manifestDir is Android-only. iOS accepts the Dart parameter through the Flutter API and ignores it.

LLM Models

Use ZeticMLangeLLMModel for text generation:

final llm = await ZeticMLangeLLMModel.create(
  personalKey: personalKey,
  name: llmModelName,
  initOption: const LLMInitOption(
    nCtx: 4096,
    kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy.cleanUpOnFull,
  ),
  onDownload: (progress) {
    print('Loading LLM ${(progress * 100).round()}%');
  },
);

llm.run('Explain on-device AI in one paragraph.');

while (true) {
  final next = llm.waitForNextToken();
  if (next.isFinished) {
    break;
  }
  print(next.token);
}

llm.cleanUp();
llm.close();

waitForNextToken() blocks until the native runtime returns the next token. Run generation from a worker isolate or keep the UI responsive by scheduling updates carefully when building an interactive chat screen.

Threading Considerations

Flutter model creation is asynchronous. Keep model initialization out of frame-critical paths and update UI state after the Future completes:

Future<void> loadModel() async {
  setState(() => isLoading = true);
  try {
    model = await ZeticMLangeModel.create(
      personalKey: personalKey,
      name: modelName,
      onProgress: (progress) {
        setState(() => loadingProgress = progress);
      },
    );
  } finally {
    setState(() => isLoading = false);
  }
}

Next Steps

Inference Mode Selection: Detailed mode comparison
Performance Optimization: Tips for best performance
Flutter API Reference: Full Dart API documentation

Advanced Configuration

On this page