Melange
API ReferenceFlutter

ZeticMLangeLLMModel

Complete API reference for the ZeticMLangeLLMModel class in Flutter.

This page reflects zetic_mlange 1.8.1.

ZeticMLangeLLMModel runs on-device LLM prompts and exposes token-by-token generation through the native SDKs.

Import

import 'package:zetic_mlange/zetic_mlange.dart';

Constructors

create

static Future<ZeticMLangeLLMModel> create({
  required String personalKey,
  required String name,
  int? version,
  LLMModelMode modelMode = LLMModelMode.runAuto,
  APType? apType,
  LLMQuantType? quantType,
  ModelCacheHandlingPolicy cacheHandlingPolicy =
      CacheHandlingPolicy.removeOverlapping,
  LLMInitOption? initOption,
  LLMKVCacheCleanupPolicy kvCacheCleanupPolicy =
      LLMKVCacheCleanupPolicy.cleanUpOnFull,
  MlangeProgressCallback? onDownload,
})
ParameterTypeDefaultDescription
personalKeyStringYour personal authentication key.
nameStringFull LLM model identifier in account_name/project_name format.
versionint?nullSpecific model version to load.
modelModeLLMModelModerunAutoLLM backend selection mode.
apTypeAPType?nullOptional processor preference.
quantTypeLLMQuantType?nullOptional LLM quantization preference.
cacheHandlingPolicyModelCacheHandlingPolicyremoveOverlappingNative model cache policy.
initOptionLLMInitOption?nullLLM initialization options such as context size and KV cache cleanup policy.
kvCacheCleanupPolicyLLMKVCacheCleanupPolicycleanUpOnFullBackward-compatible shortcut used when initOption is omitted.
onDownloadMlangeProgressCallback?nullDownload progress callback from 0.0 to 1.0.
final model = await ZeticMLangeLLMModel.create(
  personalKey: personalKey,
  name: llmModelName,
  initOption: const LLMInitOption(nCtx: 4096),
);

When initOption is provided, its kvCacheCleanupPolicy and nCtx values are used. The standalone kvCacheCleanupPolicy parameter is only used when initOption is omitted.


Properties

isClosed

bool get isClosed

Returns true after close releases the native LLM model handle.


Methods

run

LLMRunResult run(String text)

Starts generation for the prompt and returns prompt-token metadata.

Returns: LLMRunResult, which contains status and promptTokens.

waitForNextToken

LLMNextTokenResult waitForNextToken()

Blocks until the native runtime returns the next token snapshot.

Returns: LLMNextTokenResult, which contains:

PropertyTypeDescription
tokenStringGenerated token text.
generatedTokensintNumber of generated tokens reported by the native runtime.
code / statusintNative status code.
timeUsintToken timing in microseconds when provided by the native runtime.
isFirstbooltrue for the first token when reported by the native runtime.
isFinalbooltrue when generation is complete.
isFinishedboolConvenience getter. true when isFinal is true or the token is empty.

cleanUp

void cleanUp()

Cleans up LLM runtime state, including KV cache state managed by the native SDK.

close

void close()

Force-deinitializes the native LLM model handle.

After close(), the model handle is closed. Calling run, waitForNextToken, cleanUp, or close again throws MlangeException.


Generation Example

final llm = await ZeticMLangeLLMModel.create(
  personalKey: personalKey,
  name: llmModelName,
);

llm.run('Write one sentence about on-device AI.');

while (true) {
  final next = llm.waitForNextToken();
  if (next.isFinished) {
    break;
  }
  print(next.token);
}

llm.cleanUp();
llm.close();

On this page