ZeticMLangeLLMModel
Complete API reference for the ZeticMLangeLLMModel class in Flutter.
This page reflects zetic_mlange 1.8.1.
ZeticMLangeLLMModel runs on-device LLM prompts and exposes token-by-token generation through the native SDKs.
Import
import 'package:zetic_mlange/zetic_mlange.dart';Constructors
create
static Future<ZeticMLangeLLMModel> create({
required String personalKey,
required String name,
int? version,
LLMModelMode modelMode = LLMModelMode.runAuto,
APType? apType,
LLMQuantType? quantType,
ModelCacheHandlingPolicy cacheHandlingPolicy =
CacheHandlingPolicy.removeOverlapping,
LLMInitOption? initOption,
LLMKVCacheCleanupPolicy kvCacheCleanupPolicy =
LLMKVCacheCleanupPolicy.cleanUpOnFull,
MlangeProgressCallback? onDownload,
})| Parameter | Type | Default | Description |
|---|---|---|---|
personalKey | String | — | Your personal authentication key. |
name | String | — | Full LLM model identifier in account_name/project_name format. |
version | int? | null | Specific model version to load. |
modelMode | LLMModelMode | runAuto | LLM backend selection mode. |
apType | APType? | null | Optional processor preference. |
quantType | LLMQuantType? | null | Optional LLM quantization preference. |
cacheHandlingPolicy | ModelCacheHandlingPolicy | removeOverlapping | Native model cache policy. |
initOption | LLMInitOption? | null | LLM initialization options such as context size and KV cache cleanup policy. |
kvCacheCleanupPolicy | LLMKVCacheCleanupPolicy | cleanUpOnFull | Backward-compatible shortcut used when initOption is omitted. |
onDownload | MlangeProgressCallback? | null | Download progress callback from 0.0 to 1.0. |
final model = await ZeticMLangeLLMModel.create(
personalKey: personalKey,
name: llmModelName,
initOption: const LLMInitOption(nCtx: 4096),
);When initOption is provided, its kvCacheCleanupPolicy and nCtx values are used. The standalone kvCacheCleanupPolicy parameter is only used when initOption is omitted.
Properties
isClosed
bool get isClosedReturns true after close releases the native LLM model handle.
Methods
run
LLMRunResult run(String text)Starts generation for the prompt and returns prompt-token metadata.
Returns: LLMRunResult, which contains status and promptTokens.
waitForNextToken
LLMNextTokenResult waitForNextToken()Blocks until the native runtime returns the next token snapshot.
Returns: LLMNextTokenResult, which contains:
| Property | Type | Description |
|---|---|---|
token | String | Generated token text. |
generatedTokens | int | Number of generated tokens reported by the native runtime. |
code / status | int | Native status code. |
timeUs | int | Token timing in microseconds when provided by the native runtime. |
isFirst | bool | true for the first token when reported by the native runtime. |
isFinal | bool | true when generation is complete. |
isFinished | bool | Convenience getter. true when isFinal is true or the token is empty. |
cleanUp
void cleanUp()Cleans up LLM runtime state, including KV cache state managed by the native SDK.
close
void close()Force-deinitializes the native LLM model handle.
After close(), the model handle is closed. Calling run, waitForNextToken, cleanUp, or close again throws MlangeException.
Generation Example
final llm = await ZeticMLangeLLMModel.create(
personalKey: personalKey,
name: llmModelName,
);
llm.run('Write one sentence about on-device AI.');
while (true) {
final next = llm.waitForNextToken();
if (next.isFinished) {
break;
}
print(next.token);
}
llm.cleanUp();
llm.close();