ZeticMLangeLLMModel
API reference for running LLM inference on iOS with ZeticMLangeLLMModel.
The ZeticMLangeLLMModel class provides on-device LLM inference for iOS. It handles model downloading, quantization selection, KV cache management, and token-by-token streaming through a single unified Swift API.
Import
import ZeticMLangeInitializers
Automatic Configuration (Recommended)
Creates a new LLM model instance with default settings optimized for the device.
init(personalKey: String, name: String) throws| Parameter | Type | Description |
|---|---|---|
personalKey | String | Your personal authentication key from the Melange Dashboard. |
name | String | Identifier for the model. Accepts a pre-built model key or a Hugging Face repository ID. |
let model = try ZeticMLangeLLMModel(personalKey: "YOUR_PERSONAL_KEY", name: "YOUR_MODEL_NAME")The initializer performs a network call on first use to download the model binary. The binary is cached locally after the first download, so subsequent initializations are fast.
Custom Configuration (Advanced)
Creates a new LLM model instance with full control over inference mode, dataset evaluation, KV cache policy, and download progress reporting.
init(
personalKey: String,
name: String,
version: Int? = nil,
modelMode: LLMModelMode = .runSpeed,
dataSetType: LLMDataSetType = .none,
kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy = .cleanUpOnFull,
onProgress: ((Float) -> Void)? = nil
) throws| Parameter | Type | Default | Description |
|---|---|---|---|
personalKey | String | - | Your personal authentication key. |
name | String | - | Identifier for the model to download. |
version | Int? | nil | Specific model version. If nil, the latest version is used. |
modelMode | LLMModelMode | .runSpeed | Inference mode for backend selection. See LLMModelMode. |
dataSetType | LLMDataSetType | .none | Dataset used for accuracy-based model selection. See LLMDataSetType. |
kvCacheCleanupPolicy | LLMKVCacheCleanupPolicy | .cleanUpOnFull | Policy for handling KV cache when full. See LLMKVCacheCleanupPolicy. |
onProgress | ((Float) -> Void)? | nil | Callback reporting download progress as a Float from 0.0 to 1.0. |
let model = try ZeticMLangeLLMModel(
personalKey: "YOUR_PERSONAL_KEY",
name: "google/gemma-3-4b-it",
version: nil,
modelMode: .runSpeed,
dataSetType: .none,
kvCacheCleanupPolicy: .cleanUpOnFull,
onProgress: { progress in
print("Download progress: \(Int(progress * 100))%")
}
)Methods
run(_:)
Starts a generation context with the provided prompt. After calling run(), consume generated tokens by calling waitForNextToken() in a loop.
func run(_ prompt: String) throws -> LLMRunResult| Parameter | Type | Description |
|---|---|---|
prompt | String | The input prompt to begin generation. |
Returns: LLMRunResult: The initial run result.
Throws: An error if the prompt cannot be processed.
waitForNextToken()
Blocks until the next token is generated and returns it. Call this in a loop after run() to stream the full response.
func waitForNextToken() -> LLMNextTokenResultReturns: LLMNextTokenResult with the following properties:
| Property | Type | Description |
|---|---|---|
token | String | The generated token text. May be empty for special tokens. |
generatedTokens | Int | Number of tokens generated so far. Returns 0 when generation is complete. |
cleanUp()
Cleans up the model's context and releases resources. Required when using .doNotCleanUp KV cache policy before starting a new conversation.
func cleanUp()When using LLMKVCacheCleanupPolicy.doNotCleanUp, you must call cleanUp() before calling run() again. Failing to do so may cause unexpected behavior.
Compatible Model Inputs
ZeticMLangeLLMModel accepts two types of model identifiers for the name parameter:
- Pre-built models from the Melange Dashboard: select a ready-to-use model and use its key.
- Hugging Face repository IDs: pass the full repo ID directly (e.g.,
google/gemma-3-4b-it,LiquidAI/LFM2.5-1.2B-Instruct).
Currently supports public Hugging Face repositories with permissive open-source licenses. Private repository authentication is on the roadmap.
Full Working Example
import ZeticMLange
class ViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
do {
// (1) Initialize the model
let model = try ZeticMLangeLLMModel(
personalKey: PERSONAL_KEY,
name: "google/gemma-3-4b-it"
)
// (2) Start generation
try model.run("Explain on-device AI in one paragraph.")
// (3) Stream tokens
var buffer = ""
while true {
let waitResult = model.waitForNextToken()
let token = waitResult.token
let generatedTokens = waitResult.generatedTokens
if generatedTokens == 0 {
break
}
buffer.append(token)
}
let output = buffer
print(output)
// (4) Clean up when done
model.cleanUp()
} catch {
print("Melange error: \(error)")
}
}
}Swift Package Manager Setup
Add the Melange package to your Xcode project:
- Open your project in Xcode
- Go to File then Add Package Dependencies
- Enter the repository URL:
https://github.com/zetic-ai/ZeticMLangeiOS - Click Add Package
- Select
ZeticMLangeand link it to your app target
See Also
- ZeticMLangeLLMModel (Android): Android equivalent
- LLM Inference Modes: Choosing the right inference mode
- Enums and Constants: All enum types used by this class
- iOS Integration Guide: Step-by-step setup guide