Melange
API ReferenceiOS

ZeticMLangeLLMModel

API reference for running LLM inference on iOS with ZeticMLangeLLMModel.

The ZeticMLangeLLMModel class provides on-device LLM inference for iOS. It handles model downloading, quantization selection, KV cache management, and token-by-token streaming through a single unified Swift API.

Import

import ZeticMLange

Initializers

Creates a new LLM model instance with default settings optimized for the device.

init(personalKey: String, name: String) throws
ParameterTypeDescription
personalKeyStringYour personal authentication key from the Melange Dashboard.
nameStringIdentifier for the model. Accepts a pre-built model key or a Hugging Face repository ID.
let model = try ZeticMLangeLLMModel(personalKey: "YOUR_PERSONAL_KEY", name: "YOUR_MODEL_NAME")

The initializer performs a network call on first use to download the model binary. The binary is cached locally after the first download, so subsequent initializations are fast.


Custom Configuration (Advanced)

Creates a new LLM model instance with full control over inference mode, dataset evaluation, KV cache policy, and download progress reporting.

init(
    personalKey: String,
    name: String,
    version: Int? = nil,
    modelMode: LLMModelMode = .runSpeed,
    dataSetType: LLMDataSetType = .none,
    kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy = .cleanUpOnFull,
    onProgress: ((Float) -> Void)? = nil
) throws
ParameterTypeDefaultDescription
personalKeyString-Your personal authentication key.
nameString-Identifier for the model to download.
versionInt?nilSpecific model version. If nil, the latest version is used.
modelModeLLMModelMode.runSpeedInference mode for backend selection. See LLMModelMode.
dataSetTypeLLMDataSetType.noneDataset used for accuracy-based model selection. See LLMDataSetType.
kvCacheCleanupPolicyLLMKVCacheCleanupPolicy.cleanUpOnFullPolicy for handling KV cache when full. See LLMKVCacheCleanupPolicy.
onProgress((Float) -> Void)?nilCallback reporting download progress as a Float from 0.0 to 1.0.
let model = try ZeticMLangeLLMModel(
    personalKey: "YOUR_PERSONAL_KEY",
    name: "google/gemma-3-4b-it",
    version: nil,
    modelMode: .runSpeed,
    dataSetType: .none,
    kvCacheCleanupPolicy: .cleanUpOnFull,
    onProgress: { progress in
        print("Download progress: \(Int(progress * 100))%")
    }
)

Methods

run(_:)

Starts a generation context with the provided prompt. After calling run(), consume generated tokens by calling waitForNextToken() in a loop.

func run(_ prompt: String) throws -> LLMRunResult
ParameterTypeDescription
promptStringThe input prompt to begin generation.

Returns: LLMRunResult: The initial run result.

Throws: An error if the prompt cannot be processed.


waitForNextToken()

Blocks until the next token is generated and returns it. Call this in a loop after run() to stream the full response.

func waitForNextToken() -> LLMNextTokenResult

Returns: LLMNextTokenResult with the following properties:

PropertyTypeDescription
tokenStringThe generated token text. May be empty for special tokens.
generatedTokensIntNumber of tokens generated so far. Returns 0 when generation is complete.

cleanUp()

Cleans up the model's context and releases resources. Required when using .doNotCleanUp KV cache policy before starting a new conversation.

func cleanUp()

When using LLMKVCacheCleanupPolicy.doNotCleanUp, you must call cleanUp() before calling run() again. Failing to do so may cause unexpected behavior.


Compatible Model Inputs

ZeticMLangeLLMModel accepts two types of model identifiers for the name parameter:

  1. Pre-built models from the Melange Dashboard: select a ready-to-use model and use its key.
  2. Hugging Face repository IDs: pass the full repo ID directly (e.g., google/gemma-3-4b-it, LiquidAI/LFM2.5-1.2B-Instruct).

Currently supports public Hugging Face repositories with permissive open-source licenses. Private repository authentication is on the roadmap.


Full Working Example

import ZeticMLange

class ViewController: UIViewController {
    override func viewDidLoad() {
        super.viewDidLoad()

        do {
            // (1) Initialize the model
            let model = try ZeticMLangeLLMModel(
                personalKey: PERSONAL_KEY,
                name: "google/gemma-3-4b-it"
            )

            // (2) Start generation
            try model.run("Explain on-device AI in one paragraph.")

            // (3) Stream tokens
            var buffer = ""

            while true {
                let waitResult = model.waitForNextToken()
                let token = waitResult.token
                let generatedTokens = waitResult.generatedTokens

                if generatedTokens == 0 {
                    break
                }

                buffer.append(token)
            }

            let output = buffer
            print(output)

            // (4) Clean up when done
            model.cleanUp()
        } catch {
            print("Melange error: \(error)")
        }
    }
}

Swift Package Manager Setup

Add the Melange package to your Xcode project:

  1. Open your project in Xcode
  2. Go to File then Add Package Dependencies
  3. Enter the repository URL: https://github.com/zetic-ai/ZeticMLangeiOS
  4. Click Add Package
  5. Select ZeticMLange and link it to your app target

See Also