Melange
LLM Inference

LLM Inference Overview

Run large language models on-device with ZETIC Melange.

Melange provides instant on-device LLM deployment, abstracting the complexity of memory management, tensor offloading, and backend fragmentation into a simple API.

How It Works

  1. Select a model: Choose from pre-built models on the Melange Dashboard or use a Hugging Face Repository ID.
  2. Initialize: Set up your workspace via the Dashboard and generate a Personal Key.
  3. Stream tokens: Initialize the LLM engine and start streaming tokens in your app.

Supported Input Sources

  • Pre-built Models: Select a ready-to-use model from the Melange Dashboard.
  • Hugging Face Repository ID: Use models like google/gemma-3-4b-it or LiquidAI/LFM2.5-1.2B-Instruct.

Currently supports public repositories with permissive open-source licenses. Private repository authentication is on the roadmap.

Quick Example

val model = ZeticMLangeLLMModel(context, PERSONAL_KEY, MODEL_NAME)

// Start generation
model.run("What is on-device AI?")

val sb = StringBuilder()
while (true) {
    val result = model.waitForNextToken()
    if (result.generatedTokens == 0) break
    sb.append(result.token)
}

val output = sb.toString()
let model = try ZeticMLangeLLMModel(personalKey: PERSONAL_KEY, name: MODEL_NAME)

// Start generation
try model.run("What is on-device AI?")

var buffer = ""
while true {
    let result = model.waitForNextToken()
    if result.generatedTokens == 0 { break }
    buffer.append(result.token)
}

let output = buffer

Quick Start Templates

Build a complete chat app with just your PERSONAL_KEY and MODEL_NAME:

Check each repository's README for detailed setup instructions.

API Reference

Initialization

Automatic configuration (Recommended):

init(personalKey: String, name: String)

Automatically downloads and initializes the model with default settings optimized for the device.

Custom configuration (Advanced):

init(
    personalKey: String,
    name: String,
    version: String? = null,
    modelMode: LLMModelMode,
    dataSetType: LLMDataSetType,
    kvCacheCleanupPolicy: LLMKVCacheCleanupPolicy = CLEAN_UP_ON_FULL,
    onProgress: ((Float) -> Unit)? = null
)

Context Management

MethodDescription
run(prompt)Starts a conversation with the provided prompt. Returns LLMRunResult.
waitForNextToken()Returns the next generated token. Empty string indicates completion.
cleanUp()Cleans up the context of the running model.

Next Steps