Melange
Introduction

Key Concepts

Essential terminology and concepts for working with ZETIC Melange.

This page defines the core terms and concepts you will encounter when working with ZETIC Melange.

Model Key

A unique identifier for a deployed model on the Melange platform. Model keys follow the format owner/model-name (e.g., google/MediaPipe-Face-Detection, OpenAI/whisper-tiny-encoder). When you upload a model through the dashboard or CLI, Melange assigns it a model key that you reference in your mobile application code.

Personal Key

An authentication token that identifies your Melange account. You generate a Personal Key from the Melange Dashboard and use it when initializing models in your Android or iOS application. The Personal Key controls access to the models associated with your account.

Keep your Personal Key secure. Do not commit it to public repositories or embed it in client-side code that can be easily decompiled.

Inference Mode

Melange supports multiple inference modes that balance speed and accuracy for your deployed models.

General Model Modes

ModeDescriptionBest For
RUN_AUTO (Default)Automatically selects the fastest configuration while maintaining high accuracy (SNR > 20dB)Most use cases
RUN_SPEEDMaximizes inference speed with minimum latencyReal-time applications where response time is the top priority
RUN_ACCURACYDelivers the highest precision based on maximum SNR scoresApplications where accuracy is more critical than speed

LLM Model Modes

LLM models have their own set of inference modes optimized for token generation on different hardware backends. See the LLM Inference Modes documentation for details.

Performance-Adaptive Deployment

Melange does not rely on static rules (e.g., "use GPU if version > X") to determine which model binary to serve. Instead, it performs on-target performance measurement by running your model on a farm of 200+ physical devices covering a wide range of chipsets and OS versions.

Based on actual latency, throughput, and stability measurements, Melange selects the optimal model binary for each specific device model. When a user installs your app, the Melange runtime automatically fetches the best-performing binary for their device.

For details, see Performance-Adaptive Deployment.

ZeticMLangeModel

The primary SDK class for running general AI models (computer vision, audio, etc.) on-device. It handles model download, NPU initialization, and inference execution.

val model = ZeticMLangeModel(context, PERSONAL_KEY, MODEL_NAME)
val outputs = model.run(inputs)
let model = try ZeticMLangeModel(personalKey: PERSONAL_KEY, name: MODEL_NAME)
let outputs = try model.run(inputs)

ZeticMLangeLLMModel

A specialized SDK class for running Large Language Models on-device with token streaming. It manages the full LLM lifecycle including KV-cache management, token generation, and multi-backend orchestration.

val model = ZeticMLangeLLMModel(context, PERSONAL_KEY, MODEL_NAME)
model.run("prompt")

while (true) {
    val result = model.waitForNextToken()
    if (result.generatedTokens == 0) break
    print(result.token)
}
let model = try ZeticMLangeLLMModel(personalKey: PERSONAL_KEY, name: MODEL_NAME)
try model.run("prompt")

while true {
    let result = model.waitForNextToken()
    if result.generatedTokens == 0 { break }
    print(result.token)
}

ZeticMLangeHFModel

A convenience class for loading models directly from Hugging Face using only a repository ID. No personal key or model key is required. The SDK automatically downloads, compiles, and caches the model on first use.

val model = ZeticMLangeHFModel(context, "zetic-ai/yolov11n")
val outputs = model.run(arrayOf(inputTensor))
let model = try await ZeticMLangeHFModel("zetic-ai/yolov11n")
let outputs = try model.run(inputs: [inputTensor])

See Hugging Face Models for details.

Global Device Benchmark

Melange maintains a continuously updated benchmark database spanning 200+ physical devices across Android and iOS. This benchmark farm covers:

  • Chipset vendors: Qualcomm Snapdragon, MediaTek Dimensity, Samsung Exynos, Apple A-series and M-series
  • Processing units: CPU, GPU, and NPU on each device
  • Device tiers: Flagship, mid-range, and budget devices

The benchmark data drives the Performance-Adaptive Deployment system, ensuring every end user receives the best-performing model binary for their specific hardware.

Model Pipeline

Some applications require chaining multiple models together. For example, face landmark detection uses a two-model pipeline:

  1. Face Detection: Locates faces in the image
  2. Face Landmark: Extracts landmark points from the detected face region

Each model in the pipeline is a separate ZeticMLangeModel instance, and you pass the output of one model as input to the next. See the Face Landmark tutorial for a complete example.

Next Steps