Key Concepts
Essential terminology and concepts for working with ZETIC Melange.
This page defines the core terms and concepts you will encounter when working with ZETIC Melange.
Model Key
A unique identifier for a deployed model on the Melange platform. Model keys follow the format owner/model-name (e.g., google/MediaPipe-Face-Detection, OpenAI/whisper-tiny-encoder). When you upload a model through the dashboard or CLI, Melange assigns it a model key that you reference in your mobile application code.
Personal Key
An authentication token that identifies your Melange account. You generate a Personal Key from the Melange Dashboard and use it when initializing models in your Android or iOS application. The Personal Key controls access to the models associated with your account.
Keep your Personal Key secure. Do not commit it to public repositories or embed it in client-side code that can be easily decompiled.
Inference Mode
Melange supports multiple inference modes that balance speed and accuracy for your deployed models.
General Model Modes
| Mode | Description | Best For |
|---|---|---|
| RUN_AUTO (Default) | Automatically selects the fastest configuration while maintaining high accuracy (SNR > 20dB) | Most use cases |
| RUN_SPEED | Maximizes inference speed with minimum latency | Real-time applications where response time is the top priority |
| RUN_ACCURACY | Delivers the highest precision based on maximum SNR scores | Applications where accuracy is more critical than speed |
LLM Model Modes
LLM models have their own set of inference modes optimized for token generation on different hardware backends. See the LLM Inference Modes documentation for details.
Performance-Adaptive Deployment
Melange does not rely on static rules (e.g., "use GPU if version > X") to determine which model binary to serve. Instead, it performs on-target performance measurement by running your model on a farm of 200+ physical devices covering a wide range of chipsets and OS versions.
Based on actual latency, throughput, and stability measurements, Melange selects the optimal model binary for each specific device model. When a user installs your app, the Melange runtime automatically fetches the best-performing binary for their device.
For details, see Performance-Adaptive Deployment.
ZeticMLangeModel
The primary SDK class for running general AI models (computer vision, audio, etc.) on-device. It handles model download, NPU initialization, and inference execution.
val model = ZeticMLangeModel(context, PERSONAL_KEY, MODEL_NAME)
val outputs = model.run(inputs)let model = try ZeticMLangeModel(personalKey: PERSONAL_KEY, name: MODEL_NAME)
let outputs = try model.run(inputs)ZeticMLangeLLMModel
A specialized SDK class for running Large Language Models on-device with token streaming. It manages the full LLM lifecycle including KV-cache management, token generation, and multi-backend orchestration.
val model = ZeticMLangeLLMModel(context, PERSONAL_KEY, MODEL_NAME)
model.run("prompt")
while (true) {
val result = model.waitForNextToken()
if (result.generatedTokens == 0) break
print(result.token)
}let model = try ZeticMLangeLLMModel(personalKey: PERSONAL_KEY, name: MODEL_NAME)
try model.run("prompt")
while true {
let result = model.waitForNextToken()
if result.generatedTokens == 0 { break }
print(result.token)
}ZeticMLangeHFModel
A convenience class for loading models directly from Hugging Face using only a repository ID. No personal key or model key is required. The SDK automatically downloads, compiles, and caches the model on first use.
val model = ZeticMLangeHFModel(context, "zetic-ai/yolov11n")
val outputs = model.run(arrayOf(inputTensor))let model = try await ZeticMLangeHFModel("zetic-ai/yolov11n")
let outputs = try model.run(inputs: [inputTensor])See Hugging Face Models for details.
Global Device Benchmark
Melange maintains a continuously updated benchmark database spanning 200+ physical devices across Android and iOS. This benchmark farm covers:
- Chipset vendors: Qualcomm Snapdragon, MediaTek Dimensity, Samsung Exynos, Apple A-series and M-series
- Processing units: CPU, GPU, and NPU on each device
- Device tiers: Flagship, mid-range, and budget devices
The benchmark data drives the Performance-Adaptive Deployment system, ensuring every end user receives the best-performing model binary for their specific hardware.
Model Pipeline
Some applications require chaining multiple models together. For example, face landmark detection uses a two-model pipeline:
- Face Detection: Locates faces in the image
- Face Landmark: Extracts landmark points from the detected face region
Each model in the pipeline is a separate ZeticMLangeModel instance, and you pass the output of one model as input to the next. See the Face Landmark tutorial for a complete example.
Next Steps
- Platform Support Matrix: Check supported devices and formats
- Quick Start: Deploy your first model
- Tutorials: Step-by-step guides for specific models