Architecture & Workflow

The comprehensive pipeline for model acquisition, provisioning, and runtime integration

Model Ingestion & Analysis

Melange natively supports industry-standard formats (ONNX, PyTorch Exported Program). Note that TorchScript is supported but will be deprecated soon. The pipeline begins with detailed graph analysis to determine the optimal compilation strategy for heterogeneous NPU targets.

For details, refer to Model Preparation.

Automated Infrastructure Pipeline

Eliminate manual hardware configuration. Initializing a project triggers our fully automated backend orchestration, handling the entire optimization lifecycle:

Hardware-Aware Graph Optimization: Automated operator fusion and IR lowering tailored for specific NPU architectures (HTP, DSP, ANE).
Distributed On-Device Profiling: Parallel execution on 200+ physical devices to validate optimization in the exact environment where your application is deployed.
Optimal Model Dispatch: Data-driven deployment that maps the highest-throughput binary to each user's specific SoC.

Use the Web Dashboard or CLI to start.

Unified Runtime Execution

Deploy optimized models using a single, standardized API. The Melange engine abstracts all low-level hardware execution logic—including NPU context scheduling and zero-copy memory allocation—handling it transparently at runtime.

Refer to the Android Integration Guide.

val model = ZeticMLangeModel(CONTEXT, PERSONAL_KEY, PROJECT_NAME)
val output = model.run(YOUR_INPUT_TENSORS)

Refer to the iOS Integration Guide.

let model = try ZeticMLangeModel(PERSONAL_KEY, PROJECT_NAME)
let output = try model.run(YOUR_INPUT_TENSORS)

Performance Telemetry

With the Web Dashboard, you gain deep visibility into your model's on-device behavior via centralized analytics:

Real-Time Pipeline Observability: Track the granular status of graph compilation, quantization, and NPU kernel generation across the cluster.
Device-Tier Performance Matrix: Visualize latency distributions and throughput (IPS) across high-end, mid-range, and legacy mobile SoCs.
Resource Efficiency Analysis: Monitor memory footprint and thermal stability to ensure your model meets strict production Service Level Objectives (SLOs).

Architecture & Workflow

Model Ingestion & Analysis

Automated Infrastructure Pipeline

Unified Runtime Execution

Performance Telemetry

On this page