Performance-Adaptive Deployment

Melange provides the best user experience by benchmarking the performance of AI models on a pool of real-world devices. It benchmarks different processors from various manufacturers, including CPU, GPU, and NPU. Based on these results, Melange ensures optimal performance on the deployed user target device, regardless of the device type.

Objective

Guarantee optimized target library installation

Melange ensures Deployment-Ready Availability by rigorously validating performance across a global cluster of physical devices. We do not rely on heuristic rules or theoretical specs. Instead, we perform On-Target Performance Measurement to empirically determine the optimal model for every single device.

Performance-Based, Not Rule-Based

Traditional deployment uses static rules (e.g., "Use GPU if version > X"). This often fails due to driver fragmentation and thermal throttling.

Melange is different. We establish ground truth by measuring:

Actual Latency

Millisecond-precision inference time measured on physical devices.

Throughput

Real-world tokens/frames per second capacity.

Stability

Continuous execution reliability under thermal stress.

Based on this data, we identify the specific model binary that yields the highest performance for each specific device model.

Global Deployment Assurance

By testing against the fragmented landscape of Android and iOS hardware, we guarantee:

Guaranteed Runtime Compatibility

Your model is rigorously verified to load and execute correctly on every fragmentation of Android and iOS targets.

Adaptive Binary Selection

The runtime dynamically resolves the exact quantized binary that yields maximum throughput for the specific NPU chipset.

Optimal Deployment Strategy

Deployment decisions are governed by deterministic benchmark data from our device farm, eliminating theoretical guesswork.

Validation Workflow

Provision Test Environment

We instantiate an isolated, on-device runtime environment mirroring the target OS and hardware configuration.

Distributed Workload Execution

The compilation artifacts, model metadata, and test vectors are dispatched to a distributed device farm. We execute the model on over 200 physical devices to capture real-world metrics.

Telemetry Analysis & Winner Selection

We aggregate the performance data to select the "Winning Model" for each device identifier.

YOLOv11 Benchmark Results

Device	SoC Manufacturer	CPU	GPU	NPU	Remarks
Samsung Galaxy A34	MediaTek	172.08 ms	96.38 ms	249.41 ms	x1.79
Samsung Galaxy S22 5G	Qualcomm	79.76 ms	36.99 ms	8 ms	x9.97
Samsung Galaxy S23	Qualcomm	89.56 ms	27.5 ms	5.24 ms	x17.09
Samsung Galaxy S24+	Qualcomm	60.43 ms	21.46 ms	3.92 ms	x15.42
Samsung Galaxy S25	Qualcomm	53.69 ms	17.22 ms	3.72 ms	x14.43
Apple iPhone 12	Apple	123.12 ms	22.73 ms	3.51 ms	x35.08
Apple iPhone 14	Apple	111.29 ms	15.75 ms	3.75 ms	x29.68
Apple iPhone 15 Pro Max	Apple	96.36 ms	7.72 ms	2.05 ms	x47.00
Apple iPhone 16	Apple	102.09 ms	7.9 ms	1.9 ms	x53.73

Source: Original Benchmark Report

Automatic Distribution

When a user installs your app, the Melange Runtime automatically fetches the "Winning Model" for their device. This creates a seamless, high-performance experience without any manual configuration from the developer.

Advanced Telemetry Report (Premium)

We execute profiling for all users to guarantee the best performance of the On-device AI app. However, detailed profiling results are currently available for Starter users only.

Please contact us for more information.