Android

NPU Acceleration, Simplified.

Implementing NPU acceleration on Android typically involves handling fragmentated hardware drivers, complex JNI bridges, and manual memory management.

Melange abstracts this entire stack. You don't need to write a single line of C++ or OpenCL. Just import our library, and we handle the heterogeneous hardware orchestration for you.

Prerequisites

Before you begin, make sure you have:

Model Key: Prepare using the Web Dashboard (SaaS) or CLI method
Personal Key: Prepare using the Web Dashboard (SaaS) method

Step-by-step Guide

Add Melange Dependency

Integrate the Melange AAR (Android Archive) which contains the Unified HAL for Android devices.

Native Library Handling

We enable useLegacyPackaging to ensure our C++ NPU drivers (JNI) are correctly bundled without compression, optimizing load times.

build.gradle

    android {
        ...
        packagingOptions {
            jniLibs {
                useLegacyPackaging true
            }
        }
    }

    dependencies {
        implementation 'com.zeticai.mlange:mlange:+'
    }

build.gradle.kts

    android {
        ...
        packaging {
            jniLibs {
                useLegacyPackaging = true
            }
        }
    }

    dependencies {
        implementation("com.zeticai.mlange:mlange:+")
    }

Initialize and Run

Initialize the model to trigger the Zero-Copy Model Loader, which maps your model directly to NPU memory for instant readiness.

// (1) Load  model
// This handles model download (if needed) and NPU context creation
val model = ZeticMLangeModel(CONTEXT, PERSONAL_KEY, MODEL_NAME)

// (2) Prepare model inputs
// Ensure input shapes match your model's requirement (e.g., Float32 arrays)
val inputs: Array<Tensor> = // Prepare your inputs

// (3) Run Inference
// Executes the fully automated hardware graph.
// No manual delegate configuration or memory syncing required.
val outputs = model.run(inputs)

// (1) Load model
// This handles model download (if needed) and NPU context creation
ZeticMLangeModel model = new ZeticMLangeModel(CONTEXT, PERSONAL_KEY, MODEL_NAME);

// (2) Prepare model inputs
// Ensure input shapes match your model's requirement (e.g., Float32 arrays)
Tensor[] inputs = // Prepare your inputs;

// (3) Run Inference
// Executes the hardware-accelerated graph. This is a blocking call.
Tensor[] outputs = model.run(inputs);

Sample Application

Please refer to ZETIC Melange Apps repository for complete sample applications and more details.