Melange
How-To Guides

Custom Preprocessing

Implement custom input preprocessing for your Melange models.

Most AI models require input preprocessing before inference: resizing images, normalizing pixel values, tokenizing text, or converting audio samples. This guide covers common preprocessing patterns for ZETIC Melange.

Image Preprocessing

Vision models typically expect inputs in a specific format (e.g., [1, 3, 640, 640] in NCHW layout with normalized pixel values).

import android.graphics.Bitmap

fun preprocessImage(bitmap: Bitmap, targetWidth: Int, targetHeight: Int): FloatArray {
    // Resize the image
    val resized = Bitmap.createScaledBitmap(bitmap, targetWidth, targetHeight, true)

    // Convert to float array with normalization (0.0 to 1.0)
    val pixels = IntArray(targetWidth * targetHeight)
    resized.getPixels(pixels, 0, targetWidth, 0, 0, targetWidth, targetHeight)

    val floatArray = FloatArray(3 * targetWidth * targetHeight)
    for (i in pixels.indices) {
        val pixel = pixels[i]
        // NCHW layout: separate R, G, B channels
        floatArray[i] = ((pixel shr 16) and 0xFF) / 255.0f  // R
        floatArray[i + pixels.size] = ((pixel shr 8) and 0xFF) / 255.0f   // G
        floatArray[i + 2 * pixels.size] = (pixel and 0xFF) / 255.0f        // B
    }

    return floatArray
}
import UIKit
import CoreGraphics

func preprocessImage(_ image: UIImage, targetSize: CGSize) -> [Float] {
    // Resize the image
    UIGraphicsBeginImageContextWithOptions(targetSize, false, 1.0)
    image.draw(in: CGRect(origin: .zero, size: targetSize))
    let resized = UIGraphicsGetImageFromCurrentImageContext()!
    UIGraphicsEndImageContext()

    // Convert to float array with normalization
    guard let cgImage = resized.cgImage else { return [] }
    let width = Int(targetSize.width)
    let height = Int(targetSize.height)
    var pixelData = [UInt8](repeating: 0, count: width * height * 4)

    let context = CGContext(
        data: &pixelData, width: width, height: height,
        bitsPerComponent: 8, bytesPerRow: width * 4,
        space: CGColorSpaceCreateDeviceRGB(),
        bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue
    )
    context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))

    // Normalize to 0.0-1.0
    var floatArray = [Float](repeating: 0, count: 3 * width * height)
    for i in 0..<(width * height) {
        floatArray[i] = Float(pixelData[i * 4]) / 255.0       // R
        floatArray[i + width * height] = Float(pixelData[i * 4 + 1]) / 255.0  // G
        floatArray[i + 2 * width * height] = Float(pixelData[i * 4 + 2]) / 255.0  // B
    }

    return floatArray
}

Different models expect different input formats. Some models use NCHW layout (batch, channels, height, width) while others use NHWC (batch, height, width, channels). Check your model's specification.

Audio Preprocessing

Audio models like Whisper typically expect specific sample rates and formats.

# Example: preparing audio input for upload
import numpy as np
import librosa

# Load and resample audio to 16kHz
audio, sr = librosa.load("audio.wav", sr=16000)

# Convert to numpy and save
np_audio = audio.astype(np.float32)
np.save("audio_input.npy", np_audio)

Common Preprocessing Patterns

Model TypeCommon Preprocessing
Image classificationResize, normalize (0-1 or ImageNet mean/std), NCHW layout
Object detectionResize to fixed size, normalize, add batch dimension
Audio classificationResample to target rate, convert to mel spectrogram
Text / NLPTokenize, pad to fixed length, create attention masks

Feature Extractor Modules

For common model architectures, Melange provides feature extractor modules that handle preprocessing automatically. See the ZETIC Melange Apps repository for examples.


Next Steps