Custom Preprocessing

Most AI models require input preprocessing before inference: resizing images, normalizing pixel values, tokenizing text, or converting audio samples. This guide covers common preprocessing patterns for ZETIC Melange.

Image Preprocessing

Vision models typically expect inputs in a specific format (e.g., [1, 3, 640, 640] in NCHW layout with normalized pixel values).

import android.graphics.Bitmap

fun preprocessImage(bitmap: Bitmap, targetWidth: Int, targetHeight: Int): FloatArray {
    // Resize the image
    val resized = Bitmap.createScaledBitmap(bitmap, targetWidth, targetHeight, true)

    // Convert to float array with normalization (0.0 to 1.0)
    val pixels = IntArray(targetWidth * targetHeight)
    resized.getPixels(pixels, 0, targetWidth, 0, 0, targetWidth, targetHeight)

    val floatArray = FloatArray(3 * targetWidth * targetHeight)
    for (i in pixels.indices) {
        val pixel = pixels[i]
        // NCHW layout: separate R, G, B channels
        floatArray[i] = ((pixel shr 16) and 0xFF) / 255.0f  // R
        floatArray[i + pixels.size] = ((pixel shr 8) and 0xFF) / 255.0f   // G
        floatArray[i + 2 * pixels.size] = (pixel and 0xFF) / 255.0f        // B
    }

    return floatArray
}

import UIKit
import CoreGraphics

func preprocessImage(_ image: UIImage, targetSize: CGSize) -> [Float] {
    // Resize the image
    UIGraphicsBeginImageContextWithOptions(targetSize, false, 1.0)
    image.draw(in: CGRect(origin: .zero, size: targetSize))
    let resized = UIGraphicsGetImageFromCurrentImageContext()!
    UIGraphicsEndImageContext()

    // Convert to float array with normalization
    guard let cgImage = resized.cgImage else { return [] }
    let width = Int(targetSize.width)
    let height = Int(targetSize.height)
    var pixelData = [UInt8](repeating: 0, count: width * height * 4)

    let context = CGContext(
        data: &pixelData, width: width, height: height,
        bitsPerComponent: 8, bytesPerRow: width * 4,
        space: CGColorSpaceCreateDeviceRGB(),
        bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue
    )
    context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))

    // Normalize to 0.0-1.0
    var floatArray = [Float](repeating: 0, count: 3 * width * height)
    for i in 0..<(width * height) {
        floatArray[i] = Float(pixelData[i * 4]) / 255.0       // R
        floatArray[i + width * height] = Float(pixelData[i * 4 + 1]) / 255.0  // G
        floatArray[i + 2 * width * height] = Float(pixelData[i * 4 + 2]) / 255.0  // B
    }

    return floatArray
}

import 'dart:typed_data';

Float32List preprocessImageRgba(
  Uint8List rgbaPixels,
  int targetWidth,
  int targetHeight,
) {
  final planeSize = targetWidth * targetHeight;
  final values = Float32List(3 * planeSize);

  for (var i = 0; i < planeSize; i++) {
    final rgbaOffset = i * 4;
    values[i] = rgbaPixels[rgbaOffset] / 255.0;
    values[i + planeSize] = rgbaPixels[rgbaOffset + 1] / 255.0;
    values[i + 2 * planeSize] = rgbaPixels[rgbaOffset + 2] / 255.0;
  }

  return values;
}

Different models expect different input formats. Some models use NCHW layout (batch, channels, height, width) while others use NHWC (batch, height, width, channels). Check your model's specification.

Audio Preprocessing

Audio models like Whisper typically expect specific sample rates and formats.

# Example: preparing audio input for upload
import numpy as np
import librosa

# Load and resample audio to 16kHz
audio, sr = librosa.load("audio.wav", sr=16000)

# Convert to numpy and save
np_audio = audio.astype(np.float32)
np.save("audio_input.npy", np_audio)

Common Preprocessing Patterns

Model Type	Common Preprocessing
Image classification	Resize, normalize (0-1 or ImageNet mean/std), NCHW layout
Object detection	Resize to fixed size, normalize, add batch dimension
Audio classification	Resample to target rate, convert to mel spectrogram
Text / NLP	Tokenize, pad to fixed length, create attention masks

Feature Extractor Modules

For common model architectures, Melange provides feature extractor modules that handle preprocessing automatically. See the ZETIC Melange Apps repository for examples.

Next Steps

Multi-Model Pipelines: Chain models together
Basic Inference (Android): Android inference guide
Basic Inference (iOS): iOS inference guide