Custom Preprocessing
Implement custom input preprocessing for your Melange models.
Most AI models require input preprocessing before inference: resizing images, normalizing pixel values, tokenizing text, or converting audio samples. This guide covers common preprocessing patterns for ZETIC Melange.
Image Preprocessing
Vision models typically expect inputs in a specific format (e.g., [1, 3, 640, 640] in NCHW layout with normalized pixel values).
import android.graphics.Bitmap
fun preprocessImage(bitmap: Bitmap, targetWidth: Int, targetHeight: Int): FloatArray {
// Resize the image
val resized = Bitmap.createScaledBitmap(bitmap, targetWidth, targetHeight, true)
// Convert to float array with normalization (0.0 to 1.0)
val pixels = IntArray(targetWidth * targetHeight)
resized.getPixels(pixels, 0, targetWidth, 0, 0, targetWidth, targetHeight)
val floatArray = FloatArray(3 * targetWidth * targetHeight)
for (i in pixels.indices) {
val pixel = pixels[i]
// NCHW layout: separate R, G, B channels
floatArray[i] = ((pixel shr 16) and 0xFF) / 255.0f // R
floatArray[i + pixels.size] = ((pixel shr 8) and 0xFF) / 255.0f // G
floatArray[i + 2 * pixels.size] = (pixel and 0xFF) / 255.0f // B
}
return floatArray
}import UIKit
import CoreGraphics
func preprocessImage(_ image: UIImage, targetSize: CGSize) -> [Float] {
// Resize the image
UIGraphicsBeginImageContextWithOptions(targetSize, false, 1.0)
image.draw(in: CGRect(origin: .zero, size: targetSize))
let resized = UIGraphicsGetImageFromCurrentImageContext()!
UIGraphicsEndImageContext()
// Convert to float array with normalization
guard let cgImage = resized.cgImage else { return [] }
let width = Int(targetSize.width)
let height = Int(targetSize.height)
var pixelData = [UInt8](repeating: 0, count: width * height * 4)
let context = CGContext(
data: &pixelData, width: width, height: height,
bitsPerComponent: 8, bytesPerRow: width * 4,
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue
)
context?.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
// Normalize to 0.0-1.0
var floatArray = [Float](repeating: 0, count: 3 * width * height)
for i in 0..<(width * height) {
floatArray[i] = Float(pixelData[i * 4]) / 255.0 // R
floatArray[i + width * height] = Float(pixelData[i * 4 + 1]) / 255.0 // G
floatArray[i + 2 * width * height] = Float(pixelData[i * 4 + 2]) / 255.0 // B
}
return floatArray
}Different models expect different input formats. Some models use NCHW layout (batch, channels, height, width) while others use NHWC (batch, height, width, channels). Check your model's specification.
Audio Preprocessing
Audio models like Whisper typically expect specific sample rates and formats.
# Example: preparing audio input for upload
import numpy as np
import librosa
# Load and resample audio to 16kHz
audio, sr = librosa.load("audio.wav", sr=16000)
# Convert to numpy and save
np_audio = audio.astype(np.float32)
np.save("audio_input.npy", np_audio)Common Preprocessing Patterns
| Model Type | Common Preprocessing |
|---|---|
| Image classification | Resize, normalize (0-1 or ImageNet mean/std), NCHW layout |
| Object detection | Resize to fixed size, normalize, add batch dimension |
| Audio classification | Resample to target rate, convert to mel spectrogram |
| Text / NLP | Tokenize, pad to fixed length, create attention masks |
Feature Extractor Modules
For common model architectures, Melange provides feature extractor modules that handle preprocessing automatically. See the ZETIC Melange Apps repository for examples.
Next Steps
- Multi-Model Pipelines: Chain models together
- Basic Inference (Android): Android inference guide
- Basic Inference (iOS): iOS inference guide