Audio Classification (YAMNet)
Build on-device AI audio classification applications with ZETIC.MLange
What is YAMNet?
YAMNet is a deep neural network that predicts audio events from the AudioSet-YouTube corpus.
- Trained on the AudioSet dataset with 521 audio event classes
- Model on Tensorflow Hub: Tensorflow Hub
Step-by-step Implementation
Prerequisites
Prepare the YAMNet model from TensorFlow Hub or Hugging Face and convert it to ONNX format.
Convert YAMNet to ONNX:
import tensorflow as tf
import tensorflow_hub as hub
import tf2onnx
import numpy as np
model = hub.load('https://tfhub.dev/google/yamnet/1')
concrete_func = model.signatures['serving_default']
input_shape = [1, 16000]
sample_input = np.random.randn(*input_shape).astype(np.float32)
input_tensor = tf.convert_to_tensor(waveform, dtype=tf.float32)
tf.saved_model.save(
model,
"yamnet_saved_model",
signatures=concrete_func
)
# Now use tf2onnx command line
# python -m tf2onnx.convert --saved-model yamnet_saved_model --output yamnet.onnx --opset 13Prepare sample input:
import numpy as np
sample_rate = 16000
duration = 1 # 1 second
waveform = np.sin(2 * np.pi * 440 * np.linspace(0, duration, sample_rate))
waveform = waveform.astype(np.float32)
waveform = np.expand_dims(waveform, axis=0)
np.save('waveform.npy', waveform)Generate ZETIC.MLange Model
If you want to generate your own model, you can upload the model and input with MLange Dashboard,
or use CLI:
zetic gen -p $PROJECT_NAME -i waveform.npy yamnet.onnxImplement ZeticMLangeModel
We provide a model key for the demo app: yamnet. You can use this model key to try the ZETIC.MLange Application.
For detailed application setup, please follow the Deploy to Android Studio guide.
val yamnetModel = ZeticMLangeModel(this, "yamnet")
yamnetModel.run(inputs)
val outputs = yamnetModel.outputBuffersFor detailed application setup, please follow the Deploy to Xcode guide.
let yamnetModel = ZeticMLangeModel("yamnet")
yamnetModel.run(inputs)
let outputs = yamnetModel.getOutputDataArray()Prepare Audio feature extractor
We provide an Audio Feature Extractor as an Android and iOS module.
// (1) Preprocess audio data and get processed float array
val inputs = preprocess(audioData)
// ... run model ...
// (2) Postprocess model outputs
val results = postprocess(outputs) import ZeticMLange
// (1) Preprocess audio data and get processed float array
let inputs = preprocess(audioData)
// ... run model ...
// (2) Postprocess model outputs
let results = postprocess(&outputs)Complete Audio Classification Implementation
// (0) Initialize model
val yamnetModel = ZeticMLangeModel(this, "yamnet")
// (1) Preprocess audio
val inputs = preprocess(audioData)
// (2) Run model
yamnetModel.run(inputs)
val outputs = yamnetModel.outputBuffers
// (3) Postprocess results
val predictions = postprocess(outputs) // (0) Initialize model
let yamnetModel = ZeticMLangeModel("yamnet")
// (1) Preprocess audio
let inputs = preprocess(audioData)
// (2) Run model
yamnetModel.run(inputs)
let outputs = yamnetModel.getOutputDataArray()
// (3) Postprocess results
let predictions = postprocess(&outputs)Conclusion
With ZETIC.MLange, implementing on-device audio classification with NPU acceleration is straightforward and efficient. YAMNet provides robust audio event detection capabilities across a wide range of categories. The simple pipeline of audio preprocessing and classification makes it easy to integrate into your applications.
We're continuously adding new models to our examples and HuggingFace page.
Stay tuned, and contact us for collaborations!