Giter Site home page Giter Site logo

adobe-research / convmelspec Goto Github PK

View Code? Open in Web Editor NEW
120.0 11.0 8.0 186 KB

Convmelspec: Convertible Melspectrograms via 1D Convolutions

License: Apache License 2.0

Python 100.00%
ai audio conversion coreml ml on-device onnx coremltools mil spectrograms

convmelspec's Introduction

Convmelspec: Convertible Melspectrograms via 1D Convolutions

Melspectrogram
Convertible melspectrograms for ONNX and CoreML.

About

For a large class of audio neural network models, a Mel-scaled short-time Fourier transform or Melspectrogram operator is needed. The Melspectrogram operator is not typically implemented in on-device machine learning frameworks such CoreML (and previously ONNX), however, which significantly complicates the cross-platform deployment of audio machine learning models. To mitigate this, here we reuse standardized and interoperable neural network operators to implement a convertible Melspectrogram by implementing the short-time Fourier transform (STFT) via 1D convolutions.

Beyond basic functionality (known to many), however, we offer an ability to trade-off module storage size and inference speed. To do so, we provide three modes of how we compute the discrete Fourier transform (DFT) matrix needed for the STFT: store, input, and on-the-fly. Our store mode precomputed the DFT matrix and stores it directly in your model file (fastest inference, larger model, easy), our input mode assumes the DFT matrix is provided as an input parameter to your model (fast inference speed, small model, hard), and our on-the-fly model dynamically constructs the DFT matrix at inference time (slower inference, small model, easy). Our module also can be used as a pass-through to torchaudio for training and then converted to DFT mode for conversion and is setup to be compatible to the recent native ONNX stft that still requires a custom compilation setup. Further, we also show how to convert the native torchaudio melspectrogram layers via CoreML model intermediate language ops directly.

In total, we implement Melspectrograms in a standardized cross-platform way with minimal impact on model size and reasonble speed. Try it out, let us know how it goes, and submit PRs to fix!

Setup

  • Create new python environment via pip or conda
conda create -n convmelspec python=3.9 -y
conda activate convmelspec
  • Install the source code
# Install editable from source
cd <convmelspec>

# Install as editable (for developers)
pip install -e .

# Alterntatively, install read-only
pip install .

Usage

The easiest way to convert your own PyTorch models to ONNX and CoreML is to use our custom ConvertibleSpectrogram module within your model as opposed to directly using torchaudio. Once you do this, you can then export to ONNX or CoreML with a few lines of code. Internally, will use torchaudio directly or implement the required short-time Fourier transform operations using 1D convolutions, depending on the mode of operation. For CoreML, we further show how you can use CoreML's Model Intermediate Language (MIL) to implement the short-time Fourier transform (again using 1D convs) and not need to use our layer at all.

import torch
import librosa
import numpy as np
from convmelspec.stft import ConvertibleSpectrogram as Spectrogram
import coremltools as ct

# Load an example audio file
x = torch.zeros(1, 16000)

# Create the layer
melspec = Spectrogram(
    sr=sr,
    n_fft=1024,
    hop_size=512,
    n_mel=64,
)

# Switch to eval for inference and conversion
melspec.eval()

Training

For training, we recommend you create and use the layer in torchaudio mode. Once complete, you can change the mode of the layer to one of the other options that convert to ONNX and CoreML.

Convert to ONNX

To convert your model to ONNX, you can use the built-in PyTorch onnx export function.


# Set the export mode (pick one)
melspec.set_mode("DFT", "input")
melspec.set_mode("DFT", "store")
melspec.set_mode("DFT", "on_the_fly")

# Export to ONNX
output_path = '/tmp/melspec.onnx'
torch.onnx.export(melspec, x, output_path)

Convert to ONNX with Opset 17

The ONNX standard and runtime have added support for an STFT operator and related functionality (e.g. pytorch/audio#982). As noted, however, PyTorch itself does not yet support exporting with opset 17, so a custom build of PyTorch is required (this works, but not yet documented here).

Convert to CoreML

To convert your model to CoreML, you can use the coremltools Python package


# Export to CoreML
output_path = '/tmp/melspec.mlmodel'

# To reduce the size of the exported CoreML model (tradeoff with speed)
pipeline = ct.PassPipeline()
pipeline.set_options("common::const_elimination", {"skip_const_by_size": "1e6"})

# Trace the model
traced_model = torch.jit.trace(melspec, x)

# Convert traced model to CoreML
input_tensors = [ct.TensorType(name="input", shape=(x.shape))]
mlmodel = ct.convert(model=traced_model,
                     inputs=input_tensors,
                     compute_units=ct.ComputeUnit.ALL,
                     minimum_deployment_target=None,
                     pass_pipeline=pipeline)

# Save to disk
mlmodel.save(output_path)

Convert to CoreML via MIL

In addition to using our PyTorch layer to convert to CoreML, we also provide an example of how to use native torchaudio melspectrogram together with CoreMLTools model intermediate language (MIL) operators for conversion. To do this, please see the example below and corresponding unit tests.

The MIL implementation is provided as an illustrative example, but should not regularly be used in favor of the native STFT conversion implementation provided in coremltools.

import torchaudio

output_path = '/tmp/melspec-mil.mlmodel'

# Use native torchaudio melspec + CoreMLTools MIL
melspec = torchaudio.transforms.MelSpectrogram(
                sample_rate=16000,
                n_fft=1024,
                hop_length=512,
                power=2.0)

# Trace model
traced_model = torch.jit.trace(melspec, x)

# Convert traced model to CoreML
input_tensors = [ct.TensorType(name="input", shape=(x.shape))]
mlmodel = ct.convert(model=traced_model,
                     inputs=input_tensors,
                     compute_units=ct.ComputeUnit.ALL,
                     minimum_deployment_target=None)

# Save to disk
mlmodel.save(output_path)

Unit test

To run our unit tests and inspect code examples for each mode of operation per platform, please see below.

cd <convmelspec>

python -m unittest discover tests

License and Citation

This code is licensed under an Apache 2.0 license. If you use code from this work for academic publications, pleace cite our repo!:

@misc{convmelspec,
  author = {Nicholas J. Bryan, Oriol Nieto, Juan-Pablo Caceres},
  title = {Convmelspec: Melspectrograms for On-Device Audio Machine Learning},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{http://github.com/adobe-research/convmelspec}},
}

Authors

Contributors include Nicholas J. Bryan, Oriol Nieto, and Juan-Pablo Caceres.

convmelspec's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convmelspec's Issues

Does the ONNX Variant allow variable input length

Hi Guys, thank you for the library. I am running into this error after I exported the onnx file with 16000 length. However the audio I wanna process can be of variable length.

Cell In[23], [line 18](vscode-notebook-cell:?execution_count=23&line=18)
     [15](vscode-notebook-cell:?execution_count=23&line=15) encoder_path = "melspec.onnx"
     [16](vscode-notebook-cell:?execution_count=23&line=16) encoder_session = ort.InferenceSession(encoder_path)
---> [18](vscode-notebook-cell:?execution_count=23&line=18) features = encoder_session.run(
     [19](vscode-notebook-cell:?execution_count=23&line=19)     None,
     [20](vscode-notebook-cell:?execution_count=23&line=20)         {
     [21](vscode-notebook-cell:?execution_count=23&line=21)             "input": input_signal.numpy(),
     [22](vscode-notebook-cell:?execution_count=23&line=22)         },
     [23](vscode-notebook-cell:?execution_count=23&line=23) )

File [/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220), in Session.run(self, output_names, input_feed, run_options)
    [218](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:218)     output_names = [output.name for output in self._outputs_meta]
    [219](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:219) try:
--> [220](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:220)     return self._sess.run(output_names, input_feed, run_options)
    [221](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:221) except C.EPFail as err:
    [222](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/nemo/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:222)     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: input for the following indices
 index: 1 Got: 1536768 Expected: 16000
 Please fix either the inputs or the model.

Spectrograms Not Equal When Running Tests

Hi, thanks for the repo, seems to be promising. We have run some tests with your test file and got:

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0.0001
(shapes (513, 30), (513, 1, 14977) mismatch)
 x: array([[4.878030e-09, 4.919218e-07, 3.511721e-04, ..., 1.196331e-01,
        3.467581e-02, 9.270541e-03],
       [1.071588e-08, 1.510490e-06, 3.254034e-04, ..., 9.578773e-02,...
 y: array([[[4.892740e-09, 5.184347e-09, 5.963557e-09, ..., 1.972358e-02,
         1.956094e-02, 1.939876e-02]],

This is just for the spectrograms --not currently testing mel_specs. We are testing test_melpec_vs_torchaudio().
Any ideas on the possible reasons?

UPDATE: similar outcome with test_melpec_vs_librosa()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.