Giter Site home page Giter Site logo

teapoly / spectrumaugmenter Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 177 KB

Performs data augmentation as according to the SpecAugment paper. Modified from Lingvo (TensorFlow > 1.10.0).

Python 100.00%
tensorflow specaugment lingvo asr spectrumaugmenter

spectrumaugmenter's Introduction

SpectrumAugmenter

Performs data augmentation as according to the SpecAug paper. Modified from Lingvo.

Modified from Lingvo, test audio file is selected from Sound Examples.

Requirements

  • TensorFlow

For visualizing (option)

  • matplotlib
  • librosa
  • numpy

How to use

from __future__ import absolute_import, division, print_function

import librosa
import tensorflow as tf

from spectrum_augmenter import SpectrumAugmenter


if __name__ == '__main__':
    # Load an audio file as a floating point time series.
    audio, sampling_rate = librosa.load("test.wav")

    # Compute a mel-scaled spectrogram.
    mel_spectrogram = librosa.feature.melspectrogram(y=audio,
                                                     sr=sampling_rate,
                                                     n_mels=256,
                                                     hop_length=128,
                                                     fmax=8000)

    # (frequecy, time) -> (time, frequecy)
    mel_spectrogram = mel_spectrogram.transpose()

    # Inserts a dimension of 1 into a tensor's shape. 
    # (time, frequecy) -> (batch_size, time, frequecy)
    mel_spectrogram = mel_spectrogram.reshape(
        (1, mel_spectrogram.shape[0], mel_spectrogram.shape[1]))

    config = dict(
        # Maximum number of frequency bins of frequency masking.
        freq_mask_max_bins=30,
        # # Number of times we apply masking on the frequency axis.
        freq_mask_count=2,
        # Maximum number of frames of time masking. Overridden when use_dynamic_time_mask_max_frames = True.
        time_mask_max_frames=40,
        # Number of times we apply masking on the time axis. Acts as upper-bound when time_masks_per_frame > 0.
        time_mask_count=2,
        # Maximum number of frames for shifting in time warping.
        time_warp_max_frames=80,
    )

    specaug = SpectrumAugmenter(config)

    # (batch_size, time, frequecy)
    warped_masked_spectrogram = specaug(
        tf.convert_to_tensor(mel_spectrogram),
        tf.convert_to_tensor([mel_spectrogram.shape[0]]) # seq_len
    )

Reference

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

spectrumaugmenter's People

Contributors

teapoly avatar

Watchers

 avatar

Forkers

runngezhang-jx

spectrumaugmenter's Issues

regenerate the wav file after the augmentation

Hi,
I'm trying to extend this code and regenerate the wav file from the spectrogram after the augmentation. I added the below line at line 85
wav=librosa.feature.inverse.mel_to_audio(warped_masked_spectrogram)

However, I'm getting the following error .. any idea how this can be fixed?
Traceback (most recent call last): File "optuna.py", line 512, in <module> main(sys.argv[1:]) File "optuna.py", line 427, in main X_train, y_train , X_test , y_test = create_dataset(path) File "optuna.py", line 76, in create_dataset extract_features(wav, cls, model, samples , labels , aug_samples , aug_labels ) File "optuna.py", line 366, in extract_features wav=librosa.feature.inverse.mel_to_audio(warped_masked_spectrogram) File "C:\Users\ash_j\anaconda3\envs\yamnet\lib\site-packages\librosa\feature\inverse.py", line 172, in mel_to_audio stft = mel_to_stft(M, sr=sr, n_fft=n_fft, power=power, **kwargs) File "C:\Users\ash_j\anaconda3\envs\yamnet\lib\site-packages\librosa\feature\inverse.py", line 83, in mel_to_stft mel_basis = filters.mel(sr, n_fft, n_mels=M.shape[0], dtype=M.dtype, **kwargs) File "C:\Users\ash_j\anaconda3\envs\yamnet\lib\site-packages\librosa\filters.py", line 209, in mel weights = np.zeros((n_mels, int(1 + n_fft // 2)), dtype=dtype) TypeError: Cannot interpret 'tf.float32' as a data type

Tensorflow 2

Hello,
The current code doesn't work on tf2. Is there any chance of updating it?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.