Giter Site home page Giter Site logo

wsrglow's Introduction

WSRGlow

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio samples can be found here. Interactive web demo on Replicate: https://replicate.ai/zkx06111/wsrglow

Feel free to create issues or send an email to [email protected] if you have problems running the code.

Before running the code, you need to install the dependicies by pip install -r requirements.txt.

The configs for model architecture and training scheme is saved in config.yaml. You can overwrite some of the attributes by adding the --hparams flag when running a command. The general way to run a python script is

python $SRC$ --config $CONFIG$ --hparams $KEY1$=$VALUE1$,$KEY2$=$VALUE2$,...

See hparams.py for more details.

To prepare data

Before training, you need to binarize the data first. The raw wav files should be put in the hparams['raw_data_path']. The binarized data would be put in the hparams['binary_data_path'].

Specifically, for the VCTK corpus, the file structure should be like

.
|--data
    |--raw
        |--VCTK-Corpus
            |--wav48
                |--$WAVS
|--checkpoints
    |--wsrglow
    

where the model checkpoints are in checkpoints/wsrglow.

The command to binarize is

python binarizer.py --config config.yaml

To modify the architecture of the model

The current WSRGlow model in model.py is designed for x4 super-resolution and takes waveform, spectrogram and phase information as input.

To train

Run python train.py --config config.yaml on a GPU.

To infer

Change the code in infer.py to specify the checkpoint you want to load and the sample inputs you want to use for inference. Run python infer.py --config config.yaml on a GPU, modify the code for the correct path of checkpoints and wav files.

Colab Sample

You can experiment with the colab sample and trained checkpoint here.

Note that the released checkpoint is trained for 2x super-resolution. You should select the GPU runtime to run the sample.

p225_001_lr.wav and p225_001_hr.wav are the LR version and HR version of the same utterance taken from the test set.

wsrglow's People

Contributors

andreasjansson avatar zkx06111 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wsrglow's Issues

distorted spectrograms after model

Hi! I tried your pretrained checkpoint in colab and got some extra values at the spectrogram in the first case and broken harmonics in the second case.
First audio is 44100Hz real speech (converted to 24k and then upscaled to 48k).
Second audio is the output of text-to-speech system (22050, upscaled to 44100)

I don't hear any noticeable difference in both audios, is this expected?

image
image

this is the spectrogram representation in Audacity. Upper one is before, bottom is after. Mel scale

Real world application, upsampling historic recordings?

Hi I've been testing your model for a side project I'm working on. I'd like to take early historic recordings (1890-1920s), denoise & upsample them. I've already denoised them (amazingly so!), I'm trying to upsample using your model but it doesn't seem to be doing much. I used the code from the code lab and the config that's in the repo.

Is this not a good application of the model or did I do something incorrectly?

Here is the results I produced
example_and_prediction_wav_files.zip

Spectrogram - Top is the example wav (Thomas Edison speaking 1912), bottom is the prediction. I can't hear a discernible difference and I'm well versed in audio engineering.

Screen Shot 2021-12-30 at 3 14 14 PM

from infer import *

set_hparams(config='config.yaml')

model = WaveGlowMelHF(**hparams['waveglow_config']).cuda()

load_ckpt(model, 'model_ckpt_best.pt')
model.eval()

fns = ['te_small.wav']

sigma = 1
for lr_fn in fns:
lr, sr = load_wav(lr_fn)
print(f'sampling rate (lr) = {sr}')
print(f'lr.shape = {lr.shape}', flush=True)
with torch.no_grad():
pred = run(model, lr, sigma=sigma)
print(lr.shape, pred.shape)
pred_fn = f'pred_{lr_fn}'
print(f'sampling rate = {sr * 2}')
sf.write(open(pred_fn, 'wb'), pred, sr * 2)

checkpoint cannot be downloaded

Estoy tratando de hacer la inferencia de este modelo pero no me deja descargar el checkpoint. Hay alguna forma de que pueda suministrármelo?

Having Trouble in training: utils.tensors_to_scalars

Hello, I'm trying to run your code.
I just ran train.py with commands in readme, (with additional argument --hparams work_dir=ccc).
But faced this error.

File "train.py", line 143, in training_step
log_outputs = utils.tensors_to_scalars(log_outputs)
AttributeError: module 'utils' has no attribute 'tensors_to_scalars'

I looked over commit logs, but utils.py never had that function.

FileNotFoundError: [Errno 2] No such file or directory: ''

Hi, authors,
Thank you for open sourcing this great repository.

I ran python train.py --config config.yaml, and got this error: FileNotFoundError: [Errno 2] No such file or directory: ''

Traceback (most recent call last):
  File "/home/wschoi/PycharmProjects/WSRGlow/train.py", line 345, in <module>
    WaveGlowTask4.start()
  File "/home/wschoi/PycharmProjects/WSRGlow/train.py", line 274, in start
    period=1 if hparams['save_ckpt'] else 100000
  File "/home/wschoi/PycharmProjects/WSRGlow/training_utils.py", line 23, in __init__
    os.makedirs(filepath, exist_ok=True)
  File "/home/wschoi/miniconda3/envs/wsrglow/lib/python3.7/os.py", line 223, in makedirs
    mkdir(name, mode)
FileNotFoundError: [Errno 2] No such file or directory: ''

Process finished with exit code 1

I guess this error was occurred because args_work_dir was set to '' unless args.exp_name is not a default value.

WSRGlow/hparams.py

Lines 39 to 42 in 1b8fc49

args_work_dir = ''
if args.exp_name != '':
args.work_dir = args.exp_name
args_work_dir = f'checkpoints/{args.work_dir}'

and then, hparams_['work_dir'] is set to args_work_dir regardless of work_dir of config.yaml.

WSRGlow/hparams.py

Lines 84 to 86 in 1b8fc49

if not args.reset:
hparams_.update(saved_hparams)
hparams_['work_dir'] = args_work_dir


TLDR;

This error is occurred only when args.exp_name == '' .

For those who want to quickly reproduce train.py I would recommend a script like below.

python train.py --config config.yaml --config config.yaml --exp_name WSRGlow

Different sampling rates

Hi,
I would like to train the model to upsample from 8 kHz to 16 kHz. What should I change?
Thank you!

Is it possible to implement reading any other files than WAV? (e.g. MKA (Matroska) files)

Google Colab and replicate.com virtual machines have a necessity of chunking 44kHz WAV stereo file into 53 seconds parts, otherwise it throws out of memory CUDA error.

I find it very comfortable to chunk WAVs using MKVToolnix which losslessly places WAV inside Matroska container.
I kind of overcome the MKA issue by using MKVExtractGUI-2 with v.20 version of MKVToolnix (the only one compatible).
I also used Lossless-Cut for pure WAVs, but it's more cumbersome and cannot really chunk a file every 53 seconds automatically. At least it has a merge option.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.