Giter Site home page Giter Site logo

maum-ai / nuwave Goto Github PK

View Code? Open in Web Editor NEW
280.0 11.0 20.0 27.11 MB

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021

Home Page: https://mindslab-ai.github.io/nuwave/

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 1.95% Python 98.05%
upsampling super-resolution deep-learning deep-generative-model pytorch neural-audio-upsampling

nuwave's People

Contributors

seungwoo0326 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nuwave's Issues

Contribution: checkpoints AVAILABLE!

Hi guys,
First I would like to thank @junjun3518 for the excellent work of developing and sharing the code. I trained the model following the paper settings for two weeks on a V100 GPU using ratio=2 and 3. I would like to contribute to the project by sharing the checkpoints. Below are the download links.

nuwave x2:
https://drive.google.com/file/d/1pegayKs-i78yWlPuLIp-BCU8KxxCpBzd/view?usp=sharing

nuwave x3:
https://drive.google.com/file/d/12RUMjEALAs0EoEw6Fqf9ZkpTm3COX6sf/view?usp=sharing

The following are images of the training logs.

nuwave x2:

epoch:
epoch

loss:
loss

val loss:
val_loss

nuwave x3:

epoch:
epoch

loss:
loss

val loss:
val_loss

b-a01c-8df0b99c9e0e.svg)

About pretrained model weight

Hi, Thank you very much for publishing the source code. But I would like to know that can you share with the pretrained model weight?

pretrained weight

is there any pretrained weight? I want to test with my audio dataset.

About to be a Contribution

I decided to make a colab , and it took hours to adjust the packaging of pip and all the other mismatching stuff , and the colab is supposedly working , and no packaging errors or (Module not found) errors anymore ,, but the thing is , it doesn't really upscale , i used the x3 last.ckpt from freds0 it should upscale 16khz to 48khz , and i did put a 16khz file which was provided from the demo and it did the following
Annotation 2022-06-08 032011

by the way I don't understand why it did so many outputs , but i took the "sample_0_48000.wav" and there is no upscaling improvement at all , at all :( but the interesting thing is it does say 48khz if imported to any audio software but it doesn't sound like it nor the spectrogram looks like it ,, you know the 16khz file that i experimented on was 66kb , after supposedly the upscaling is made it says it is 197kb but it doesn't sound like 48khz at ALL ,,, here comes the crazy part when i exported the same 16khz 66kb file to adobe audition and then exported it to 48khz which is wrong but i did it , it was the same size as the nuwave one 197 kb also , and it had the same spectrogram of the nuwave ,, then i knew that there is something wrong and that nu wave isn't really functioning alright ,, P.S during the inference there was also some errors but it didn't stop it ,,
the code was; !python sampling.py -c last.ckpt -f sample_0.wav
and it did that /usr/local/lib/python3.7/dist-packages/torch/functional.py:472: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:664.)
normalized, onesided, return_complex)
/usr/local/lib/python3.7/dist-packages/torch/functional.py:546: UserWarning: istft will require a complex-valued input tensor in a future PyTorch release. Matching the output from stft with return_complex=True. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:817.)
normalized, onesided, length, return_complex)

i hope you reach as soon as possible as this problem of it saying it is 48khz and not sounding like one is really making my mind go crazy :)

Have a nice day/night

How do I upsample a wav once the model is trained?

I think this is a very interesting project and I'd like to test it but I'm not a data scientist. I see how to train and test but I don't see any examples of how to use it. I looked through the code but didn't see anything that gave me a clear indication on how to use it. How do I upsample a wav once the model is trained?

Upsample file has static in the background or complete silence

Hello-there, Junhyeok & Seungu,

My name is David. I'm writing an article about your awesome repository for the Level Up Coding publication on Medium.

I'm still new to deep learning so I've been stumbling a bit through your implementation.

I think your paper mentioned that 8 epochs produced the similar results as 1000 epochs.

I trained the model with a 1080 ti 11gb using a batch size of 3 for 7 epochs so far.
It created a checkpoint file for the 5th epoch.
It also created a ema checkpoint file for the 7th epoch.

Here's the strange part...

The regular checkpoints produce an upsample file that has constant static in the background.
The ema checkpoints produce an upsample file with complete silence.

Would either of you be able to help shed some light on how to make the most of your awesome repository?

With appreciation,

David

Pretrained model

Hi! Thank you for sharing your code.

Do you plan to share pretrained models?

robustness issue: use of methods marked for deprecation

Hi,

  1. Do you have pretrained models - I don't see them linked in the github ? It would be great to have those

  2. So, to test your model I'm retraining, I noticed a couple easy fixes that would make this robust to current libraries.
    librosa 0.9 and pytorch-lightning 1.4
    -- I get it that you put older libraries librosa 0.8 and pytorch-ligthning 1.1.6 in the requirements, yet the 'fixes' were already marked for deprecation and having the environmnet already built I didnt want to grab older libraries. So, for your consideration only, you may want to keep the old code but it doesnt work for me. I forked and while I don't know if all processes are being correctly run it seems to be training alright.

file: nuwave/utils/wav2pt.py
on librosa 0.9.0 effects.trim() requires kwargs for all but the first argument; minimal change

rosa.effects.trim(y, top_db=15)   

file: nuwave/trainer.py
pytorch-lightning has the terrible habit of deprecating and renaming; I think these changes should work in the older version as well as they were already slated for deprecation. From the CHANGELOG
(#5321) Removed deprecated checkpoint argument filepath Use dirpath + filename instead
(#6162) Removed deprecated ModelCheckpoint arguments prefix, mode="auto"

    checkpoint_callback = ModelCheckpoint(dirpath=hparams.log.checkpoint_dir,
                                          filename=ckpt_path,
                                          verbose=True,
                                          save_last=True,
                                          save_top_k=3,
                                          monitor='val_loss',
                                          mode='min')

Trainer() class does not accept checkpoint_callback kwarg.
(#9754) Deprecate checkpoint_callback from the Trainer constructor in favour of enable_checkpointing

    trainer = Trainer(
        checkpoint_callback=True,
        gpus=hparams.train.gpus,
        accelerator='ddp' if hparams.train.gpus > 1 else None,
        #plugins='ddp_sharded',
        amp_backend='apex',  #
        amp_level='O2',  #
        #num_sanity_val_steps = -1,
        check_val_every_n_epoch=2,
        gradient_clip_val = 0.5,
        max_epochs=200000,
        logger=tblogger,
        progress_bar_refresh_rate=4,
        callbacks=[
            EMACallback(os.path.join(hparams.log.checkpoint_dir,
                        f'{hparams.name}_epoch={{epoch}}_EMA')),
                        checkpoint_callback
                  ],
        resume_from_checkpoint=None
        if args.resume_from == None or args.restart else sorted(
            glob(
                os.path.join(hparams.log.checkpoint_dir,
                             f'*_epoch={args.resume_from}.ckpt')))[-1])

(#11578) Deprecated Callback.on_epoch_end hook in favour of Callback.on_{train/val/test}_epoch_end

    @rank_zero_only
    def on_train_epoch_end(self, trainer, pl_module):
        self.queue.append(trainer.current_epoch)
        ...

Different sampling rates

Hi!

Did you observe trainings with different sampling rates such as 8K->16K, 8K-> 22K, 16K->22K, etc.. ?
(diferent from demo page)

and what changes should we do to train with these data? (maybe hop length, n_fft, noise_schedule, pos_emb_scale, etc..)

_pickle.UnpicklingError running test.py script (SOLVED)

Hi @junjun3518,

First, congratulations on the work. I trained the nuwave model for r=2 and r=3, however I'm having trouble running the test.py script.

Please if you can help me that would be great. The following error message occurs when running:

# python test.py -r=645 -e
checkpoint_ratio_2/nuwave_x2_01_07_22_epoch=645_EMA
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
full speakers 109
2 9
num_workers:  64
Testing:   0%|                                                                                                                                                       | 0/3552 [00:00<?, ?it/s]Exception in thread Thread-3:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.6/site-packages/prefetch_generator/__init__.py", line 80, in run
    for item in self.generator:
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 388, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1018, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1043, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 420, in reraise
    raise self.exc_type(msg)
_pickle.UnpicklingError: Caught UnpicklingError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/Documentos/Upsampling/nuwave/dataloader.py", line 104, in __getitem__
    wav = torch.load(self.data_list[index])
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 765, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: unpickling stack underflow

I'm using docker container. My hparameter.yaml file is as follows:

train:
  batch_size: 16
  lr: 0.00003
  weight_decay: 0.00
  num_workers: 64
  gpus: 1 #ddp
  opt_eps: 1e-9
  beta1: 0.5
  beta2: 0.999

data:
  dir: './VCTK-Corpus/wav48/' #dir/spk/format
  format: '*.wav'
  cv_ratio: (100./108., 8./108., 0.00) #train/val/test

audio:
  sr: 48000
  nfft: 1024
  hop: 256
  ratio: 2 #upscale_ratio
  length: 32768 #32*1024 ~ 1sec

arch:
  residual_layers: 30 #
  residual_channels: 64
  dilation_cycle_length: 10
  pos_emb_dim: 512 

ddpm:
  max_step: 1000
  noise_schedule: "torch.linspace(1e-6, 0.006, hparams.ddpm.max_step)"
  pos_emb_scale: 50000
  pos_emb_channels: 128 
  infer_step: 8
  infer_schedule: "torch.tensor([1e-6,2e-6,1e-5,1e-4,1e-3,1e-2,1e-1,9e-1])"

log:
  name: 'nuwave_x2'
  checkpoint_dir: 'checkpoint_ratio_2/'
  tensorboard_dir: 'tensorboard_ratio_2/'
  test_result_dir: 'test_sample/result'

Thank you very much in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.