Giter Site home page Giter Site logo

dansuh17 / segan-pytorch Goto Github PK

View Code? Open in Web Editor NEW
105.0 2.0 31.0 84 KB

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

License: GNU General Public License v3.0

Python 100.00%
segan pytorch data-preprocessing segan-pytorch audio speech-enhancement source-separation mir

segan-pytorch's Introduction

Pytorch Implementation of SEGAN (Speech Enhancement GAN)

Implementation of SEGAN by Pascual et al. in 2017, using pytorch. Original Tensorflow version can be found here.

Prerequisites

  • python v3.5.2 or higher
  • pytorch v0.4.0
  • CUDA preferred
  • noisy speech dataset downloaded from here
  • libraries specified in requirements.txt

Installing Required Libraries

pip install -r requirements.txt

Data Preprocessing

Use data_preprocess.py file to preprocess downloaded data. Adjust the file paths at the beginning of the file to properly locate the data files, output folder, etc. Uncomment functions in __main__ to perform desired preprocessing stage.

Data preprocessing consists of three main stages:

  1. Downsampling - downsample original audio files (48k) to sampling rate of 16000.
  2. Serialization - Splitting the audio files into 2^14-sample (about 1 second) snippets.
  3. Verification - whether it contains proper number of samples.

Note that the second stage takes a fairly long time - more than an hour.

Training

python model.py

Again, fix and adjust datapaths in model.py according to your needs. Especially, provide accurate path to where serialized data are stored.

Using Tensorboard

In order to use tensorboard, you need to first install tensorboard:

pip install tensorboard

Then run tensorboard by specifing the log directory.

tensorboard --logdir=segan_data_out/tblogs

segan-pytorch's People

Contributors

dansuh17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

segan-pytorch's Issues

Save the Optimizer

Hi @densuh ,

It seems that resuming the training process is incomplete as the optimizers for both generator and discriminator networks are not being saved.

Best Regards

How to compute the SSNR and other evaluation parameters

Hi,
Thank you for the nice implementation.
The model.py contains only training, but I don't know how to test the model and compute the SSNR and other evaluation parameters in the original paper.
Could you give me some advice to solve this problem?

Thanks!

Update the script to work with pytorch v0.4

apply possible API changes

 model.py:254: UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.
model.py:389: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_
2018-06-26 13:56:18,462 STDOUT model.py:430: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number                                             
2018-06-26 13:56:18,463 STDOUT   epoch + 1, i + 1, clean_loss.data[0],                                                
2018-06-26 13:56:18,463 STDOUT model.py:431: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number

Improving other noisy data instead of Voice bank

Hi,
I have used GAN for speech enhancement using AMI speech dataset but it does not improve the noisy far-field speech data samples. Although it works well on Voice bank dataset as mentioned in the paper.
Can you please give any comments.
Thanks

Script to generate audio

It will be good to have a separate script just to generate audio from a trained model.
Example, you feed in a noisy signal to a trained generator and you get an enhanced signal.

Train result is not consistent with the original paper

Hi,
Thanks for your codes. I use your codes to train the segan network. When the result came out, it is not consistent with the original paper. The noisy signal is denoised less. Have you compared your results with the original paper before?
Thanks in advance.

why your test data is in your train batch

thanks for you code, it helps me a lot, and there r some problems when training,
I found that your batch size is 200 and the original paper used 400, have you tried other batch size?
and your test data is in your train batch, is there any reason for doing this?
and why is the hyper-parameter of activation function used in discriminator is "negative_slope = 0.03" (different from original paper)

thank you

Discriminator Sigmoid layer

Hi,

Thanks for this clear implementation ... I would like to know why did you use Sigmoid function as an output layer for the discriminator network ?

Because I cannot find it used in the TF implementation.

Best Regards

Results

Hell Danush ,
Can,I know the scores of STOI and PESQ with 56 speaker dataset?

RuntimeError: tensors are on different GPUs

I'm getting this error with:

  • Python 3.5
  • CUDA 8

Any idea how can this be solved?

Generator created
DataLoader created
Test samples loaded
Starting Training...
Traceback (most recent call last):
  File "model.py", line 396, in <module>
    outputs = discriminator(batch_pairs_var, ref_batch_var)  # output : [40 x 1 x 8]
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "model.py", line 109, in forward
    ref_x = self.conv1(ref_x)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 166, in forward
    self.padding, self.dilation, self.groups)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/functional.py", line 54, in conv1d
    return f(input, weight, bias)
RuntimeError: tensors are on different GPUs

tensorboard audio logging error

seeing errors like:

2019-02-26 21:08:22,621 STDOUT warning: audio amplitude out of range, auto clipped.
2019-02-26 21:08:22,659 STDOUT warning: audio amplitude out of range, auto clipped.

during training. Normalizing the audio signals might help.

OSError: [Errno 5] Input/output error

hello,how to resolve this problem,thank you.
Traceback (most recent call last):
File "model.py", line 399, in
for i, sample_batch_pairs in enumerate(random_data_loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 623, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OSError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/content/drive/SEGAN/data_generator.py", line 56, in getitem
pair = np.load(self.filepaths[idx])
File "/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py", line 404, in load
magic = fid.read(N)
OSError: [Errno 5] Input/output error

D loss is very low but G loss is very high

when code ran into about epoch 4 and beyond, the loss of D downed lower than 0.001 but the loss of G was very high about 100. I don't know whether is key that I change the batch_size to 32, I wish get your answers, thx!

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1.

Hello, I really like the implementation you wrote.
I used the TIMIT and NoiseX92 to synthesize my own dataset, but at runtime, I am prompted with the following error:

  File "model.py", line 291, in forward
    encoded = torch.cat((c, z), dim=1)

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 3 in dimension 0 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.