dansuh17 / segan-pytorch Goto Github PK

View Code? Open in Web Editor NEW

106.0 2.0 32.0 84 KB

SEGAN pytorch implementation https://arxiv.org/abs/1703.09452

License: GNU General Public License v3.0

Python 100.00%

segan pytorch data-preprocessing segan-pytorch audio speech-enhancement source-separation mir

segan-pytorch's Introduction

Pytorch Implementation of SEGAN (Speech Enhancement GAN)

Implementation of SEGAN by Pascual et al. in 2017, using pytorch. Original Tensorflow version can be found here.

Prerequisites

python v3.5.2 or higher
pytorch v0.4.0
CUDA preferred
noisy speech dataset downloaded from here
libraries specified in requirements.txt

Installing Required Libraries

pip install -r requirements.txt

Data Preprocessing

Use data_preprocess.py file to preprocess downloaded data. Adjust the file paths at the beginning of the file to properly locate the data files, output folder, etc. Uncomment functions in __main__ to perform desired preprocessing stage.

Data preprocessing consists of three main stages:

Downsampling - downsample original audio files (48k) to sampling rate of 16000.
Serialization - Splitting the audio files into 2^14-sample (about 1 second) snippets.
Verification - whether it contains proper number of samples.

Note that the second stage takes a fairly long time - more than an hour.

Training

python model.py

Again, fix and adjust datapaths in model.py according to your needs. Especially, provide accurate path to where serialized data are stored.

Using Tensorboard

In order to use tensorboard, you need to first install tensorboard:

pip install tensorboard

Then run tensorboard by specifing the log directory.

tensorboard --logdir=segan_data_out/tblogs

segan-pytorch's People

Contributors

Stargazers

Watchers

segan-pytorch's Issues

de_emphasis function is broken

It looks like your de_emphasis function is broken: batch != de_emphasis(pre_emphasis(batch)). Did you test it?

The output of discriminator is [1 x 8]?

As a true or false indicator, shouldn't the output of discriminator be a scalar?

the generated audio are nothing

the waveform of the denoised audio are zero.

OSError: [Errno 5] Input/output error

hello,how to resolve this problem,thank you.
Traceback (most recent call last):
File "model.py", line 399, in
for i, sample_batch_pairs in enumerate(random_data_loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 623, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OSError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/content/drive/SEGAN/data_generator.py", line 56, in getitem
pair = np.load(self.filepaths[idx])
File "/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py", line 404, in load
magic = fid.read(N)
OSError: [Errno 5] Input/output error

D loss is very low but G loss is very high

when code ran into about epoch 4 and beyond, the loss of D downed lower than 0.001 but the loss of G was very high about 100. I don't know whether is key that I change the batch_size to 32, I wish get your answers, thx!

tensorboard audio logging error

seeing errors like:

2019-02-26 21:08:22,621 STDOUT warning: audio amplitude out of range, auto clipped.
2019-02-26 21:08:22,659 STDOUT warning: audio amplitude out of range, auto clipped.

during training. Normalizing the audio signals might help.

Discriminator Sigmoid layer

Hi,

Thanks for this clear implementation ... I would like to know why did you use Sigmoid function as an output layer for the discriminator network ?

Because I cannot find it used in the TF implementation.

Best Regards

how to predict

Hi,
thank you for the nice implementation.

Regarding predict, should I use Tensorflow original version?
(https://github.com/santi-pdp/segan/blob/master/main.py)

Your REDADME explain only about training, so let me confirm how to predict the model.

Regards,

Train result is not consistent with the original paper

Hi,
Thanks for your codes. I use your codes to train the segan network. When the result came out, it is not consistent with the original paper. The noisy signal is denoised less. Have you compared your results with the original paper before?
Thanks in advance.

Add example audio files.

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1.

Hello, I really like the implementation you wrote.
I used the TIMIT and NoiseX92 to synthesize my own dataset, but at runtime, I am prompted with the following error:

  File "model.py", line 291, in forward
    encoded = torch.cat((c, z), dim=1)

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 32 and 3 in dimension 0 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

RuntimeError: tensors are on different GPUs

I'm getting this error with:

Python 3.5
CUDA 8

Any idea how can this be solved?

Generator created
DataLoader created
Test samples loaded
Starting Training...
Traceback (most recent call last):
  File "model.py", line 396, in <module>
    outputs = discriminator(batch_pairs_var, ref_batch_var)  # output : [40 x 1 x 8]
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 66, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "model.py", line 109, in forward
    ref_x = self.conv1(ref_x)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 166, in forward
    self.padding, self.dilation, self.groups)
  File "/home/xxxxx/seganPyTorch-env/lib/python3.5/site-packages/torch/nn/functional.py", line 54, in conv1d
    return f(input, weight, bias)
RuntimeError: tensors are on different GPUs

use tensorboard for viz

https://github.com/lanpa/tensorboard-pytorch

Remove unused directory information in model.py

some directory info are not used and can be confusing

Improving other noisy data instead of Voice bank

Hi,
I have used GAN for speech enhancement using AMI speech dataset but it does not improve the noisy far-field speech data samples. Although it works well on Voice bank dataset as mentioned in the paper.
Can you please give any comments.
Thanks

Define conditional losses.

Script to generate audio

It will be good to have a separate script just to generate audio from a trained model.
Example, you feed in a noisy signal to a trained generator and you get an enhanced signal.

log more info in tensorboard

epoch
sample diff (L2 loss)
clean vs noisy vs enhanced audio samples
etc.

Update the script to work with pytorch v0.4

apply possible API changes

 model.py:254: UserWarning: nn.init.xavier_normal is now deprecated in favor of nn.init.xavier_normal_.
model.py:389: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_

2018-06-26 13:56:18,462 STDOUT model.py:430: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number                                             
2018-06-26 13:56:18,463 STDOUT   epoch + 1, i + 1, clean_loss.data[0],                                                
2018-06-26 13:56:18,463 STDOUT model.py:431: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number

why your test data is in your train batch

thanks for you code, it helps me a lot, and there r some problems when training,
I found that your batch size is 200 and the original paper used 400, have you tried other batch size?
and your test data is in your train batch, is there any reason for doing this?
and why is the hyper-parameter of activation function used in discriminator is "negative_slope = 0.03" (different from original paper)

thank you

How to compute the SSNR and other evaluation parameters

Hi,
Thank you for the nice implementation.
The model.py contains only training, but I don't know how to test the model and compute the SSNR and other evaluation parameters in the original paper.
Could you give me some advice to solve this problem?

Thanks!

Save the Optimizer

Hi @densuh ,

It seems that resuming the training process is incomplete as the optimizers for both generator and discriminator networks are not being saved.