breizhn / dtln Goto Github PK

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

License: MIT License

Python 100.00%

noise-reduction deep-learning audio real-time-audio audio-processing noise-suppression tensorflow dns-challenge dtln-model speech-denoising

dtln's Introduction

Dual-signal Transformation LSTM Network

Tensorflow 2.x implementation of the stacked dual-signal transformation LSTM network (DTLN) for real-time noise suppression.
This repository provides the code for training, infering and serving the DTLN model in python. It also provides pretrained models in SavedModel, TF-lite and ONNX format, which can be used as baseline for your own projects. The model is able to run with real time audio on a RaspberryPi.
If you are doing cool things with this repo, tell me about it. I am always curious about what you are doing with this code or this models.

The DTLN model was handed in to the deep noise suppression challenge (DNS-Challenge) and the paper was presented at Interspeech 2020.

This approach combines a short-time Fourier transform (STFT) and a learned analysis and synthesis basis in a stacked-network approach with less than one million parameters. The model was trained on 500h of noisy speech provided by the challenge organizers. The network is capable of real-time processing (one frame in, one frame out) and reaches competitive results. Combining these two types of signal transformations enables the DTLN to robustly extract information from magnitude spectra and incorporate phase information from the learned feature basis. The method shows state-of-the-art performance and outperforms the DNS-Challenge baseline by 0.24 points absolute in terms of the mean opinion score (MOS).

For more information see the paper. The results of the DNS-Challenge are published here. We reached a competitive 8th place out of 17 teams in the real time track.

For baseline usage and to reproduce the processing used for the paper run:

$ python run_evaluation.py -i in/folder/with/wav -o target/folder/processed/files -m ./pretrained_model/model.h5

The pretrained DTLN-aec (the DTLN applied to acoustic echo cancellation) can be found in the DTLN-aec repository.

Author: Nils L. Westhausen (Communication Acoustics , Carl von Ossietzky University, Oldenburg, Germany)

This code is licensed under the terms of the MIT license.

Citing:

If you are using the DTLN model, please cite:

@inproceedings{Westhausen2020,
  author={Nils L. Westhausen and Bernd T. Meyer},
  title={{Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2477--2481},
  doi={10.21437/Interspeech.2020-2631},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2631}
}

Contents of the README:

Results
Execution Times
Audio Samples
Contents of the repository
Python dependencies
Training data preparation
Run a training of the DTLN model
Measuring the execution time of the DTLN model with the SavedModel format
Real time processing with the SavedModel format
Real time processing with tf-lite
Real time audio with sounddevice and tf-lite
Model conversion and real time processing with ONNX

Results:

Results on the DNS-Challenge non reverberant test set:

Model	PESQ [mos]	STOI [%]	SI-SDR [dB]	TF version
unprocessed	2.45	91.52	9.07
NsNet (Baseline)	2.70	90.56	12.57

DTLN (500h)	3.04	94.76	16.34	2.1
DTLN (500h)	2.98	94.75	16.20	TF-light
DTLN (500h)	2.95	94.47	15.71	TF-light quantized

DTLN norm (500h)	3.04	94.47	16.10	2.2

DTLN norm (40h)	3.05	94.57	16.88	2.2
DTLN norm (40h)	2.98	94.56	16.58	TF-light
DTLN norm (40h)	2.98	94.51	16.22	TF-light quantized

The conversion to TF-light slightly reduces the performance.
The dynamic range quantization of TF-light also reduces the performance a bit and introduces some quantization noise. But the audio-quality is still on a high level and the model is real-time capable on the Raspberry Pi 3 B+.
The normalization of the log magnitude of the STFT does not decrease the model performance and makes it more robust against level variations.
With data augmentation during training it is possible to train the DTLN model on just 40h of noise and speech data. If you have any question regarding this, just contact me.

To contents

Execution Times:

Execution times for SavedModel are measured with TF 2.2 and for TF-lite with the TF-lite runtime:

System	Processor	#Cores	SavedModel	TF-lite	TF-lite quantized
Ubuntu 18.04	Intel I5 6600k @ 3.5 GHz	4	0.65 ms	0.36 ms	0.27 ms
Macbook Air mid 2012	Intel I7 3667U @ 2.0 GHz	2	1.4 ms	0.6 ms	0.4 ms
Raspberry Pi 3 B+	ARM Cortex A53 @ 1.4 GHz	4	15.54 ms	9.6 ms	2.2 ms

For real-time capability the execution time must be below 8 ms.

To contents

Audio Samples:

Here some audio samples created with the tf-lite model. Sadly audio can not be integrated directly into markdown.

Noisy	Enhanced	Noise type
Sample 1	Sample 1	Air conditioning
Sample 2	Sample 2	Music
Sample 3	Sample 3	Bus

To contents

Contents of the repository:

DTLN_model.py
This file is containing the model, data generator and the training routine.
run_training.py
Script to run the training. Before you can start the training with $ python run_training.pyyou have to set the paths to you training and validation data inside the script. The training script uses a default setup.
run_evaluation.py
Script to process a folder with optional subfolders containing .wav files with a trained DTLN model. With the pretrained model delivered with this repository a folder can be processed as following:
$ python run_evaluation.py -i /path/to/input -o /path/for/processed -m ./pretrained_model/model.h5
The evaluation script will create the new folder with the same structure as the input folder and the files will have the same name as the input files.
measure_execution_time.py
Script for measuring the execution time with the saved DTLN model in ./pretrained_model/dtln_saved_model/. For further information see this section.
real_time_processing.py
Script, which explains how real time processing with the SavedModel works. For more information see this section.

./pretrained_model/ \
- model.h5: Model weights as used in the DNS-Challenge DTLN model.
- DTLN_norm_500h.h5: Model weights trained on 500h with normalization of stft log magnitudes.
- DTLN_norm_40h.h5: Model weights trained on 40h with normalization of stft log magnitudes.
- ./dtln_saved_model: same as model.h5 but as a stateful model in SavedModel format.
- ./DTLN_norm_500h_saved_model: same as DTLN_norm_500h.h5 but as a stateful model in SavedModel format.
- ./DTLN_norm_40h_saved_model: same as DTLN_norm_40h.h5 but as a stateful model in SavedModel format.
- model_1.tflite together with model_2.tflite: same as model.h5 but as TF-lite model with external state handling.
- model_quant_1.tflite together with model_quant_2.tflite: same as model.h5 but as TF-lite model with external state handling and dynamic range quantization.
- model_1.onnx together with model_2.onnx: same as model.h5 but as ONNX model with external state handling.

To contents

Python dependencies:

The following packages will be required for this repository:

TensorFlow (2.x)
librosa
wavinfo

All additional packages (numpy, soundfile, etc.) should be installed on the fly when using conda or pip. I recommend using conda environments or pyenv virtualenv for the python environment. For training a GPU with at least 5 GB of memory is required. I recommend at least Tensorflow 2.1 with Nvidia driver 418 and Cuda 10.1. If you use conda Cuda will be installed on the fly and you just need the driver. For evaluation-only the CPU version of Tensorflow is enough. Everything was tested on Ubuntu 18.04.

Conda environments for training (with cuda) and for evaluation (CPU only) can be created as following:

For the training environment:

$ conda env create -f train_env.yml

For the evaluation environment:

$ conda env create -f eval_env.yml

For the tf-lite environment:

$ conda env create -f tflite_env.yml

The tf-lite runtime must be downloaded from here.

To contents

Training data preparation:

Clone the forked DNS-Challenge repository. Before cloning the repository make sure git-lfs is installed. Also make sure your disk has enough space. I recommend downloading the data to an SSD for faster dataset creation.
Run noisyspeech_synthesizer_multiprocessing.py to create the dataset. noisyspeech_synthesizer.cfgwas changed according to my training setup used for the DNS-Challenge.
Run split_dns_corpus.pyto divide the dataset in training and validation data. The classic 80:20 split is applied. This file was added to the forked repository by me.

To contents

Run a training of the DTLN model:

Make sure all dependencies are installed in your python environment.
Change the paths to your training and validation dataset in run_training.py.
Run $ python run_training.py.

One epoch takes around 21 minutes on a Nvidia RTX 2080 Ti when loading the training data from an SSD.

To contents

Measuring the execution time of the DTLN model with the SavedModel format:

In total there are three ways to measure the execution time for one block of the model: Running a sequence in Keras and dividing by the number of blocks in the sequence, building a stateful model in Keras and running block by block, and saving the stateful model in Tensorflow's SavedModel format and calling that one block by block. In the following I will explain how running the model in the SavedModel format, because it is the most portable version and can also be called from Tensorflow Serving.

A Keras model can be saved to the saved model format:

import tensorflow as tf
'''
Building some model here
'''
tf.saved_model.save(your_keras_model, 'name_save_path')

Important here for real time block by block processing is, to make the LSTM layer stateful, so they can remember the states from the previous block.

The model can be imported with

model = tf.saved_model.load('name_save_path')

For inference we now first call this for mapping signature names to functions

infer = model.signatures['serving_default']

and now for inferring the block x call

y = infer(tf.constant(x))['conv1d_1']

This command gives you the result on the node 'conv1d_1'which is our output node for real time processing. For more information on using the SavedModel format and obtaining the output node see this Guide.

For making everything easier this repository provides a stateful DTLN SavedModel. For measuring the execution time call:

$ python measure_execution_time.py

To contents

Real time processing with the SavedModel format:

For explanation look at real_time_processing.py.

Here some consideration for integrating this model in your project:

The sampling rate of this model is fixed at 16 kHz. It will not work smoothly with other sampling rates.
The block length of 32 ms and the block shift of 8 ms are also fixed. For changing these values, the model must be retrained.
The delay created by the model is the block length, so the input-output delay is 32 ms.
For real time capability on your system, the execution time must be below the length of the block shift, so below 8 ms.
If can not give you support on the hardware side, regarding soundcards, drivers and so on. Be aware, a lot of artifacts can come from this side.

To contents

Real time processing with tf-lite:

With TF 2.3 it is finally possible to convert LSTMs to tf-lite. It is still not perfect because the states must be handled seperatly for a stateful model and tf-light does not support complex numbers. That means that the model is splitted in two submodels when converting it to tf-lite and the calculation of the FFT and iFFT is performed outside the model. I provided an example script for explaining, how real time processing with the tf light model works (real_time_processing_tf_lite.py). In this script the tf-lite runtime is used. The runtime can be downloaded here. Quantization works now.

Using the tf-lite DTLN model and the tf-lite runtime the execution time on an old Macbook Air mid 2012 can be decreased to 0.6 ms.

To contents

Real time audio with sounddevice and tf-lite:

The file real_time_dtln_audio.pyis an example how real time audio with the tf-lite model and the sounddevice toolbox can be implemented. The script is based on the wire.py example. It works fine on an old Macbook Air mid 2012 and so it will probably run on most newer devices. In the quantized version it was sucessfully tested on an Raspberry Pi 3B +.

First check for your audio devices:

$ python real_time_dtln_audio.py --list-devices

Choose the index of an input and an output device and call:

$ python real_time_dtln_audio.py -i in_device_idx -o out_device_idx

If the script is showing too much input underflow restart the sript. If that does not help, increase the latency with the --latency option. The default value is 0.2 .

To contents

Model conversion and real time processing with ONNX:

Finally I got the ONNX model working. For converting the model TF 2.1 and keras2onnx is required. keras2onnx can be downloaded here and must be installed from source as described in the README. When all dependencies are installed, call:

$ python convert_weights_to_onnx.py -m /name/of/the/model.h5 -t onnx_model_name

to convert the model to the ONNX format. The model is split in two parts as for the TF-lite model. The conversion does not work on MacOS. The real time processing works similar to the TF-lite model and can be looked up in following file: real_time_processing_onnx.py The ONNX runtime required for this script can be installed with:

$ pip install onnxruntime

The execution time on the Macbook Air mid 2012 is around 1.13 ms for one block.

dtln's People

Contributors

Stargazers

Watchers

Forkers

ashishpatel26 xixirupan cybermax-008 gurbaaz27 ismita98 sunilsivadas kubasiak hiyoung-asr hyungui abhinavm24 rafle0 normonisping punkcure wang-asher shetu1994 rafeal23 fmbao rungalad rbozydar lihao0214 ahlas okrio ardhitama gxu82 anupamme cloudchenl anigi98932 wendongj shobhit-agarwal dewanggogte entn-at youngjay0612 taalua wangtianrui templeblock wac81 omar-fouad xzm2004260 jihwanparkpreprocessing azhiltz networkedaudio alfa1210 gatsbychen msaad1311 dgsivan liroda acbdef123 xiaowei-coder marvin-nj ishine syljoy sciai-ai iamweiweishi ajithprinc rxhmdia titoruiz bozonhiggsa sanebow intflow chungyehwangai ryuk17 lesanpi unanan zelokuo subhroc173 joshih-cad yongyug scofir fatehsinghiit zhaoliang1983x wangbq18 rogervaas redearly123 erridan23 jangsooyoung maxmax2016 heping236 haikefengw samiulshuvo shenhark stuartiannaylor shamsnafisaali hongfei-niu pb-001 omar-kitegames weishanyi sorangnl0 dennistang742 joaquinsun yaoao2017 agangzz hdubey dariadiatlova has-n fragrantrookie jinmingche yin-zhang wendonggan avinash-glitch olegjakushkin

dtln's Issues

Model B1 from paper

Hi, is the B1 model from the paper (4 layers of LSTM with STFT) also implemented in this code? Maybe I'm missing something..

PortAudioError: Error opening Stream: Invalid sample rate [PaErrorCode -9997]

Hi, I'm currently using a Raspberry Pi 4, I have installed the dependencies, I'm trying to make it work but I'm stuck on this error. Can you please help? Thank you.

pi@raspberrypi:~/Downloads/DTLN-master $ python3 real_time_dtln_audio.py --list-devices
0 bcm2835 HDMI 1: - (hw:0,0), ALSA (0 in, 8 out)
1 bcm2835 Headphones: - (hw:1,0), ALSA (0 in, 8 out)
2 USB PnP Sound Device: Audio (hw:2,0), ALSA (1 in, 2 out)
3 sysdefault, ALSA (0 in, 128 out)
4 lavrate, ALSA (0 in, 128 out)
5 samplerate, ALSA (0 in, 128 out)
6 speexrate, ALSA (0 in, 128 out)
7 pulse, ALSA (32 in, 32 out)
8 upmix, ALSA (0 in, 8 out)
9 vdownmix, ALSA (0 in, 6 out)
10 dmix, ALSA (0 in, 2 out)

11 default, ALSA (32 in, 32 out)

pi@raspberrypi:~/Downloads/DTLN-master $ python3 real_time_dtln_audio.py -i 2 -o 2
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2719
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
PortAudioError: Error opening Stream: Invalid sample rate [PaErrorCode -9997]

My USB headset with microphone is at 2 USB PnP Sound Device: Audio (hw:2,0), ALSA (1 in, 2 out)

Thank you very much!

nan when training

hi!Breizhn.thanks for your job.
I have a question about training，when i changing the variable len_samples to 5, nan appears. the len_samples must set longer？

> > @shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the `Execution Times` section of this repo's `ReadMe`.

@shilsircar the latency needs to be less than 8ms for a 32ms block as @breizhn says in the Execution Times section of this repo's ReadMe.
Also, since you've got a working model, I have a couple of questions:

Which model did you use for conversion? .h5 or savedmodel?

.h5 norm model

Does tfjs have stateful LSTMs? Or did you handle states outside the model?

I didn't have to handle it outside. Tfjs handles sateful lstm [email protected]

Trouble is TFJS team told me they don't have any immediate plans to implement conv1dwithbias and causal padding. I feel it's a bug since the last layer bias is false.
Issue open: tensorflow/tfjs#3578

I just found that in tfjs, conv1d doesn't have an option to set usebias to false. It only supports usebias - true.

Yup you can modify and still get reasonable results.

Originally posted by @hchintada in #4 (comment)

Result of retrain is not good as pretrain models provided.

Thanks for your wonderful job. @breizhn
I use this project to retrain on DNS-challenge dataset that was updated recently. The denoised results of the retraining model is a little worse than that of your model provided in this project( both 40h and 500h model). I just set 'norm_stft'=True'.
Any advice to improve the performance of retraining?

Looking forward to your reply.

question about B3 model in paper

Hi @breizhn, thanks for your job. The parameters of B3 model are as follows. Is it correct?

Model: "functional_1"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, None)] 0

lambda (Lambda) [(None, None, 257), 0 input_1[0][0]

tf_op_layer_AddV2 (TensorFlowOp [(None, None, 257)] 0 lambda[0][0]

tf_op_layer_Log (TensorFlowOpLa [(None, None, 257)] 0 tf_op_layer_AddV2[0][0]

instant_layer_normalization (In (None, None, 257) 514 tf_op_layer_Log[0][0]

lstm (LSTM) (None, None, 156) 258336 instant_layer_normalization[0][0]

dropout (Dropout) (None, None, 156) 0 lstm[0][0]

lstm_1 (LSTM) (None, None, 156) 195312 dropout[0][0]

dense (Dense) (None, None, 257) 40349 lstm_1[0][0]

activation (Activation) (None, None, 257) 0 dense[0][0]

multiply (Multiply) (None, None, 257) 0 lambda[0][0]
activation[0][0]

lstm_2 (LSTM) (None, None, 156) 258336 multiply[0][0]

dropout_1 (Dropout) (None, None, 156) 0 lstm_2[0][0]

lstm_3 (LSTM) (None, None, 156) 195312 dropout_1[0][0]

dense_1 (Dense) (None, None, 257) 40349 lstm_3[0][0]

activation_1 (Activation) (None, None, 257) 0 dense_1[0][0]

multiply_1 (Multiply) (None, None, 257) 0 multiply[0][0]
activation_1[0][0]

lambda_1 (Lambda) (None, None, 512) 0 multiply_1[0][0]
lambda[0][1]

lambda_2 (Lambda) (None, None) 0 lambda_1[0][0]

Total params: 988,508
Trainable params: 988,508
Non-trainable params: 0

How to run on Raspberry Pi?

Hi Nils, I'm trying to run your model on Raspberry Pi, however it seems miniconda doesn't support python3.7 yet. Should I degrade to python3.6 or install packages manually? Thanks.

Training got error

Epoch 00095: val_loss did not improve from -16.76465
2021-01-21 00:24:52.466681: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

can't convert same onnx by provided python file.

I convert onnx file by convert_weights_to_onnx.py. The onnx files converted is not sames onnx files provided

some error after python run_training.py

Traceback (most recent call last):
File "run_training.py", line 56, in
path_to_val_mix, path_to_val_speech)
File "/data1/dtln/DTLN/DTLN_model.py", line 555, in train_model
self.fs, train_flag=True)
File "/data1/dtln/DTLN/DTLN_model.py", line 54, in init
self.count_samples()
File "/data1/dtln/DTLN/DTLN_model.py", line 68, in count_samples
info = WavInfoReader(os.path.join(self.path_to_input, file))
File "/root/.conda/envs/train_env/lib/python3.7/site-packages/wavinfo/wave_reader.py", line 52, in init
self.main_list = chunks.children
AttributeError: 'ChunkDescriptor' object has no attribute 'children'

Transfer Learning with DTLN model weights to remove the block shift

To do away with block processing at inference, I'm trying to use your pre-trained weights, and retrain the network, after replacing the stftLayer with fftLayer.

Using mag normalization as is.

I can set stateful=False for both separation kernels while training?

Train for around 20 epochs.

Does this idea make sense @breizhn? Or should I start from zero to effect this change?

Also, can you please provide guidance on how to use data augmentation/pre-processing to train the network only with 40 hours data?

validation loss -16.83, but performance is not better than model.h5.

model.h5 : /pretrained_model/model.h5 (not normalized)

As a result of performance comparison between my_model.h5 and model.h5, model.h5 is the best.
I looked at the reason.
The size of the weights(weight,bias) values of "my_model.h5" is small compared to the weights values of "model.h5"

figure1. weights result

figure2. signal result

tensorflow-gpu == 2.4.0
tensorflow==2.4.0
GPU : GeForce GTX 1050
The parameters are the same as the code. The dataset was DNS-Challenge2020. I used clean,noise.
snr_lowr : -5
snr_upper : 25
total_hours : 40
norm_stft : False

Training data was created with the code provided by DNS-Challenge2020.
I set it to epoch 200, but because it is set to patient 10, it stopped at 96.
validation loss : -16.83

Is there a way to increase the weight value?
Is it a dataset issue?
Any advice would be appreciated.
Thank you.

Real-time-processing Tf_lite inference program ERROR

Hi, even after changing the input_details as you did in the file, the code is still giving an error

    interpreter_2.set_tensor(input_details_2[0]['index'], estimated_block)#input_details_2[0]
  File "/home/purna/.local/lib/python3.6/site-packages/tflite_runtime/interpreter.py", line 399, in set_tensor
    self._interpreter.SetTensor(tensor_index, value)
  File "/home/purna/.local/lib/python3.6/site-packages/tflite_runtime/interpreter_wrapper.py", line 148, in SetTensor
    return _interpreter_wrapper.InterpreterWrapper_SetTensor(self, i, value)
ValueError: Cannot set tensor: Dimension mismatch. Got 512 but expected 257 for dimension 2 of input 0.

and the input_details_2 is below along with shapes we want

states2
(1, 2, 128, 2)
input_details_2[1]['shape']
[  1   2 128   2]
estimated_block
(1, 1, 512)
input_details_2[0]['shape']
[  1   1 257]

any help would be appreciated.
Thanks in advance

the inference time of tflite_quant is larger than tflite

I use the supported model file (model_1.tflite, model_2.tflite, model_quant_1.tflite and model_quant_2.tflite) and the script "real_time_processing_tf_lite.py" to compare the inference time.
My implementation configs: Ubuntu 18.04, tf2.0.
the processing times are shown as follows:
TF-lite: 0.383403 ms; TF-lite quantized: 0.4470351 ms
It is a little abnormal that TF-lite quantized model is slower than TF-lite model during inference. I found the script is required in tf2.3.0 when running tflite model. Does it mean that tf2.0 has some limitations in your script? Looking forward to your reply

Unable to test the Pre-trained model with GPU

Thank you so much for providing the open source solution as it is very helpful. I came across a problem which I though would be better answered by the creators of DTLN. I tested the pre-trained models on simple linux machine on CPU and it works fine but when I shifted it to the Google Cloud Platform(GCP) instance which has a Nvidia Tesla K80 GPU and I'm using it for other processes such as Speaker Diarization but I'm unable to test the Pre-trained models with that installation, as it does not result in any error, its just some GPU libraries which gives some errors and I face them with my previous tests for Speaker Diarization in which the GPU is being used, is there any chance that DTLN does not support GPU to test pre-trained models or does not work with GPU at all.

You response will be highly appreciated, I'm attaching an output snippet below just for your reference. As you would see that the GPU is visible and the code executes successfully but does not result in any converted audio files with Noise suppression.

Thank you

Output:

2021-09-30 06:42:41.828956: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:41.829007: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-09-30 06:42:45.115012: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-09-30 06:42:47.497740: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:42:47.498557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2021-09-30 06:42:47.498729: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.498885: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.499052: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.501305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-09-30 06:42:47.502306: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-09-30 06:42:47.502448: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.502617: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.502790: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-09-30 06:42:47.502831: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-30 06:42:47.503317: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-30 06:42:47.504158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-30 06:42:47.504203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]
Processing finished.

Voice Flickering during RT Use

Hi,

Thank you for this great repo, the model and pre-trained weights.

I tried to use the model you provided (dtln_saved_model) for real-time denoising on my laptop.
It has successfully removed the background noise, but the resultant speech signal has a lot of flickering.

Please check the audio files in below links.

Using Windows OS, CPU, TF 2.2. pyaudio for real time processing.

Can you suggest what is causing this and how to avoid it?

noisy - clyp.it/opnfnagd?token=1f91ac255bf94fce0dac66f2fe2cc36c
cleaned - clyp.it/55rwflcn?token=cec87a02aa62634e7ff5da4d9e43d5c2

error on run noisyspeech_synthesizer_multiprocessing.py

I clone your dns-challage fork, and run python noisyspeech_synthesizer_multiprocessing.py. But error can be seen below:

WARNING: Audio type not supported
Generating file #59032
WARNING: Audio type not supported
Generating file #59220
WARNING: Audio type not supported
Generating file #59408
WARNING: Audio type not supported
Generating file #59596
WARNING: Audio type not supported
Generating file #59784
WARNING: Audio type not supported
Generating file #59972
WARNING: Audio type not supported
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/.conda/envs/train_env/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/root/.conda/envs/train_env/lib/python3.7/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "noisyspeech_synthesizer_multiprocessing.py", line 156, in main_gen
gen_audio(True, params, filenum)
File "noisyspeech_synthesizer_multiprocessing.py", line 124, in gen_audio
build_audio(is_clean, params, filenum, audio_samples_length)
File "noisyspeech_synthesizer_multiprocessing.py", line 75, in build_audio
input_audio, fs_input = audioread(source_files[idx])
File "/data1/dtln/fork-dns-challenge/audiolib.py", line 46, in audioread
if len(audio.shape) == 1: # mono
AttributeError: 'NoneType' object has no attribute 'shape'
"""

[Question] Preparation of dataset?

Hi Nils,

I trained your model on speech & noise provided by DNS challenge, but it seems that the model you provided in pretrained_model/ folder performs better than the model I trained (especially for the data with_reverb). Therefore I got some question about how you prepared your data and trained your model.

In your paper you mentioned that WHAMR corpus was also used, did you use it as train & cross validation & test or only test set?
Did you use DNS-Challenge script provided by microsoft to create the training set (I think they did not consider room impulse response in thier script)? Or did you add any RIR to create your training set?
In the DNS-Challenge repo you forked, I think you just randomly split the noisy data into train and val, which means that same speaker(s) may appear in both train and val, right?
Compared with DTLN_norm_500h.h5, did you use norm_stft=false and other parameters unchanged to get the model.h5 in pretrained_model?

TensorflowJS conversion

I am trying to convert the savedmodel using tensorflowjs 2.X using the following command:

tensorflowjs_converter --control_flow_v2=False --input_format=tf_saved_model --saved_model_tags=serve --signature_name=serving_default --strip_debug_ops=False --weight_shard_size_bytes=4194304 C:\Users\ss\Documents\workspace\DTLN\DTLN-master\pretrained_model\DTLN_norm_500h_saved_model C:\Users\ss\Documents\workspace\DTLN\tfjs

I get the following two errors:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 497, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 0 of node StatefulPartitionedCall/model/lstm/AssignVariableOp was passed float from Func/StatefulPartitionedCall/input/_4:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 482, in convert_tf_saved_model
frozen_graph = _freeze_saved_model_v2(concrete_func, control_flow_v2)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 352, in _freeze_saved_model_v2
concrete_func, lower_control_flow=not control_flow_v2).graph
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 680, in convert_variables_to_constants_v2
return _construct_concrete_function(func, output_graph_def, converted_inputs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\convert_to_constants.py", line 406, in _construct_concrete_function
new_output_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 633, in function_from_graph_def
wrapped_import = wrap_function(_imports_graph_def, [])
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 611, in wrap_function
collections={}),
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 86, in call
return self.call_with_variable_creator_scope(self._fn)(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 92, in wrapped
return fn(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\wrap_function.py", line 631, in _imports_graph_def
importer.import_graph_def(graph_def, name="")
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\importer.py", line 501, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: Input 0 of node StatefulPartitionedCall/model/lstm/AssignVariableOp was passed float from Func/StatefulPartitionedCall/input/_4:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\wizard.py", line 606, in run
converter.convert(arguments)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\converter.py", line 681, in convert
control_flow_v2=args.control_flow_v2)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 485, in convert_tf_saved_model
output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflowjs\converters\tf_saved_model_conversion_v2.py", line 342, in _freeze_saved_model_v1
sess, g.as_graph_def(), output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 359, in convert_variables_to_constants
inference_graph = extract_sub_graph(input_graph_def, output_node_names)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 205, in extract_sub_graph
_assert_nodes_are_present(name_to_node, dest_nodes)
File "c:\users\ss\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\graph_util_impl.py", line 160, in _assert_nodes_are_present
assert d in name_to_node, "%s is not in graph" % d
AssertionError: Identity is not in graph

How is y_pred conneted to output of generator implicitly in keras?

Hi, thanks for sharing the code!

I am new to Keras. I am try to train your model but confused about how y_pred is generated.

The input of model is yield by the create_generator function

        yield in_dat.astype('float32'), tar_dat.astype('float32')

then tf_data_set is created from the generator

 self.tf_data_set = tf.data.Dataset.from_generator(
                self.create_generator,
                (tf.float32, tf.float32),
                output_shapes=(tf.TensorShape([self.len_of_samples]), \
                               tf.TensorShape([self.len_of_samples])),
                args=None
                )

the data batches are generated by .batch op

generator_input = audio_generator(path_to_train_mix, 
                                  path_to_train_speech, 
                                  len_in_samples, 
                                  self.fs, train_flag=True)
dataset = generator_input.tf_data_set
dataset = dataset.batch(self.batchsize, drop_remainder=True).repeat()
# calculate number of training steps in one epoch
steps_train = generator_input.total_samples//self.batchsize
# create data generator for validation data
generator_val = audio_generator(path_to_val_mix,
                                path_to_val_speech, 
                                len_in_samples, self.fs)
dataset_val = generator_val.tf_data_set
dataset_val = dataset_val.batch(self.batchsize, drop_remainder=True).repeat()

then modeo.fir is called to fit the model

self.model.fit(
    x=dataset, 
    batch_size=None,
    steps_per_epoch=steps_train, 
    epochs=self.max_epochs,
    verbose=1,
    validation_data=dataset_val,
    validation_steps=steps_val, 
    callbacks=[checkpointer, reduce_lr, csv_logger, early_stopping],
    max_queue_size=50,
    workers=4,
    use_multiprocessing=True)

In terms of calculation of the loss

    loss = tf.squeeze(self.cost_function(y_pred,y_true))
    # calculate mean over batches
    loss = tf.reduce_mean(loss)

y_pred,y_true is passed to the cost_function.

For some other codes that I've read, usually y_pred and y_true is clearly calculated, such as:

y_pred = Model(x_train_data) # output of model
y_true = x_label

I know that here, y_pred is the output of the model when in_dat is inputted to the model, and y_true is tar_dat. However, in yout model, I cannot find such calculations.

To summarize, my questions are:

How is y_pred conneted to output of generator (in_data)?
If I want to modify ouput of the generator and the loss function(e.g. original noise data and filenames), what should I do to differentiate which data are input to the model(such as in_dat) and which are not (such as tar_dat)?

I'm training model 48K model, block_len and block_shift，encoder_size，numUnits ？

Thanks for your wonderful job. @breizhn，48K model and 44.1K model，block_len and block_shift，encoder_size，numUnits, shouled I change these parameters?

about traing a real time model

Hello, in your code about training, you used 15s of data and then divided into frames, so the timestep of the data input to lstm is the number of frames, then I want to know about if I want to train a real-time processing model, Then the input data during training is single frame data, like (batch,timestep=1,512) or 15s data like (batch,timestep=1873,512)

Modify to 32K sample rate

I tried to modified the DTLN model to 32K, but it always got error.
''----------------------------------------------------------------------------------------------------------------
File "/Library/Python/3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1110, in fit
raise ValueError('Expect x to be a non-empty array or dataset.')
ValueError: Expect x to be a non-empty array or dataset.
''---------------------------------------------------------------------------------------------------------------

I changed the dataset to 32K, and modify self.fs = 32000, and all fs related stuffs.

Please help...

How to retraining for 10ms audio

How to retraining this model for 10ms audio frame : ）

problem in running fixed size batch processing with realtime_processing_tflite.py

it gives different outputs with different batch sizes.
Input with batch size 1 gives a decent output.
but with batch size 10, it gives a disturbing audio output.

Any tricks to improve DTLN performance on Raspberry?

I test the real_time_audio_device.py, when there are two human voices in a environment, I want to extract one from the mixed speech, but the performance is not good, any tricks to improve this?

Onnx in Javascript

@breizhn I am experimenting with loading Onnx model in js like this and experiencing an error. Looks like something on the input to lstm4 is not correct. All the ops are supported in Onnx I checked like DropOut, Activation etc here https://github.com/microsoft/onnxjs/blob/master/docs/operators.md

Any ideas why ?

    async function loadDTLN_modelOnnx()
    {
      
        const model1 = new onnx.InferenceSession();
        const model2 = new onnx.InferenceSession();
        var output1 = await model1.loadModel("./model_1.onnx");
        var output2 = await model2.loadModel("./model_2.onnx")
     };

It throws an error:

graph.ts:313 Uncaught (in promise) Error: unrecognized input '' for node: lstm_4
at t.buildGraph (graph.ts:313)
at new t (graph.ts:139)
at Object.from (graph.ts:77)
at t.load (model.ts:25)
at session.ts:85
at t.event (instrument.ts:294)
at e.initialize (session.ts:81)
at e. (session.ts:63)
at inference-session-impl.ts:16
at Object.next (inference-session-impl.ts:16)

How to calculate those metrics in the DTLN paper?

Could you please share some code about calculating the metrics , like PESQ SI-SDR STOI score in the DTLN paper .

Google research

https://github.com/google-research/google-research/tree/master/kws_streaming

In there models they have embedded MFCC so you just point the 16k audio stream in chunks and the streaming KWS works.

There is FFT in the python ops and just wondered with the TFlite models and a look at what they did above could improve performance.
This is all beyond me but after some testing the embedded MFCC seems approx 2x faster than an external routine with Librosa.

Dunno if the above is any you to you. @breizhn

The B3 model in paper

Hi，Thank you for your fantastic implement about DTLN! I just want to know more details about model b3 in paper, only one STFT&iFFT and 2*2 lstm layer in b3? what the difference compare to b1?

ERROR: Could not find a version that satisfies the requirement tflite-runtime==2.1.0.post1

I just want to run using tflite model and executed
conda env create -f tflite_env.yml
But got error on End.
My system is Mac Big Sur 11.6
Someone please help me.

Paper link is not working

update paper link. The current paper link is not working.

Valid paper link should be https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2631.pdf

40h training early stopped between 80~90 epochs

Hi Nils,
Thank you for your great work! Recently I tried to train my model exactly as you described in the environment created with train_env.yml, the only difference is my Ubuntu 20.04, however my training always early stopped between 80-90 epochs, val_loss -16~-17, no matter how I recreated dataset for training. Any suggestions to improve?
Thanks,
Junmin Guo

Would work with audio streaming?

Hi, first of all great work!

I'm wandering if it would be possible to use this method on audio streaming, because of the block shift that it is used.

I'd be possible? If so, should I do some modifications like for example not using the block shift?

Thanks!

question about LSTM states

I'm trying to run your model on arm platform by TVM or any other inference engine. But there is nothing but tf-lite supported stateful LSTM. From my experiment, I find that the model without states achieves a bad performance.
Is there any solution to achieve a good performance with stateless LSTM?
which mean:
model_1 = Model(inputs=mag, outputs=mask_1)
model_2 = Model(inputs=estimated_frame_1, outputs=decoded_frame)
I need your help

Model fails with exception if given epoch more than 62

Hi,

I'm trying to run your model on python 3.8.5 with the Tensorflow version of 2.3.1.
It works fine with lower epochs but as soon as it goes to 62 or higher epoch, the model fails with the exception given below

2021-01-06 15:28:33.866558: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]

Have you faced such an error or can you tell what can be the solution for it?

Thanks in advance!

Training the model with no shift loss doesn't improve

@breizhn Hi Nils,
I have pulled your prepared sample set for training with just modifying the blk_shift to be 512 i.e no shift. I notice no significant improvement after 40+ epoch. Is this expected? What loss and val loss did you get in your training?

Epoch 00055: val_loss improved from -13.22399 to -13.22746, saving model to ./models_DTLN_model_512_norm/DTLN_model_512_norm.h5
3000/3000 [==============================] - 8885s 3s/step - loss: -13.2950 - val_loss: -13.2275
Epoch 56/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2961
Epoch 00056: val_loss did not improve from -13.22746
3000/3000 [==============================] - 8886s 3s/step - loss: -13.2961 - val_loss: -13.2204
Epoch 57/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2977
Epoch 00057: val_loss did not improve from -13.22746
3000/3000 [==============================] - 8894s 3s/step - loss: -13.2977 - val_loss: -13.2184
Epoch 58/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2971
Epoch 00058: val_loss improved from -13.22746 to -13.23236, saving model to ./models_DTLN_model_512_norm/DTLN_model_512_norm.h5
3000/3000 [==============================] - 8888s 3s/step - loss: -13.2971 - val_loss: -13.2324
Epoch 59/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2989
Epoch 00059: val_loss did not improve from -13.23236
3000/3000 [==============================] - 8890s 3s/step - loss: -13.2989 - val_loss: -13.2236
Epoch 60/200
3000/3000 [==============================] - ETA: 0s - loss: -13.2993
Epoch 00060: val_loss did not improve from -13.23236
3000/3000 [==============================] - 8889s 3s/step - loss: -13.2993 - val_loss: -13.2229
Epoch 61/200
2698/3000 [=========================>....] - ETA: 13:08 - loss: -13.3098

The data augment method used for

The data augment method used for 40h training dataset better than 500h

Hi! I want to know what data augment method you used in 40h training dataset to achieve better performance than 500h training dataset

Convert model.h5 to tflite

I'm trying to convert the provided pretrained model to tflite:

converter = tf.lite.TocoConverter.from_keras_model_file("./model.h5") tflite_model = converter.convert() open(os.path.join(".", "model.tflite"), 'wb').write(tflite_model)

And I get the following error:

ValueError: No model found in config file.

Full error message:

Traceback (most recent call last): File "/home/qendrim/solaborate/repos/solaborate/Solaborate.ML.KPP/src/s4_export_to_tflite.py", line 18, in <module> converter = tf.lite.TocoConverter.from_keras_model_file("./model.h5") File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 1002, in from_keras_model_file input_shapes, output_arrays) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 747, in from_keras_model_file keras_model = _keras.models.load_model(model_file, custom_objects) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/python/keras/saving/save.py", line 146, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/home/qendrim/anaconda3/envs/solaborate-ml/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 209, in load_model_from_hdf5 raise ValueError('No model found in config file.') ValueError: No model found in config file.

python: 3.7
tensorflow: 1.4

What can I do to convert this model to tflite?

TFLite Android

Hi,
I want to use tflite model in android project. When I load model to android studio it generates a code like below:

`
val model = Dtln.newInstance(context)

// Creates inputs for reference.
val inputFeature0 = TensorBuffer.createFixedSize(intArrayOf(1, 1, 512), DataType.FLOAT32)
inputFeature0.loadBuffer(byteBuffer)
val inputFeature1 = TensorBuffer.createFixedSize(intArrayOf(1, 2, 128, 2), DataType.FLOAT32)
inputFeature1.loadBuffer(byteBuffer)

// Runs model inference and gets result.
val outputs = model.process(inputFeature0, inputFeature1)
val outputFeature0 = outputs.outputFeature0AsTensorBuffer
val outputFeature1 = outputs.outputFeature1AsTensorBuffer

// Releases model resources if no longer used.
model.close()
`

My question is what is the inputFeature0 and inputFeature1 in this code? Should I read wav file as byte array than reshape it? Or Should I create feature vector of wav file? Can you help me with this?

Thanks

Low version tf model training

Hello, can I use tensorflow version 1.15 to train the model？thanks

run_training.py, shape problem

I'm trying to train a new model using run_training.py.
When I read my .wav files using scipy.io.wavfile.read I get a 2d array, something like the following.

[[ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 2 2], [ 2 2], [ 2 2], [ 2 2], [ 3 3], [ 5 5], [ 6 6], [ 6 6], [ 5 5], [ 3 3], [ 3 3], [ 2 3], [ 3 3], [ 2 2], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 2 2], [ 2 2], [ 0 0], [-2 -3], [-5 -5], [-5 -5], [-2 -2], [ 0 0], [ 2 2], [ 3 3], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 1 1], [ 2 2], [ 1 1], [ 0 0], [ 0 0], [ 0 0], [ 1 1], [ 2 2], [ 1 1], [ 0 0], [-1 -1], [-2 -2], [-2 -2], [-1 -1], [ 0 0], [ 1 1], [ 2 2], [ 2 2]]

When I run the script, I get the following error:

ValueError: generator yielded an element of shape (720000, 2) where an element of shape (720000,) was expected.

Full error message:

`
Traceback (most recent call last):
File "/home/qendrim/solaborate/repos/ML/NoiseCancellation/DTLN/run_training.py", line 51, in
modelTrainer.train_model(runName, path_to_train_mix, path_to_train_speech, path_to_val_mix, path_to_val_speech)
File "/home/qendrim/solaborate/repos/ML/NoiseCancellation/DTLN/DTLN_model.py", line 383, in train_model
use_multiprocessing=True)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit
total_epochs=epochs)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
distributed_function(input_fn))
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in call
result = self._call(*args, **kwds)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
return self._stateless_fn(*args, **kwds)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2363, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
self.captured_inputs)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: generator yielded an element of shape (720000, 2) where an element of shape (720000,) was expected.
Traceback (most recent call last):

File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in call
ret = func(*args)

File "/home/qendrim/anaconda3/envs/solab-py37-tf2/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 825, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))

ValueError: generator yielded an element of shape (720000, 2) where an element of shape (720000,) was expected.

[[{{node PyFunc}}]]
[[IteratorGetNext]] [Op:__inference_distributed_function_10659]

Function call stack:
distributed_function
`

python: 3.7
tensorflow: 2.1

Microphone sampling rate

I was trying to get real_time_dtln_audio.py to work . Should i change my microphone input and speaker output to 16khz as well ?

There is no any code with Training data preparation?

onnx quantization

Thanks for your code! If I want to do the quantization on the onnx model, Coud you give me some advice?

question about B1 model.

Hi @breizhn, thanks for your job.
I found the B1 model(4 Layer, STFT) in your paper. I understand there is only one separation kernel using an STFT analysis and synthesis basis and four LSTM layers in the B1 model( the second separation core is removed), and the training target is the negative SNR. Do I understand correctly?

Some questions about real-time denoising

Breizhn, I have few question of real-time denoising.

First, what the 'quantization' argument in 'convert_weights_to_tf_lite.py' mean. If i want ti get the tf lite model, I just use the tf model (.h5) I just trained as input to 'convert_weights_to_tf_lite.py', right? And, if I set the argument 'quantization' be True, then the result of denoising shold be better, right? Did I misunderstanding?

And, when I run the code, 'real_time_dtln_audio.py', I notice that result of output is much worse than the result when I use 'run_evaluation.py' with tf model. Is that because I used tf lite model without quantization?

The last but not least, if I want to change the value of block_len_ms and block_shift_ms, I need to retrain the model with new value of batchsize, blockLen, and block_shift in DTLN_model(), right?

Thank you a lot.

Issue with tflite interpreter

Hi,
In DTLN model, I did some changes in separation kernel which looks as given below
` x = keras.layers.Conv1D( B, 1,use_bias=False )(x)

    y =  keras.layers.Conv1D(N, 1,use_bias=False  )(x)
    y =  keras.activations.relu(y, alpha = 0.01 ) 

    y =  InstantLayerNormalization()(y) 
    y =  keras.layers.Conv1D( N, kernel_size=P,  strides=1, padding= "causal", dilation_rate=2**0, groups=N,use_bias=False)(y) 
 
    y = keras.activations.relu(y, alpha = 0.01 )
    y = InstantLayerNormalization()(y)
    v = keras.layers.Conv1D( B, 1 ,use_bias=False)(y)
    z = y
    x = x + v

When I'm trying tflite convertion and tflite interpreter I'm getting error which is as mentioned below
"RuntimeError: tensorflow/lite/kernels/conv.cc:238 input->dims->data[3] != filter->dims->data[3] (32 != 1)Node number 1 (CONV_2D) failed to prepare"
I tried to debug the issue and understood that if I kept x = x+v or using v as the input for the next layers then I'm getting this error. (due to dilated convolution dependency with v)

1.How to resolve this error?
2. If we cannot do it directly with tflite , then is there any alternative method for post quantization of model such as given above code?

Any help will be greatly appreciable . Thanks in Advance!

Real-time-processing Tf_lite inference program ERROR

Hi I loaded a noisy one channel .wav file to perform inference using real_time_processing_tf_lite.py.
I got the following Dimension mismatch error.

Traceback (most recent call last):
File "real_time_processing_tf_lite.py", line 70, in
interpreter_1.set_tensor(input_details_1[0]['index'], states_1)
File "/Users/cyberon/anaconda3/envs/dtln_py/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 399, in set_tensor
self._interpreter.SetTensor(tensor_index, value)
File "/Users/cyberon/anaconda3/envs/dtln_py/lib/python3.7/site-packages/tflite_runtime/interpreter_wrapper.py", line 148, in SetTensor
return _interpreter_wrapper.InterpreterWrapper_SetTensor(self, i, value)
ValueError: Cannot set tensor: Dimension mismatch. Got 4 but expected 3 for input 0.

Any inputs on this will be helpful!

breizhn / dtln Goto Github PK

dtln's Introduction

Dual-signal Transformation LSTM Network

Citing:

Contents of the README:

Results:

Execution Times:

Audio Samples:

Contents of the repository:

Python dependencies:

Training data preparation:

Run a training of the DTLN model:

Measuring the execution time of the DTLN model with the SavedModel format:

Real time processing with the SavedModel format:

Real time processing with tf-lite:

Real time audio with sounddevice and tf-lite:

Model conversion and real time processing with ONNX:

dtln's People

Contributors

Stargazers

Watchers

Forkers

dtln's Issues

Layer (type) Output Shape Param # Connected to

lambda_2 (Lambda) (None, None) 0 lambda_1[0][0]

Recommend Projects

Recommend Topics

Recommend Org