Giter Site home page Giter Site logo

fgnt / nara_wpe Goto Github PK

View Code? Open in Web Editor NEW
452.0 16.0 166.0 3.92 MB

Different implementations of "Weighted Prediction Error" for speech dereverberation

License: MIT License

Python 100.00%
dereverberation audio audio-processing signal-processing enhancement

nara_wpe's People

Contributors

boeddeker avatar danielhkl avatar jazcarretao avatar jheymann85 avatar lukasdrude avatar lzljbsc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nara_wpe's Issues

Time consumption for online version

Hi,
Is the online WPE version can be used as a front-end processing system for real-time integrated ASR application.
I notice that it takes 0.8s for processing a single frame that I think it's not available for real-time system comparing with spectral subtraction method, whereas using spectral subtraction method didnot give good performance in recognition after dereverberation.

Comment error regarding dimension of filter_taps in online_wpe_step

In the comment line describing the input filter_taps in online_wpe_step (line 702 of wpe.py) it says the dimensions are (F, taps*D, taps) .

Is that incorrect? Shouldn't it be (F, taps*D, D) ? The two terms that make up the pred calculation end up being different sizes and are not broadcastable. The third dimension being equal to D would also be consistent with how filter_taps is declared in the __init__ function of class OnlineWPE as well.

The wav files in the data folder

Thanks for your sharing.
My question is how the 8 wav files in the data folder were recorded?
Is there some special microphone arrangement? or?

AttributeError: module 'scipy.signal' has no attribute 'blackman'

I'm encountering an issue while using nara_wpe.utils.istft in my project. The error is triggered by the default parameter for window, which is set to scipy.signal.blackman. However, I discovered that scipy.signal does not have a blackman attribute. Instead, the correct way to access the blackman function is by using scipy.signal.windows.blackman.

Expected Behavior:
The blackman function should be accessible directly from scipy.signal.

Actual Behavior:
Encountered an AttributeError due to the absence of the blackman attribute in scipy.signal.

Workaround:
Use scipy.signal.windows.blackman instead of scipy.signal.blackman.

Environment:
nara_wpe version: >=0.0.7
scipy version: >= 1.6.0
Python version: 3.9

Iteration running on Numpy Online example

Hi.
Thanks for this great tool. When I run numpy online example notebook, an iterator runs ( 986it [00:30, 32.21it/s] ). Online implementation is supposed to be non-iterative (according to the paper). Right?
Also, I noticed, Numpy offline is much faster than Numpy online. I expected the opposite.
Is there anything I should edit to avoid iterating?
Thanks.
Kanthasamy

PyTorch module emits UserWarning

When running nara_wpe.torch_wpe.wpe_v6 with PyTorch 1.11, I'm seeing the following warning:

torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:766.)
  G, _ = torch.solve(P, R)

It seems modifying G, _ = torch.solve(P, R) to G = torch.linalg.solve(R, P) does the trick.

Question about handle long duration stereo audio file

Great work and thx tor your sharing!

When I'm running WPE_Numpy_offline jupyter notebook with my audio file, the result does not sounds dereverbed, so I need some help with adjusting params or any suggestions.

My audio file is a 4 minutes two channel karaoke soundtrack recorded by my friends and I need to extract her clean voice track. First I am using spleeter to remove background music to get a two channel voice track. Then I am dereverbing voice track with your jupyter scripts. For the stereo input processing I simply followed method in #42 . Here are my visualized outputs:
2E0837CE-50E9-48F1-9EB4-0291941EEFE7
C14AEA49-630E-4FBE-B31A-5F263E98E3A4_4_5005_c
s
And here is my params setting:
48C982DC-7AC6-4121-ACD8-65174C890E8E_4_5005_c
.

Single stereo file implementation

Hi, thank you for the great work.

Any hints on how to test the algorithm on a single stereo file?
Notebooks show batch editing and I am failing to test it on a single stereo input.

Thank you.

Can this python code remove real time reverb?

Hey!

I have a wet sound of a clap inside a small room 4.3 x 1.8 x 2.4 meters. I want to remove the reverb from the sound before passing it on.

Later on I want to use this code to remove reverb from a machine room on a ship with several electric motors, hydraulics and valves. Is it possible to remove reverb with this code?

MIMO version of WPE

Hi,

Firstly, thank you for the great work!

I would like to use WPE as a preprocessing step to a source localization algorithm. To do that I'd need to use a "MIMO" version of WPE which would preserve the delays between the channels.

It seems like this option is not yet implemented. If so, I would be happy to contribute to that, although I'd need some guidance on the algorithmic part. Would this be interesting to you?

Kind regards,

Eric

python 2 support - utils.py

@LukasDrude I think the file utils.py is not working with Python2, I get the error:

from nara_wpe.utils import stft
Traceback (most recent call last):
File "", line 1, in
File "/export/b18/asubraman/espnet-local/espnet/tools/venv/local/lib/python2.7/site-packages/nara_wpe/utils.py", line 111
*,
^
SyntaxError: invalid syntax

Difference from the previous wpe algorithms

Hello!
I test a two-channel speech use the numpy realization you offered. The effct is very good!So I wanna ask what's the difference between your wpe algorithm and the previous wpe algorithms?
You mentioned the recursive formulation used in the Google Home speech assistant hardware,is your recursive formulation the same with it?

Syntax error in nara_wpe/utils.py", line 204

I get this when trying to run the first cell from WPE_Numpy_offline:

  File "/home/hhrutz/Documents/devel/nara_wpe/nara_wpe/utils.py", line 204
    *,
     ^
SyntaxError: invalid syntax

What do to?

Screenshot from 2019-11-24 23-55-16

Some questions about the Online version?

I encountered some problems when using nara_wpe.
After reading the paper "NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing", I want to try nara_wpe for de-reverberation, so I learn how to use nara_wpe according to the IPython Notebook in nara_wpe/examples.
And according to the description of Table 1 in the paper, I learned that there is no Block-Online version of the Numpy implemention, while the TensorFlow implemention has the Block-Online version.
When I read the code in IPython Notebook, I saw the Frame-Online version of the code in WPE_Numpy_online.ipynb, but in WPE_Tensorflow_online.ipynb, I found that there is only the Frame-Online version and I can not find the Block-Online version.
(I also read the source code of nara_wpe and found a comment in "wpe.py -> class OnlineWPE -> def _get_prediction":
# TODO: Only block shift of 1 works.
I wonder When block shift=1, is Block-Online equivalent to Frame-Online?)
So my first question is: how do I enable the Block-Online version?

Then I tested my code on the testset (Simulated 8ch wavs) with the Offline and Frame-Online version of the Numpy implementation, record the time cost, and calculate the word error rate (WER).
My result is as follows:

Unprocess Reverb wav: WER=9.97.
Offline: WER=7.10, the RealTimeFactor=1.7.
Frame-Online: WER=12.39, the RealTimeFactor=9.87.

The result shows that Frame-Online 's WER and RealTimeFactor are high.
So my second question is: Is this result reasonable, if it is reasonable, why the Frame-Online version so slow?

Help in running

import numpy as np
import soundfile as sf
from tqdm import tqdm
from nara_wpe.wpe import wpe
from nara_wpe.wpe import get_power
from nara_wpe.utils import stft, istft, get_stft_center_frequencies
from nara_wpe import project_root

stft_options = dict(size=512, shift=128)

channels = 2
sampling_rate = 48000
delay = 3
iterations = 5
taps = 10
alpha=0.9999

file_template = 'r:/reverb.wav'
signal_list = [
    sf.read(str(project_root / 'data' / file_template.format(d + 1)))[0]
    for d in range(channels)
]
y = np.stack(signal_list, axis=0)



Y = stft(y, **stft_options).transpose(2, 0, 1)

Z = wpe(
    Y,
    taps=taps,
    delay=delay,
    iterations=iterations,
    statistics_mode='full'
).transpose(1, 2, 0)
z = istft(Z, size=stft_options['size'], shift=stft_options['shift'])

from scipy.io import wavfile
wavfile.write('new_audio.wav', sampling_rate, z.T)
sf.write('new_audio.wav', z.T, sampling_rate)   

Result,


    Y = stft(y, **stft_options).transpose(2, 0, 1)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: axes don't match array

It looks nice, but I can't yet use it.

Save audio file after WPE processing

I have generated audio file after pre-processing with numpy WPE but the resultant audio is not audible.
I have used soundfile and scipy.io.wavfile to save the audio file but not able to clean the audio and generate the original clean audio.

Nothing left after using offline method, need help plz!

hi, I tried to test both NumPy offline and online methods following the ipynb file under the example folder. For the same audio (generated, 8khz, about 8s long, 4 channel, rt60=0.2) the results are quite different.
online result:
nara_online
offline result:
nara_offline
left figs are the original spectrogram.
compare to the online result, it seems nothing left when using an offline method. I used the offline method with default parameters and tried both wpe_v8 and wpe API. The norm of value in the offline results is more than 10 times lower than online.
And I found the result of the online method seems good. it improved the TDOA estimate result. So there must be something wrong when I used the offline one I think.

Do you have any suggestions? What did I miss?

How to run examples

Hello. Noobie question, as someone not really familiar with pip - if I follow the README and run pip install --editable ., this happens, but how do I run example code?

Obtaining file:///home/hhrutz/Documents/devel/nara_wpe
Requirement already satisfied: pathlib2; python_version < "3.0" in /usr/lib/python2.7/dist-packages (from nara-wpe==0.0.6) (2.3.3)
Requirement already satisfied: numpy in /home/hhrutz/.local/lib/python2.7/site-packages (from nara-wpe==0.0.6) (1.12.1)
Collecting tqdm (from nara-wpe==0.0.6)
  Downloading https://files.pythonhosted.org/packages/bb/62/6f823501b3bf2bac242bd3c320b592ad1516b3081d82c77c1d813f076856/tqdm-4.39.0-py2.py3-none-any.whl (53kB)
    100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 61kB 675kB/s 
Collecting soundfile (from nara-wpe==0.0.6)
  Downloading https://files.pythonhosted.org/packages/68/64/1191352221e2ec90db7492b4bf0c04fd9d2508de67b3f39cbf093cd6bd86/SoundFile-0.10.2-py2.py3-none-any.whl
Collecting bottleneck (from nara-wpe==0.0.6)
  Downloading https://files.pythonhosted.org/packages/62/d0/55bbb49f4fade3497de2399af70ec0a06e432c786b8623c878b11e90d456/Bottleneck-1.3.1.tar.gz (88kB)
    100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 92kB 1.7MB/s 
  Installing build dependencies ... done
Requirement already satisfied: click in /usr/lib/python2.7/dist-packages (from nara-wpe==0.0.6) (7.0)
Requirement already satisfied: scandir in /usr/lib/python2.7/dist-packages (from pathlib2; python_version < "3.0"->nara-wpe==0.0.6) (1.9.0)
Collecting cffi>=1.0 (from soundfile->nara-wpe==0.0.6)
  Downloading https://files.pythonhosted.org/packages/93/5d/c4f950891251e478929036ca07b22f0b10324460c1d0a4434c584481db51/cffi-1.13.2-cp27-cp27mu-manylinux1_x86_64.whl (384kB)
    100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 389kB 1.5MB/s 
Collecting pycparser (from cffi>=1.0->soundfile->nara-wpe==0.0.6)
  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
    100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 163kB 912kB/s 
Building wheels for collected packages: bottleneck, pycparser
  Running setup.py bdist_wheel for bottleneck ... done
  Stored in directory: /home/hhrutz/.cache/pip/wheels/31/36/8f/1ed7e6f1b3295499c8bbab934262f2494d0f6aebe0c5860754
  Running setup.py bdist_wheel for pycparser ... done
  Stored in directory: /home/hhrutz/.cache/pip/wheels/f2/9a/90/de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511
Successfully built bottleneck pycparser
Installing collected packages: tqdm, pycparser, cffi, soundfile, bottleneck, nara-wpe
  The script tqdm is installed in '/home/hhrutz/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  Running setup.py develop for nara-wpe
Successfully installed bottleneck-1.3.1 cffi-1.13.2 nara-wpe pycparser-2.19 soundfile-0.10.2 tqdm-4.39.0

I'm kind of lost here as to the next step. I installed juypiter via pip install jupyter --user, then run jupyter-notebook and in the browser try to open an example. But something is not right:

$ jupyter-notebook 
[I 23:10:37.685 NotebookApp] Serving notebooks from local directory: /home/hhrutz/Documents/devel/nara_wpe
[I 23:10:37.685 NotebookApp] The Jupyter Notebook is running at:
[I 23:10:37.685 NotebookApp] http://localhost:8888/?token=c7eb70e17b0f6c7495ec505fd788f22dfcd5879c1dd157aa
[I 23:10:37.685 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 23:10:37.688 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/hhrutz/.local/share/jupyter/runtime/nbserver-3688-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=c7eb70e17b0f6c7495ec505fd788f22dfcd5879c1dd157aa
[E 23:10:43.634 NotebookApp] Uncaught exception GET /notebooks/examples/WPE_Numpy_offline.ipynb (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/notebooks/examples/WPE_Numpy_offline.ipynb', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/home/hhrutz/.local/lib/python2.7/site-packages/tornado/web.py", line 1590, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "/home/hhrutz/.local/lib/python2.7/site-packages/tornado/web.py", line 3006, in wrapper
        return method(self, *args, **kwargs)
      File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/notebook/handlers.py", line 59, in get
        get_custom_frontend_exporters=get_custom_frontend_exporters
      File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/base/handlers.py", line 519, in render_template
        return template.render(**ns)
      File "/home/hhrutz/.local/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render
        return self.environment.handle_exception(exc_info, True)
      File "/home/hhrutz/.local/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
        reraise(exc_type, exc_value, tb)
      File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/templates/notebook.html", line 1, in top-level template code
        {% extends "page.html" %}
      File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/templates/page.html", line 154, in top-level template code
        {% block header %}
      File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/templates/notebook.html", line 120, in block "header"
        {% for exporter in get_custom_frontend_exporters() %}
      File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/notebook/handlers.py", line 19, in get_custom_frontend_exporters
        from nbconvert.exporters.base import get_export_names, get_exporter
      File "/home/hhrutz/.local/lib/python2.7/site-packages/nbconvert/__init__.py", line 4, in <module>
        from .exporters import *
      File "/home/hhrutz/.local/lib/python2.7/site-packages/nbconvert/exporters/__init__.py", line 1, in <module>
        from .base import (export, get_exporter,
      File "/home/hhrutz/.local/lib/python2.7/site-packages/nbconvert/exporters/base.py", line 8, in <module>
        import entrypoints
      File "/usr/lib/python2.7/dist-packages/entrypoints.py", line 16, in <module>
        import configparser
      File "/home/hhrutz/.local/lib/python2.7/site-packages/configparser.py", line 11, in <module>
        from backports.configparser import (
    ImportError: cannot import name ConverterMapping

a small bug in utils.py

Hi all,
I found that there is a small bug in the function segment_axis(). The parameter 'overlap' seems to be shift instead of overlap.

Best
Sining

Documentation about the variants

It is not clear exactly how all the wpe versions relate to each other? are they variants? or different types of optimizations? and is it possible to know which one is recommended to use ?

online TF outputs NAN

Hi,
I am trying online WPE with my own 2-channel data. After 2 or 3 iterations outputs became NAN. The official example works properly. How do I fix this problem.

Many thanks!

Save result as wav file

I'm trying to save the result as wav file but getting only silence. Am I using the wrong format?

I've added this to WPE_Numpy_offline.ipynb after the Iterative WPE cell:

import soundfile
soundfile.write('output.wav', z[0].astype(np.int16), sampling_rate, 'PCM_16')

Are there any differences between these versions of WPE?

I notice there are three versions of example,tensorflow,numpy and NTT? Are there any differences between these versions of WPE?
NTT is new version that has never been before?Can you help me to solve the difference of these three versions.
Thank you!

Online WPE can’t handle audio signals that have segments of absolute silent

Hello,
I ran into an issue with WPE_online: turns out it can’t handle audio signals that have segments of absolute silent (continuous constant value). The way I found out about this was as follows:
While running (WPE online class fashion etc…)_ on a ~2.5 hr session, I got this warning message:

31104it [00:15, 2050.81it/s]//anaconda3/lib/python3.7/site-packages/nara_wpe/wpe.py:790: RuntimeWarning: invalid value encountered in true_divide
self.kalman_gain = nominator / denominator[:, None]
37489it [00:18, 1997.07it/s]

Also, when I listened to the output file I noticed it was fine up to about 4 mins in, and then it went completely silent for the rest of the 2.5 hr file. This experience recurred for both multi- and single-channel files.
When I took a look at the signal, I turned out that the input signal has a short silent segment right where the output file went silent. here is an example of the input and output plotted on the same time scale
PastedGraphic-2

Question about asr training data preparations in paper.

Thanks for your job.
I find your nnwpe method outperforms other traditional methods on wer results of asr system after feeding data processed by wpe method or none.

Is your asr training data processed by wpe in advance?
Or will wer become better when using training data processed by nnwpe?

image

Something weird between Tensorflow-offline-wpe and numpy-offline-wpe

Hi thanks for your works on nara_wpe, I learn quiet a lot from your implementation and your paper,

I tried to integrated tensorflow-offline-wpe with my ASR system,
However, time spend on a 3.5s audio for tf-offline-wpe is ~7s while the numpy version only takes ~200ms

I tried tf-offline-wpe in GPU. What I have done is just do the wpe dreverberation processing under a tf.session for all the audio file
so my code is something like

with tf.Session(config=config) as session:
    with tf.device('/gpu:0'):
        <wpe for all the audio file in test_data_set>

But it takes more time than numpy versio which confuses me a lot. I expect the tf nara_wpe version will be faster than the numpy one

Best

Add iterations to frame-online / comparison with block-online

Hi there, I finally got the notebooks working, thank you very much for this great repository. I have two related questions:

  • would it be correct to assume that I could add iterativity to the frame-online (NumPy) version by simply cascading the entire chain multiple times (running the output again through the frame-online algorithm)?
  • since frame-online seems to be relatively slow; my understanding is that although the paper suggests a way to do block-online, this is not available in the notebook examples (NumPy). Could I just perform a large chunk sliding window, calculate the block ("batch") wpe and then crossfade the filter coefficients? any other pointers for implementing the block-online variant?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.