fgnt / nara_wpe Goto Github PK
View Code? Open in Web Editor NEWDifferent implementations of "Weighted Prediction Error" for speech dereverberation
License: MIT License
Different implementations of "Weighted Prediction Error" for speech dereverberation
License: MIT License
Hi,
Is the online WPE version can be used as a front-end processing system for real-time integrated ASR application.
I notice that it takes 0.8s for processing a single frame that I think it's not available for real-time system comparing with spectral subtraction method, whereas using spectral subtraction method didnot give good performance in recognition after dereverberation.
In the comment line describing the input filter_taps
in online_wpe_step
(line 702 of wpe.py) it says the dimensions are (F, taps*D, taps) .
Is that incorrect? Shouldn't it be (F, taps*D, D) ? The two terms that make up the pred
calculation end up being different sizes and are not broadcastable. The third dimension being equal to D would also be consistent with how filter_taps
is declared in the __init__
function of class OnlineWPE
as well.
Thanks for your sharing.
My question is how the 8 wav files in the data folder were recorded?
Is there some special microphone arrangement? or?
I'm encountering an issue while using nara_wpe.utils.istft
in my project. The error is triggered by the default parameter for window, which is set to scipy.signal.blackman
. However, I discovered that scipy.signal
does not have a blackman
attribute. Instead, the correct way to access the blackman
function is by using scipy.signal.windows.blackman
.
Expected Behavior:
The blackman
function should be accessible directly from scipy.signal
.
Actual Behavior:
Encountered an AttributeError due to the absence of the blackman attribute in scipy.signal
.
Workaround:
Use scipy.signal.windows.blackman
instead of scipy.signal.blackman
.
Environment:
nara_wpe
version: >=0.0.7
scipy
version: >= 1.6.0
Python version: 3.9
Hi.
Thanks for this great tool. When I run numpy online example notebook, an iterator runs ( 986it [00:30, 32.21it/s] ). Online implementation is supposed to be non-iterative (according to the paper). Right?
Also, I noticed, Numpy offline is much faster than Numpy online. I expected the opposite.
Is there anything I should edit to avoid iterating?
Thanks.
Kanthasamy
When running nara_wpe.torch_wpe.wpe_v6
with PyTorch 1.11, I'm seeing the following warning:
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:766.)
G, _ = torch.solve(P, R)
It seems modifying G, _ = torch.solve(P, R)
to G = torch.linalg.solve(R, P)
does the trick.
Great work and thx tor your sharing!
When I'm running WPE_Numpy_offline jupyter notebook with my audio file, the result does not sounds dereverbed, so I need some help with adjusting params or any suggestions.
My audio file is a 4 minutes two channel karaoke soundtrack recorded by my friends and I need to extract her clean voice track. First I am using spleeter to remove background music to get a two channel voice track. Then I am dereverbing voice track with your jupyter scripts. For the stereo input processing I simply followed method in #42 . Here are my visualized outputs:
s
And here is my params setting:
.
Hi, thank you for the great work.
Any hints on how to test the algorithm on a single stereo file?
Notebooks show batch editing and I am failing to test it on a single stereo input.
Thank you.
Hey!
I have a wet sound of a clap inside a small room 4.3 x 1.8 x 2.4 meters. I want to remove the reverb from the sound before passing it on.
Later on I want to use this code to remove reverb from a machine room on a ship with several electric motors, hydraulics and valves. Is it possible to remove reverb with this code?
Hi,
Firstly, thank you for the great work!
I would like to use WPE as a preprocessing step to a source localization algorithm. To do that I'd need to use a "MIMO" version of WPE which would preserve the delays between the channels.
It seems like this option is not yet implemented. If so, I would be happy to contribute to that, although I'd need some guidance on the algorithmic part. Would this be interesting to you?
Kind regards,
Eric
HiοΌwhat difference between offline and online?In what scenarios should two different types of code be used? Thank you
when I use Tensorflow offline-WPE ,I got this problem.Is there any solution?
@LukasDrude Could you please explain why choosing STFT size 512 (with shift 128)? Is is related to the coherence bandwidth of RIR?
@LukasDrude I think the file utils.py is not working with Python2, I get the error:
from nara_wpe.utils import stft
Traceback (most recent call last):
File "", line 1, in
File "/export/b18/asubraman/espnet-local/espnet/tools/venv/local/lib/python2.7/site-packages/nara_wpe/utils.py", line 111
*,
^
SyntaxError: invalid syntax
Hello!
I test a two-channel speech use the numpy realization you offered. The effct is very good!So I wanna ask what's the difference between your wpe algorithm and the previous wpe algorithms?
You mentioned the recursive formulation used in the Google Home speech assistant hardware,is your recursive formulation the same with it?
How can I solve the tensorflow version problem?
In 'tf_wpe.py', tensorflow.contrib code make problem.
Tensorflow 2x version doesn't support it.
I encountered some problems when using nara_wpe.
After reading the paper "NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing", I want to try nara_wpe for de-reverberation, so I learn how to use nara_wpe according to the IPython Notebook in nara_wpe/examples.
And according to the description of Table 1 in the paper, I learned that there is no Block-Online version of the Numpy implemention, while the TensorFlow implemention has the Block-Online version.
When I read the code in IPython Notebook, I saw the Frame-Online version of the code in WPE_Numpy_online.ipynb, but in WPE_Tensorflow_online.ipynb, I found that there is only the Frame-Online version and I can not find the Block-Online version.
(I also read the source code of nara_wpe and found a comment in "wpe.py -> class OnlineWPE -> def _get_prediction":
# TODO: Only block shift of 1 works.
I wonder When block shift=1, is Block-Online equivalent to Frame-Online?)
So my first question is: how do I enable the Block-Online version?
Then I tested my code on the testset (Simulated 8ch wavs) with the Offline and Frame-Online version of the Numpy implementation, record the time cost, and calculate the word error rate (WER).
My result is as follows:
Unprocess Reverb wav: WER=9.97.
Offline: WER=7.10, the RealTimeFactor=1.7.
Frame-Online: WER=12.39, the RealTimeFactor=9.87.
The result shows that Frame-Online 's WER and RealTimeFactor are high.
So my second question is: Is this result reasonable, if it is reasonable, why the Frame-Online version so slow?
import numpy as np
import soundfile as sf
from tqdm import tqdm
from nara_wpe.wpe import wpe
from nara_wpe.wpe import get_power
from nara_wpe.utils import stft, istft, get_stft_center_frequencies
from nara_wpe import project_root
stft_options = dict(size=512, shift=128)
channels = 2
sampling_rate = 48000
delay = 3
iterations = 5
taps = 10
alpha=0.9999
file_template = 'r:/reverb.wav'
signal_list = [
sf.read(str(project_root / 'data' / file_template.format(d + 1)))[0]
for d in range(channels)
]
y = np.stack(signal_list, axis=0)
Y = stft(y, **stft_options).transpose(2, 0, 1)
Z = wpe(
Y,
taps=taps,
delay=delay,
iterations=iterations,
statistics_mode='full'
).transpose(1, 2, 0)
z = istft(Z, size=stft_options['size'], shift=stft_options['shift'])
from scipy.io import wavfile
wavfile.write('new_audio.wav', sampling_rate, z.T)
sf.write('new_audio.wav', z.T, sampling_rate)
Result,
Y = stft(y, **stft_options).transpose(2, 0, 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: axes don't match array
It looks nice, but I can't yet use it.
I have generated audio file after pre-processing with numpy WPE but the resultant audio is not audible.
I have used soundfile and scipy.io.wavfile to save the audio file but not able to clean the audio and generate the original clean audio.
hi, I tried to test both NumPy offline and online methods following the ipynb file under the example folder. For the same audio (generated, 8khz, about 8s long, 4 channel, rt60=0.2) the results are quite different.
online result:
offline result:
left figs are the original spectrogram.
compare to the online result, it seems nothing left when using an offline method. I used the offline method with default parameters and tried both wpe_v8 and wpe API. The norm of value in the offline results is more than 10 times lower than online.
And I found the result of the online method seems good. it improved the TDOA estimate result. So there must be something wrong when I used the offline one I think.
Do you have any suggestions? What did I miss?
I'm a bit confused here. The paper in equations 5 and 17 uses the psd context parameter Ξ΄ for psd estimation, however both the OnlineWPE
class's call as well as the get_power_online used by the notebook example use taps + delay + 1 (K + π₯ + 1) instead.
Hello. Noobie question, as someone not really familiar with pip - if I follow the README and run pip install --editable .
, this happens, but how do I run example code?
Obtaining file:///home/hhrutz/Documents/devel/nara_wpe
Requirement already satisfied: pathlib2; python_version < "3.0" in /usr/lib/python2.7/dist-packages (from nara-wpe==0.0.6) (2.3.3)
Requirement already satisfied: numpy in /home/hhrutz/.local/lib/python2.7/site-packages (from nara-wpe==0.0.6) (1.12.1)
Collecting tqdm (from nara-wpe==0.0.6)
Downloading https://files.pythonhosted.org/packages/bb/62/6f823501b3bf2bac242bd3c320b592ad1516b3081d82c77c1d813f076856/tqdm-4.39.0-py2.py3-none-any.whl (53kB)
100% |ββββββββββββββββββββββββββββββββ| 61kB 675kB/s
Collecting soundfile (from nara-wpe==0.0.6)
Downloading https://files.pythonhosted.org/packages/68/64/1191352221e2ec90db7492b4bf0c04fd9d2508de67b3f39cbf093cd6bd86/SoundFile-0.10.2-py2.py3-none-any.whl
Collecting bottleneck (from nara-wpe==0.0.6)
Downloading https://files.pythonhosted.org/packages/62/d0/55bbb49f4fade3497de2399af70ec0a06e432c786b8623c878b11e90d456/Bottleneck-1.3.1.tar.gz (88kB)
100% |ββββββββββββββββββββββββββββββββ| 92kB 1.7MB/s
Installing build dependencies ... done
Requirement already satisfied: click in /usr/lib/python2.7/dist-packages (from nara-wpe==0.0.6) (7.0)
Requirement already satisfied: scandir in /usr/lib/python2.7/dist-packages (from pathlib2; python_version < "3.0"->nara-wpe==0.0.6) (1.9.0)
Collecting cffi>=1.0 (from soundfile->nara-wpe==0.0.6)
Downloading https://files.pythonhosted.org/packages/93/5d/c4f950891251e478929036ca07b22f0b10324460c1d0a4434c584481db51/cffi-1.13.2-cp27-cp27mu-manylinux1_x86_64.whl (384kB)
100% |ββββββββββββββββββββββββββββββββ| 389kB 1.5MB/s
Collecting pycparser (from cffi>=1.0->soundfile->nara-wpe==0.0.6)
Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
100% |ββββββββββββββββββββββββββββββββ| 163kB 912kB/s
Building wheels for collected packages: bottleneck, pycparser
Running setup.py bdist_wheel for bottleneck ... done
Stored in directory: /home/hhrutz/.cache/pip/wheels/31/36/8f/1ed7e6f1b3295499c8bbab934262f2494d0f6aebe0c5860754
Running setup.py bdist_wheel for pycparser ... done
Stored in directory: /home/hhrutz/.cache/pip/wheels/f2/9a/90/de94f8556265ddc9d9c8b271b0f63e57b26fb1d67a45564511
Successfully built bottleneck pycparser
Installing collected packages: tqdm, pycparser, cffi, soundfile, bottleneck, nara-wpe
The script tqdm is installed in '/home/hhrutz/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Running setup.py develop for nara-wpe
Successfully installed bottleneck-1.3.1 cffi-1.13.2 nara-wpe pycparser-2.19 soundfile-0.10.2 tqdm-4.39.0
I'm kind of lost here as to the next step. I installed juypiter via pip install jupyter --user
, then run jupyter-notebook
and in the browser try to open an example. But something is not right:
$ jupyter-notebook
[I 23:10:37.685 NotebookApp] Serving notebooks from local directory: /home/hhrutz/Documents/devel/nara_wpe
[I 23:10:37.685 NotebookApp] The Jupyter Notebook is running at:
[I 23:10:37.685 NotebookApp] http://localhost:8888/?token=c7eb70e17b0f6c7495ec505fd788f22dfcd5879c1dd157aa
[I 23:10:37.685 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 23:10:37.688 NotebookApp]
To access the notebook, open this file in a browser:
file:///home/hhrutz/.local/share/jupyter/runtime/nbserver-3688-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=c7eb70e17b0f6c7495ec505fd788f22dfcd5879c1dd157aa
[E 23:10:43.634 NotebookApp] Uncaught exception GET /notebooks/examples/WPE_Numpy_offline.ipynb (::1)
HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/notebooks/examples/WPE_Numpy_offline.ipynb', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
File "/home/hhrutz/.local/lib/python2.7/site-packages/tornado/web.py", line 1590, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/home/hhrutz/.local/lib/python2.7/site-packages/tornado/web.py", line 3006, in wrapper
return method(self, *args, **kwargs)
File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/notebook/handlers.py", line 59, in get
get_custom_frontend_exporters=get_custom_frontend_exporters
File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/base/handlers.py", line 519, in render_template
return template.render(**ns)
File "/home/hhrutz/.local/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/home/hhrutz/.local/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/templates/notebook.html", line 1, in top-level template code
{% extends "page.html" %}
File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/templates/page.html", line 154, in top-level template code
{% block header %}
File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/templates/notebook.html", line 120, in block "header"
{% for exporter in get_custom_frontend_exporters() %}
File "/home/hhrutz/.local/lib/python2.7/site-packages/notebook/notebook/handlers.py", line 19, in get_custom_frontend_exporters
from nbconvert.exporters.base import get_export_names, get_exporter
File "/home/hhrutz/.local/lib/python2.7/site-packages/nbconvert/__init__.py", line 4, in <module>
from .exporters import *
File "/home/hhrutz/.local/lib/python2.7/site-packages/nbconvert/exporters/__init__.py", line 1, in <module>
from .base import (export, get_exporter,
File "/home/hhrutz/.local/lib/python2.7/site-packages/nbconvert/exporters/base.py", line 8, in <module>
import entrypoints
File "/usr/lib/python2.7/dist-packages/entrypoints.py", line 16, in <module>
import configparser
File "/home/hhrutz/.local/lib/python2.7/site-packages/configparser.py", line 11, in <module>
from backports.configparser import (
ImportError: cannot import name ConverterMapping
Hi all,
I found that there is a small bug in the function segment_axis(). The parameter 'overlap' seems to be shift instead of overlap.
Best
Sining
@danielhkl Currently, the notebooks are not tested. There might be a way to automatically convert them to Python scripts and then at least run a smoke test.
It is not clear exactly how all the wpe
versions relate to each other? are they variants? or different types of optimizations? and is it possible to know which one is recommended to use ?
Hi,
I am trying online WPE with my own 2-channel data. After 2 or 3 iterations outputs became NAN. The official example works properly. How do I fix this problem.
Many thanks!
I'm trying to save the result as wav file but getting only silence. Am I using the wrong format?
I've added this to WPE_Numpy_offline.ipynb after the Iterative WPE cell:
import soundfile
soundfile.write('output.wav', z[0].astype(np.int16), sampling_rate, 'PCM_16')
I notice there are three versions of exampleοΌtensorflowοΌnumpy and NTTοΌ Are there any differences between these versions of WPEοΌ
NTT is new version that has never been beforeοΌCan you help me to solve the difference of these three versions.
Thank you!
Hello,
I ran into an issue with WPE_online: turns out it canβt handle audio signals that have segments of absolute silent (continuous constant value). The way I found out about this was as follows:
While running (WPE online class fashion etcβ¦)_ on a ~2.5 hr session, I got this warning message:
31104it [00:15, 2050.81it/s]//anaconda3/lib/python3.7/site-packages/nara_wpe/wpe.py:790: RuntimeWarning: invalid value encountered in true_divide
self.kalman_gain = nominator / denominator[:, None]
37489it [00:18, 1997.07it/s]
Also, when I listened to the output file I noticed it was fine up to about 4 mins in, and then it went completely silent for the rest of the 2.5 hr file. This experience recurred for both multi- and single-channel files.
When I took a look at the signal, I turned out that the input signal has a short silent segment right where the output file went silent. here is an example of the input and output plotted on the same time scale
I can not see significant improvement in numpy WPE although I changed some parameters e.g delay -> 5 and iterations -> 15.
But I think speech enhancement is not that significant.
Please find the attached audio sample for improving.
20210909143636_26_395_sad_sam__5231.wav.zip
Hi thanks for your works on nara_wpe, I learn quiet a lot from your implementation and your paper,
I tried to integrated tensorflow-offline-wpe with my ASR system,
However, time spend on a 3.5s audio for tf-offline-wpe is ~7s while the numpy version only takes ~200ms
I tried tf-offline-wpe in GPU. What I have done is just do the wpe dreverberation processing under a tf.session for all the audio file
so my code is something like
with tf.Session(config=config) as session:
with tf.device('/gpu:0'):
<wpe for all the audio file in test_data_set>
But it takes more time than numpy versio which confuses me a lot. I expect the tf nara_wpe version will be faster than the numpy one
Best
Hi there, I finally got the notebooks working, thank you very much for this great repository. I have two related questions:
Hi,
Are you planning to support the python 2 version?
@LukasDrude, is it possible to include the numpy installation in setup.py
? We have a small problem of including nara_wpe to our package due to https://github.com/fgnt/nara_wpe/blob/master/setup.py#L11
We can avoid it manually by installing numpy in advance (in espnet/espnet#541), but it would be great if nara_wpe handles it by itself, and then we don't need such installation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.