rookiejunchen / fullsubnet-plus Goto Github PK
View Code? Open in Web Editor NEWThe official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
License: Apache License 2.0
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
License: Apache License 2.0
学长您好,可以帮我看一下这个是什么问题导致的吗?我不明白一开始的use specified dataset_dir_list: ['result/data'], instead of in config是什么意思,我放了绝对路径相对路径都不行,感谢您的帮助。
`(speech_enhance) 123@123-Lenovo-Legion-R7000P2020H:/media/123/Axuan2/FullSubNet-plus$ ./A.sh
use specified dataset_dir_list: ['result/data'], instead of in config
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
[2022-11-11 01:25:27.873] Using CPU in the experiment.
[2022-11-11 01:25:27.874] Loading inference dataset...
[2022-11-11 01:25:27.883] Loading model...
[2022-11-11 01:25:27.996] 当前正在处理 tar 格式的模型断点,其 epoch 为:194.
[2022-11-11 01:25:28.019] Configurations are as follows:
[2022-11-11 01:25:28.020] [acoustics]
n_fft = 512
win_length = 512
sr = 16000
hop_length = 256
[inferencer]
path = "fullsubnet_plus.inferencer.inferencer.Inferencer"
type = "mag_complex_full_band_crm_mask"
[dataset]
path = "fullsubnet.dataset.dataset_inference.Dataset"
[model]
path = "fullsubnet_plus.model.fullsubnet_plus.FullSubNet_Plus"
[inferencer.args]
n_neighbor = 15
[dataset.args]
dataset_dir_list = [ "result/data",]
sr = 16000
[model.args]
sb_num_neighbors = 15
fb_num_neighbors = 0
num_freqs = 257
look_ahead = 2
sequence_model = "LSTM"
fb_output_activate_function = "ReLU"
sb_output_activate_function = false
channel_attention_model = "TSSE"
fb_model_hidden_size = 512
sb_model_hidden_size = 384
weight_init = false
norm_type = "offline_laplace_norm"
num_groups_in_drop_band = 2
kersize = [ 3, 5, 10,]
subband_num = 1
Traceback (most recent call last):
File "/home/123/anaconda3/envs/speech_enhance/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/123/anaconda3/envs/speech_enhance/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/media/123/Axuan2/FullSubNet-plus/speech_enhance/tools/inference.py", line 37, in
main(configuration, checkpoint_path, output_dir)
File "/media/123/Axuan2/FullSubNet-plus/speech_enhance/tools/inference.py", line 13, in main
inferencer = inferencer_class(
File "/media/123/Axuan2/FullSubNet-plus/speech_enhance/fullsubnet_plus/inferencer/inferencer.py", line 54, in init
super().init(config, checkpoint_path, output_dir)
File "/media/123/Axuan2/FullSubNet-plus/speech_enhance/audio_zen/inferencer/base_inferencer.py", line 59, in init
with open((root_dir / f"{time.strftime('%Y-%m-%d %H:%M:%S')}.toml").as_posix(), "w") as handle:
OSError: [Errno 22] Invalid argument: '/media/123/Axuan2/FullSubNet-plus/result/output/2022-11-11 01:25:28.toml'`
你好呀,我在复现你的工作的时候发现,train.toml出错了,我发现是因为snr_range = [-5,20]和metrics = ["WB_PESQ", "NB_PESQ", "STOI", "SI_SDR"]这里有问题,把这两行注释掉就没问题。我在网上没找到解决方法,请问您遇到过这样的问题吗?期待您的回复
学长您好,我基于您的基础上在流式推理上进行了探索,也读过ISSUE当中几个有关因为因果性实现不了实时的讨论,并做了一些实践,想请教下您。
然而我一开始尝试的是将一长度为60s的语音,基于以下命令:
ffmpeg -i input.wav -ss 00:00:xx -t 00:00:01 output.wav
编写一个bash脚本,切割成60个.wav文件,通过inference增强后再使用ffmpeg进行拼接。
然而我发现了一个问题: 包含人声的片段的1s依旧会得到增强,然而在一些raw语音是静默的片段,却会产生啸叫。
以下三张为语谱图,从上到下依次为原声,直接增强,基于1s为片段的增强拼接合成:
可以在语谱图上发现也会出现一些冲激。
然而并不是只要是silence的片段,就会产生啸叫,为此我做了以下实验:
wav = 0.0000001*np.random.randn(100000,) 生成一个能量极小的白噪声。
采样率为16k,我把其保存成.wav文件再做增强,同样地,也尝试过分割后增强,但是结论是并没有啸叫,只有白噪声本身被增强。
想请问下您基于算法原理,作为作者对这类问题的思考是怎样的?
While trying to install the requirements, I got the following error shown below. I'm using ubuntu 22.04, and I followed the instructions on the page, and I am using Nvidia GeForce 2080Ti. How can I get over this? is there a specific version I need to install? However, I already tried downgrading pypesq, which still failed.
ERROR: Failed building wheel for pypesq
Running setup.py clean for pypesq
Failed to build pypesq
Installing collected packages: pypesq
Running setup.py install for pypesq ... error
ERROR: Command errored out with exit status 1:
command: /home/username/anaconda3/envs/speech_enhance/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zzkoowzc/pypesq_c363aa2277764f5585f4781bcbe8b6fc/setup.py'"'"'; file='"'"'/tmp/pip-install-zzkoowzc/pypesq_c363aa2277764f5585f4781bcbe8b6fc/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-zcyemj5u/install-record.txt --single-version-externally-managed --compile --install-headers /home/username/anaconda3/envs/speech_enhance/include/python3.6m/pypesq
cwd: /tmp/pip-install-zzkoowzc/pypesq_c363aa2277764f5585f4781bcbe8b6fc/
Complete output (26 lines):
running install
running build
running build_py
file numpy.py (for module numpy) not found
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pypesq
copying pypesq/init.py -> build/lib.linux-x86_64-3.6/pypesq
file numpy.py (for module numpy) not found
running build_ext
building 'pesq_core' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/pypesq
gcc -pthread -B /home/choi1022linux/anaconda3/envs/speech_enhance/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/choi1022linux/anaconda3/envs/speech_enhance/lib/python3.6/site-packages/numpy/core/include/numpy -I/home/choi1022linux/anaconda3/envs/speech_enhance/include/python3.6m -c pypesq/pesq.c -o build/temp.linux-x86_64-3.6/pypesq/pesq.o
In file included from /home/username/anaconda3/envs/speech_enhance/lib/python3.6/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822,
from /home/username/anaconda3/envs/speech_enhance/lib/python3.6/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/username/anaconda3/envs/speech_enhance/lib/python3.6/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from pypesq/pesq.c:2:
/home/username/anaconda3/envs/speech_enhance/lib/python3.6/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with "
| ^~~~~~~
pypesq/pesq.c:5:10: fatal error: pesq.h: No such file or directory
5 | #include "pesq.h"
| ^~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /home/username/anaconda3/envs/speech_enhance/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zzkoowzc/pypesq_c363aa2277764f5585f4781bcbe8b6fc/setup.py'"'"'; file='"'"'/tmp/pip-install-zzkoowzc/pypesq_c363aa2277764f5585f4781bcbe8b6fc/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-zcyemj5u/install-record.txt --single-version-externally-managed --compile --install-headers /home/username/anaconda3/envs/speech_enhance/include/python3.6m/pypesq Check the logs for full command output.
你好,拜读了fullsubnet_plus 的代码,有个疑问请教:
虽然 look ahead=2,预示着模型参考了未来两帧的信息,但是在 MulCA 中计算各个频带的权重参数使用了AdaptiveAvgPool1d ,是不是意味着参考了一整帧的信息?
尝试使用 AvgPool1d 并且仅使用历史若干长度的信息,从而使得整个模型具备流式部署的可能?这样做性能会下降码?
谢谢
Although look ahead=2 indicates that the model references the information of the next two frames, AdaptiveAvgPool1d is used to calculate the weight parameters of each band in MulCA, does it mean that the information of the whole frame is referenced?
Try using AvgPool1d to reference only a few lengths of history to make the entire model possible for streaming deployment? Does this degrade the code?
When I tried to do the inference/test on my sample audio (.wav) file it getting killed.
Executing Command
python3 -m speech_enhance.tools.inference /
-C config/inference.toml -M /home/arvik/Desktop/ParadisoAI/FullSubNet-plus/best_model.tar /
-I /home/arvik/Desktop/ParadisoAI/FullSubNet-plus/input /
-O /home/arvik/Desktop/ParadisoAI/FullSubNet-plus/output
Is the model causal? It seems like during training and during inference the ChannelTimeSenseSELayer is used, where average pooling is taken along the frames axis, or I am supposed to process audio chunk-by-chunk to obtain the honest result with usage of only limited look ahead amount of data?
I know this issue because from torch 2.0+ Real datatype inputs are no longer supported. So any solution for this problem?
Dear author,
Thank you for this interesting solution.
Please check the comment from > PINTO0309 in:
PINTO0309/PINTO_model_zoo#187
"The sound transformation model depends on the audio length, but how would you like the input width to be fixed? ONNX with a variable input width will cause an error in Reshape operation and cannot be used."
Additionally, PINTO0309 converted the model in different formats:
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/254_FullSubNet-plus
Thank you.
I tried to make a quick usage using pre-trained checkpoint, but get the error "magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified."
my torch is 1.7.1 ,python is 3.8 . The command is "python -m speech_enhance.tools.inference -C config/inference.toml -M archive/data.pkl -I ~/mini_test_set/test.wav -O ~/fullout"
What can I do to run the demo successfully?
你好,谢谢你的工作。
我尝试训练遇到以下问题:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/lee/Documents/software/anaconda3/envs/fsnetplus/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/home/lee/Desktop/workspace/project/se/ns/tests/FullSubNet-plus/speech_enhance/tools/train.py", line 58, in entry model = initialize_module(config["model"]["path"], args=config["model"]["args"]) File "/home/lee/Desktop/workspace/project/se/ns/tests/FullSubNet-plus/speech_enhance/audio_zen/utils.py", line 91, in initialize_module class_or_function = getattr(module, class_or_function_name) AttributeError: module 'fullsubnet_plus.model.fullsubnet_plus' has no attribute 'Sub_FullSubNet_Plus'
我在工程中全局搜索没有找到这个函数或文件,请问该如何解决?
大佬你好,问个问题,关于因果性,看到代码里边TCNBlock使用的归一化是groupnorm(1, channel),但是输入norm的数据维度是(B,channel,T),这样是因果的嘛?这样似乎每个通道里边都包含所有帧的信息。
Thank you for sharing your great code. 😺
What is the license for this model? I'd like to cite it to the repository I'm working on if possible, but I want to post the license correctly.
https://github.com/PINTO0309/PINTO_model_zoo
https://github.com/PINTO0309/PINTO_model_zoo/tree/main/254_FullSubNet-plus
Thank you.
First of all thank you so much for making your implementation public. I have a query regarding causality of the model published.
In the paper it was proposed that the proposed architecture is real-time and i could even see the Inferencer code dealing with chunks of audio. Yet, i came across from one of the comments that the model published in the paper/ implementation available here in Github is non-causal.
Incase if it's not non-causal, would it be possible to list down the changes that are needed to be done to make it causal ? Thanks.
Hey,
first of all I'd like to thank you for this great model and for sharing it on github!
a small bug i found:
as we know, cIRM isn't bounded and thus we are able to get mask amplitudes that are larger than 1.
this can cause clipping in the enhanced signal.
to fix this you check it:
if abs(enhanced).any() > 1:
print(f"Warning: enhanced is not in the range [-1, 1], {name}")
I think you meant:
if (abs(enhanced) > 1).any():
print(f"Warning: enhanced is not in the range [-1, 1], {name}")
after fixing this I see quite a lot of clipping...
学长,我有个问题,num_groups_in_drop_band要是不为1,输出的掩膜维度的F不就变了嘛,后续还原语音的时候,就和原来的幅度谱大小对不上了
I tried to use your pre-trained checkpoints, data.pkl, to inference noisy signals, but found out that there is problem in torch.load() function, indicating failure to load the .pkl file. I searched the Internet and suggestions are installing appropriate torch version. I tried a few versions but it still doesn't work.
Is the loss function used for FullSubNet and FullSubNet+ the same? Also, could you clarify specifically what loss you used, as there is inconsistency within the repository -- e.g. loss.py is defined but never called. Thank you for the clarification.
Hi,
I encountered an error like that when training FullSubNet-Plus about 100 epoch on DNS dataset. What is the reason and how to solve it.
error:
File "/venv/py365/lib/python3.6/site-packages/librosa/util/utils.py", line 310, in valid_audio
raise ParameterError("Audio buffer is not finite everywhere")
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere.
Looking forward to your reply!!
I am trying to reproduce the FullSubNet+ on some speech enhancement datasets. The results are amazing, the noise suppression ability of this method is so good, and very impressive! 🤩🤩🤩
I stumbled across an implementation detail in the paper and code that piqued my curiosity. Regarding the paper in the MulCA module (if I'm not misunderstanding, its code is implemented in ChannelTimeSenseSELayer
). Three concurrently processed nn.Sequential
are used here, each sequence in turn contains Depthwise Conv1d
, AdaptiveAvgPool1d
, and ReLU
. These features are then subsequently merged together using fully connected layers.
One question that puzzles me is that if the order of operations is Conv1d
and then AdaptiveAvgPool1d
is used, based on the distributive law of multiplication, it seems that the process of convolution can be approximated basically by the following simplified form (stride=1):
AvgPool1d(Conv1d(A,weight,bias)) ≈ A.sum(-1)*weight.sum(-1)/(A.shape[-1]-kernel_size+1)+bias + small_sided_error
(may be similar to the above formula for subband_num>1)
We might be able to define weight.sum(-1)/(A.shape[-1]-kernel_size+1)
and bias
as two float32 parameters, then use the Maxout activation function to combine the three-way convolution into a simple channel summation.
Are there any special considerations for the design of MulCA through Conv1d? I think the simplified implementation is very similar to ChannelSELayer
.
Hi,attention模块里包含了一个[B, num_channels, T]到[B, num_channels, 1]的AdaptiveAvgPool1d自适应池化操作,这一步会使用整个时间轴上的信息,这个是不是不能够实时化?
If I want to train my own dataset, what should the structure of the dataset be, and should the names of the files in the clean and noise folders be in one-to-one correspondence, thank you!
I find a problem when training my model:
soundfile.LibsndfileError: Error opening 'xx/xx/xx.wav': File contains data in an unknown format.
I run this in Ubuntu
I have tried many methods. What can I do about this problem? Think you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.