zain-jiang / speech-editing-toolkit Goto Github PK

View Code? Open in Web Editor NEW

178.0 178.0 18.0 2.62 MB

It's a repository for implementations of neural speech editing algorithms.

Python 99.84% Shell 0.16%

speech-editing-toolkit's People

Contributors

Stargazers

Watchers

Forkers

entn-at ishine xzm2004260 q-y-tang shaun95 maxmax2016 axxxjt jxu-thu tatsuya-yamaguchi fightseed maoshuiyang danablend lokshaw-chau liangzheng-zl amorjnyh as1078 piyushsinghpasi tangzhimiao cisetn

speech-editing-toolkit's Issues

请问支持多段同时编辑吗

比如
句子a：我叫小明，我买了从北京到上海的票。
句子b：我叫小红，我买了从天津到广东的票。

我想同时把小明替换成小红，北京替换成天津，上海替换成广东，不知道能否实现

when would SASE Dataset be available

During inference, is it generated as a whole or partially?

Hi,
I recently discovered in experiments that the audio generated using fluentspeech is also different from the original audio in the non-modified parts. Here are the waveforms of the unmodified portion of both, with the original audio above and the generated audio below.

However, in the fluentspeech paper, it is stated that the masked part is generated in Reverse Diffusion, and the non-mask part has not been changed.Here is the figure in the paper.

Is the generation generated as a whole or partially?

Is there a Chinese example

Dear Jiang,

#9 (comment)

You mentioned the current package support Chinese also, not sure if you can share an example please? Thanks for your excellent work. Today, after spent whole day, and I finally finished the installation of MFA, and also successfully built a model by running "run_mfa_train_align.sh". I plan to build a Chinese model soon, any suitable dataset you will suggest? Like Aishell2?

Kelvin

Demo page link is broken

This page is 404 now.
https://speechai-demo.github.io/FluentSpeech/

an error when trying to infer with spec_denoiser.py

Thanks for your excellent work, but I encountered an error when trying to infer with python inference/tts/spec_denoiser.py

Traceback (most recent call last):
File "inference/tts/spec_denoiser.py", line 272, in
StutterSpeechInfer.example_run()
File "inference/tts/spec_denoiser.py", line 259, in example_run
wav_out, wav_gt, mel_out, mel_gt, masked_mel_out, masked_mel_gt = infer_ins.infer_once(inp)
File "/data3/liukaiyang/Speech-Editing-Toolkit/inference/tts/base_tts_infer.py", line 97, in infer_once
output = self.forward_model(inp)
File "inference/tts/spec_denoiser.py", line 119, in forward_model
output = self.model(edited_txt_tokens, time_mel_masks=time_mel_masks, mel2ph=edited_mel2ph, spk_embed=sample['spk_embed'],
File "/root/anaconda3/envs/LKYBase/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/data3/liukaiyang/Speech-Editing-Toolkit/modules/speech_editing/spec_denoiser/spec_denoiser.py", line 159, in forward
ret = self.fs(txt_tokens, time_mel_masks, mel2ph, spk_embed, f0, uv, energy,
File "/root/anaconda3/envs/LKYBase/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, *kwargs)
File "/data3/liukaiyang/Speech-Editing-Toolkit/modules/speech_editing/spec_denoiser/fs.py", line 92, in forward
mel2ph = self.forward_dur(dur_inp, time_mel_masks, mel2ph, txt_tokens, ret, use_pred_mel2ph=use_pred_mel2ph)
File "/data3/liukaiyang/Speech-Editing-Toolkit/modules/speech_editing/spec_denoiser/fs.py", line 137, in forward_dur
masked_dur_gt = mel2token_to_dur(mel2ph(1-time_mel_masks).squeeze(-1).long(), T) * nonpadding
File "/data3/liukaiyang/Speech-Editing-Toolkit/utils/audio/align.py", line 85, in mel2token_to_dur
dur = mel2token.new_zeros(B, T_txt + 1).scatter_add(1, mel2token, torch.ones_like(mel2token))
RuntimeError: index 86 is out of bounds for dimension 1 with size 86

VCTK checkpoint

Thanks for the great work on this repository, really useful!

Wondering if there is a VCTK checkpoint that could be accessed, for use with speakers with UK accent?

Again thanks for this repository!

Automatic Stutter Removal

Are a text transcript, defined region, and defined edited_region all required for inference and training on automatic stutter removal? Is there any way to provide only the raw audio and destutter it? If so, would this be done by running spec_denoiser.py or another script?

Where to find mfa_dict.txt and mfa_model.zip?

Hi! I'm getting the following error when running python inference/tts/spec_denoiser.py --exp_name spec_denoiser. Where can I find the required files? I'm trying to run the basic pre-trained inference of FluentSpeech.

Traceback (most recent call last):
  File "inference/tts/spec_denoiser.py", line 350, in <module>
    dataset_info = data_preprocess(test_file_path, test_wav_directory, dictionary_path, acoustic_model_path,
  File "inference/tts/spec_denoiser.py", line 297, in data_preprocess
    assert os.path.exists(file_path) and os.path.exists(input_directory) and os.path.exists(acoustic_model_path), \
AssertionError: inference/example.csv,inference/audio,data/processed/libritts/mfa_dict.txt,data/processed/libritts/mfa_model.zip

How to use this to destutter an audio file?

Hi,

I'm trying to use this code to destutter some audio files. What's the process for this?

where is install_mfa.sh?

Sorry i can't find it.

inference_acl missing

from inference_acl.tts.infer_utils import get_align_from_mfa_output, extract_f0_uv

ModuleNotFoundError: No module named 'inference_acl'

A typo in README

In Data Preprocess part of README, the second instruction has a spell mistake: run_mfa_train_align.sh instead of run_mfa_train_aligh.sh

how to train vocoder from scratch

I want to train hifigan vocoder use my data, but i can't find a config file.

Any checkpoints

Hi, thanks for this amazing work, I am just wondering if we can provide any checkpoints to run for speech editing.

Gradio - No module named 'inference.tts.ps_flow'

Missing module

Colab anyone?

Is there any colab or Space to try that repo?

speech edit on Arabic audio

Hello, thanks for the great repo!

If I want to edit arabic speech. What do you suggest for best practices ?
Train/finetune FluentSpeech on Arabic audio and keep the vocoder as is ?
Also, what about the code, do I need to change some files for it to be able to edit Arabic speech ?

Which version of MFA is used

Could the author provide which version of MFA(montreal-forced-aligner) is used?

Thank you!

Broken link in README

Hi, the following link provided in the README file seems broken:

https://drive.google.com/drive/folders/1H-dk7cNYVn1DSzYq_q66rS5b5xpbdBi4?usp=sharing

Where could we find the data/binary/libritts/phone_set.json file for the pretrained models if not there?

Thanks in advance.

How to process and inference?

Hi,
I used the pre-trained model for reasoning and found that mfa_model.zip and mfa_dict.txt were missing. I downloaded the relevant models from the official mfa and created folders by myself to put them in.

However, the output shows:

Do I need to perform the data processing part first?
After entering the following command：

show:

How should I solve this problem and I need help with!