Describe the bug Examples listed in the egs/tts/VALLE/README.md f

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[BUG]: prompt_examples/*.wav missing about amphion HOT 3 CLOSED

open-mmlab commented on July 16, 2024

[BUG]: prompt_examples/*.wav missing

from amphion.

Comments (3)

xvdp commented on July 16, 2024

I recorded my own .wav for the text prompt and ran it and I got something sounding "kind of like my voice" but as if inside a glass jar with all the words mangled

after cloning your code, installing the required dependencies , making symlink
recording "But even the unsuccessful dramatist has his moments. to
/home/data/Language/7176_92135_000004_000000.wav

and running this command :

 sh egs/tts/VALLE/run.sh  --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" \
--infer_expt_dir ckpts/tts/valle_libritts \
--infer_output_dir $OUT_DIR \
--infer_mode "single" \
--infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model."  \
--infer_text_prompt "But even the unsuccessful dramatist has his moments." \
--infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

If I look at the log I see a couple lines that may be the problem?

WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1

appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
Exprimental Configuration File: ckpts/tts/valle_libritts/args.json
Text: This is a clip of generated speech with the given text from Amphion Vall-E model.
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
DEBUG:matplotlib:matplotlib data path: /opt/conda/lib/python3.9/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/appuser/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:matplotlib:CACHEDIR=/home/weights/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /home/weights/matplotlib/fontlist-v330.json
Namespace(config='ckpts/tts/valle_libritts/args.json', dataset=None, testing_set='test', test_list_file='None', speaker_name=None, text='This is a clip of generated speech with the given text from Amphion Vall-E model.', vocoder_dir=None, acoustics_dir='ckpts/tts/valle_libritts', checkpoint_path=None, mode='single', log_level='debug', pitch_control=1.0, energy_control=1.0, duration_control=1.0, output_dir='/home/data/Language/VallE', text_prompt='But even the unsuccessful dramatist has his moments.', audio_prompt='/home/data/Language/7176_92135_000004_000000.wav', top_k=-100, temperature=1.0, continual=False, copysyn=False, ref_audio='', device='cuda', inference_step=200)
INFO:inference:========================================================
INFO:inference:|| New inference process started. ||
INFO:inference:========================================================
INFO:inference:

DEBUG:inference:Acoustic model dir: ckpts/tts/valle_libritts
DEBUG:inference:Setting random seed done in 0.28ms
DEBUG:inference:Random seed: 10086
INFO:inference:Building model...
INFO:inference:Building model done in 607.009ms
INFO:inference:Initializing accelerate...
INFO:inference:Initializing accelerate done in 242.029ms
INFO:inference:Loading checkpoint...
INFO:accelerate.accelerator:Loading states from ckpts/tts/valle_libritts/checkpoint/final_epoch-0100_step-0837900_loss-3.883116
INFO:accelerate.checkpointing:All model weights loaded successfully
INFO:accelerate.checkpointing:All optimizer states loaded successfully
INFO:accelerate.checkpointing:All scheduler states loaded successfully
INFO:accelerate.checkpointing:All dataloader sampler states loaded successfully
INFO:accelerate.checkpointing:Could not load random states
INFO:accelerate.accelerator:Loading in 0 custom states
INFO:inference:Loading checkpoint done in 537.945ms
/opt/conda/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)
Saved to: /home/data/Language/VallE/single
(base) appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

Driver Version: 535.129.03 CUDA Version: 12.2
torch.version '2.1.2'
running insider docker container

from amphion.

lmxue commented on July 16, 2024

Thank you for your feedback. You can double-check on the prompt examples we provided.

from amphion.

RMSnow commented on July 16, 2024

Hi @xvdp, if you have any further questions, feel free to re-open this issue. We are glad to follow up!

from amphion.

[BUG]: prompt_examples/*.wav missing about amphion HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent