Comments (3)
I recorded my own .wav for the text prompt and ran it and I got something sounding "kind of like my voice" but as if inside a glass jar with all the words mangled
after cloning your code, installing the required dependencies , making symlink
recording "But even the unsuccessful dramatist has his moments. to
/home/data/Language/7176_92135_000004_000000.wav
and running this command :
sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" \
--infer_expt_dir ckpts/tts/valle_libritts \
--infer_output_dir $OUT_DIR \
--infer_mode "single" \
--infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." \
--infer_text_prompt "But even the unsuccessful dramatist has his moments." \
--infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
If I look at the log I see a couple lines that may be the problem?
WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1
appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
Exprimental Configuration File: ckpts/tts/valle_libritts/args.json
Text: This is a clip of generated speech with the given text from Amphion Vall-E model.
The following values were not passed to accelerate launch
and had defaults used instead:
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--mixed_precision
was set to a value of 'no'
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
DEBUG:matplotlib:matplotlib data path: /opt/conda/lib/python3.9/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/appuser/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:matplotlib:CACHEDIR=/home/weights/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /home/weights/matplotlib/fontlist-v330.json
Namespace(config='ckpts/tts/valle_libritts/args.json', dataset=None, testing_set='test', test_list_file='None', speaker_name=None, text='This is a clip of generated speech with the given text from Amphion Vall-E model.', vocoder_dir=None, acoustics_dir='ckpts/tts/valle_libritts', checkpoint_path=None, mode='single', log_level='debug', pitch_control=1.0, energy_control=1.0, duration_control=1.0, output_dir='/home/data/Language/VallE', text_prompt='But even the unsuccessful dramatist has his moments.', audio_prompt='/home/data/Language/7176_92135_000004_000000.wav', top_k=-100, temperature=1.0, continual=False, copysyn=False, ref_audio='', device='cuda', inference_step=200)
INFO:inference:========================================================
INFO:inference:|| New inference process started. ||
INFO:inference:========================================================
INFO:inference:
DEBUG:inference:Acoustic model dir: ckpts/tts/valle_libritts
DEBUG:inference:Setting random seed done in 0.28ms
DEBUG:inference:Random seed: 10086
INFO:inference:Building model...
INFO:inference:Building model done in 607.009ms
INFO:inference:Initializing accelerate...
INFO:inference:Initializing accelerate done in 242.029ms
INFO:inference:Loading checkpoint...
INFO:accelerate.accelerator:Loading states from ckpts/tts/valle_libritts/checkpoint/final_epoch-0100_step-0837900_loss-3.883116
INFO:accelerate.checkpointing:All model weights loaded successfully
INFO:accelerate.checkpointing:All optimizer states loaded successfully
INFO:accelerate.checkpointing:All scheduler states loaded successfully
INFO:accelerate.checkpointing:All dataloader sampler states loaded successfully
INFO:accelerate.checkpointing:Could not load random states
INFO:accelerate.accelerator:Loading in 0 custom states
INFO:inference:Loading checkpoint done in 537.945ms
/opt/conda/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)
Saved to: /home/data/Language/VallE/single
(base) appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
Driver Version: 535.129.03 CUDA Version: 12.2
torch.version '2.1.2'
running insider docker container
from amphion.
Thank you for your feedback. You can double-check on the prompt examples we provided.
from amphion.
Hi @xvdp, if you have any further questions, feel free to re-open this issue. We are glad to follow up!
from amphion.
Related Issues (20)
- [BUG]-NaturalSpeech2 data preprocess & pitch loss HOT 2
- [BUG]- state_dic saved by "accelerator" cannot be load due to "shared tensors" problem HOT 1
- N
- [Help]: Need a list of hardware configurations. HOT 2
- [Feature]: FACodec training HOT 1
- [Help]: May I ask when naturalspeech3_facodec/resolve/main/ns3_facodec_encoder_v2.bin will be released? HOT 1
- [BUG]: Typos in TTA task HOT 2
- [Help]: MultiGPU TTA training HOT 2
- length mismatch for FACodecDecoderV2 HOT 1
- [Help]: FACodec. How to recreate demo examples for voice conversion? HOT 8
- [BUG]: TTA ldm training loss
- [Help]: while trainning transfomerSVC HOT 4
- How to convert text to target audio(TTS) using ns3_codec(naturalspeech3) HOT 6
- [BUG]: FastSpeech2 train failed when under cuda_devices="1,2,3" HOT 3
- [Help] How to do data processing of the tta project? HOT 2
- [Help]: When using Valle_libritts pre -training model, the model failed to load the model correctly. HOT 3
- [Feature]: For Music - VALL-E transformer RAG (and other embedding solutions) HOT 1
- [BUG]: ns2_dataset.py does not have this two part, phones and num_frames, which must be need in ns2_trainer.py HOT 4
- [Help]: FileNotFoundError: [Errno 2] No such file or directory: 'data\\metadata\\libritts\\train-clean-100#1970#28415#1970_28415_000067_000000.pkl' HOT 2
- [BUG]: FACodec outputs noise HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amphion.