Giter Site home page Giter Site logo

Comments (3)

xvdp avatar xvdp commented on July 16, 2024

I recorded my own .wav for the text prompt and ran it and I got something sounding "kind of like my voice" but as if inside a glass jar with all the words mangled

after cloning your code, installing the required dependencies , making symlink
recording "But even the unsuccessful dramatist has his moments. to
/home/data/Language/7176_92135_000004_000000.wav

and running this command :

 sh egs/tts/VALLE/run.sh  --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" \
--infer_expt_dir ckpts/tts/valle_libritts \
--infer_output_dir $OUT_DIR \
--infer_mode "single" \
--infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model."  \
--infer_text_prompt "But even the unsuccessful dramatist has his moments." \
--infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

If I look at the log I see a couple lines that may be the problem?

WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1

appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav
Exprimental Configuration File: ckpts/tts/valle_libritts/args.json
Text: This is a clip of generated speech with the given text from Amphion Vall-E model.
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
DEBUG:matplotlib:matplotlib data path: /opt/conda/lib/python3.9/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/appuser/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:matplotlib:CACHEDIR=/home/weights/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /home/weights/matplotlib/fontlist-v330.json
Namespace(config='ckpts/tts/valle_libritts/args.json', dataset=None, testing_set='test', test_list_file='None', speaker_name=None, text='This is a clip of generated speech with the given text from Amphion Vall-E model.', vocoder_dir=None, acoustics_dir='ckpts/tts/valle_libritts', checkpoint_path=None, mode='single', log_level='debug', pitch_control=1.0, energy_control=1.0, duration_control=1.0, output_dir='/home/data/Language/VallE', text_prompt='But even the unsuccessful dramatist has his moments.', audio_prompt='/home/data/Language/7176_92135_000004_000000.wav', top_k=-100, temperature=1.0, continual=False, copysyn=False, ref_audio='', device='cuda', inference_step=200)
INFO:inference:========================================================
INFO:inference:|| New inference process started. ||
INFO:inference:========================================================
INFO:inference:

DEBUG:inference:Acoustic model dir: ckpts/tts/valle_libritts
DEBUG:inference:Setting random seed done in 0.28ms
DEBUG:inference:Random seed: 10086
INFO:inference:Building model...
INFO:inference:Building model done in 607.009ms
INFO:inference:Initializing accelerate...
INFO:inference:Initializing accelerate done in 242.029ms
INFO:inference:Loading checkpoint...
INFO:accelerate.accelerator:Loading states from ckpts/tts/valle_libritts/checkpoint/final_epoch-0100_step-0837900_loss-3.883116
INFO:accelerate.checkpointing:All model weights loaded successfully
INFO:accelerate.checkpointing:All optimizer states loaded successfully
INFO:accelerate.checkpointing:All scheduler states loaded successfully
INFO:accelerate.checkpointing:All dataloader sampler states loaded successfully
INFO:accelerate.checkpointing:Could not load random states
INFO:accelerate.accelerator:Loading in 0 custom states
INFO:inference:Loading checkpoint done in 537.945ms
/opt/conda/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
WARNING:phonemizer:words count mismatch on 200.0% of the lines (2/1)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)
Saved to: /home/data/Language/VallE/single
(base) appuser@zL:~/Amphion$ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" --config "ckpts/tts/valle_libritts/args.json" --infer_expt_dir ckpts/tts/valle_libritts --infer_output_dir $OUT_DIR --infer_mode "single" --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." --infer_text_prompt "But even the unsuccessful dramatist has his moments." --infer_audio_prompt /home/data/Language/7176_92135_000004_000000.wav

Driver Version: 535.129.03 CUDA Version: 12.2
torch.version '2.1.2'
running insider docker container

from amphion.

lmxue avatar lmxue commented on July 16, 2024

Thank you for your feedback. You can double-check on the prompt examples we provided.

from amphion.

RMSnow avatar RMSnow commented on July 16, 2024

Hi @xvdp, if you have any further questions, feel free to re-open this issue. We are glad to follow up!

from amphion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.