Comments (3)
Hey @rtruszkowski - I imagine the question is about getting a translation model that generates a specific voice? Then you only need to train your own vocoder. We trained the vocoder following the HifiGAN implementation in https://github.com/facebookresearch/speech-resynthesis - our multilingual version is slightly different in aux embedding.
Given a dataset you have with your own voice
(1) With our UPCOMING unit_extraction pipeline (XLSR + kmeans), extract discrete units (WIP #17 by @kauterry )
(2) Train vocoder using the library above
(3) At inference time, replace our multilingual vocoder with your vocoder
from seamless_communication.
#17 is merged, so this should be possible.
from seamless_communication.
Hey @rtruszkowski - I imagine the question is about getting a translation model that generates a specific voice? Then you only need to train your own vocoder. We trained the vocoder following the HifiGAN implementation in https://github.com/facebookresearch/speech-resynthesis - our multilingual version is slightly different in aux embedding.
Given a dataset you have with your own voice (1) With our UPCOMING unit_extraction pipeline (XLSR + kmeans), extract discrete units (WIP #17 by @kauterry ) (2) Train vocoder using the library above (3) At inference time, replace our multilingual vocoder with your vocoder
Thanks for the suggestion, I'm curious about the amount of voice data. So, how many seconds of voice data do I need to collect to train a stable vocoder? And how long does the training process takes?
Further, is there any more convenient way?
Thank you for your reply :P
from seamless_communication.
Related Issues (20)
- Why only one previous word is used as input when predict the current word? HOT 2
- Incorrect layer index between offline discrete unit extraction and UnitY2 forced alignment
- OOM with 20GB GPU on SeamlessStreaming evaluate
- Is it possible to run SeamlessStreaming on an Apple M1 Pro?
- [Finetune] Error(s) in loading state_dict for UnitYModel
- finetune.run failed on assert batch.text_to_units.prev_output_tokens is not None
- Analysis of Audio Frame Alignment Discrepancies in Metadata Retrieval Process
- seamlessM4T_v2_large finetuning on speech translation task
- Confidence scores for the predictions generated?
- MuTox dataset not accessible HOT 1
- Wrong result for traditional Chinese HOT 5
- Some languages do not support speech synthesis
- fairseq2.assets.metadata_provider.AssetNotFoundError: An asset with the name '/Models/seamlessM4T_v2_large.pt' cannot be found.
- Facebook information
- LM Rescoring for Seamless text decoder HOT 1
- Initializing the model on an M3 Mac fails on a fresh conda environment
- What is the format of the input data .tsv in speech recognition? Is there any example file? HOT 2
- Seamless-M4T-v2 Catastrophic transcription error on clear audio (german), but file works fine in whisper v2
- How to reproduce T2TT result using HF? HOT 3
- How to save text translation to txt file? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seamless_communication.