Comments (5)
Hey @barinov274 - It's not trivial to get ASR timestamps for our model unfortunately. Since it shares with translation tasks, decoding process is not "monotonic" like other ASR approaches (e.g. CTC) Technically i-th generated token t_i could attend to x-th source token s_x and t_j to s_y, with i < j but x > y.
from seamless_communication.
Hi @barinov274! Unlike Whisper and although we both use an encoder-decoder architecture, we didn't train for ASR with timestamp tokens. Our focus is translation and ASR is treated as S2TT in the same source language. As @cndn mentioned we can technically attend to the source audio in a non-monotonic fashion. That said, we can potentially leverage the encoder-decoder attention matrices to infer some monotonic alignment between the source audio and target text and use that to output timestamps. I'll try this option and see if it's accurate then share updates here
from seamless_communication.
I would also like to know this as it would be incredibly beneficial to have this as an optional feature!
from seamless_communication.
I'll try something using this approach. For vod content is ok because we don't have problem in wait to result and make a srt/cry as result. I'm try to understand at all because I'm looking for some models and approach's to make this happen with live streaming content.
Thinking in resource necessary to do this what is your recommendation ? Thinking in production environment?
from seamless_communication.
Any progress on this? I am very far removed from the Machine Learning world so I am unable to contribute, but I'm keen on using either this model or Meta MMS to generate subtitles for a low-resource language (whisper is utterly incapable of getting good results for the respective language).
from seamless_communication.
Related Issues (20)
- finetune results HOT 4
- bug
- unknown result in t2tt HOT 1
- artifact model seamless expression
- Translated from Chinese to English, nitrogen converts hydrogen gas, oxygen converts oxidation, so terrible
- فروشگاهپرشینفیلتر
- Why always Downloading the tokenizer of seamlessM4T_v2_large HOT 1
- Where can I set the max input length
- !!!!!!!!!Bug RuntimeError: expected scalar type Half but found Float $$$$$$$$$$ HOT 9
- Request for Enterprise Use
- not Fixed in #400
- How to use speech to text predit in batch by batch not one audio
- Could the demo run in windows 11?
- Error with GPU 40G !!!!!!!!! HOT 1
- Error with GPU 40GB !!!!!!!!! HOT 6
- use orginal .pth or FineTuned .pth
- FineTune Errror TEXT_TO_SPEECH Same lang
- .pth file
- NotImplementedError: T2U finetuning implemented only for UnitYT2UModel why??!!
- How to segment hate speech downloaded from the Mutox dataset tsv file HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from seamless_communication.