seamlessm4t can generate tranion so I thought that it storing timestamps somewhe

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Is it possible to generate subtitles like whisper ai? about seamless_communication HOT 5 OPEN

facebookresearch commented on August 15, 2024 2

Is it possible to generate subtitles like whisper ai?

from seamless_communication.

Comments (5)

cndn commented on August 15, 2024 7

Hey @barinov274 - It's not trivial to get ASR timestamps for our model unfortunately. Since it shares with translation tasks, decoding process is not "monotonic" like other ASR approaches (e.g. CTC) Technically i-th generated token t_i could attend to x-th source token s_x and t_j to s_y, with i < j but x > y.

from seamless_communication.

elbayadm commented on August 15, 2024 6

Hi @barinov274! Unlike Whisper and although we both use an encoder-decoder architecture, we didn't train for ASR with timestamp tokens. Our focus is translation and ASR is treated as S2TT in the same source language. As @cndn mentioned we can technically attend to the source audio in a non-monotonic fashion. That said, we can potentially leverage the encoder-decoder attention matrices to infer some monotonic alignment between the source audio and target text and use that to output timestamps. I'll try this option and see if it's accurate then share updates here

from seamless_communication.

dillfrescott commented on August 15, 2024

I would also like to know this as it would be incredibly beneficial to have this as an optional feature!

from seamless_communication.

Leeaandrob commented on August 15, 2024

I'll try something using this approach. For vod content is ok because we don't have problem in wait to result and make a srt/cry as result. I'm try to understand at all because I'm looking for some models and approach's to make this happen with live streaming content.

Thinking in resource necessary to do this what is your recommendation ? Thinking in production environment?

from seamless_communication.

jtlonsako commented on August 15, 2024

Any progress on this? I am very far removed from the Machine Learning world so I am unable to contribute, but I'm keen on using either this model or Meta MMS to generate subtitles for a low-resource language (whisper is utterly incapable of getting good results for the respective language).

from seamless_communication.

Recommend Projects

Is it possible to generate subtitles like whisper ai? about seamless_communication HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent