Giter Site home page Giter Site logo

Comments (9)

ankitgurua avatar ankitgurua commented on August 14, 2024 1

It works and I understand what you mean. Regarding the lack of timestamp information for numbers, I've been inferring and filling in this data using the timestamps of adjacent words. However, during processing, the original spacy file likely uses the token information from the JSON file generated by Whisper - a feature that WhisperX lacks. I'm uncertain whether this difference impacts Spacy's functionality.

I think i finally found a perfect way to get subtitles, in which i get near perfect aligning of timestamps and the length of the sentence is also naturally segregated. Using this spacy script.

What i did is basically divided what whisperX does in 3 parts.

Transcription, Aligning, Segmenting

What i did is do them all seperately and then combine those different methods to work with each other.

It's a bit long, but its perfect.

from whisperx.

ViiTetrix avatar ViiTetrix commented on August 14, 2024

Did you solve this problem?

from whisperx.

ankitgurua avatar ankitgurua commented on August 14, 2024

Did you solve this problem?

Yes i did

Basically some changes in the script made it run with whisperX as well

from whisperx.

ViiTetrix avatar ViiTetrix commented on August 14, 2024

So can you teach me how to do this, I try to change it many times,but I failed

from whisperx.

ViiTetrix avatar ViiTetrix commented on August 14, 2024

I have only implemented it on whisper. The JSON file output by whisperx does not contain the relevant token information, and I do not know how to handle this part in the script.

from whisperx.

ankitgurua avatar ankitgurua commented on August 14, 2024

I have only implemented it on whisper. The JSON file output by whisperx does not contain the relevant token information, and I do not know how to handle this part in the script.

Yes, i created different spacy files for whisperX and whispertimestamped,
https://gist.github.com/ankitgurua/eac069ed0c95e1ce5924a10923883133
https://gist.github.com/ankitgurua/7b0db06baa8e2c7288cbbf396169120d

the problem with whisperX is that its alignment model cannot align numbers so its json have numbers in sentences but dont have timestamps for it, so to deal with whisperX output you should use --suppress_numerals in your command and then use this spacy script i provided to segment it.

from whisperx.

ViiTetrix avatar ViiTetrix commented on August 14, 2024

It works and I understand what you mean. Regarding the lack of timestamp information for numbers, I've been inferring and filling in this data using the timestamps of adjacent words. However, during processing, the original spacy file likely uses the token information from the JSON file generated by Whisper - a feature that WhisperX lacks. I'm uncertain whether this difference impacts Spacy's functionality.

from whisperx.

hongyuhei7722 avatar hongyuhei7722 commented on August 14, 2024

What i did is basically divided what whisperX does in 3 parts. Transcription, Aligning, Segmenting
——Could you tell me how to do? Thank you very much.

from whisperx.

ViiTetrix avatar ViiTetrix commented on August 14, 2024

Transcription → WhisperX
Aligning → WhisperX -wav2vec2
Segmenting → Spacy

  • You can find the script and environmental requirements above

from whisperx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.