egorlakomkin / ktspeechcrawler Goto Github PK

View Code? Open in Web Editor NEW

151.0 151.0 38.0 782 KB

Automatically constructing corpus for automatic speech recognition from YouTube videos

Home Page: https://arxiv.org/abs/1903.00216

License: MIT License

Python 75.87% HTML 20.92% Shell 3.21% Dockerfile 0.01%

asr crawler speech-recognition youtube

ktspeechcrawler's People

Contributors

Stargazers

Watchers

ktspeechcrawler's Issues

why GOOGLE_TEST default OK? Why didn't added GoogleRandomSubsetWERFilter class in process.py?

why GOOGLE_TEST default OK?

Why didn't added GoogleRandomSubsetWERFilter class in process.py pipeline?

first up all thanks for Given this Project as Open-Source. Awesome work. thank you so much KTSpeechCrawler team.:)

i was tried KTSpeechCrawler project to collecting youtube audio datasets for ASR Speech-to-text task.

i was collected and finished entire steps. after that i was tested transcipt with corresponding audio files (.wav, .txt).

here i getting 11/100 audios are mistakes.

if we will apply google_speech_test , and validate to remove less than the threshold means (threshold=0.85) we can get good proper audiofiles and transcipt.

can you please tell where i need to start and add this module to do google_speech_test?

Here any complexity will come, for using google_speech_test?

that pipeline module,

pipeline = Pipeline([
    OverlappingSubtitlesRemover(),
    SubtitleCaptionTextFilter(),
    CaptionNormalizer(),
    CaptionRegexMatcher(good_chars_regexp),
    CaptionLengthFilter(min_length=5),
    CaptionLeaveOnlyAlphaNumCharacters(),
    SubtitleMerger(max_len_merged_sec=10),
    CaptionDurationFilter(min_length=1, max_length=20.0)
])

here which place i need add that module? last is enough?

Thank you sir :)

[youtube:search_url] in: Downloading webpage
[download] Downloading playlist: in
[youtube:search_url] playlist in: Downloading 0 videos
[download] Finished downloading playlist: in
[youtube:search_url] is: Downloading webpage
[download] Downloading playlist: is
[youtube:search_url] playlist is: Downloading 0 videos

whats wrong I might have done ?

egorlakomkin / ktspeechcrawler Goto Github PK

ktspeechcrawler's People

Contributors

Stargazers

Watchers

Forkers

ktspeechcrawler's Issues

why GOOGLE_TEST default OK? Why didn't added GoogleRandomSubsetWERFilter class in process.py?

Failed to download

Question: Where exactly is Gentle being used

0 videos downloaded

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent