Giter Site home page Giter Site logo

david-else / simple-youtube-chapter-extractor Goto Github PK

View Code? Open in Web Editor NEW
18.0 2.0 0.0 4.72 MB

Copy the text containing chapter information directly from YouTube and convert it into simple mkvmerge chapter format to embed in your downloaded YouTube video.

TypeScript 100.00%
youtube-chapter-extractor youtube deno mkvtoolnix mkvmerge

simple-youtube-chapter-extractor's People

Contributors

david-else avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

simple-youtube-chapter-extractor's Issues

Suggestion on regex improvements

The current regex won't capture the chapters for this video for example:
https://www.youtube.com/watch?v=Dorf8i6lCuk

I have written a media player that parses the text containing chapter information and displays the chapter and from my experiences I noticed that YouTube just removes the timestamps regex from the string and whatever is left is used as the chapter title after some tweaks.
For example this text in the descriptions:

#2*\+  Wh#at# is\ Re#act*00:00 sdds#
#3 First React Code 01:15sdsdd

generates the following chapters
image

I am currently updating my chapters capturing function to behave like above. I am thinking about removing the regex below to remove the timestamps from the string containing a chapter:
/([^a-zA-Z0-9_](?:(\d{2}):)?(\d{2}):(\d{2}))/gm (still a WIP)

Edit: I have come up with a way to extract the chapter title and its timestamp based on the experiences I made with YouTube chapters:
Qt C++ code:

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    QString str = "#2 (00:10:00) variables ✘";
    str.prepend(' ');  // force timestamp capture when they are at the beginning of the string
    QRegularExpression re("(?:[^a-zA-Z0-9_=:])((?:(\\d{1,2}):)?(\\d{1,2}):(\\d{1,2}))(?:[^a-zA-Z0-9_=:])?");
    QRegularExpression deletionRE("^[^a-zA-Z0-9!'\"_`\\[{(\\?]*|[^a-zA-Z0-9!)'\"_`}\\]\\.\\?]*$");
    QRegularExpressionMatch match = re.match(str);

    if(match.hasMatch())
    {
        qDebug() << "Match:" << match.captured(1);
        qDebug() << "Hour:" << match.captured(2);
        qDebug() << "Minute:" << match.captured(3);
        qDebug() << "Second:" << match.captured(4);

        qDebug() << "part1:" << str.mid(0, str.indexOf(match.captured(1)));
        qDebug() << "part2:" << str.mid(str.lastIndexOf(match.captured(1)) + match.captured(1).size(), str.size() - 1);
        QString part1 = str.mid(0, str.indexOf(match.captured(1)));
        QString part2 = str.mid(str.lastIndexOf(match.captured(1)) + match.captured(1).size(), str.size() - 1);

        str.remove(match.captured(1));
        part1.remove(deletionRE);
        part2.remove(deletionRE);

        QString chapterTitle;
        if(! part1.isEmpty() && ! part2.isEmpty())
        {
            chapterTitle =   part1 + "." + part2;
        }
        else
        {
            chapterTitle = part1.isEmpty() ? part2 : part1;
        }

        chapterTitle.remove(deletionRE);

        qDebug() << "Chapter Title" << chapterTitle;
    }

}

Output:

Match: "00:10:00"
Hour: "00"
Minute: "10"
Second: "00"
part1: " #2 ("
part2: ") variables ✘"
Chapter Title "2.variables"

The extractor is not saving correctly

I tried to get the chapters of this video: https://www.youtube.com/watch?v=_eLnSQq4KRM When I extracted, I formatted the entire description to work well with your readme.md, they not save properly, its save as:

CHAPTER0=01:02.000
CHAPTER0NAME=Heike
CHAPTER1=01:42.000
CHAPTER1NAME=Bercthun
CHAPTER2=02:16.000
CHAPTER2NAME=Hrothgar
CHAPTER3=02:56.000
CHAPTER3NAME=Cudberct
CHAPTER4=03:37.000
CHAPTER4NAME=Horsa
CHAPTER5=04:11.000
CHAPTER5NAME=Osgar
CHAPTER6=04:52.000
CHAPTER6NAME=Kendall
CHAPTER7=05:34.000
CHAPTER7NAME=Beorhtsige
CHAPTER8=06:23.000
CHAPTER8NAME=Wealdmaer
CHAPTER9=07:00.000
CHAPTER9NAME=Cola
CHAPTER10=07:34.000
CHAPTER10NAME=Callin
CHAPTER11=08:09.000
CHAPTER11NAME=Eorforwine
CHAPTER12=08:49.000
CHAPTER12NAME=Redwalda
CHAPTER13=09:33.000
CHAPTER13NAME=Wuffa
CHAPTER14=10:09.000
CHAPTER14NAME=Yohanes Loukas (The Oil)
CHAPTER15=11:55.000
CHAPTER15NAME=Heika of Friesland (The Sickle)
CHAPTER16=13:24.000
CHAPTER16NAME=Beneseck of Bath (The Bell)
CHAPTER17=14:52.000
CHAPTER17NAME=Hilda (The Quill)
CHAPTER18=15:51.000
CHAPTER18NAME=Selwyn (The Gallows)
CHAPTER19=16:35.000
CHAPTER19NAME=Ealhferth (The Seax)
CHAPTER20=17:10.000
CHAPTER20NAME=Havelok (The Billhook)
CHAPTER21=18:17.000
CHAPTER21NAME=Patrick (The Anvil)
CHAPTER22=19:42.000
CHAPTER22NAME=Mucel (The Lathe)
CHAPTER23=22:16.000
CHAPTER23NAME=Gifle (The Ash-Spear)
CHAPTER24=23:54.000
CHAPTER24NAME=Wigmund (The Tang)
CHAPTER25=24:36.000
CHAPTER25NAME=Bishop Herefrith (The Crozier)
CHAPTER26=25:10.000
CHAPTER26NAME=Eanbhert (The Vellum)
CHAPTER27=26:00.000
CHAPTER27NAME=Tata (The Dart) 
CHAPTER28=27:17.000
CHAPTER28NAME=Gunilla (The Adze) 
CHAPTER29=29:01.000
CHAPTER29NAME=Abbess Ingeborg (The Firebrand) 
CHAPTER30=29:33.000
CHAPTER30NAME=Grigorii (The Needle)
CHAPTER31=30:22.000
CHAPTER31NAME=Audun (The Vault)
CHAPTER32=31:11.000
CHAPTER32NAME=Kjotve the Cruel
CHAPTER33=31:27.000
CHAPTER33NAME=Leofgifu (The Scabbard)
CHAPTER34=32:18.000
CHAPTER34NAME=Hunta, son of Hunta (The Baldric)
CHAPTER35=33:17.000
CHAPTER35NAME=Sister Frideswid (The Leech)
CHAPTER36=34:10.000
CHAPTER36NAME=Avgos Spearhand (The Arrow)
CHAPTER37=34:48.000
CHAPTER37NAME=Vicelin (The Compass)
CHAPTER38=35:37.000
CHAPTER38NAME=Sister Blaeswith (The Rake)
CHAPTER39=36:26.000
CHAPTER39NAME=Tatfrid (The Lyre)
CHAPTER40=37:12.000
CHAPTER40NAME=Fulke (The Instrument)
CHAPTER41=37:43.000
CHAPTER41NAME=Reeve Derby (The Vice)
CHAPTER42=38:16.000
CHAPTER42NAME=Gorm Kjotvesson (The Keel)
CHAPTER43=41:55.000
CHAPTER43NAME=The Father

Should be save as:

CHAPTER00=00:00:00.000
CHAPTER00NAME=Woden
CHAPTER01=00:01:02.000
CHAPTER01NAME=Heike
CHAPTER02=00:01:42.000
CHAPTER02NAME=Bercthun
CHAPTER03=00:02:16.000
CHAPTER03NAME=Hrothgar
CHAPTER04=00:02:56.000
CHAPTER04NAME=Cudberct
CHAPTER05=00:03:37.000
CHAPTER05NAME=Horsa
CHAPTER06=00:04:11.000
CHAPTER06NAME=Osgar
CHAPTER07=00:04:52.000
CHAPTER07NAME=Kendall
CHAPTER08=00:05:34.000
CHAPTER08NAME=Beorhtsige
CHAPTER09=00:06:23.000
CHAPTER09NAME=Wealdmaer
CHAPTER10=00:07:00.000
CHAPTER10NAME=Cola
CHAPTER11=00:07:34.000
CHAPTER11NAME=Callin
CHAPTER12=00:08:09.000
CHAPTER12NAME=Eorforwine
CHAPTER13=00:08:49.000
CHAPTER13NAME=Redwalda
CHAPTER14=00:09:33.000
CHAPTER15NAME=Wuffa
CHAPTER15=00:10:09.000
CHAPTER15NAME=Yohanes Loukas (The Oil)
CHAPTER16=00:11:55.000
CHAPTER16NAME=Heika of Friesland (The Sickle)
CHAPTER17=00:13:24.000
CHAPTER17NAME=Beneseck of Bath (The Bell)
CHAPTER18=00:14:52.000
CHAPTER18NAME=Hilda (The Quill)
CHAPTER19=00:15:51.000
CHAPTER19NAME=Selwyn (The Gallows)
CHAPTER20=00:16:35.000
CHAPTER20NAME=Ealhferth (The Seax)
CHAPTER21=00:17:10.000
CHAPTER21NAME=Havelok (The Billhook)
CHAPTER22=00:18:17.000
CHAPTER22NAME=Patrick (The Anvil)
CHAPTER23=00:19:42.000
CHAPTER23NAME=Mucel (The Lathe)
CHAPTER24=00:22:16.000
CHAPTER24NAME=Gifle (The Ash-Spear)
CHAPTER25=00:23:54.000
CHAPTER25NAME=Wigmund (The Tang)
CHAPTER26=00:24:36.000
CHAPTER26NAME=Bishop Herefrith (The Crozier)
CHAPTER27=00:25:10.000
CHAPTER27NAME=Eanbhert (The Vellum)
CHAPTER28=00:26:00.000
CHAPTER28NAME=Tata (The Dart) 
CHAPTER29=00:27:17.000
CHAPTER29NAME=Gunilla (The Adze) 
CHAPTER30=00:29:01.000
CHAPTER30NAME=Abbess Ingeborg (The Firebrand) 
CHAPTER31=00:29:33.000
CHAPTER31NAME=Grigorii (The Needle)
CHAPTER32=00:30:22.000
CHAPTER32NAME=Audun (The Vault)
CHAPTER33=00:31:11.000
CHAPTER33NAME=Kjotve the Cruel
CHAPTER34=00:31:27.000
CHAPTER34NAME=Leofgifu (The Scabbard)
CHAPTER35=00:32:18.000
CHAPTER35NAME=Hunta, son of Hunta (The Baldric)
CHAPTER36=00:33:17.000
CHAPTER36NAME=Sister Frideswid (The Leech)
CHAPTER37=00:34:10.000
CHAPTER37NAME=Avgos Spearhand (The Arrow)
CHAPTER38=00:34:48.000
CHAPTER38NAME=Vicelin (The Compass)
CHAPTER39=00:35:37.000
CHAPTER39NAME=Sister Blaeswith (The Rake)
CHAPTER40=00:36:26.000
CHAPTER40NAME=Tatfrid (The Lyre)
CHAPTER41=00:37:12.000
CHAPTER41NAME=Fulke (The Instrument)
CHAPTER42=00:37:43.000
CHAPTER42NAME=Reeve Derby (The Vice)
CHAPTER43=00:38:16.000
CHAPTER43NAME=Gorm Kjotvesson (The Keel)
CHAPTER44=00:41:55.000
CHAPTER44NAME=The Father

If you notice, they extractor not catch the first chapter:

CHAPTER00=00:00:00.000
CHAPTER00NAME=Woden

And the other thing is, the time is not correct, if you also noticed, in your extractor save as:

CHAPTER0=01:02.000
CHAPTER0NAME=Heike

Should be save as:

CHAPTER01=00:01:02.000
CHAPTER01NAME=Heike

Notice the 00 before the time, in your extractor does not have the 00 before, which indicate the hour, so when importing the file, there is always an error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.