Migrating the entire v1 database schema is probably too much effort. Instead, it might

I am currently working on a (pair of) (s) that will perform the migration in a s

First approach depends on <a class="issue-link js-issue-link" data-error-text="Failed

First approach has been implemented in <a class="commit-link" data-hovercard-type="com

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Improve migration from v1 to v2 about ytcc HOT 13 CLOSED

woefe commented on August 26, 2024

Improve migration from v1 to v2

from ytcc.

Comments (13)

EmRowlands commented on August 26, 2024 1

I am currently working on a (pair of) script(s) that will perform the migration in a somewhat automated manner. This will include all videos, if they're watched and anything that's in the version 1 database. The first script is run with version 1 installed, and the second with v2 due to python not being able to handle importing two modules with the same name simultaneously.

from ytcc.

woefe commented on August 26, 2024

First approach depends on #43

from ytcc.

woefe commented on August 26, 2024

First approach has been implemented in 1da677b and 0d03bac

from ytcc.

woefe commented on August 26, 2024

I'm going to leave this issue open for now. Hoping to get more feedback.

from ytcc.

EmRowlands commented on August 26, 2024

I'm currently trying to work out how to calculate the extractor_hash from a video URL, but I'm not getting very far. Could you offer some help on this? This is the only thing (I think) holding back the entire script

from ytcc.

woefe commented on August 26, 2024

@EmRowlands Awesome!!

The extractor_hash is calculated as a sha256 of youtube-dl's (unprocessed) information extractor output.

Basically pseudocode for one item:

sha256(YoutubeDL(...).extract_info(..., process=False).entries[0])

Relevant lines:

ytcc/ytcc/core.py

Line 50 in d97eebd

def extractor_hash(data: Dict[str, str]) -> str:

ytcc/ytcc/core.py

Line 84 in d97eebd

e_hash = extractor_hash(entry)

from ytcc.

woefe commented on August 26, 2024

And important here is that the hashed entry is from a playlist, not from the video page itself. Not sure, if we can reverse it from a yt_video_id easily, which is probably what we would need when converting from v1?

from ytcc.

EmRowlands commented on August 26, 2024

I knew how it was being created, I just couldn't reproduce it because I didn't have access to the playlist data. I'm also not sure this way of generating the hash makes sense, since if a video is in multiple playlists it will have multiple extractor_hashes (unless this is intentional). I considered suggesting using the same method, but using the extractor info for the specific video instead, but that would require youtube-dling the info for every single video that is being imported.

Perhaps it would be better to use something like the format provided by --download-archive, which provides strings that look like this:

youtube dQw4w9WgXcQ

Where the first part is the name of the extractor, and the second is a site-specific string that uniquely identifies a video.

from ytcc.

woefe commented on August 26, 2024

Admittedly, the extractor_hash approach has problems. I actually found cases where it won't work with the current function that relies on a Dict[str, str], which is not always the output of processors. Sometimes the values might be more complex structures.

The hash should be the same for videos of different playlists. At least, for all examples that I checked it was the same.

I have looked into the --download-archive option again. It uses _make_archive_id to create the id, which can be generated from an unprocessed result and therefore does not require more network requests than the current approach. I think using _make_archive_id is more reliable, because then we rely on existing youtube_dl internals, which should work nicer with the rest of youtube_dl.

It is possible to replace the extractor_hash() with _make_archive_id(). Ytcc will simply resync all playlist content on the next update. I'll commit my changes and release a second beta soon.

from ytcc.

EmRowlands commented on August 26, 2024

I've done some testing, and it appears that this approach will work with my scripts in a drop-in way. Since all of the videos from v1 will be from youtube, it's trivial to reimplement _make_archive_id() to not require network access. I'm also not sure how to contribute these scripts, since they require v1 to be installed, with v2 code sitting in a different directory (or vice-versa for the price of a trivial change)

from ytcc.

EmRowlands commented on August 26, 2024

I have finished my implementation, but there are some caveats:

It requires version 1 to be installed
It requires a copy of the source code of version 2
It exists entirely "out of tree"

As such, I'm not really sure how to submit it for review. It could sit in the scripts directory if it was only a single file, but it includes 5 files (a common file, an export and import script, a config file, and a migration shell script which runs them all).

If you think it would be acceptable to put them in a subdirectory of scripts, I'm happy to submit a PR.

from ytcc.

woefe commented on August 26, 2024

@EmRowlands, Im not sure how to handle it. Is it public somewhere for me to see? Can you maybe push it to a new branch on your fork in a new subfolder of scripts/? Then we can still decide where to put it when we merge it. Maybe, we create an orphan branch (git checkout --orphan ...).

from ytcc.

EmRowlands commented on August 26, 2024

I have added them in #50

from ytcc.

Improve migration from v1 to v2 about ytcc HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent