The moomoo's discuss from nolanbconaway

Album artist max not working

Im still getting a lot of the same album artist. Probably because we only check if the count is above the limit for the artist here:

moomoo/http/src/moomoo_http/playlist_generator/base.py

Line 167 in 076aaa8

if artist_counts[track.artist_mbid] < limit_per_artist:

We need logic for the album artist there as well.

Use new pgvector action

https://github.com/pgvector/setup-pgvector

http
ingest
ml

Auto suggest themed playlists

Right now the app will suggest artists with high listens and releases to revisit. But maybe a daily selection of playlists from similar songs by different artists would be good?

Locally store ml artifacts

Give huggingface a break with the downloads.

Set up cache in github actions.

Store selected playlists from the client.

http endpoint to POST a selection
db schema to store it
client handler to send post request

Surface new artist suggests in the gui

Store playlists in the database

To enable basic analytics on listens, etc

dbt table mapping listens to local files.

Probably best to add an ingest via https://listenbrainz.readthedocs.io/en/latest/users/api-usage.html#lookup-mbids

This is a baseline feature needed to understand which wiles are most played, etc; mapping listen taste to local files to make playlists.

Should probably measure % of recent listens with a mapped file, keep track in a view.

would support #31, #32, #42, etc, so we know which files are popular
would support analytics on playlist quality later

Weigh artist playlists source tracks by most listened

Or maybe just use a nonrandom sample and pick the top N?

Drop/ update embeds

drop if file has been removed, update if mtime is > create time, or if file hash not the same?

Improve mapping between CollectionItem and Playlist

Make it so that the playlist can be recreated exactly from the collection item.

store seeds
factory method, etc.

Ingest release groups

This would be a requirement to use time data in playlisting, since releases are not always stamped with the original year.

TODO:

dbt support for release groups in the local files mbids
re ingest release annotations with the release-groups include
add annotations data back into the mbids for release groups in dbt

Pregenerate playlists for client consumption

Pregenerate artist playlists, also those from #112 and store in the db. Write to be general for other kinds.

Likely needs separate package, playlist/ or something.
- write so that the generator can be imported in http for from-path purposes. or maybe let that be?
Can revert to true pairwise distance compute rather than via avgs.
scheduled service to make playlists.
edit http endpoints to pull from stored playlists
bump http to py 3.10

schema something like

id: int? uuid?
playlist: [ {path: ..., metadata:..., ...}, ... ]
title: str
username: str
created_ts: ts

so we can get the latest artist playlists, etc.
collection_name: str (??)
collection_ts: ts

Need something to index latest of collection X, so that the http can pull, e.g., the latest artist playlists.

HTTP/Client integration test

Improve revisit releases playlists

I see files not in the release folder but which match on the name (via compilations, remixes etc). maybe filter down to files with the correct release group assigned?
parse track number and order on that explicitly.

Re-enrich mbids every once in awhile

Todo: a CLI to pick out enriched data of a certain age, and re-enrich. Maybe a setup to pop N out, to smooth spikes in ingest.

If possible, check updated stamps to determine need for API calls.

Client interface to choose premade playlists

Playlist: "loved" tracks

via track play spikes and/or loved tracks in listenbrainz
add logic to minimize same artist.
add logic to skip anything recently played

Limit one per album artist as well as artist

Ingest listenbrainz global stats per mbid

To enable comparison of local and global stats.

Try artist/release partitioned playlist ranking

Instead of top N tracks max X per artist. Try top N releases by avg distance; top track per release.

Or, try using other tracks from the artist as well (but at a lower weighting). Anything to be a little more contextual about the media.

SQL

with base as (
    select filepath, embedding
    from moomoo.local_files_flat
    where filepath = '...'
)
, distances as (
    select
        local_files_flat.filepath as filepath
        , avg(base.embedding <-> local_files_flat.embedding) as distance

    from base
    cross join moomoo.local_files_flat

    where local_files_flat.embedding_success
      and local_files_flat.embedding_duration_seconds >= 60
      and local_files_flat.artist_mbid is not null
      and local_files_flat.filepath not in (select filepath from base)

    group by local_files_flat.filepath
)

, release_ranks as (
    select
        local_files_flat.release_mbid
        , row_number() over (order by avg(distances.distance)) as distance_rank

    from distances
    inner join moomoo.local_files_flat using (filepath)
    where release_mbid is not null
    group by 1
)

, tracks as (
    select
        local_files_flat.filepath
        , distances.distance
        , row_number() over (
            partition by local_files_flat.release_mbid order by distance
        ) as track_rank

    from distances
    inner join moomoo.local_files_flat using (filepath)
    inner join (select release_mbid from release_ranks where distance_rank <= 20 ) as releases
        using (release_mbid)

)


select filepath
from tracks
where track_rank <= 1
order by distance

Ingest loved tracks from listenbrainz

https://listenbrainz.readthedocs.io/en/latest/users/api/recordings.html#get--1-feedback-user-(user_name)-get-feedback

HTTP server for playlists

If this is done:

could set up auth via api key
user would not need to manage db creds
writes to db could be easily managed (see #55, no more readonly db users needed)
no need for thin clients, maybe just a requests wrapper.

File browser, media player integration

Playlist: songs like X, Y, Z

cluster tracks from most played artists etc. create themed playlists from those clusters.

Thin client for playlist creating, cli, etc

so torch does not need to be installed

Remove username logic

No need, this is a single person app anyway

remove from ingest and insert via envvar
remove require from http
remove from client
drop ingest columns

Smarter selection of top artists playlists

Rn i take any artist with > X listens historically; then a random N of those become playlists.

It would be better to serve a mix of

artists with a lot of lifetime listens; but this group is slow changing
artists with a lot of recent listens, which changes quickly.

It would also be good to ensure that there is a sufficient diversity of choices; maybe using an avgd vector.

Maybe pivot to fastapi for http?

Pydantic to store playlist objects, easier to manage schemas, etc.

playlist option to weigh unplayed tracks more

Similar user ingest option for a specific time period

or maybe just always all time

Split monopackage to projects

Probably

Ingest
Web
Dbt
Client

Maybe redis for storing embeddings?

Vicki did it and she seemed happy about it.

https://vickiboykis.com/2024/01/05/retro-on-viberary/

Inspect / resolve failed scorer runs

I see many in the db with a '' as the exception?

Fix fresh releases dbt

i see a lot of artists that i think i have listened to. maybe check not busted?
maybe add some condition of the artist not being extremely popular?

Pip installable client package.

Or just even instructions on how to do it.

Should be pipx targettable, easily updated.

External docker artifactory?

Maybe use https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#pulling-container-images

Try pandora style radio via embeddings model

https://huggingface.co/m-a-p/music2vec-v1

Factory method for collection -> [playlists]

Rn i have to do

playlists = [item.to_playlist() for item in collection.playlists]

Test pipeline with postgres fixture

Local files to playlist generator

To enable strawberry, clementine, etc playlists.

Library cleanup: duplicate songs

Something like:

with t as (
    select recording_md5, count(1) as cnt
    from moomoo.local_files_flat
    where album_name is not null
    group by 1
    having count(*) > 1
      and max(track_length_seconds) - min(track_length_seconds) < 10

)
select t.*, artist_name, album_name, track_name, track_length_seconds
from t
inner join moomoo.local_files_flat
on t.recording_md5 = moomoo.local_files_flat.recording_md5
order by artist_name, album_name, track_name

select recording_msid, count(distinct lower(track_name || '-' || artist_name)) as num_tracks
from moomoo.listens_flat
group by 1
having count(distinct lower(track_name || '-' || artist_name)) > 1;

Playlist: Albums to revisit

Albums with some high number of listens, without any recently.

nolanbconaway / moomoo Goto Github PK

moomoo's Issues

Recommend Projects

Recommend Topics

Recommend Org