Giter Site home page Giter Site logo

moomoo's Issues

Auto suggest themed playlists

Right now the app will suggest artists with high listens and releases to revisit. But maybe a daily selection of playlists from similar songs by different artists would be good?

Drop/ update embeds

drop if file has been removed, update if mtime is > create time, or if file hash not the same?

Ingest release groups

This would be a requirement to use time data in playlisting, since releases are not always stamped with the original year.

TODO:

  • dbt support for release groups in the local files mbids
  • re ingest release annotations with the release-groups include
  • add annotations data back into the mbids for release groups in dbt

Pregenerate playlists for client consumption

Pregenerate artist playlists, also those from #112 and store in the db. Write to be general for other kinds.

  • Likely needs separate package, playlist/ or something.
    • write so that the generator can be imported in http for from-path purposes. or maybe let that be?
  • Can revert to true pairwise distance compute rather than via avgs.
  • scheduled service to make playlists.
  • edit http endpoints to pull from stored playlists
  • bump http to py 3.10

schema something like

id: int? uuid?
playlist: [ {path: ..., metadata:..., ...}, ... ]
title: str
username: str
created_ts: ts

so we can get the latest artist playlists, etc.
collection_name: str (??)
collection_ts: ts

Need something to index latest of collection X, so that the http can pull, e.g., the latest artist playlists.

Improve revisit releases playlists

  • I see files not in the release folder but which match on the name (via compilations, remixes etc). maybe filter down to files with the correct release group assigned?
  • parse track number and order on that explicitly.

Re-enrich mbids every once in awhile

Todo: a CLI to pick out enriched data of a certain age, and re-enrich. Maybe a setup to pop N out, to smooth spikes in ingest.

If possible, check updated stamps to determine need for API calls.

Playlist: "loved" tracks

  • via track play spikes and/or loved tracks in listenbrainz
  • add logic to minimize same artist.
  • add logic to skip anything recently played

Try artist/release partitioned playlist ranking

Instead of top N tracks max X per artist. Try top N releases by avg distance; top track per release.

Or, try using other tracks from the artist as well (but at a lower weighting). Anything to be a little more contextual about the media.

SQL

with base as (
    select filepath, embedding
    from moomoo.local_files_flat
    where filepath = '...'
)
, distances as (
    select
        local_files_flat.filepath as filepath
        , avg(base.embedding <-> local_files_flat.embedding) as distance

    from base
    cross join moomoo.local_files_flat

    where local_files_flat.embedding_success
      and local_files_flat.embedding_duration_seconds >= 60
      and local_files_flat.artist_mbid is not null
      and local_files_flat.filepath not in (select filepath from base)

    group by local_files_flat.filepath
)

, release_ranks as (
    select
        local_files_flat.release_mbid
        , row_number() over (order by avg(distances.distance)) as distance_rank

    from distances
    inner join moomoo.local_files_flat using (filepath)
    where release_mbid is not null
    group by 1
)

, tracks as (
    select
        local_files_flat.filepath
        , distances.distance
        , row_number() over (
            partition by local_files_flat.release_mbid order by distance
        ) as track_rank

    from distances
    inner join moomoo.local_files_flat using (filepath)
    inner join (select release_mbid from release_ranks where distance_rank <= 20 ) as releases
        using (release_mbid)

)


select filepath
from tracks
where track_rank <= 1
order by distance

HTTP server for playlists

If this is done:

  • could set up auth via api key
  • user would not need to manage db creds
  • writes to db could be easily managed (see #55, no more readonly db users needed)
  • no need for thin clients, maybe just a requests wrapper.

Remove username logic

No need, this is a single person app anyway

  • remove from ingest and insert via envvar
  • remove require from http
  • remove from client
  • drop ingest columns

Smarter selection of top artists playlists

Rn i take any artist with > X listens historically; then a random N of those become playlists.

It would be better to serve a mix of

  1. artists with a lot of lifetime listens; but this group is slow changing
  2. artists with a lot of recent listens, which changes quickly.

It would also be good to ensure that there is a sufficient diversity of choices; maybe using an avgd vector.

Fix fresh releases dbt

  • i see a lot of artists that i think i have listened to. maybe check not busted?
  • maybe add some condition of the artist not being extremely popular?

Library cleanup: duplicate songs

Something like:

with t as (
    select recording_md5, count(1) as cnt
    from moomoo.local_files_flat
    where album_name is not null
    group by 1
    having count(*) > 1
      and max(track_length_seconds) - min(track_length_seconds) < 10

)
select t.*, artist_name, album_name, track_name, track_length_seconds
from t
inner join moomoo.local_files_flat
on t.recording_md5 = moomoo.local_files_flat.recording_md5
order by artist_name, album_name, track_name

Map msid to local files

Seems like msids are unique to the track name and artist name, so we can map local file track-artist names to msids in listens.

select recording_msid, count(distinct lower(track_name || '-' || artist_name)) as num_tracks
from moomoo.listens_flat
group by 1
having count(distinct lower(track_name || '-' || artist_name)) > 1;

Option to re-ingest entire feedback table

rn each run gets the last 100; so data would be lost if the user has >100 feedback and the table gets dropped.

maybe best to be able to nuke it if something weird happens?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.