Giter Site home page Giter Site logo

moomoo's Introduction

moomoo: Nolan's Homemade Music Recommendation System.

I want to ditch Spotify and the other major streaming platforms, but I also love the recommendation/playlisting products they provide. I'll never do it so long as I cannot get that service elsewhere, so let's see if I can't use open source software to get what I need.

Moomoo is very much an ongoing effort. Nobody else should even read this, let alone try to deploy it themselves.

CI Status

  • ingest
  • ml
  • dbt
  • client
  • playlist
  • http

Architecture

Moomoo is composed of (currently) 6 components that work together through the use of a central postgres database. pgvector is needed in that database to store and manage ML embeddings.

The general setup is:

  • ml, ingest populate base tables in postgres (with some exception in ingest). These modules are run via a scheduler like airflow, etc.
  • dbt merges tables and populates tested/consumable data. It also populates a main list of mbids which are consumed by some ingest jobs.
  • playlist contains a combination of library code for creating playlists (for re-use in http) and CLI handlers for saving collections of playlists to the database.
  • http provides a webserver through which playlists are requested. Database access from the client is managed exclusively through the http api.
  • client provides an installable (via pipx, etc) package for local clients. Its requirements are minimal and should be lightweight. It currently only contains a CLI to generate playlists, but will likely evolve into a larger GUI application.

Each component requires some special envvars, etc. So see the docs within the folder for each. I have no advice on how to orchestrate these services.

moomoo's People

Contributors

nolanbconaway avatar

Watchers

 avatar  avatar

moomoo's Issues

Playlist: "loved" tracks

  • via track play spikes and/or loved tracks in listenbrainz
  • add logic to minimize same artist.
  • add logic to skip anything recently played

Map msid to local files

Seems like msids are unique to the track name and artist name, so we can map local file track-artist names to msids in listens.

select recording_msid, count(distinct lower(track_name || '-' || artist_name)) as num_tracks
from moomoo.listens_flat
group by 1
having count(distinct lower(track_name || '-' || artist_name)) > 1;

Try artist/release partitioned playlist ranking

Instead of top N tracks max X per artist. Try top N releases by avg distance; top track per release.

Or, try using other tracks from the artist as well (but at a lower weighting). Anything to be a little more contextual about the media.

SQL

with base as (
    select filepath, embedding
    from moomoo.local_files_flat
    where filepath = '...'
)
, distances as (
    select
        local_files_flat.filepath as filepath
        , avg(base.embedding <-> local_files_flat.embedding) as distance

    from base
    cross join moomoo.local_files_flat

    where local_files_flat.embedding_success
      and local_files_flat.embedding_duration_seconds >= 60
      and local_files_flat.artist_mbid is not null
      and local_files_flat.filepath not in (select filepath from base)

    group by local_files_flat.filepath
)

, release_ranks as (
    select
        local_files_flat.release_mbid
        , row_number() over (order by avg(distances.distance)) as distance_rank

    from distances
    inner join moomoo.local_files_flat using (filepath)
    where release_mbid is not null
    group by 1
)

, tracks as (
    select
        local_files_flat.filepath
        , distances.distance
        , row_number() over (
            partition by local_files_flat.release_mbid order by distance
        ) as track_rank

    from distances
    inner join moomoo.local_files_flat using (filepath)
    inner join (select release_mbid from release_ranks where distance_rank <= 20 ) as releases
        using (release_mbid)

)


select filepath
from tracks
where track_rank <= 1
order by distance

Remove username logic

No need, this is a single person app anyway

  • remove from ingest and insert via envvar
  • remove require from http
  • remove from client
  • drop ingest columns

Option to re-ingest entire feedback table

rn each run gets the last 100; so data would be lost if the user has >100 feedback and the table gets dropped.

maybe best to be able to nuke it if something weird happens?

Drop/ update embeds

drop if file has been removed, update if mtime is > create time, or if file hash not the same?

Improve revisit releases playlists

  • I see files not in the release folder but which match on the name (via compilations, remixes etc). maybe filter down to files with the correct release group assigned?
  • parse track number and order on that explicitly.

Re-enrich mbids every once in awhile

Todo: a CLI to pick out enriched data of a certain age, and re-enrich. Maybe a setup to pop N out, to smooth spikes in ingest.

If possible, check updated stamps to determine need for API calls.

Library cleanup: duplicate songs

Something like:

with t as (
    select recording_md5, count(1) as cnt
    from moomoo.local_files_flat
    where album_name is not null
    group by 1
    having count(*) > 1
      and max(track_length_seconds) - min(track_length_seconds) < 10

)
select t.*, artist_name, album_name, track_name, track_length_seconds
from t
inner join moomoo.local_files_flat
on t.recording_md5 = moomoo.local_files_flat.recording_md5
order by artist_name, album_name, track_name

HTTP server for playlists

If this is done:

  • could set up auth via api key
  • user would not need to manage db creds
  • writes to db could be easily managed (see #55, no more readonly db users needed)
  • no need for thin clients, maybe just a requests wrapper.

Ingest release groups

This would be a requirement to use time data in playlisting, since releases are not always stamped with the original year.

TODO:

  • dbt support for release groups in the local files mbids
  • re ingest release annotations with the release-groups include
  • add annotations data back into the mbids for release groups in dbt

Auto suggest themed playlists

Right now the app will suggest artists with high listens and releases to revisit. But maybe a daily selection of playlists from similar songs by different artists would be good?

Pregenerate playlists for client consumption

Pregenerate artist playlists, also those from #112 and store in the db. Write to be general for other kinds.

  • Likely needs separate package, playlist/ or something.
    • write so that the generator can be imported in http for from-path purposes. or maybe let that be?
  • Can revert to true pairwise distance compute rather than via avgs.
  • scheduled service to make playlists.
  • edit http endpoints to pull from stored playlists
  • bump http to py 3.10

schema something like

id: int? uuid?
playlist: [ {path: ..., metadata:..., ...}, ... ]
title: str
username: str
created_ts: ts

so we can get the latest artist playlists, etc.
collection_name: str (??)
collection_ts: ts

Need something to index latest of collection X, so that the http can pull, e.g., the latest artist playlists.

Smarter selection of top artists playlists

Rn i take any artist with > X listens historically; then a random N of those become playlists.

It would be better to serve a mix of

  1. artists with a lot of lifetime listens; but this group is slow changing
  2. artists with a lot of recent listens, which changes quickly.

It would also be good to ensure that there is a sufficient diversity of choices; maybe using an avgd vector.

Fix fresh releases dbt

  • i see a lot of artists that i think i have listened to. maybe check not busted?
  • maybe add some condition of the artist not being extremely popular?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.