moomoo's Issues
Album artist max not working
Im still getting a lot of the same album artist. Probably because we only check if the count is above the limit for the artist here:
We need logic for the album artist there as well.
Use new pgvector action
https://github.com/pgvector/setup-pgvector
- http
- ingest
- ml
Auto suggest themed playlists
Right now the app will suggest artists with high listens and releases to revisit. But maybe a daily selection of playlists from similar songs by different artists would be good?
Locally store ml artifacts
Give huggingface a break with the downloads.
Set up cache in github actions.
Store selected playlists from the client.
- http endpoint to POST a selection
- db schema to store it
- client handler to send post request
Surface new artist suggests in the gui
Store playlists in the database
To enable basic analytics on listens, etc
dbt table mapping listens to local files.
Probably best to add an ingest via https://listenbrainz.readthedocs.io/en/latest/users/api-usage.html#lookup-mbids
This is a baseline feature needed to understand which wiles are most played, etc; mapping listen taste to local files to make playlists.
Should probably measure % of recent listens with a mapped file, keep track in a view.
Weigh artist playlists source tracks by most listened
Or maybe just use a nonrandom sample and pick the top N?
Drop/ update embeds
drop if file has been removed, update if mtime is > create time, or if file hash not the same?
Improve mapping between CollectionItem and Playlist
Make it so that the playlist can be recreated exactly from the collection item.
- store seeds
- factory method, etc.
Ingest release groups
This would be a requirement to use time data in playlisting, since releases are not always stamped with the original year.
TODO:
- dbt support for release groups in the local files mbids
- re ingest release annotations with the
release-groups
include - add annotations data back into the mbids for release groups in dbt
Pregenerate playlists for client consumption
Pregenerate artist playlists, also those from #112 and store in the db. Write to be general for other kinds.
- Likely needs separate package,
playlist/
or something.- write so that the generator can be imported in
http
for from-path purposes. or maybe let that be?
- write so that the generator can be imported in
- Can revert to true pairwise distance compute rather than via avgs.
- scheduled service to make playlists.
- edit http endpoints to pull from stored playlists
- bump http to py 3.10
schema something like
id: int? uuid?
playlist: [ {path: ..., metadata:..., ...}, ... ]
title: str
username: str
created_ts: ts
so we can get the latest artist playlists, etc.
collection_name: str (??)
collection_ts: ts
Need something to index latest of collection X, so that the http can pull, e.g., the latest artist playlists.
HTTP/Client integration test
Improve revisit releases playlists
- I see files not in the release folder but which match on the name (via compilations, remixes etc). maybe filter down to files with the correct release group assigned?
- parse track number and order on that explicitly.
Re-enrich mbids every once in awhile
Todo: a CLI to pick out enriched data of a certain age, and re-enrich. Maybe a setup to pop N out, to smooth spikes in ingest.
If possible, check updated stamps to determine need for API calls.
Client interface to choose premade playlists
Playlist: "loved" tracks
- via track play spikes and/or loved tracks in listenbrainz
- add logic to minimize same artist.
- add logic to skip anything recently played
Limit one per album artist as well as artist
Ingest listenbrainz global stats per mbid
To enable comparison of local and global stats.
Try artist/release partitioned playlist ranking
Instead of top N tracks max X per artist. Try top N releases by avg distance; top track per release.
Or, try using other tracks from the artist as well (but at a lower weighting). Anything to be a little more contextual about the media.
SQL
with base as (
select filepath, embedding
from moomoo.local_files_flat
where filepath = '...'
)
, distances as (
select
local_files_flat.filepath as filepath
, avg(base.embedding <-> local_files_flat.embedding) as distance
from base
cross join moomoo.local_files_flat
where local_files_flat.embedding_success
and local_files_flat.embedding_duration_seconds >= 60
and local_files_flat.artist_mbid is not null
and local_files_flat.filepath not in (select filepath from base)
group by local_files_flat.filepath
)
, release_ranks as (
select
local_files_flat.release_mbid
, row_number() over (order by avg(distances.distance)) as distance_rank
from distances
inner join moomoo.local_files_flat using (filepath)
where release_mbid is not null
group by 1
)
, tracks as (
select
local_files_flat.filepath
, distances.distance
, row_number() over (
partition by local_files_flat.release_mbid order by distance
) as track_rank
from distances
inner join moomoo.local_files_flat using (filepath)
inner join (select release_mbid from release_ranks where distance_rank <= 20 ) as releases
using (release_mbid)
)
select filepath
from tracks
where track_rank <= 1
order by distance
Ingest loved tracks from listenbrainz
HTTP server for playlists
If this is done:
- could set up auth via api key
- user would not need to manage db creds
- writes to db could be easily managed (see #55, no more readonly db users needed)
- no need for thin clients, maybe just a requests wrapper.
File browser, media player integration
Playlist: songs like X, Y, Z
cluster tracks from most played artists etc. create themed playlists from those clusters.
Thin client for playlist creating, cli, etc
so torch does not need to be installed
Remove username logic
No need, this is a single person app anyway
- remove from ingest and insert via envvar
- remove require from http
- remove from client
- drop ingest columns
Smarter selection of top artists playlists
Rn i take any artist with > X listens historically; then a random N of those become playlists.
It would be better to serve a mix of
- artists with a lot of lifetime listens; but this group is slow changing
- artists with a lot of recent listens, which changes quickly.
It would also be good to ensure that there is a sufficient diversity of choices; maybe using an avgd vector.
Maybe pivot to fastapi for http?
Pydantic to store playlist objects, easier to manage schemas, etc.
playlist option to weigh unplayed tracks more
Similar user ingest option for a specific time period
or maybe just always all time
Split monopackage to projects
Probably
- Ingest
- Web
- Dbt
- Client
Maybe redis for storing embeddings?
Vicki did it and she seemed happy about it.
Inspect / resolve failed scorer runs
I see many in the db with a ''
as the exception?
Fix fresh releases dbt
- i see a lot of artists that i think i have listened to. maybe check not busted?
- maybe add some condition of the artist not being extremely popular?
Pip installable client package.
Or just even instructions on how to do it.
Should be pipx targettable, easily updated.
External docker artifactory?
Try pandora style radio via embeddings model
Factory method for collection -> [playlists]
Rn i have to do
playlists = [item.to_playlist() for item in collection.playlists]
Test pipeline with postgres fixture
Local files to playlist generator
To enable strawberry, clementine, etc playlists.
Library cleanup: duplicate songs
Something like:
with t as (
select recording_md5, count(1) as cnt
from moomoo.local_files_flat
where album_name is not null
group by 1
having count(*) > 1
and max(track_length_seconds) - min(track_length_seconds) < 10
)
select t.*, artist_name, album_name, track_name, track_length_seconds
from t
inner join moomoo.local_files_flat
on t.recording_md5 = moomoo.local_files_flat.recording_md5
order by artist_name, album_name, track_name
Do not lock up app while db refresh is running
I have noticed some live-db playlist functions error out when the refresh is running.
Is dbt dropping views on refresh? maybe make them tables
Precompute pairwise distances
So that way making a playlist is as fast as possible.
Map msid to local files
Seems like msids are unique to the track name and artist name, so we can map local file track-artist names to msids in listens.
select recording_msid, count(distinct lower(track_name || '-' || artist_name)) as num_tracks
from moomoo.listens_flat
group by 1
having count(distinct lower(track_name || '-' || artist_name)) > 1;
Playlist: Albums to revisit
- Albums with some high number of listens, without any recently.
Option to re-ingest entire feedback table
rn each run gets the last 100; so data would be lost if the user has >100 feedback and the table gets dropped.
maybe best to be able to nuke it if something weird happens?
Assert PlaylistCollection is unique at the collection name / username level.
also collection item is unique at collection index / collection id level
Package playlist for install in http
without using a git hash
Surface HTTP version in the GUI.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.