Giter Site home page Giter Site logo

cody-lange / milestone-2-spotify-music-clustering-with-unsupervised-learning Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 0.0 8.41 MB

Clustering music genres with audio data fetched from the Spotify API, features generated from Librosa, K-Means clustering, agglomerative clustering, and PCA/ t-SNE dimensionality reduction

Jupyter Notebook 54.64% HTML 45.36%
spotify music unsupervised-learning pca t-sne k-means-clustering agglomerative-clustering librosa audio-processing

milestone-2-spotify-music-clustering-with-unsupervised-learning's Introduction

Milestone-2-Spotify-Music-Clustering-with-Unsupervised-Learning

3D Plots of clusters viewable here: https://nbviewer.jupyter.org/github/Cody-Lange/Milestone-2-Spotify-Music-Clustering-with-Unsupervised-Learning/blob/main/Music_Clustering.ipynb

Motivation

Music is always evolving, and at the core of its evolution is the formation of new genres and the reimagination of existing genres. Genres are created and reformed due to the fluidity of their inherent characteristics. Qualities such as form, style, and subject matter are in flux and can overlap between genres and subgenres. The main motivation for our music genre classification project was to explore the differences between genres: how well-defined genre boundaries are according to the characteristics of songs as well as whether an unsupervised learning model would generate classifications similar to an average music listener. We theorize that the borders between genres are becoming blurred and that the broad classifications are similar.

Data Source

We used Spotipy, a Python library designed to integrate with the Spotify Web API. We focused on Spotify-created genre specific playlists spanning across a number of genres including rock, metal, indie, EDM, hip hop, pop, rap, country, classical, Latin, and K-pop. We collected 6 playlists for each genre and used Spotipy to collect details and characteristics about each song including acousticness, danceability, energy, key, loudness, mode, speechiness, instrumentalness, liveness, valence, tempo, duration, and time signature. We also collected mp3s from each song’s 30 second preview. In addition to Spotify’s song characteristics, we leveraged the Librosa package which is used for audio and music analysis. Librosa offered us the ability to analyze waveforms of audio for each song and compute several characteristics of said audio. The main characteristics we chose to incorporate are Mel-frequency cepstral coefficients (MFCCs), commonly used in speech recognition, and spectral contrast which measures the dynamic range between peaks and valleys of the audio’s spectrograph. After removing records for which there were no audio previews available, we ended up with 3089 records to analyze, about 280 records per genre.

Unsupervised Learning Methods

After the data was mined, standard scaling was applied to make the data more suitable for clustering. Afterwards, either principal component analysis or t-sne was applied to find three features that best capture the variance of the data and can generate 3d plots for visualization. PCA was chosen because it works well with larger datasets and preserves global distances between clusters.T-Sne was also chosen because it preserves local distances as opposed to global distances, which was initially thought to possibly make the clusters more distinguishable, as it is a very different approach to dimensional reduction. Models were then clustered via either K-Means clustering or agglomerative clustering, with the first pass attempting to gage whether or not unsupervised models could reconstruct existing genre clusters. K-means was chosen because we already know 11 clusters exist naturally, it is simple, and is guaranteed to converge. Agglomerative clustering was chosen because it was unknown whether or not music data inherently follows a “treelike structure'', it still clusters a predetermined amount of clusters, and captures groups of varying shapes and sizes better. After initial visualizations for clusters of 11 were made, the elbow plot in figure 2 was used to determine a mathematically more “appropriate” cluster size. Elbow plots iterate through K-Means models of different cluster sizes and sums up the squared distances from each point to the centroid of its cluster. (“inertia) It is commonly observed that lower inertia values give clusters that are more distinguishable, and the cluster sizes with acceptable inertia values fall within the range of 3 to 6 clusters. The next set of 3D clustering visualizations used a cluster size of 6 in order to be closer to the ground truth of 11 genres.

image

Figure 1. An elbow plot used assist in determining “appropriate” cluster size

image

Figure 2. K-means Clustering w/ t-SNE - 6 clusters^^

image

Figure 3. K-means Clustering w/ t-SNE - 11 clusters^^

image

Figure 4. K-means Clustering w/ PCA - 6 clusters^^

image

Figure 5. K-Means Clustering w/ PCA - 11 Clusters^^

image

Figure 6. Agglomerative Clustering w/ t-SNE - 11 Clusters^^

image

Figure 7. Agglomerative Clustering w/ PCA - 11 Clusters^^

Evaluation

The 3D clustering visualizations are all listed in an appendix at the end for space reasons.

image

Table 1. Davies Bouldin and Calinski-Harabasz scores for all clusters

We were interested to see what number of clusters would produce the most distinguishable results and how this would compare to the 11 genres we began with. We found that the lines between genres are heavily blurred, especially when talking about ‘modern’ music - pop, rap, and EDM for example. We found that 6 clusters produced the most distinguishable results when differentiating between songs according to the feature representations we chose to use. Built-in audio characteristics such as key, loudness, and energy for instance may well be similar across different genres, especially considering that popular modern music across the top genres carry very similar qualities according to what the public responds positively to. High tempos, danceability, and valence (emotional positivity) are rife among popular music, and these songs are usually the ones inputted into spotify playlists. The PCA biplot in Figure 3. Shows that the first principal components mostly depended on acousticness, loudness, energy, and danceability, although acousticness was inversely related with loudness and energy. This can be attributed to the classical class having large acoustic values. The internal computation of these metrics were loosely defined on their development site, but according to their metrics, genres blur to a very high degree according to these characteristics.

image

Figure 8. PCA biplot of the Spotify data

It is interesting, however, to observe the differences in scores across different clustering methods as applied to our dataset of 3098 songs. We used the Davies Bouldin and Calinski Harabasz scores as our main evaluation metrics. The Davies-Bouldin score dictates the average similarity between clusters, essentially measuring how much overlap exists between separate clusters. Clusters which are more distinct and farther apart from each other will receive a better score, with lower scores indicating better clusters. Overall there was not a lot of variance in the Davies-Bouldin scores, with the K-Means-PCA 11 cluster group having the lowest score, and therefore the most distinguishability among clusters. This came out to be similar to the t-SNE K-means 11 cluster group. The worst score and therefore the most indistinguishable clusters by similarity and distance were in the agglomerative-PCA 11 cluster group and the K-means-PCA 6 cluster group. Interestingly, the K-means 6 cluster group prepared with t-SNE scored much better on the Davies-Bouldin index. These different groups can be referenced visually with the 3-D plots in the appendix section.

The second metric we used to evaluate our clusters was the Calinski-Harabasz index. The Calinski-Harabasz index, also known as Variance Ratio Criterion, is defined as the ratio between within-cluster dispersion and between-cluster dispersion. In this case, a higher score indicates better performance. The Calinski-Harabasz score rewards clusters that are both dense and well-separated, which is intuitive for clustering. In this case, we saw a very clear relationship between the dimensionality reduction technique and the Calinski-Harabasz score. When PCA was used, we saw scores ranging from 111-332, with the K-means-PCA 6 cluster group scoring the best. However, when t-SNE was used we observed a significant increase in scores as compared to the PCA groups. In this case, the K-mean-PCA 6 cluster group was still the best performer at 1252, but the other two t-SNE groups also scored above 1000 on the Calinski-Harabasz score. This proved to us that t-SNE resulted in much denser clusters that were also more distinct. Once again, these different groups can be referenced visually with the 3-D plots in the appendix section.

Discussion It was learned that there is a lot of crossover between the main genres of music, but the K-means t-SNE 6 cluster model best represents the differences between the genres. Although 11 official genres were included in the dataset, it was surprising to find that the songs included were best represented by six different clusters. If we had more time and resources, we could compile more songs into our dataset as well as incorporate more subgenres which would yield interesting results. It would also be interesting to incorporate more characteristics of the spectral waveform for each audio sample to see if those features play a larger role in the classification of songs.

milestone-2-spotify-music-clustering-with-unsupervised-learning's People

Contributors

cody-lange avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.