This repository contains a dataset of Spotify tracks across various genres, complete with audio features. The goal is to predict track popularity based on these features and gain insights into the most popular genres and artists.
Before you begin, ensure you have the following requirements in place:
- Python 3.x
- Libraries listed in
requirements.txt
. Install them usingpip install -r requirements.txt
.
-
Clone this repository to your local machine: git clone https://github.com/yourusername/spotify_data_insights.git
-
Navigate to the project directory: cd spotify_data_insights
-
Install the dependencies: pip install -r requirements.txt
To get started, open the Jupyter Notebook provided in this repository. Follow the instructions below:
- Open the notebook in Jupyter Notebook
- Run the cells in order to load and preprocess the dataset, perform exploratory data analysis (EDA), train machine learning models, and visualize the results.
Here's a breakdown of the dataset columns:
genre
: The genre of the track.track_id
: The Spotify ID of the track.track_name
: The name of the track.popularity
: The popularity of the track.year
: The year the track was released.danceability
: A measure of how suitable the track is for dancing.duration_ms
: The duration of the track in milliseconds.energy
: A measure of intensity and activity.loudness
: The overall loudness of the track in decibels (dB).mode
: Indicates the modality (major or minor) of the track.key
: The estimated overall key of the track.acousticness
: A confidence measure of whether the track is acoustic.speechiness
: Detects the presence of spoken words in a track.instrumentalness
: Predicts whether a track contains no vocals.liveness
: Detects the presence of an audience in the recording.valence
: A measure describing the musical positiveness conveyed by a track.tempo
: The overall estimated tempo of a track in beats per minute (BPM).time_signature
: An estimated overall time signature of a track.
The Jupyter Notebook includes comprehensive EDA, highlighting key insights and visualizations to help understand the dataset better. View Notebook Here
Spotify_Tracks_Popularity_Prediction.ipynb
: The Jupyter Notebook for the project.data/
: Directory containing the dataset (spotify_tracks.csv
).requirements.txt
: List of Python dependencies for the project.
The exploratory data analysis revealed that the most popular genres were Pop, Hip-Hop, and Rock. The most popular artists were NewJeans, Elley Duhe and Rema. The most popular tracks were Shakira: Bzrp Music Sessions, Die For You - Remix and Calm Down (with Selena Gomez). Some of the charts generated during the EDA are shown below:
The best performing model was a Random Forest Regressor with a mean absolute error (MAE) of 9.5. Another model, was then trained on the important features identified by the Random Forest Regressor. This model was also a Random Forest Regressor, and it achieved a MAE of 0.49.