The requirements for this challenge were to use unsupervised learning technique of k-Means clustering to group cryptocurrencies by their performance to create portfolio portfolio recommendations.
crypto_market_data.csv - market data of different cryptocurrencies during different time periods
Using the elbow curve method to normalize the data to find the optimal k value for the k-Means model that will use all of the original features of the dataset.
Elbow curve plot showing a value of 4 for k to be optimal for the dataset with all features
A k-Means model was trained and predicted using the best k values, resulting in four clusters of cryptocurrencies. The inertia of each cluster was large enough to consider reducing the number of features.
A scatter plot showing 4 clusters with heavy inertia
To reduce the amount of features used, the Principal Component Analysis (PCA) was applied to create three primary clusters.
DataFrame holding 3 primary clusters as columns and cryptocurrency as inde
Then the PCA data was used to recalculate the optimal k value for the k-Means model.
Elbow curve line plot from the PCA data that shows 4 to be the optimal k value
Finally, a new cluster was drawn using the best k value of the PCA feature.
Scatter plot showing 4 low inertia clusters generated using the PCA dataframe
This project uses Jupyter Notebook using a Python 3 kernel.
Dependencies used:
- [Jupyter] - Running code
- [Conda] - Dev environment
- [Pandas] - Data analysis
- [Matplotlib] - Data visualization
- [Numpy] - Data calculations & Pandas support
- [hvPlot] - Interactive Pandas plots
- [scikit-learn] - kMeans clustering, PCA, and StandardScaler