Giter Site home page Giter Site logo

shumph10 / cryptocurrencies Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 10.11 MB

Unsupervised machine learning was used to establish a classification system for actively trading cryptocurrencies for potential investment prospects.

Jupyter Notebook 100.00%
cleaning-data crypocurrency data-analysis elbow-curves hvplot pca-analysis pandas-python scikit-learn

cryptocurrencies's Introduction

Cryptocurrencies

Analysis of crytpocurrency data from CryptoCompare. Unsupervised machine learning was used to establish a classification system for actively trading cryptocurrencies for potential investment prospects.

##Resources

Data Source: Crypto_data.csv, CryptoCompare

Software: Python 3.9.7, VisualStudio Code 1.69.1, Jupyter Notebook 6.4.0, Scikit-Learn 1.0, Pandas 1.3.5, hvplot 0.7.3


##Preprocessing:

  • Data was cleaned to remove features unable to be processed through machine learning, null values and cryptocurrencies that were not actively trading were dropped.The resulting dataframe is as shown below:

cleaned_df

  • The data was scaled to ensure all factors were weighted accurately in training, and string values were encoded to be numerically representated.
  • The dimensions of the data were then reduced using Principle Component Analysis, yeilding 3 principal components which represent original data through mathematical manipulation involving the creation of matricies between data points and calculation of eignenvalues and eigenvectors. The resulting dataframe and its dimensions are as shown:

pcs_df


##Clustering using KMeans:

  • After this preprocessing was complete, an elbow curve was created using the KMeans inertia of different k values(# of clusters) to find the ideal number of clusters to be used for analysis. KMeans will be used to cluster points based on the mean of all points belonging to that cluster. Inertia was used as the objective function here because it measures the amount of variation in the dataset. Using the value for k where the line switches from a vertical to horizontal direction to minimize variation and maximize functionality, we can see from the resulting elbow graph, the best value for k is 4. Therefore out analysis will utilize 4 clusters or categories for the cryptocurrencies.

elbow_curve

  • KMeans algorithm was applied to the PCA crypto data with 4 clusters as discussed above. The model was used to make predictions on what cluster (or class) the cryptocurrency would belong to. The class data was then added to the original cleaned dataframe, and joined with the PCA dataframe so all information would be readily available in one dataset as shown below:

clustered_df


##Visualization of Results

  • The PCA data and the predicted class from the unsupervised learning model were plotted in a 3D graph for best possible visualization of cluster seperation. From this graph the class #2(shown in orange) is shown to only have one data point within it, since it is relatively far from all other groups this is most likely an outlier and after due diligence inspection may be considered to be dropped as an outlier. From all angles shown there is minimal seperation between classes 0 and 1 (shown in purple and pink respectively) and further investigation using a different unsupervised learning method may be required to establish the seperation between the two.
  • The resulting graph is shown from two angles for best visualization of results below:

3D_plot_clusters 3D_plot_clusters_angle_2

  • A table was then created to show the original features vs the class assigned by the KMeans algorithm. Original, unscaled data was provided in this graph for better depiction of difference total coins mined and coin supply. Table in crypto_clustering.ipynb is interactive allowing sorting (ascending or decending) by column. When sorting the table by total coins mined from most to least, it appears as through the coins with the most coins mined are typically in classes 0 or 1. Additional graphing or analysis will be done in the next section to visualize this observation. A sample of the table is as shown below:

tradable_crypto_table_sorted

  • For additional clarity and visualization purposed, a new dataframe with only the scaled total coin supply and total coins mined, along with name and class was created. Scaling of these features allowed graphing to explore the connection between these features and class as discussed in the previous section. The graph shows 2 outliers rather than one as previously observed by the 3D graph. When discounting these outliers and zooming in on the area of the graph the rest of the data is concentrated there again is not a clear distinction between classes 0 and 1. The data points in class 3 appear to be clustered around the total coin supply and coins mined being nearly zero, which may be a reason the unsupervised machine learning model placed them in a group.
  • The graph created is show below, first at full size, then zoomed into the more concentrated data area.

coins_mined_vs_supply coins_mined_vs_supply_zoomed


##Summary

From the analysis and cleaning of the data, 532 cryptocurrencies were able to be analyzed through unsupervised machine learning using the KMeans algorithm. From this, four classes of cryptocurrencies were identified. Of these four, class 2 appears to contain only one data point that is likely an outlier. Class 3 contains cryptocurrencies that have both a low total coins mined and low total coin supply count. This could indicate cryptocurrencies within class 3 may not be the best candidates for investment as there is both low supply and low demand. Classes 0 and 1 would need additional research and analysis to determine the features that seperate the two and what features make them the best candidates for investment. Additional analysis could be done on sale/buy percentages for a timeframe to see consumer trends, price fluctuation, if the type of coin has significant influence, or if the platform the currency is avaiable for sale on has influence.


##Contact Me

Email: [email protected]
LinkedIn

cryptocurrencies's People

Contributors

shumph10 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.