Data mining is a fundamental skill for massive data analysis. At a high level, it allows the analyst to discover patterns in data, and transform it into a usable product. The course will teach data mining algorithms for analyzing very large data sets. It will have an applied focus, in that it is meant for preparing students to utilize topics in data mining to solve real world problems.
No. | Main Application | Programming | Tags | Version | Score |
---|---|---|---|---|---|
1 | Spark Data Exploration | Python & Spark & Scala | MapReduce Spark Pyspark |
Python 3.6, JDK 1.8, Scala 2.11, and Spark 2.4.4 | 7.7/7 |
2 | Frequent itemsets | Python & Spark | SON Limited-Pass MapReduce |
Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 | 7/7 |
3 | Collaborative-filtering recommendation systems | Python & Spark | LSH MinHash XGBregressor |
Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 | 7/7 |
4 | Community Detection | Python & Spark | Girvan-Newman algorithm Betweenness Graph LPA GraphFrames |
Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 | 7/7 |
5 | Mining Stream Data | Python & Spark | Bloom Filtering Flajolet-Martin algorithm Fixed Size Sampling |
Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 | 7/7 |
6 | Clustering | Python & Spark | KMeans BFR Algorithm |
Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 | 7/7 |