Some mini projects for Introduction to Data mining techniques
Note that notebooks are not all focus on challenge solving, but experiments and comparations with some models/algorithms/engineering techniques
The following mini projects included:
- 1. bbc_text_categorization
- 2. emnist_handwriten_character_digits_recognition
- 3. fake_jobs_classification
- 4. market_basket_association_rules
- 5. mushrrom_classification
- 6. people_interest_clustering
- 7. red_wine_quality
- Notebooks can be found at bbc_text_categorization/.
- Dataset from this Kaggle Challenge: BBC articles fulltext and category
- Testing some Naive Bayes algorithms on document categorization with different feature extraction methods: binary vectorization, count vectorization, TF/IDF.
- Notebooks can be found at emnist_handwriten_character_recognition/
- Data is sampled from a part of the EMNIST handwritten character digits dataset
- Playing with CNNs and some DL techniques with the Keras framework.
- Notebooks can be found at fake_jobs_classification/.
- Dataset from this Kaggle Challenge: [Real or Fake] Fake JobPosting Prediction
- Notebooks can be found at market_basket_association_rules/.
- Dataset from this Kaggle Challenge: market_basket
- Association rules applied with the apiori algorithm.
- Notebooks can be found at mushroom_classification/.
- Dataset from this Kaggle Challenge: Mushroom Classification
- Decision Tree algorithm testing and visualizing.
- Notebooks can be found at people_interest_clustering/.
- Dataset from this Kaggle Challenge: Clustering Categorical Peoples Interests
- Clustering algorithms: K-means, DBSCAN and clusters visualization.
- Notebooks can be found at red_wine_quality/.
- Dataset from this Kaggle Challenge: Red Wine Quality
- Decision Tree algorithm testing and visualizing.