This project aims to classify wines as either red or white using various machine learning algorithms. The dataset used is the Wine dataset.
- Introduction
- Dataset
- Preprocessing
- Models and Results
- Conclusion
- Installation
- Usage
- Contributing
- License
The goal of this project is to predict whether a given wine is red or white. We applied several machine learning algorithms and evaluated their performance.
The dataset used in this project contains various features of wines, such as acidity, sugar levels, and pH. It is divided into two classes: red wine and white wine.
Different preprocessing techniques were applied to the data, including imputation and scaling. The specific preprocessing steps varied for different models to optimize their performance.
Here are the models used and their respective results:
-
K-Nearest Neighbors (KNN)
- Accuracy: 99.67%
- Hyperparameters: imputer=1, n_neighbors=3, p=1, weights=distance
- Best Parameters:
{'p': 1, 'weights': 'distance'}
- Model:
KNeighborsClassifier(n_neighbors=11, p=1, weights='distance')
-
Logistic Regression
- Accuracy: 97.06%
- Hyperparameters:
{'C': 10.0, 'penalty': 'l1'}
- Model:
LogisticRegression(C=10.0, max_iter=1000000, penalty='l1', solver='saga')
-
Naive Bayes Gaussian
- Accuracy:
- Without Scaling: 96.88%
- Min-Max Scaling: 97.48%
- Min-Max Scaling + Power Transformer: 99.02%
- Accuracy:
-
Naive Bayes Multinomial
- Accuracy:
- Without Scaling: 92.21%
- With Scaling: 75.76%
- Accuracy:
-
Naive Bayes Bernoulli
- Accuracy:
- Without Scaling: 77.58%
- With Scaling: 77.62%
- Accuracy:
-
Support Vector Machine (SVM)
- Accuracy: 99.67%
- Hyperparameters:
{'C': 1.0, 'gamma': 1.0, 'kernel': 'rbf'}
- Model:
SVC(gamma=1.0)
-
Decision Tree
- Accuracy: 99.67%
-
Bagging Classifier with Support Vector Machine (SVM)
- Accuracy: 99.72%
- Model:
BaggingClassifier(estimator=SVC(), n_estimators=10, random_state=0)
The Bagging Classifier with SVM achieved the highest accuracy of 99.72% in predicting whether a wine is red or white. The preprocessing techniques such as scaling and power transformation significantly improved the performance of Naive Bayes models.