i10brook / uci_ml_archive Goto Github PK
View Code? Open in Web Editor NEWThis project forked from philipuit/uci_ml_archive
from the Bank Marketing data set from the UCI ML Archive (http://archive.ics.uci.edu/ml/datasets/Bank+Marketing). The data set has 20 feature columns plus one result column and we need to do some work to get it ready for further processing. 1. Reference the bank-additional-names.txt file for column types and what the names mean. 2. Make the following changes: Change column names to remove abbreviations, capitalize, add spaces, and generally make the names more "meaningful" to casual readers. Change column types to match the associated feature types. Replace word separators in strings like "-" or "." with spaces. 3. Missing Attribute Values: There are several missing values in some categorical attributes, all coded with the "unknown" label. These missing values can be treated as a possible class label or using deletion or imputation techniques.