Giter Site home page Giter Site logo

dt-svm-nb's Introduction

DT-SVM-NB

DMML ASSIGNMENT CLASSIFICATION PROBLEM with DECISION TREE,SVM,NAIVE BAYESIAN CLASSIFIER

ABOUT: We created a ipython project with kernel python 3.7 on a data set availavle at: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing . We have used the bank-additional-full.csv file . The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution.The classification goal is to predict if the client will subscribe a term deposit (variable y).We have made 3 classification models : A Dicision Tree , A Support Vector Machine and a Naive Bayesian classifier.

Libraries : The following python3 libraries have been used: numpy pandas time math sklearn.tree.DecisionTreeClassiier sklearn.model_selectio.cross_validate sklearn.model_selection.train_test_split sklearn.metrics.accuracy_score sklearn.metrics.precision_recall_fscore_support sklearn.neighbors.DistanceMetric

PARAMTERS USED: For Decision Tree, Max depth of tree =7,(To avoid overfitting) Min Samples in a leaf =50, Criterion = Gini Index and Class weightage ={yes:2,No:1} For SVM, Kelnel = RBF class weightage={yes:2.35,No:1} For Naive Bayesian, no parameters have been taken as input.

PROGRAM DESCRIPTION: Funcions: sampling(dataframe,percentage): returnes percentage% of the dataframe after sampling

	preprocess_word_to_num(dataframe):
		takes a dataframe as input and returnes a dataframe with all word class features mapped to numeric class features.
	      A simple Example:
		     if column=="y":
			df_copy[column]=df_copy[column].replace({"yes":1,"no":0})

	normalize(dataframe):
		Normalize all the numeric fearure data pts in the dataframe by
	      replacing xi with (xi - ū )/σ where, ū : feature mean, σ: feature standerd deviation. 

(N.B: NORMALIZATIO IS IMPORTANT SVM AS OTHERWISE IT SVM CANT CLUSTER THE DATA UNBAISEDLY) drop_column(dataframe,col): Drops unnecessary features from dataframe.

Clustering:
	The idea:
		Since most people spend according to their income,their agreeing to a term deposit depends greatly on monthly savings 				capability. In a consumer based economy,the savings of a person greatly varies on the price of daily necessary household 				items, employment and other economic conditions.Consumer price index, consumer confidence index,employment variation rate 				greatly estimate these factors.

	The clustering:
			Since the data is skewd with outcome results no:yes =(approx)9:1 we needed to do undersampling , but random 				undersamplingmay loose important data.So,we have divided the dataset on monthly  basis and measured  distance of each pair 				of data points based on taxi-cab metric,Thus data on consumer price index and other monthly and quarterly data are not lost.
		For every data point we have considered a 2 radius neighbourhood and if it contnains ≥ 4 pts in it then we have 			discarded all the points in that neighbourhood except the centre. Since,this way of clustering is very premitive some 				centers are also getting deleted.But by taking the neighbourhood radius =2 we are not discarding any whole 				cluster.

(N.B: WHILE CLUSTERING WE HAVE USED ONLY DATA PTS WITH NEGATIVE OUTCOME SO THAT THE DATA SET BECOMES BLANCED)

   Dropping features:
	Have dropped some features that were not contributing more to the classification.

(N.B: DURATON HAS NOT BEEN DROPPED)

   Model fitting and cross validation:
	We have used sklearn.model_selection.cross_validate for cross validation of the different model trained.

OUTCOME: In decision tree we achived accuracy of ~ 0.87, recall ~ 0.82,precision ~ 0.63, fscore ~ 0.71 . In SVM we achived accuracy of ~ 0.87, recall ~ 0.83,precision ~ 0.6, fscore ~ 0.7 . In Naive bayesian we achived accuracy of ~ 0.81, recall ~ 0.63,precision ~ 0.51, fscore ~ 0.56. The detailed output is stored in: https://drive.google.com/drive/folders/1-c0UFsi4qBCb7fJ6qxstzZttHCa5sFon

CONCLUSION: The mdeels have used the "duration" feature,which implies this results can only be used as benchmarking on real-time predicting.

dt-svm-nb's People

Contributors

shadow23-cmi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.