Giter Site home page Giter Site logo

banu-prasanth-pulicharla / inducing-decision-trees Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 55 KB

Implement the decision tree learning algorithm using Information gain heuristic & Variance impurity heuristic

License: MIT License

Python 100.00%
python python3 machine-learning-algorithms machine-learning machinelearning decision-tree-classifier decision-trees pandas heuristics information-gain variance-impurity

inducing-decision-trees's Introduction

Inducing Decision Trees

Description

Implement and test the decision tree learning algorithm.

  • Download the two datasets available on repo. Each data set is divided into three sets: the training set, the validation set and the test set. Data sets are in CSV format. The first line in the file gives the attribute names. Each line after that is a training (or test) example that contains a list of attribute values separated by a comma. The last attribute is the class-variable. Assume that all attributes take values from the domain (0,1).

  • Implemented the decision tree learning algorithm. The main step in decision tree learning is choosing the next attribute to split on. Implemented the following two heuristics for selecting the next attribute -

  1. Information gain heuristic.
  2. Variance impurity heuristic described below. Let K denote the number of examples in the training set. Let K0 denote the number of training examples that have class = 0 and K1 denote the number of training examples that have class = 1. The variance impurity of the training set S is defined as:

pic1

Notice that the impurity is 0 when the data is pure. The gain for this impurity is defined as usual.

pic2

where X is an attribute, Sx denotes the set of training examples that have X = x and Pr(x) is the fraction of the training examples that have X = x (i.e., the number of training examples that have X = x divided by the number of training examples in S).

Implemented a function to print the decision tree to standard output. We will use the following format.

pic3

According to this tree, if wesley = 0 and honor = 0 and barclay = 0, then the class value of the corresponding instance should be 1. In other words, the value appearing before a colon is an attribute value, and the value appearing after a colon is a class value.

How to Run?

a. Place the file DecisionTree.py in a directory.
b. Use below command to run the script

python DecisionTree.py

c. Parameters for the script would be asked now. Please provide in below format -

<Training dataset Path><Validation dataset Path><Test dataset Path><Print Tree?Yes/No><Heuristic?H1/H2>  

Ex:-

D:\data_TEMP\\training_set.csv D:\data_TEMP\\validation_set.csv D:\data_TEMP\\test_set.csv yes h1

d. That's it! Output would show the accuracies for training, validation, test data. Along with decision tree based on input provided.

inducing-decision-trees's People

Contributors

banu-prasanth-pulicharla avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.