Giter Site home page Giter Site logo

python-id3-trees's Introduction

python-trees

python implementation of id3 classification trees. id3 is a machine learning algorithm for building classification trees developed by Ross Quinlan in/around 1986.

The algorithm is a greedy, recursive algorithm that partitions a data set on the attribute that maximizes information gain. The information gain of attribute A is defined as the difference between the entropy of a data set S and the size weighted average entropy for sub datasets S' of S when split on attribute A.

This implementation was informed by Dr. Lutz Hamel's notes here. A widely cited text on decision trees is Machine Learning, by Tim Mitchell, you can find pages relevant to id3 here.

There are also some readable notes on information gain from University of Washington here.

Running the code

Run the code with the python interpreter:

python id3.py ./resources/<config.cfg>

Where config.cfg is a plain text configuration file. The format of the config file is a python abstract syntax tree representing a dict with the following fields:

{ 'data_file' : '\\resources\\tennis.csv', 'data_project_columns' : ['Outlook', 'Temperature', 'Humidity', 'Windy', 'PlayTennis'], 'target_attribute' : 'PlayTennis' }

You have to specify:

  • relative path to the csv data_file
  • which columns to project from the file (useful if you have a large input file, and are only interested in a subset of columns)
  • the target attribute, that you want to predict.

Docker

FROM python:3.6.8-alpine

WORKDIR /usr/src/app
RUN apk add --no-cache git && git clone https://github.com/tofti/python-id3-trees.git

WORKDIR /usr/src/app/python-id3-trees

ENTRYPOINT [ "python", "id3.py" ]

To run the built in examples:

docker run tofti-id3-trees ./resources/tennis.cfg

Or your own example after creating a config file, and csv data file:

docker run -v <localpath>:/<dockerpath>" tofti-id3-trees <dockerpath>/config.cfg

e.g.

docker run -v "/c/Users/tofti/dvol/id3:/data" tofti-id3-trees /data/credithistory_test.cfg

Examples

  1. tennis.cfg is the 'Play Tennis' example from Machine Learning, by Tim Mitchell, also used by Dr. Lutz Hamel in his lecture notes, both referenced above.
  2. credithistory.cfg is the credit risk assement example from Artificial Intelligence: Structures and Strategies for Complex Problem Solving (6th Edition), Luger, see Table 10.1 & Figure 10.14 (full text is available online asof 11/19/2017).

Results

results

TODO

  • Add code to classify data.
  • Add code to prune rules (C4.5 modifications)

python-id3-trees's People

Contributors

tofti avatar

Stargazers

Ramesh Adhikari avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.