Giter Site home page Giter Site logo

renatoosousa / geneticalgorithmforfeatureselection Goto Github PK

View Code? Open in Web Editor NEW
104.0 3.0 30.0 17 KB

Search the best feature subset for you classification mode

License: MIT License

Python 100.00%
genetic-algorithm feature-selection evolutionary-algorithms evolutionary-algorithm genetic-optimization-algorithm machine-learning machinelearning classifier genetic-programming genetic-algorithm-framework

geneticalgorithmforfeatureselection's Introduction

Genetic Algorithm For Feature Selection

Search the best feature subset for you classification model

Description

Feature selection is the process of finding the most relevant variables for a predictive model. These techniques can be used to identify and remove unneeded, irrelevant and redundant features that do not contribute or decrease the accuracy of the predictive model.

In nature, the genes of organisms tend to evolve over successive generations to better adapt to the environment. The Genetic Algorithm is an heuristic optimization method inspired by that procedures of natural evolution.

In feature selection, the function to optimize is the generalization performance of a predictive model. More specifically, we want to minimize the error of the model on an independent data set not used to create the model.

Dependencies

Pandas

Numpy

scikit-learn

Deap

Usage

  1. Go to the repository folder
  2. Run:
python gaFeatureSelection.py path n_population n_generation

Obs:

  • path should be the path to some dataframe in csv format
  • n_population and n_generation must be integers
  • You can go to the code and change the classifier so that the search is optimized for your classifier.

Usage Example

python gaFeatureSelection.py datasets/nuclear.csv 20 6

Returns:

Accuracy with all features: 	(0.90833333333333344,)

gen	nevals	avg     	min     	max     
0  	20    	0.849167	0.683333	0.941667
1  	12    	0.919167	0.766667	0.966667
2  	18    	0.934167	0.908333	0.966667
3  	9     	0.941667	0.908333	0.966667
4  	9     	0.946667	0.908333	0.966667
5  	12    	0.955833	0.908333	0.966667
6  	12    	0.9625  	0.883333	0.966667
Best Accuracy: 	(0.96666666666666679,)
Number of Features in Subset: 	5
Individual: 		[1, 1, 1, 0, 0, 1, 1, 0, 0, 0]
Feature Subset	: ['cost', 'date', 't1', 'pr', 'ne']


creating a new classifier with the result
Accuracy with Feature Subset: 	0.966666666667

Fonts

  1. This repository was heavily based on GeneticAlgorithmFeatureSelection
  2. For the description was used part of the introduction of Genetic algorithms for feature selection in Data Analytics. Great text.

Author: Renato Sousa

geneticalgorithmforfeatureselection's People

Contributors

renatoosousa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

geneticalgorithmforfeatureselection's Issues

Not choosing best individual

Dear Renato Sousa,

Iโ€™m just writing to inform you that while testing your code, I found that it does not return the best individual ( the individual with the highiest accuracy).
A screenshot of the execution trace is attached below.

Waiting for your feedback,

Thank you in advance.

GA

Using GA for optimization of CNN model

I am doing research on image classification. As in this code you have used LogisticRegression. Can i use Keras or tensorflow cnn model to optimize it's accuracy?
Thank you.

TypeError: '>' not supported between instances of 'tuple' and 'float'

Traceback (most recent call last):
File "gaFeatureSelection.py", line 129, in
accuracy, individual, header = bestIndividual(hof, X, y)
File "gaFeatureSelection.py", line 84, in bestIndividual
if(individual.fitness.values > maxAccurcy):
TypeError: '>' not supported between instances of 'tuple' and 'float'

Can you help me with this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.