Giter Site home page Giter Site logo

jingweitoo / wrapper-feature-selection-toolbox-python Goto Github PK

View Code? Open in Web Editor NEW
232.0 4.0 68.0 112 KB

This toolbox offers 13 wrapper feature selection methods (PSO, GA, GWO, HHO, BA, WOA, and etc.) with examples. It is simple and easy to implement.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
whale-optimization-algorithm particle-swarm-optimization genetic-algorithm differential-evolution salp-swarm-algorithm grey-wolf-optimizer sine-cosine-algorithm wrapper feature-selection classification

wrapper-feature-selection-toolbox-python's Introduction

Jx-WFST : Wrapper Feature Selection Toolbox

License GitHub release


"Toward Talent Scientist: Sharing and Learning Together" --- Jingwei Too


Wheel

Introduction

  • This toolbox offers 13 wrapper feature selection methods
  • The Demo_PSO provides an example of how to apply PSO on benchmark dataset
  • Source code of these methods are written based on pseudocode & paper

Usage

The main function jfs is adopted to perform feature selection. You may switch the algorithm by changing the pso in from FS.pso import jfs to other abbreviations

  • If you wish to use particle swarm optimization ( PSO ) then you may write
from FS.pso import jfs
  • If you want to use differential evolution ( DE ) then you may write
from FS.de import jfs

Input

  • feat : feature vector matrix ( Instance x Features )
  • label : label matrix ( Instance x 1 )
  • opts : parameter settings
    • N : number of solutions / population size ( for all methods )
    • T : maximum number of iterations ( for all methods )
    • k : k-value in k-nearest neighbor

Output

  • Acc : accuracy of validation model
  • fmdl : feature selection model ( It contains several results )
    • sf : index of selected features
    • nf : number of selected features
    • c : convergence curve

Example 1 : Particle Swarm Optimization ( PSO )

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from FS.pso import jfs   # change this to switch algorithm 
import matplotlib.pyplot as plt


# load data
data  = pd.read_csv('ionosphere.csv')
data  = data.values
feat  = np.asarray(data[:, 0:-1])   # feature vector
label = np.asarray(data[:, -1])     # label vector

# split data into train & validation (70 -- 30)
xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label)
fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest}

# parameter
k    = 5     # k-value in KNN
N    = 10    # number of particles
T    = 100   # maximum number of iterations
w    = 0.9
c1   = 2
c2   = 2
opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'w':w, 'c1':c1, 'c2':c2}

# perform feature selection
fmdl = jfs(feat, label, opts)
sf   = fmdl['sf']

# model with selected features
num_train = np.size(xtrain, 0)
num_valid = np.size(xtest, 0)
x_train   = xtrain[:, sf]
y_train   = ytrain.reshape(num_train)  # Solve bug
x_valid   = xtest[:, sf]
y_valid   = ytest.reshape(num_valid)  # Solve bug

mdl       = KNeighborsClassifier(n_neighbors = k) 
mdl.fit(x_train, y_train)

# accuracy
y_pred    = mdl.predict(x_valid)
Acc       = np.sum(y_valid == y_pred)  / num_valid
print("Accuracy:", 100 * Acc)

# number of selected features
num_feat = fmdl['nf']
print("Feature Size:", num_feat)

# plot convergence
curve   = fmdl['c']
curve   = curve.reshape(np.size(curve,1))
x       = np.arange(0, opts['T'], 1.0) + 1.0

fig, ax = plt.subplots()
ax.plot(x, curve, 'o-')
ax.set_xlabel('Number of Iterations')
ax.set_ylabel('Fitness')
ax.set_title('PSO')
ax.grid()
plt.show()

Example 2 : Genetic Algorithm ( GA )

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from FS.ga import jfs   # change this to switch algorithm 
import matplotlib.pyplot as plt


# load data
data  = pd.read_csv('ionosphere.csv')
data  = data.values
feat  = np.asarray(data[:, 0:-1])
label = np.asarray(data[:, -1])

# split data into train & validation (70 -- 30)
xtrain, xtest, ytrain, ytest = train_test_split(feat, label, test_size=0.3, stratify=label)
fold = {'xt':xtrain, 'yt':ytrain, 'xv':xtest, 'yv':ytest}

# parameter
k    = 5     # k-value in KNN
N    = 10    # number of chromosomes
T    = 100   # maximum number of generations
CR   = 0.8
MR   = 0.01
opts = {'k':k, 'fold':fold, 'N':N, 'T':T, 'CR':CR, 'MR':MR}

# perform feature selection
fmdl = jfs(feat, label, opts)
sf   = fmdl['sf']

# model with selected features
num_train = np.size(xtrain, 0)
num_valid = np.size(xtest, 0)
x_train   = xtrain[:, sf]
y_train   = ytrain.reshape(num_train)  # Solve bug
x_valid   = xtest[:, sf]
y_valid   = ytest.reshape(num_valid)  # Solve bug

mdl       = KNeighborsClassifier(n_neighbors = k) 
mdl.fit(x_train, y_train)

# accuracy
y_pred    = mdl.predict(x_valid)
Acc       = np.sum(y_valid == y_pred)  / num_valid
print("Accuracy:", 100 * Acc)

# number of selected features
num_feat = fmdl['nf']
print("Feature Size:", num_feat)

# plot convergence
curve   = fmdl['c']
curve   = curve.reshape(np.size(curve,1))
x       = np.arange(0, opts['T'], 1.0) + 1.0

fig, ax = plt.subplots()
ax.plot(x, curve, 'o-')
ax.set_xlabel('Number of Iterations')
ax.set_ylabel('Fitness')
ax.set_title('GA')
ax.grid()
plt.show()

Requirement

  • Python 3
  • Numpy
  • Pandas
  • Scikit-learn
  • Matplotlib

List of available wrapper feature selection methods

  • Note that the methods are altered so that they can be used in feature selection tasks
  • The extra parameters represent the parameter(s) other than population size and maximum number of iterations
  • Click on the name of method to view how to set the extra parameter(s)
  • Use the opts to set the specific parameters
  • If you do not set extra parameters then the algorithm will use default setting in here
No. Abbreviation Name Year Extra Parameters
13 hho Harris Hawk Optimization 2019 No
12 ssa Salp Swarm Algorithm 2017 No
11 woa Whale Optimization Algorithm 2016 Yes
10 sca Sine Cosine Algorithm 2016 Yes
09 ja Jaya Algorithm 2016 No
08 gwo Grey Wolf Optimizer 2014 No
07 fpa Flower Pollination Algorithm 2012 Yes
06 ba Bat Algorithm 2010 Yes
05 fa Firefly Algorithm 2010 Yes
04 cs Cuckoo Search Algorithm 2009 Yes
03 de Differential Evolution 1997 Yes
02 pso Particle Swarm Optimization 1995 Yes
01 ga Genetic Algorithm - Yes

wrapper-feature-selection-toolbox-python's People

Contributors

jingweitoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

wrapper-feature-selection-toolbox-python's Issues

Wrapper feature selection for multi-class and continuous variables?

Hi there,

Thank you for your hard works on Python version of the wrapper feature selection. As I understood, all the codes are for binary problems. You said we can use these codes adapting to multi-class and/or continuous variables but need to rewrite all functions.

So, will you release another version of this kind of wrapper feature selection for multi-class and/or continuous variables in the future? We used the Genetic Algorithm and PSO previously, and loving to extend the experiments to other metaheuristic algorithms.

To the best,
Thang

Fitness Function

Hi There,
Which fitness function is used by the algorithm, is it overall accuracy or something else?
Could you please explain me clearly, since I am new in the topic of wrapper feature selection methods !!!

Adding another .csv file to PSO example

Hi! Really appreciate Mr. Jingwei Too for this. I tried the file Demo_PSO with ionosphere.csv and it ran perfectly well... But when I replaced the ionosphere.csv file with another file, there was an error occur:

ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

How to solve this problem?

regression problem

when i changed knn with linear regression and changed accuracy with r2 and mean squared error
i got this error ( ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.)
my dataset consist of 5 input and 1 output and all numerical

Error

parameter

IndexErrorTraceback (most recent call last)
in ()
1
2 print("parameter")
----> 3 jfs(xtrain, ytrain, opts)

C:\Users\karkorum computer\FS\FS\hho.pyc in jfs(xtrain, ytrain, opts)
59 # Parameters
60 ub = 1
---> 61 lb = 0
62 thres = 0.5
63 beta = 1.5 # levy component

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

INSTALL

from FS.pso import jfs

how to pip install this libray in colab .it shows no library found.could please give solution

feature selection

Hello, I would like to take the selected optimal features separately and put them into a neural network for classification, how can I achieve this?

Install

How can i install the library via conda with pip?

ValueError

Hiiiiiiii,I wonder why the following error appears after I copy your code and run it:
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

A bug in function `boundary` (pso.py).

The origin boundary will return the boundary lb and ub if x is overspill.
however, what we need limited is the overspill value instead of the whole vector.

alpha value in fitness

hi
can you please tell me why alpha value in fitness function is chosen as 0.99 in specific?

Question?

Hi JingweiTOO,

Thank you for your hard works on Python version of the wrapper feature selection.
For feature selection, shouldn't cross-validation be added when the fitness function is calculated.
How should the fitness function change if for regression problems? I would like to give some hints or code.

What does k-value used for?

In README.md says the k-value is used in KNN and the example use KNN so we can determine how much k-value we wanna use. So, what happened when didn't use KNN? Because i want to use this toolbox with XGBoost.

To change fitness with XGBoot or Random forest?

Hello Jing Wei,
Thank you for the hard work, could you please advise how to implement your wrapper with a different fitness function such as the XGBoost or Random forest. I can see there're quite a lot of code steps in the folder named functionHO.py and I don't want to mess things up?

Best Regards,
rpr.

About cost in FunctionHO.py

Hi good morning,
I wonder what is cost code in here and what does it do?

cost = alpha * error + beta * (num_feat / max_feat)

And i know that you use error rate as fitness, but why use cost value as fitness instead of error rate?

Question

Hello dear Jingwei Too.
I am a graduate student of data science. And I used your code to implement my dissertation.
I want to know if you have determined the parameters of the algorithms based on the articles?
Thank you in advance for your response.
[email protected]

Using selected features with other Neural Network models like LSTM

Hello Jing Wei. I have to say that was an amazing content for feature selection and maybe the best. I just want to ask you I am trying to make an hybridmodel with PSO and LSTM. I think your algorithm suits well. What do you think? Is itpossible to find features with using PSO (with the error that calculated by K-NN) and model with selected features by using LSTM.

solution please

when i put in spyder
from FS.pso import jfs
message appear to me ( No module named 'FS' )

A huge bug in `error_rate` function, the evaluation is wrong.

the origin code is:

    # Number of instances
    num_train = np.size(xt, 0)
    num_valid = np.size(xv, 0)
    # Define selected features
    xtrain  = xt[:, x == 1]
    ytrain  = yt.reshape(num_train)  # Solve bug
    xvalid  = xv[:, x == 1]
    yvalid  = yv.reshape(num_valid)  # Solve bug   

However, the code such as xt[:, x == 1] will not select the features. The solve bug codes are still wrong.

In face, the code should be:

    xtrain  = xt[:, np.where(x == 1)[0]]
    xvalid  = xv[:, np.where(x == 1)[0]]

    # Training
    mdl     = KNeighborsClassifier(n_neighbors = k)
    mdl.fit(xtrain, yt)
    # Prediction
    ypred   = mdl.predict(xvalid)
    acc     = np.sum(yv == ypred) / len(yv)

Binary Conversion

Hi JingweiToo
First I want to really congratulate and thank you for this great contribution.
Can you please elaborate about the significance and need for this binary conversion function in the HHO.py module
Thanks in advance..

Number of Selected Feature

Hi there,

Is it possible to specify the number of selected features? Otherwise, the algorithm tests all features, alone?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.