Giter Site home page Giter Site logo

tournament-fs's Introduction

Tournament Feature Selection

A hierarchical, iterative approach for feature selection. (Also, a discontinued research idea.) It evaluates the usefulness of features randomly partitioned in small groups of size m (m is an input parameter). For each of these groups, we determine feature importance and select a subset of features (see below how we do this). The selected features are joined, split again in groups of size m, their importance is evaluated etc. This is inspired by sports tournaments (hierarchical elimination) and crowdsourcing (make decisions based on small tasks). The code is an early prototype at best. Also, we haven't versioned our R packages / the environment, so you need to install stuff manually.

TournamentFeatureSelection.R provides:

  • a train-test split on a demo dataset
  • calls to Tournament FS and genetic-algorithm FS
  • calls to three classifiers: kNN, SVM and xgboost

The script is intended for interactive use, not for running it completely.

TournamentFeatureSelectionUtility.R provides several functions:

  • the generator for a demo (binary classification) dataset where it is clear which features are useful and which are not (target variable is Boolean combination of decision-tree-like splits on some features), with a certain amount of noise (target variable flipped)
  • classification performance measures (MCC and a penalized version considering the number of selected features)
  • variants of tournament FS:
    • classification-score-based: For each feature group of size m, run a classifier (implemented: kNN, logistic regression, Naive Bayes, SVM, xgboost) for all feature subsets of size k (input parameter). Aggregate classification performance:
      • local max: Per group, select the feature set of size k with the highest classification performance.
      • local mean: For each feature, take the average classification performance over all feature subsets of size k the feature is involved in. Per group, select the k features with the highest average classification performance.
      • local median: As before, just using the median instead of the mean.
      • global mean (median, max would also be possible): For each feature, take the average classification performance over all feature subsets of size k the feature is involved in. Instead of ranking this classification performance locally per group, create a global ranking over all groups and select the top k * m features.
    • penalized version of classification-score-based: Instead of just trying feature sets of size k per group, try all feature subsets of sizes 1..m. However, classification performance is penalized linearly with the number of features. The selection of the best feature subset per group works as in the unpenalized classification scoring (i.e., there are several options).
    • importance-score-based: For each group of size m, run a classifier (here: xgboost) with an internal importance measure once (without running it for smaller subsets). Aggregate this:
      • local xgboost-based: From each group, select k features with the highest importance.
      • global xgboost-based: Instead of doing an importance ranking locally per group, create a global ranking over all groups and select the top k * m features.
    • model-based with xgboost: For each group of size m, train a tree-based xgboost model and select all features which occur in any of the trees. Thus, the number of features selected per group is variable.

tournament-fs's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.