Giter Site home page Giter Site logo

bnfs-qa's Introduction

Baesyan Network Feature Selection (BNFS)

Feature selection for classification and regression.

Final application project at Trento University

Table of Contents

Introduction
Installation
Running BNFS

Introduction

Bayesian Network Feature Selection (BNFS) implements a way to solve the problem of feature selection using the Bayesian Networks. Bayesian networks can be used to model relationships between variables in a dataset and estimate the strength of these relationships through probabilities. These probabilities can then be used as a measure of the importance of the features in the dataset and therefore select the most important features for the prediction problem.

Briefly, a pipeline is implemented as follows:

  1. Preprocess the data: clean and normalize the data to ensure that it is in a format that can be used to train a Bayesian network. This step also include the discretization of the data.
  2. Bayesian Network Structure Learning: Decide which variables should be included as nodes in the network and how they should be connected.
  3. Markov blanket: determines the Markov blanket of the target feature.
  4. Feature Selection: selects the features from the Markov blanket that are most relevant for predicting the target feature.

Installation

Prerequisites:

Make sure you have installed all of the following prerequisites on your development machine:

  • python3.6+
  • pip3

BNFS installation:

pip3 install bnfs

Running BNFS

Step 1: data preparation

Before running the tool, you should prepare the csv table containing the actual sample and The target variable(the variable whose values are modeled and predicted by other variables):

Example
Feature 1 Feature 2 Feature 3 TARGET
17.27 3 ETVDA True
44.59 105 FBAER False
... ... ... ...
26.89 19 DDFBDF False
15.56 298 CSDSD True

Please note that the type of each feature column could be any kind (integer, float, string) and that the TARGET value is the last column of the dataframe.

Step 2: creating configuration file

Configuration file is a json file containing all customizable parameters for the feature selection algorithm.

Available parameters

🔴!NOTE! - All paths to files / directories can be either relative to the configuration file directory or absolute paths

  • data_path Path to csv table of the data.

  • output_dir Path to directory for output files. If it doesn't exist, it will be created.

  • random_state Random seed (set to an arbitrary integer for reproducibility).

  • verbose If true, print info messages at each step (discretization,bnlearn and markov blanket search).

  • full_Markov_blanket If true, the feature selected will be the union of the nodes parent, children and the children's parent, otherwise only parent and children.

Discretization

  • discretize If flase skips the discretization steps.

  • labels List of index position of the feature that are categorical and therfore need a label encoding.

  • n_bins Number of bins for discretization.

  • discretizer_strategy Strategy used to define the widths of the bins. {‘uniform’, ‘quantile’, ‘kmeans’}

  • keep_file If true generates a csv file with the discretized dataset.

  • divide_et_impera If true ecxecute the steps for the divide et impera approach.

Bayesian Network Structure Learning

  • dei_n Number of splits for the divide et impera approach.

  • bnsl_data_path Path to csv table of the discretized data. (Used when discretization step is skipped)

  • bnsl_strategy Strategy used to learn the structure of the Bayesian network from data. {QA, SA, bnlearn}

QA_kwargs

  • reads Number of reads for the annealing method

  • annealing_time Time in microseconds of quantum annealing time per read

bnlearn_kwargs

  • metric The scoring function indicates how well the Bayesian network fits the data. {k2, bic, bdeu}

  • search_algorithm The search algorithm to optimize throughout the search space of all possible DAGs. {ex, hc, cl, tan, cs, naivebayes}

Step 4: running the pipeline

When input data and configuration file are ready,
the algorithm can be executed as follows -

bnfs -c <config_file>

This will generate multiple info messages in the console and a summary file in the specified output folder:

  • res.txt: this file contains the structure as adjency matrix of the Bayesian network learned and a list of feature selected using the Markov blanket method.

bnfs-qa's People

Contributors

mathiasdallapalma avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.