Giter Site home page Giter Site logo

gnssnwo / noisefiltersr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from melissakey/noisefiltersr

0.0 1.0 1.0 389 KB

This is a modified copy of the NoiseFiltersR package which should run significantly faster for many algorithms. I take no credit for the algorithms themselves or the implementation - I just sped a few things up.

R 100.00%

noisefiltersr's Introduction

Description

NoiseFiltersR contains an extensive implementation of state-of-the-art and classical label noise preprocessing algorithms for classification problems. Such a collection was missing for R statistical software.

Namely, NoiseFiltersR includes 30 label noise filters. All of them are appropriately documented, with a general explanation of the method and the exact reference where it was first published. Moreover, they can be called in a R-user-friendly manner, and their results are unified by means of the filter class, which also benefits from adapted print and summary methods.

Installation

Use install.packages to install NoiseFiltersR and its dependencies from CRAN:

install.packages("NoiseFiltersR")

Once installed, use the command library to attach the package:

library("NoiseFiltersR")

Example of use

Once the package is installed and attached, the user can apply any of the implemented algorithms.

Next instruction shows how to use the well-known Iterative Partitioning Filter (IPF) (Khoshgoftaar & Rebours, 2007) to filter out class noise from the dataset iris. The formula allows us to indicate the classification variable. Default parameters for the algorithm are considered:

out <- IPF(Species~., data = iris)

Then, the variable out is an object of class filter. This is a list with seven elements:

  • cleanData: a data frame containing the filtered dataset.
  • remIdx: a vector of integers indicating the indexes for removed instances (i.e. their row number with respect to the original data frame).
  • repIdx: a vector of integers indicating the indexes for repaired/relabelled instances (i.e. their row number with respect to the original data frame).
  • repLab: a factor containing the new labels for repaired instances.
  • parameters: a list containing the tuning parameters used for the filter.
  • call: an expression that contains the original call to the filter.
  • extraInf: a character that includes additional relevant information not covered by previous items.

To appropriately display the information contained in a filter object, general functions print and summary can be used (more details about their output can be found in the package vignette):

print(out)
summary(out)

Finally, all the implemented algorithms can also be used without a formula argument, just indicating the dataset to be preprocessed and the column that contains the classification variable (last column is assumed by default):

out <- IPF(iris, classColumn = 5)

For more specific information on how to use each filter, please refer to the functions documentation page and the examples contained therein. For a general overview of the NoiseFiltersR package, please look up the associated vignette.

noisefiltersr's People

Contributors

jluengo avatar

Watchers

James Cloos avatar

Forkers

dinesh0612

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.