Giter Site home page Giter Site logo

cissagatto / jaccard Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 346 KB

This code generate partitions for a multilabel dataset using the Jaccard Index similarity measure. We use HCLUST with 6 linkage metrics to generate several partitions. You may build the partition with the highest coefficient. This code also provide an analysis about the partitioning.

License: GNU General Public License v3.0

R 100.00%
jaccard-index machine-learning multilabel-classification multilabel-partition partitioning similarity-measures

jaccard's Introduction

Jaccard

This code generate partitions for a multilabel dataset using the Jaccard Index similarity measure. We use HCLUST with 6 linkage metrics to generate several partitions. You may build the partition with the highest coefficient. This code also provide an analysis about the partitioning

Preparing your experiment

Step-1

Confirms if the folder .......

Step-2

Copy this code and place it where you want. The folder configurations is "~/jaccard"

Step-3

A file called datasets-original.csv must be in the root project folder. This file is used to read information about the datasets and they are used in the code. We have 90 multilabel datasets in this .csv file. If you want to use another dataset, please, add the following information about the dataset in the file:

Parameter Status Description
Id mandatory Integer number to identify the dataset
Name mandatory Dataset name (please follow the benchmark)
Domain optional Dataset domain
Instances mandatory Total number of dataset instances
Attributes mandatory Total number of dataset attributes
Labels mandatory Total number of labels in the label space
Inputs mandatory Total number of dataset input attributes
Cardinality optional
Density optional
Labelsets optional
Single optional
Max.freq optional
Mean.IR optional
Scumble optional
TCS optional
AttStart mandatory Column number where the attribute space begins*
AttEnd mandatory Column number where the attribute space ends
LabelStart mandatory Column number where the label space begins
LabelEnd mandatory Column number where the label space ends
Distinct optional
xn mandatory Value for Dimension X of the Kohonen map
yn mandatory Value for Dimension Y of the Kohonen map
gridn mandatory X times Y value. Kohonen's map must be square
max.neigbors mandatory The maximum number of neighbors is given by LABELS -1
  • Because it is the first column the number is always 1.

STEP 4

You need to have installed all the R packages required to execute this code on your machine. Check out which are needed in the file libraries.R. This code does not provide any type of automatic package installation! You can use the Conda environment that I created to perform this experiment. Below are the links to download the files.

| download txt | download yml | download yaml |

Run

To run, first enter the folder ~/jaccard/R in a terminal and the type: (check this information)

Rscript jaccard.R [number_dataset] [number_cores] [number_folds] [validation] [folder]

Where:

number_dataset is the dataset number in the datasets.csv file

number_cores is the number of cores that you wanto to use in paralel

number_folds is the number of folds you want for cross-validation

validation 0 if you dont want the validation set and 1 if you want

folder temporary folder like SHM or SCRATCH to speed up the process

Folder Structure

Acknowledgment

  • This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
  • This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.
  • The authors also thank the Brazilian research agencies FAPESP financial support.

Contact

[email protected]

Links

| Site | Post-Graduate Program in Computer Science | Computer Department | Biomal | CNPQ | Ku Leuven | Embarcados | Read Prensa | Linkedin Company | Linkedin Profile | Instagram | Facebook | Twitter | Twitch | Youtube |

Thanks

jaccard's People

Contributors

cissagatto avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.