Giter Site home page Giter Site logo

artist-filtering's Introduction

Artist filtering with tag balancing


1) DESCRIPTION

We can use this R script to split a data set into n folds; where each artist, and its respective songs and tags, will appear in only one fold; and the number of artists and songs per tag will be balanced among the folds.


2) HOW TO USE IT?

In the R environment, we must load the R script using the command: source('artist_filtering_with_tag_balancing.R')

Then, we run the script using the following command: artist_filtering_with_tag_balancing(input='weaklabel.k3p30.h.csv', nfolds=3, minpertag=5)

We must setup 3 parameters in the script:

- input: The input file;
- nfolds: Number of folds we want to split our data set (default 3);
- minpertag: Number minimum of songs per tag that must be in each fold. When the tag does not have the minimum number of songs, the tag and its songs are removed from all folds (default 5).


3) INPUT DATA SET FORMAT

The input data is a text file with 3 columns separated by the special character "\t" (tab). In the first column we have the artists, in the second one we have the song file names, and finally, we have the tags in the third column. Following we have an illustrative example of input data format. In the directory "example/input/" we have a real example of input data file ("input.csv").

artist1	song1.mp3	tag1
artist1	song2.mp3	tag1
artist1	song3.mp3	tag1
artist2	song4.mp3	tag1
artist3	song5.mp3	tag1
artist4	song34.mp3	tag1
artist5	song44.mp3	tag1
artist6	song11.mp3	tag2
artist6	song23.mp3	tag2
artist7	song33.mp3	tag2
artist8	song55.mp3	tag2
artist8	song56.mp3	tag2
artist8	song57.mp3	tag2


4) OUTPUT FILES

The R script will output the 9 files described below:

- dataset_fold_1.csv: A file containing artists, songs and tags for the fold 1. The data fields are separated by the special character "\t" (tab);
- dataset_fold_2.csv: A file containing artists, songs and tags for the fold 2. The data fields are separated by the special character "\t" (tab);
- dataset_fold_3.csv: A file containing artists, songs and tags for the fold 3. The data fields are separated by the special character "\t" (tab);
- statistic_fold_1.txt: Statistics for the fold 1;
- statistic_fold_2.txt: Statistics for the fold 2;
- statistic_fold_3.txt: Statistics for the fold 3;
- songFiles.txt: List of song file names in all folds;
- tagVocabulary.txt: List of tags in all folds;
- tags_with_few_examples.txt: List of tags with less songs than the minimum number of songs per tag setup in the parameter "minpertag". Those tags with the respective artists and songs were removed from all folds.

A real example of output files can be found in the directory "example/output/".


artist-filtering's People

Contributors

maddomingues avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.