When training a classifier, if the input contain labels that are clustered, the precis

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Low performance of trained classfier when the input labels are clustered about fasttext HOT 4 CLOSED

facebookresearch commented on June 15, 2024 18

Low performance of trained classfier when the input labels are clustered

from fasttext.

Comments (4)

riyadparvez commented on June 15, 2024 1

Wouldn't it be easier to just shuffle training data before training in fastText itself so that users do not get surprised?

from fasttext.

adam2326 commented on June 15, 2024 1

@riyadparvez No since fastText does not accept IDs separately it would be difficult to join data back if it was part of a larger data set. I dont want to append an ID to the text itself. So shuffle-> break text off -> fasttext -> join. I can count on order this way.

from fasttext.

EdouardGrave commented on June 15, 2024

Yes, this is an expected behavior. FastText uses stochastic gradient descent to learn the model, and the examples are processed "in order". Thus, it is important to shuffle the training data, so that not all the examples of a given class are seen first, then the examples of a second class, etc.

from fasttext.

xiamx commented on June 15, 2024

Thanks for the clarification 👍

from fasttext.

Low performance of trained classfier when the input labels are clustered about fasttext HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent