Hi David I'm seeing the following error, when I try to run your on my test

This because of the nature of stratification . The <code class="n

I am having the same problem as <a class="user-mention notranslate" data-hovercard-typ

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than about text-classification HOT 9 CLOSED

davidsbatista commented on July 27, 2024 5

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than

from text-classification.

Comments (9)

valkiii commented on July 27, 2024 93

Try using x_train, x_test, y_train, y_test = train_test_split(data_x,data_y,test_size=0.33, random_state=42) . It should work

from text-classification.

SHi-ON commented on July 27, 2024 52

This because of the nature of stratification. The stratify parameter set it to split data in a way to allocate test_size amount of data to each class. In this case, you don't have sufficient class labels of one of your classes to keep the data splitting ratio equal to test_size.

from text-classification.

WajdiBenSaad commented on July 27, 2024 18

This because of the nature of stratification. The stratify parameter set it to split data in a way to allocate test_size amount of data to each class. In this case, you don't have sufficient class labels of one of your classes to keep the data splitting ratio equal to test_size.

I confirm the above explanation. I have encountered this situation when dealing with a class that has a very low count . You can either take a random sample (not stratified) or try different test_size values, to be able to have an adequate size that could hold all your various labels.

from text-classification.

fatemerhmi commented on July 27, 2024 1

It might be because you have a multi-label dataset. Which in this case you can use this tutorial from sklearn.

from text-classification.

fingoldo commented on July 27, 2024 1

at has a very low count . You can either take a random sample (not stratified) or try different test_size values, to be able to have an adequate size that could hold all your various labels.

I think sklearn should handle such situations somehow automatically. It's frustrating and not clear immediately that it can be solved by slight fine-tuning of test_size.

from text-classification.

osancus commented on July 27, 2024

I am having the same problem as @vikramkone can any suggest how i can solve it?

from text-classification.

shishir13 commented on July 27, 2024

I too faced the same issue. I was trying to solve the spam text classification problem wherein mostly we have less number of spam messages. But on seeing the count of spam and ham messages, I found out that they were both equal in numbers, and without looking into the count I applied stratify = data['label'], I removed the stratify part and I issue was solved.

from text-classification.

jiviteshoo7 commented on July 27, 2024

How can we fix this? I think random_state would be any integer because it only take permutation seeds from it.

from text-classification.

Adesoji1 commented on July 27, 2024

It might be because you have a multi-label dataset. Which in this case you can use this tutorial from sklearn.

Nope, my fake labels are 1,114 while real data labels are 475, now i i know this is the reason for ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. @WajdiBenSaad is 101% correct. i am doing a binary classification problem

from text-classification.

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than about text-classification HOT 9 CLOSED

Comments (9)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent