Giter Site home page Giter Site logo

Comments (9)

haifengl avatar haifengl commented on April 28, 2024

Sure, you can remove them. At least the first one. But I am little worried about this case. This function ( iterative biconjugate gradient method) usually should converge pretty fast and thus not much print out. Please check if your LASSO model works well for test data. Thanks!

from smile.

Xyclade avatar Xyclade commented on April 28, 2024

The Lasso model is not working well for the test data, but that is the actual goal this time. I'm writing an example of how it can go wrong, thank you for the heads up though! 👍

from smile.

haifengl avatar haifengl commented on April 28, 2024

Interesting. It doesn't work well because LASSO is not a good fit for the problem? Or the parameter settings (e.g. regularization factor)?

from smile.

Xyclade avatar Xyclade commented on April 28, 2024

It's because there is too few datapoints and no actual statistical relation within the data.

As John Tukey once said:

The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

This is the case in my example, to make people aware that Machine Learning cannot perform miracles on every single dataset.

from smile.

haifengl avatar haifengl commented on April 28, 2024

Is your data size less than the dimensionality?

from smile.

Xyclade avatar Xyclade commented on April 28, 2024

yes, its 100 datapoints (with rank 1 to 100) with 27000 + features, this can never go well :) But since it's an example for my blog rather than an actual dataset that I want to use for predictions, I still worked it out to show what goes wrong when you do these kind of things.

In the end result the trained LASSO model should predict a rank value for a new datapoint, but it predicts worse than just predicting the average (50). This makes it a perfect example for what can go wrong if you have no clue what you are doing 👍

from smile.

haifengl avatar haifengl commented on April 28, 2024

That is the true reason. It is known as small sample size problem. I have a paper (http://lectures.molgen.mpg.de/networkanalysis13/LDA_cancer_classif.pdf) to deal with it.

from smile.

Xyclade avatar Xyclade commented on April 28, 2024

Cool tnx! I'll read that and see if I can incorporate it if that's ok with you?

from smile.

haifengl avatar haifengl commented on April 28, 2024

Try FLD in SMILE for your case. I don't remember if I implemented this algorithm there. Thanks!

from smile.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.