Comments (9)
Sure, you can remove them. At least the first one. But I am little worried about this case. This function ( iterative biconjugate gradient method) usually should converge pretty fast and thus not much print out. Please check if your LASSO model works well for test data. Thanks!
from smile.
The Lasso model is not working well for the test data, but that is the actual goal this time. I'm writing an example of how it can go wrong, thank you for the heads up though! 👍
from smile.
Interesting. It doesn't work well because LASSO is not a good fit for the problem? Or the parameter settings (e.g. regularization factor)?
from smile.
It's because there is too few datapoints and no actual statistical relation within the data.
As John Tukey once said:
The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
This is the case in my example, to make people aware that Machine Learning cannot perform miracles on every single dataset.
from smile.
Is your data size less than the dimensionality?
from smile.
yes, its 100 datapoints (with rank 1 to 100) with 27000 + features, this can never go well :) But since it's an example for my blog rather than an actual dataset that I want to use for predictions, I still worked it out to show what goes wrong when you do these kind of things.
In the end result the trained LASSO model should predict a rank value for a new datapoint, but it predicts worse than just predicting the average (50). This makes it a perfect example for what can go wrong if you have no clue what you are doing 👍
from smile.
That is the true reason. It is known as small sample size problem. I have a paper (http://lectures.molgen.mpg.de/networkanalysis13/LDA_cancer_classif.pdf) to deal with it.
from smile.
Cool tnx! I'll read that and see if I can incorporate it if that's ok with you?
from smile.
Try FLD in SMILE for your case. I don't remember if I implemented this algorithm there. Thanks!
from smile.
Related Issues (20)
- Jitpack builds are failing since 3.x HOT 2
- FR: Compact "how to load dirty data" example HOT 1
- Arff.java writeField can fail when type isn't in the list of handled types HOT 1
- BarPlot.getUpperBound() computes wrong bound. HOT 1
- FR: Warn before trying to train where the label column has any nulls HOT 1
- Dot product Question HOT 2
- stringVector(0) error HOT 1
- Suggest changing license to Apache 2.0 license or MIT
- Non-monotonic cluster tree -- the linkage is probably not appropriate! HOT 1
- HiddenLayerBuilder does not add dropout to HiddenLayer HOT 4
- Method in interface BaseArray can never return an int[] HOT 2
- Making the plot module available in Java API HOT 4
- InnerProduct of vectors created with cas.Vars not being simplified HOT 6
- Support header attribute on facet / row / column encoding channels HOT 2
- Incorrect spec generated for encoding channel sort HOT 4
- How can I set up in Intellij or other IDE to compile and read code? HOT 3
- What is the efficient way to fill null values in a column with an arbitrary string in a Dataframe? HOT 3
- ClassCastException when calling DataFrame.omitNullRows() HOT 1
- smile.plot.swing.BarPlot works with smile-plot 3.0.2 but not with 3.1.0 HOT 1
- IllegalArgumentException when suing SimpleImputer for data sourced from json file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from smile.