Comments (2)
Unfortunately no. I will add some details below for future reference.
from glum.
There exists a vowpal-wabbit (VW) python api but I think it is still mainly used as a command line tool.
Features:
- Loss functions: squared, Poisson, logistic, quantile, hinge (no gamma!)
- Regualrization: L1, L2, or both
- Sample weights are supported
- Online algorithm -> Can digest "infinite" data. Can set fixed memory bounds in advance.
- Fast
- Offsets are supported
- Categoricals need not be dummy encoded, it's fine to include the category value as a feature value if it doesn't clash with any other value from other categorical variables.
- Can specify (possibly higher-order) interactions between groups of features without actually creating them. VW will create them per observation on the fly.
Data format:
Very similar to libsvm format.
In a nutshell, every observation is a line in a (possibly gzipped) text file with the following format:
label weight tag | feature_group_name feature_name:feature_value ... | feature_group_name2 feature_name:feature_value ...
If a feature_name is not present in a row vw will use 0 as a default value.
This makes it possible to represent dummy features in a sparse format.
The bad:
- The online algorithm is very sensitive to learning parameters. (and there are at least 3 such params)
- Requires a lot of tuning -> Goodbye, speed advantage.
- Dumping large data sets to disk in VW format might take several minutes.
Bottom line:
I think this is an excellent tool for plain lasso-based variable selection on large data sets with many regressors and especially useful for detection of important interactions.
from glum.
Related Issues (20)
- Daily run failure: Unit tests HOT 1
- GeneralizedLinearRegressorCV - Is it possible to use custom defined folds? HOT 1
- Add @stanmart as a maintainer HOT 1
- glum LASSO fit not finishing when same problem fits within 1 second in glmnet HOT 4
- Daily run failure: Unit tests
- Daily run failure: Unit tests HOT 2
- Tweedie distribution doesnt fit HOT 3
- Performance of Tweedie log-likelihood HOT 6
- Elastic net parameter selection question HOT 13
- GLM tests of scikit-learn HOT 3
- Daily run failure: Unit tests HOT 2
- Several warnings and erros in coef_table HOT 5
- BUG poisson GLM, offset and weighted versions, give wrong result HOT 4
- User facing API for specifying linear models terms HOT 2
- Trouble with importlib library when installing using pip3 HOT 2
- ElasticNet in Glum 2.6.0 is twice as slow as Scikit-learn 1.3.2 HOT 5
- Way to generate confidence interval for predictions? HOT 6
- Get p-value of a categorical model field? HOT 1
- Cross Validation for P2 HOT 3
- Clarify effect of `scale_predictors` on penalization in docstring HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glum.