Giter Site home page Giter Site logo

xgbfi's People

Contributors

far0n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xgbfi's Issues

support Lightgbm

As we know Lightgbm is now becoming more and more popular for its outstanding efficiency. Is it possible to add a new feature so that LightGBM's importance can be supported? That would be tons of help;

thanks a lot!

Feature's Own Interaction

Hi Mathias,

Not sure where to ask this question but unfortunately xgbfi doesn't have much documentation (yet).

So my understanding is that f10|f40 means there is notable interactions between feature 10 and feature 40, and I could potentially "help" the classifier by adding new features such as f10 - f40 or f10 * f40. But for a problem I am working on now, I see the top interactions are f10|f10 and f10|f10|f10 and f10|f10|f10|f10... what does that mean? Should I create a new feature called f10 * f10 or should I create an identical f10 as a new feature so that it could be split at more than 1 node?

Appreciate your clarification!

Li

Error in Ubuntu 15.10

`
Missing method .ctor in assembly /media/vladimir/1ab2d5e6-a134-47e7-ba27-b2d70ac5ffc5/workspace/xgbfi/bin/lib/EPPlus.dll, type System.IO.FileFormatException
ERROR: Could not load file or assembly 'WindowsBase, Version=3.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies.

`

`
uname --all

Linux t430 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
`

`
mono --version

Mono JIT compiler version 3.2.8 (Debian 3.2.8+dfsg-4ubuntu4)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
TLS: __thread
SIGSEGV: altstack
Notifications: epoll
Architecture: amd64
Disabled: none
Misc: softdebug
LLVM: supported, not enabled.
GC: sgen`

Query about the Feature names

Hello,
Thanks for sharing your work.
I was just exploring it and I got confused in one area.

My data has features named v1, v2, v3, ...
I build a feature map and the xgb.dump.

In the excel report, the interaction columns contains f2|f3, f10|f11, etc.
Do I need to manually map these names to the actual variables names.

Or am I doing something wrong.
Please comment.

How to calculate the interaction gain

I am trying to replicate the interaction gain by adding the gains from interaction individually but the results didn't match. I am wondering whether it is a pure summation or other information is used. There is no example for the interaction gain calculation.

Thanks,

Meaning Expected Gain

Hi,

I'm thinking of using this package to get some more insights into the behaviour of my xgboost models, as the extra features look very interesting, and looking at interactions is something I have wanted to do for a while, so thanks for this!

But I'm struggling to find information on how some of the extra features are defined. Most importantly for me, expected gain. The readme of this package says:

Total gain of each feature or feature interaction weighted by the probability to gather the gain

I scanned through your code a bit, and the gain for each node seems to be scaled by the path_proba of that node. Is path_proba the number of samples involved in a node divided by the total number of samples?

If so, I was wondering what the specific reasoning behind this is? As far as I understand, the original gain already scales with the number of samples involved in the node (albeit nonlinearly). Does Expected gain represent better what the gain per sample is than gain, or does it serve a different goal?

Thanks!

Understanding how to use these metrics

Hi,

I came across this tool while trying to find something that can return the gains on a boosted tree. I would like to understand how the other metrics can be useful for interpreting a XGBoost model. Can you kindly share any paper or reference material, that will help me understand these metrics?

Thanks and Best,
Param

Feature map(xgb.fmap) and XGBoost dump file(xgb.dump) upload request.

Hi Far0n,
I am currently using XGBoost to select feature interaction for my LR model, I find your work very interesting, however I have encountered a problem to get the XgbFeatureInteractions.exe runing with my xgb.dump file.
I am using scala language and XGBoost4j package running on Spark version 1.6, so I am not sure if the XGBoost dump file that my project created is having a different format from yours, so it would be really nice if you can upload the files xgb.fmap and xgb.dump that were mentioned in README.md in your project so that I can do a format check to see what wrong with my dump file.
Thank you.

Multiclass Classification

It would be really nice to get multiclass classification support.
I would be interested in the different feature importance for each class.

XGBoost creates a tree for each class then starting over for a repetition of the total sum of trees.

Example/description for how gain is calculated for interaction depth > 0

Thank you for your interesting application! I'd like to use your application in my research, but need to include a description for the journal of how the gain is calculated for the two way and greater interactions (Interaction Depth 1 and 2 sheets). Would it be possible for you to please add a quick example for features 1 and 2 to the image you already have on the read me page?

how to define 4-way interaction

Trying MaxInteractionDepth MaxDeepening and other combinations, just could not get four-way interaction. Any suggestion on this?

Documentation

Far0n,

Would you be able to write up a short little documentation on how to run this? I have downloaded Visual Studio Express, and loaded the project, but I am unfamiliar with what to do next. I tried clicking on the project and selecting "Start", some stuff runs but the output says it exited with a certain code and nothing else.

How do you tell the program where the dumpfile is? That might be my problem, but I couldnt figure it out.

Thanks

how to build project in VS?

Hi,Far0n:
Sorry for disturbing you.I am a newer to C#.When put this project in My VS2015,choose release and build.Some errors occur!

Error CS0246 The type or namespace name 'SplitValueHistogram' could not be found (are you missing a using directive or an assembly reference?) XgbFeatureInteractions
Severity Code Description Project File Line Suppression State
Error CS0246 The type or namespace name 'SplitValueHistogram' could not be found (are you missing a using directive or an assembly reference?) XgbFeatureInteractions

how to solve this problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.