Giter Site home page Giter Site logo

efficient-decision-tree-notes's Introduction

efficient-decision-tree-notes

这个笔记记录高效的决策树系列算法,主要阅读论文,结合一些开源框架,希望在弄清算法的基础上,尝试着改进算法,尝试着工程实现.

我们知道,目前比较流行的两个GBDT开源框架是XGBoost和LightGBM,无论内存占用还是计算速度,它们都做到了淋漓尽致.在LightGBM的Feature上提到了,XGBoost的decision tree用的是pre-sorted based的算法,也就是在tree building之前对各维特征先排序,代表性的算法是SLIQ[1]和SPRINT[2].而LightGBM的decision tree是histogram based的算法,也就是先将特征离散化,代表性的算法是CLOUDS[3],Mcrank[4]和Machado[5].

SLIQ和SPRINT算法的特点决定了树生长的方式是level-wise(breadth-first)的,与之对应的是leaf-wise(depth-wise,best-wise[6])的方式,LightGBM正是采用leaf-wise的方式.

内容大致按以下几部分展开:

参考文献

[1] Mehta, Manish, Rakesh Agrawal, and Jorma Rissanen. “SLIQ: A fast scalable classifier for data mining.” International Conference on Extending Database Technology. Springer Berlin Heidelberg, 1996.

[2] Shafer, John, Rakesh Agrawal, and Manish Mehta. “SPRINT: A scalable parallel classifier for data mining.” Proc. 1996 Int. Conf. Very Large Data Bases. 1996.

[3] Ranka, Sanjay, and V. Singh. “CLOUDS: A decision tree classifier for large datasets.” Proceedings of the 4th Knowledge Discovery and Data Mining Conference. 1998.

[4] Machado, F. P. “Communication and memory efficient parallel decision tree construction.” (2003).

[5] Li, Ping, Qiang Wu, and Christopher J. Burges. “Mcrank: Learning to rank using multiple classification and gradient boosting.” Advances in neural information processing systems. 2007.

[6] Shi, Haijian. “Best-first decision tree learning.” Diss. The University of Waikato, 2007.

efficient-decision-tree-notes's People

Contributors

wepe avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.