Giter Site home page Giter Site logo

4paradigm / autox Goto Github PK

View Code? Open in Web Editor NEW
500.0 13.0 132.0 73.73 MB

AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.

Home Page: https://autox.readthedocs.io

License: Apache License 2.0

Python 10.31% Jupyter Notebook 89.51% Shell 0.17% Dockerfile 0.01% Makefile 0.01% Batchfile 0.01%
kaggle machine-learning python

autox's Introduction

English | 简体中文 logo

AutoX是什么?

AutoX一个高效的自动化机器学习工具。 它的特点包括:

  • 效果出色: AutoX在多个kaggle数据集上,效果显著优于其他解决方案(见效果对比)。
  • 简单易用: AutoX的接口和sklearn类似,方便上手使用。
  • 通用: 适用于分类和回归问题。
  • 自动化: 无需人工干预,全自动的数据清洗、特征工程、模型调参等步骤。
  • 灵活性: 各组件解耦合,能单独使用,对于自动机器学习效果不满意的地方,可以结合专家知识,AutoX提供灵活的接口。
  • 比赛上分点总结:整理并公开历史比赛的上分点。

AutoX包含什么内容

加入社区

AutoX社区

框架

autox_competition

autox_competition framework

autox_recommend

autox_recommend framework

autox_video

autox_video framework

如何为AutoX贡献

如何为AutoX贡献

目录

安装

github仓库安装

git clone https://github.com/4paradigm/autox.git
pip install ./autox

pip安装

## pip安装包可能更新不及时,建议用github安装方式安装最新版本
!pip install automl-x -i https://www.pypi.org/simple/

快速上手

社区案例

汽车销量预测

比赛案例

见demo文件夹

数据集下载链接:https://pan.baidu.com/s/1p38OuP8_FJp2P_wJwhdFiw?pwd=8mxf

效果对比

不同任务下的效果提升百分比

data_type 对比AutoGluon 对比H2o
binary classification 20.44% 2.98%
regression 37.54% 39.66%
time-series 28.40% 32.46%

详细数据集对比

data_type single-or-multi data_name metric AutoX AutoGluon H2o
binary classification single-table Springleaf auc 0.78865 0.61141 0.78186
binary classification-nlp single-table stumbleupon auc 0.87177 0.81025 0.79039
binary classification single-table santander auc 0.89196 0.64643 0.88775
binary classification multi-table IEEE accuracy 0.920809 0.724925 0.907818
regression single-table ventilator mae 0.755 8.434 4.221
regression single-table Allstate Claims Severity mae 1137.07885 1173.35917 1163.12014
regression single-table zhidemai mse 1.0034 1.9466 1.1927
regression single-table Tabular Playground Series - Aug 2021 rmse 7.87731 10.3944 7.8895
regression single-table House Prices rmse 0.13043 0.13104 0.13161
regression single-table Restaurant Revenue rmse 2133204.32146 31913829.59876 28958013.69639
regression multi-table Elo Merchant Category Recommendation rmse 3.72228 3.80801 22.88899
regression-ts single-table Demand Forecasting smape 13.79241 25.39182 18.89678
regression-ts multi-table Walmart Recruiting wmae 4660.99174 5024.16179 5128.31622
regression-ts multi-table Rossmann Store Sales RMSPE 0.13850 0.20453 0.35757
regression-cv single-table PetFinder rmse 20.1327 23.1732 21.0586

AutoX成就

企业支持

比赛获奖

TODO

功能开发完成后,发布相应的使用demo

  • 多分类任务

若有其他希望AutoX支持的功能,欢迎提issue! 欢迎填写用户调研问卷,让AutoX变得更好!

错误排查

错误信息 解决办法

autox's People

Contributors

artificialzeng avatar caixc97 avatar dhengw avatar enjoysport2022 avatar fxzero avatar kelelexu avatar kian98 avatar liyaooi avatar mingyang1996 avatar peiqialan avatar utopianet avatar yang-charles avatar yqkenanwang avatar yulv-git avatar zhhwss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autox's Issues

内存优化问题

在kaggle环境中运行《值得买》数据集,发现16G内存会爆掉。初步分析是因为特征工程中暴力循环生成了出了大量衍生特征,可以考虑借鉴kaggle上的 memory reduce 代码思路进行内存优化

Sample selection

I would like to ask if AutoX has any plans for sample selection?

Now many data sets are so large that the computing power of individuals and small companies cannot afford.

Can a part of the data be selected for training to approximate the effect of full data training?

安装/installation

autox安装的时候是要提前安装深度学习框架keras嘛?是否支持pytorch或其他框架?

lightgbm.train bug(lightgbm==3.3.2.99)

Mac中 lightgbm==3.3.2.99, lightgbm.train不再包含verbose_eval和early_stopping_rounds接口,改用callbacks接口,调用lgb模型时会报错

File ~/miniforge3/envs/lx/lib/python3.9/site-packages/autox/autox_competition/models/regressor_ts.py:231, in LgbRegressionTs.fit(self, train, test, used_features, target, time_col, ts_unit, Early_Stopping_Rounds, N_round, Verbose, log1p, custom_metric, weight_for_mae)
    226     model = lgb.train(self.params_, trn_data, num_boost_round=self.N_round, valid_sets=[trn_data, val_data],
    227                       verbose_eval=self.Verbose,
    228                       early_stopping_rounds=self.Early_Stopping_Rounds,
    229                       feval=weighted_mae_lgb(weight=weight_for_mae))
    230 else:
--> 231     model = lgb.train(self.params_, trn_data, num_boost_round=self.N_round, valid_sets=[trn_data, val_data],
...
    233                     early_stopping_rounds=self.Early_Stopping_Rounds)
    234 val = model.predict(train.iloc[valid_idx][used_features])
    235 if log1p:

TypeError: train() got an unexpected keyword argument 'verbose_eval'

task1_baseline.ipynb

您是把一条数据中的实体拆分了吗?
一条数据对应一个实体?对应一个情感?

Welcome update to OpenMMLab 2.0

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

OpenMMLab 1.0 branch OpenMMLab 2.0 branch
MMEngine 0.x
MMCV 1.x 2.x
MMDetection 0.x 、1.x、2.x 3.x
MMAction2 0.x 1.x
MMClassification 0.x 1.x
MMSegmentation 0.x 1.x
MMDetection3D 0.x 1.x
MMEditing 0.x 1.x
MMPose 0.x 1.x
MMDeploy 0.x 1.x
MMTracking 0.x 1.x
MMOCR 0.x 1.x
MMRazor 0.x 1.x
MMSelfSup 0.x 1.x
MMRotate 1.x 1.x
MMYOLO 0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.