jvmncs / default-risk Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 8.0 49 KB

Python 58.56% Jupyter Notebook 41.44%

default-risk's People

Contributors

Stargazers

Watchers

Forkers

ryanbales tommybonobo miwojc ttroxell micah1 kaggle-cdron notarya cazhu

default-risk's Issues

Module masking for missing credit histories

Some applicants likely won't have balances for previous credit applications within home credit or in the credit bureau, and some may not even have previous credit applications. To handle this, our model will need to include special masking inputs paired up with the input to each module (e.g. x_bureau, x_burbal, etc. in the current code). Specifically, this will be a single Boolean value determining whether or not to mask the entire module output. For example, in the case of a previous application without credit card balances, given the boolean feature has_ccbal, we'd want to do something like:

# how it is now:
ccbal = self.ccbal(x_ccbal)
# how it should be:
ccbal = self.ccbal(x_ccbal) * has_ccbal

Data exploration: installments_payments.csv

Similar to #4, see that issue for details. The same information is needed here, although there are fewer kernels available that have done EDA on this table.

Data exploration: bureau_balance.csv

Similar to #4, see that issue for details. The same information is needed here, although there are fewer kernels available that have done EDA on this table.

Data exploration: previous_application.csv

Similar to #4, see that issue for details. The same information is needed here.

Data exploration: POS_cash_balance.csv

Similar to #4, see that issue for details. The same information is needed here, although there are fewer kernels available that have done EDA on this table.

Complete Risk model

Initial stab at the model in core/model.py using the PooledLSTM module in core/layers.py, useful for developing the rest of the codebase. This issue should include all debugging, as well as unit tests.

Required by #17.

Data exploration: credit_card_balance.csv

Similar to #4, see that issue for details. The same information is needed here, although there are fewer kernels available that have done EDA on this table.

Fix "Trial finished without reporting result!" error with Ray

I think I screw up somewhere that the script doesn't understand where my directory is

Set up

git checkout ray
pip install -r requirements.txt
python dev/mnist-ray.py --epochs=1

It will start training and report

rocess STDOUT and STDERR is being redirected to /tmp/raylogs/.
Waiting for redis server at 127.0.0.1:62651 to respond...
Waiting for redis server at 127.0.0.1:22361 to respond...
Starting local scheduler with the following resources: {'CPU': 8, 'GPU': 0}.

======================================================================
View the web UI at http://localhost:8894/notebooks/ray_ui34599.ipynb?token=298685d42e77e7e460e34c71da0e3d27257a1ad1a42a1c5a
======================================================================

== Status ==
Using FIFO scheduling algorithm.
Result logdir: /home/yxu/ray_results/awesome
PENDING trials:
 - train_0_lr=0.55999,momentum=0.7021:  PENDING
 - train_1_lr=0.015444,momentum=0.7021: PENDING
 - train_2_lr=0.55999,momentum=0.89643: PENDING
 - train_3_lr=0.015444,momentum=0.89643:        PENDING

.....


Final model stored at "/home/yxu/Documents/default-risk/checkpoint/net2018-08-04 13:37-best.pth.tar".
Test set: Average loss: 0.1230, Accuracy: 9613/10000 (96%)

Final model stored at "/home/yxu/Documents/default-risk/checkpoint/net2018-08-04 13:37-best.pth.tar".

Then error message below:

================== TESTING ==================
Test set: Average loss: 2.3449, Accuracy: 958/10000 (10%)

Final model stored at "/home/yxu/Documents/default-risk/checkpoint/net2018-08-04 13:37-best.pth.tar".
Test set: Average loss: 0.1230, Accuracy: 9613/10000 (96%)

Final model stored at "/home/yxu/Documents/default-risk/checkpoint/net2018-08-04 13:37-best.pth.tar".
Remote function train failed with:

Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 891, in _process_task
    *arguments)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/actor.py", line 261, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trainable.py", line 117, in train
    result = self._train()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 114, in _train
    result = self._status_reporter._get_and_clear_status()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 42, in _get_and_clear_status
    raise TuneError("Trial finished without reporting result!")
ray.tune.error.TuneError: Trial finished without reporting result!

Error processing event: Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 255, in _process_events
    result = ray.get(result_id)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 2776, in get
    raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(888e77e8b61177963bd332b03dec3ac3d6aa12f9). It was created by remote function train which failed with:

Remote function train failed with:

Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 891, in _process_task
    *arguments)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/actor.py", line 261, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trainable.py", line 117, in train
    result = self._train()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 114, in _train
    result = self._status_reporter._get_and_clear_status()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 42, in _get_and_clear_status
    raise TuneError("Trial finished without reporting result!")
ray.tune.error.TuneError: Trial finished without reporting result!


Suppressing duplicate error message.
Worker ip unknown, skipping log sync for /home/yxu/ray_results/awesome/train_2_lr=0.55999,momentum=0.89643_2018-08-04_13-37-27t_x_vep8
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 3/8 CPUs, 0/0 GPUs
Result logdir: /home/yxu/ray_results/awesome
ERROR trials:
 - train_2_lr=0.55999,momentum=0.89643: ERROR, 1 failures: /home/yxu/ray_results/awesome/train_2_lr=0.55999,momentum=0.89643_2018-08-04_13-37-27t_x_vep8/error_2018-08-04_13-39-21.txt
RUNNING trials:
 - train_0_lr=0.55999,momentum=0.7021:  RUNNING
 - train_1_lr=0.015444,momentum=0.7021: RUNNING
 - train_3_lr=0.015444,momentum=0.89643:        RUNNING

Error processing event: Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 255, in _process_events
    result = ray.get(result_id)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 2776, in get
    raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(1af518971277c3ede6df2b728c40c5195a99e2b6). It was created by remote function train which failed with:

Remote function train failed with:

Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 891, in _process_task
    *arguments)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/actor.py", line 261, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trainable.py", line 117, in train
    result = self._train()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 114, in _train
    result = self._status_reporter._get_and_clear_status()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 42, in _get_and_clear_status
    raise TuneError("Trial finished without reporting result!")
ray.tune.error.TuneError: Trial finished without reporting result!


/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96,got 88
  return f(*args, **kwds)
/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96,got 88
  return f(*args, **kwds)
Worker ip unknown, skipping log sync for /home/yxu/ray_results/awesome/train_3_lr=0.015444,momentum=0.89643_2018-08-04_13-37-27ynfch_di

Validation set: Average loss: 0.1793, Accuracy: 11327/11968 (95%)


================== TESTING ==================
Test set: Average loss: 2.3101, Accuracy: 958/10000 (10%)

Final model stored at "/home/yxu/Documents/default-risk/checkpoint/net2018-08-04 13:37-best.pth.tar".
Error processing event: Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 255, in _process_events
    result = ray.get(result_id)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 2776, in get
    raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(f17f0f114cee8c6d34a8a8a55feaabafee1496c1). It was created by remote function train which failed with:

Remote function train failed with:

Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 891, in _process_task
    *arguments)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/actor.py", line 261, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trainable.py", line 117, in train
    result = self._train()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 114, in _train
    result = self._status_reporter._get_and_clear_status()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 42, in _get_and_clear_status
    raise TuneError("Trial finished without reporting result!")
ray.tune.error.TuneError: Trial finished without reporting result!

Suppressing duplicate error message.

/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96,got 88
  return f(*args, **kwds)
Worker ip unknown, skipping log sync for /home/yxu/ray_results/awesome/train_0_lr=0.55999,momentum=0.7021_2018-08-04_13-37-263b0iemnu
Test set: Average loss: 0.1684, Accuracy: 9464/10000 (95%)

Final model stored at "/home/yxu/Documents/default-risk/checkpoint/net2018-08-04 13:37-best.pth.tar".
Error processing event: Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 255, in _process_events
    result = ray.get(result_id)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 2776, in get
    raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(6faf591f18be2eaed89f51cfe0f21c68f9075879). It was created by remote function train which failed with:

Remote function train failed with:

Traceback (most recent call last):
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/worker.py", line 891, in _process_task
    *arguments)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/actor.py", line 261, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/trainable.py", line 117, in train
    result = self._train()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 114, in _train
    result = self._status_reporter._get_and_clear_status()
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/function_runner.py", line 42, in _get_and_clear_status
    raise TuneError("Trial finished without reporting result!")
ray.tune.error.TuneError: Trial finished without reporting result!


Suppressing duplicate error message.
/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96,got 88
  return f(*args, **kwds)
Worker ip unknown, skipping log sync for /home/yxu/ray_results/awesome/train_1_lr=0.015444,momentum=0.7021_2018-08-04_13-37-271fw4grhb
== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs
Result logdir: /home/yxu/ray_results/awesome
ERROR trials:
 - train_0_lr=0.55999,momentum=0.7021:  ERROR, 1 failures: /home/yxu/ray_results/awesome/train_0_lr=0.55999,momentum=0.7021_2018-08-04_13-37-263b0iemnu/error_2018-08-04_13-39-22.txt
 - train_1_lr=0.015444,momentum=0.7021: ERROR, 1 failures: /home/yxu/ray_results/awesome/train_1_lr=0.015444,momentum=0.7021_2018-08-04_13-37-271fw4grhb/error_2018-08-04_13-39-23.txt
 - train_2_lr=0.55999,momentum=0.89643: ERROR, 1 failures: /home/yxu/ray_results/awesome/train_2_lr=0.55999,momentum=0.89643_2018-08-04_13-37-27t_x_vep8/error_2018-08-04_13-39-21.txt
 - train_3_lr=0.015444,momentum=0.89643:        ERROR, 1 failures: /home/yxu/ray_results/awesome/train_3_lr=0.015444,momentum=0.89643_2018-08-04_13-37-27ynfch_di/error_2018-08-04_13-39-21.txt

Traceback (most recent call last):
  File "dev/mnist-ray.py", line 291, in <module>
    }
  File "/home/yxu/.local/share/virtualenvs/default-risk-mgdyo4BW/lib/python3.6/site-packages/ray/tune/tune.py", line 104, in run_experiments
    raise TuneError("Trials did not complete", errored_trials)
ray.tune.error.TuneError: ('Trials did not complete', [train_0_lr=0.55999,momentum=0.7021, train_1_lr=0.015444,momentum=0.7021, train_2_lr=0.55999,momentum=0.89643, train_3_lr=0.015444,momentum=0.89643])
/ray/src/local_scheduler/local_scheduler.cc:177: Killed worker pid 13852 which hadn't started yet.

Risk model specific training/inference

Depends on #14.

Convert the training and inference loops from cle-mnist to work with our Risk model.

Data exploration: bureau.csv

This issue is for exploring the bureau table. It's sequential data, so most of the exploration will be related to analysing a time series of points related to each applicant,

Minimally, we'll want to know here are summary statistic about the time series nature of this table. In particular, two statistics come to mind: (1) number/percentage of applicants with previous credits in bureau.csv, and (2) average number of credits in the table per applicant with at least one credit. In particular, (2) will inform what kind of module we use to model the table (it's currently an LSTM, but that could change depending on these results).

The latter can be accomplished with a few simple pandas functions, e.g. something roughly similar to

total_applicants = ... # get this number from the application_train.csv table
print(len(bureau.loc[:, 'applicant_id'].unique())/total_applicants) # gives (1) above
counts = bureau.group_by(applicant_id).count()
print(counts.iloc[:, -1].mean()) # gives (2) above

except with proper column names and pandas syntax 🙂

We'll also need a good understanding of each feature. In particular, any systemic missing-ness should be made clear by this task. Hopefully, we'll have an understanding of how we want to represent each feature in the time series by the end of it, so that we'll be able to process accordingly.

There are some kernels available exploring this table, although there will be fewer than are available for application_train.csv.

Training prerequisites epic

This epic contains all tasks related to preparing the initial model and the supporting code required to be able to train it. For the initial version, we'll be extending the training code from cle-mnist.

This will include the following tasks:
(1a) #15 Complete RiskDataset in core/dataset.py with tests/debugging
(1b) #14 Complete Risk model code in core/model.py with tests/debugging (begun in #2)
(2a) #16 Implement prepare_data function as in cle-mnist using the RiskDataset
(2b) #17 Convert cle-mnist training/inference loops to be compatible with Risk model
(3) #18 Audit of training code, model, dataset.

The tasks above are mostly blocking for all downstream tasks. The tasks below are not. They are optional, but will be extremely useful during training and tuning, and should be picked up whenever all higher-priority tasks are complete. Note I've split them out as two tasks, although they're strongly intermingled and could be completed by a single person or two people working in parallel with effective communication.

(4) #12 Modify logging from stdout to logging files in a directory.
(5) #12 Extend (4) to work with TensorboardX

Training prereqs audit

Audit of the critical path work from the training prereq #13.

Sanity check to make sure the main parts of the epic were well-merged, including a dry run of the training code with fake data.

Implement prepare_data with RiskDataset

Depends on #15.

Reimplement the prepare_data function from cle-mnist to use the RiskDataset.

Data exploration: application_{train/test}.csv

For this issue, the motivation is to gain an understanding of the application-level data. This is ultimately what we're classifying default risk from, so the features we choose here will be extremely important. I suggest we leverage the kernels on Kaggle. There are a ton to choose from, and all of them perform EDA on this table. There are similarly quite a few notebooks performing feature selection engineering on this table, and we should try to leverage those as well.

One challenge that will hopefully be answered here regards which categorical features we select for use in the model, and how to represent them. There are far too many to represent their combinations with a one hot encoding, and also too many combinations to train an embedding. Individual embeddings on each categorical are also out of the question, as most of the categoricals don't have enough levels for an embedding to work well. My (naïve) suggestion would be to whittle down the number of discrete variables to a point where we can train an embedding over their combinations, but hopefully the kernels will give guidance here.

The goal of this will be to select features to use as input to the Application module.

Complete RiskDataset

Complete RiskDataset in core/dataset.py. This issue should include all debugging, as well as unit tests.

Required by #16.

Data exploration epic

The goal here is to gather the insights we need to make informed choices about data processing and downstream modeling tasks. Although it's not super exciting work, everything else depends on this being completed. It's also highly parallelizable, which means that we should be able to get it done fairly quickly if we have enough people volunteer.

There's an issue for performing EDA on each table. All of the issues below except for #3 follow the same basic workflow, while #3 consists of gathering research into existing kernels on Kaggle. If you're going to pickup an issue, please assign it to yourself so we don't end up repeating work!

#3 application_{train/test}.csv
#4 bureau.csv
#5 bureau_balance.csv
#6 previous_application.csv
#7 POS_cash_balance.csv
#8 credit_card_balance.csv
#9 installments_payments.csv

Training code: logging / monitoring for training

Add facilities to log and monitor the training of the model to the training code.

Bonus points: integrate https://github.com/lanpa/tensorboardX into the model

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.