Giter Site home page Giter Site logo

customs-fraud-detection's People

Contributors

chaeyoonjeong avatar jaechan-so avatar john-mai-2605 avatar roytsai27 avatar seondong avatar sungwon-han avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

customs-fraud-detection's Issues

Bug report

Some preprocessed data contains an infinite value, and it causes errors during the training.
Please add error handling code in dataset.py, firstCheck() in Import_declarations class.

deepSAD & multideepSAD module error

When I select a GPU and run the deepSAD.py, I face a module error in the training phase.

Code: `export CUDA_VISIBLE_DEVICES=3 && python main.py --data real-n --batch_size 128 --sampling deepSAD --mode scratch --train_from 20130101 --test_from 20130701 --test_length 15 --valid_length 15 --initial_inspection_rate 30 --final_inspection_rate 5 --epoch 5 --closs bce --rloss full --save 0 --numweeks 100 --inspection_plan direct_decay

Line that faces Error message:
self.model.module.get_average_hidden_vec(train_loader)

Error message:

Training deepSAD model ...
Traceback (most recent call last):
  File "main.py", line 341, in <module>
    chosen = sampler.query(num_samples)
  File "/home/sundong/WCO/sundong/DATE-active/WCO-project/query_strategies/deepSAD.py", line 198, in query
    self.train_deepSAD_model()
  File "/home/sundong/WCO/sundong/DATE-active/WCO-project/query_strategies/deepSAD.py", line 183, in train_deepSAD_model
    self.date_model.train(self.args)
  File "/home/sundong/WCO/sundong/DATE-active/WCO-project/query_strategies/deepSAD.py", line 478, in train
    self.model.module.get_average_hidden_vec(train_loader)
  File "/home/sundong/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 576, in __getattr__
    type(self).__name__, name))
AttributeError: 'AnomalyDATEModel' object has no attribute 'module'
Traceback (most recent call last):
  File "main.py", line 348, in <module>
    logger.info("%s, %s, %s", len(set(chosen)), len(chosen), num_samples)
NameError: name 'chosen' is not defined

Also, 'temp' directory is not initialized, save intermediary file in /intermediary/ directory.

torch.save(normality_scores, "intermediary/multideepSAD_models/normality_scores_valid{}.ckpt".format(epoch))
torch.save(self.data.valid_cls_label, "intermediary/multideepSAD_models/xgb_validy{}.ckpt".format(epoch))

Add few lines in main.py Line 147
pathlib.Path('./intermediary/deepSAD_models').mkdir(parents=True, exist_ok=True)
pathlib.Path('./intermediary/multideepSAD_models').mkdir(parents=True, exist_ok=True)

Uncertainty evaluation

Make self-supervsied algorithm for uncertainty evaluation
Input: A tuple data, destructable into (train_loader, valid_loader, test_loader, leaf_num, importer_size, item_size, xgb_validy, xgb_testy, revenue_valid, revenue_test)

Output: A list of score with the same length as test_loader.datasets.tensors[-1].shape[0] (test set size)

Fix evaluation metric

For now, the model evaluation still uses the result from DATE.
Need to change to active_DATE (upDATE)

Splitter value should be updated every epoch

Hi, if we want to use the most recent data as a validation set, I think the values in splitter be changed when new weekly data is added to the training set? (active_DATE.py, line 170)

Problems in Adaptive weight (adahybrid)

I'm not sure the module 'adahybrid' is working as you intended.

Some errors:

  • Need to handle edge cases: When the arm value with 0 or 1 is selected, the hybrid module does not work because num_sample = 0 for one subsampler.
  • weight_sampler.update_dists(1-norm_precision) does not provide any meaningful update. After update_dists, weight_sampler.l is still
    {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0.0, 10: 0}.

Use debugger (pdb.set_trace()) to check if the code works as desired, and then commit again.

Code testing

@nguyenthi47

  • Make an overleaf (or GG doc) to dump result graph on, share a link to your overleaf (editable link preferred, so @Jaechan-So and @john-mai-2605 can add if we have any result).
  • If you want to have a faster code, please wait after I add fine-tune mode (expected tonight).
  • With DATE_badge, please try different function of revenue (x, ln(1+x), ...) and see what is the best
    Don't forget to save the indices as @Sungwon-Han said

Hybrid - edge case: when two subsets have intersection

This is a minor issue, but it should be considered later

line 238, active_DATE.py
What if there is an intersection between two strategies?
assert len(chosen) == num_samples
If the assertion error occurs, the model should select additional samples for fair comparison. (amount: num_samples-len(chosen))

Integrate uncertainty to batch

Tasks:

  • Make a new strategy file (let's call it badge_DATE.py) - John will do this (Done)
  • Normalize badge in badge_DATE.py - John will do this (Done)
  • Integrate uncertainty into badge_DATE.py - @Jaechan-So will do this

Important changes:
badge.py, badge_DATE.py and uncertainty.py is migrated to folder query_strategies
uncertainty.py is renamed as utils.py (query_strategies/utils.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.