Giter Site home page Giter Site logo

mkirchmeyer / adaptation-imputation Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 33.55 MB

Unsupervised domain adaptation with non-stochastic missing data

Home Page: https://rdcu.be/czoZ5

License: MIT License

Python 100.00%
domain-adaptation imputation digital-advertising missing-data

adaptation-imputation's Introduction

Unsupervised Domain Adaptation with non-stochastic missing data

This repository contains the code of the following paper "Unsupervised Domain Adaptation with non-stochastic missing data" accepted at ECML2021 journal track Data Mining and Knowledge Discovery.

Dependencies

In order to run, the code requires the following Python modules specified in requirements.txt

  • Numpy
  • Matplotlib
  • POT (Python Optimal Transport library)
  • PyTorch
  • sklearn

Quickstart

  • Install Miniconda then source ~/.bashrc
  • Create conda environment: conda create --name adaptation-imputation python=3.6 -y
  • Activate environment source activate adaptation-imputation
  • Install the requirements in this environment pip install -r requirements.txt.
  • Install the package pip install -e . at the root

Run an experiment

  • cd adaptation-imputation/experiments/launcher
  • python ../../orchestration/launcher.py --experiment "dann_mnist_usps" --gpu_id=1

The experiment argument is defined in adaptation-imputation/experiments/__init__.py. All hyperparameters are stored in a .py file.

Both DANN and DeepJDOT extensions to missing data are in this repository.

  • with no suffix, it will run the model with full data (for digits) else it will run the model on missing data
  • with "ignore" suffix, it will run the model ignoring the missing component
  • with "zeroimput" suffix, it will run the model with zero imputation for the missing component
  • with "imput" suffix, it will run the model with conditional generation of the missing component

gpu_id specifies which gpu machine to use. Jobs can be run on CPU but training time will be long.

Notes

Figures are saved in folder figures and logs in a seperate results folder created when the job is launched

Utils functions are saved in utils

Datasets

Digits dataset

Digits datasets will download as part of the training script. Code is taken from existing github repos and credit is given in the .py files.

Criteo dataset

Preprocessed data is available here. If necessary follow steps below to regenerate the data.

Regenerate Criteo data

  • Download Criteo Kaggle dataset from here into data folder

  • Run python data_preprocessing.py in data folder

  • Run following UNIX commands:

    • Define UNIX function for seeding:

    get_seeded_random() { seed="$1" openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null ; }

    • shuf -o total_source_shuffled.txt < total_source.txt --random-source=<(get_seeded_random 42)
    • sed -n -e '1,1183117p' total_source_shuffled.txt > total_source_data.txt
    • rm total_source_shuffled.txt total_source.txt
  • Training script is then ready to be run

Citation

@article{Kirchmeyer2021, author = {Kirchmeyer, Matthieu and Gallinari, Patrick and Rakotomamonjy, Alain and Mantrach, Amin}, doi = {10.1007/s10618-021-00775-3}, isbn = {1573-756X}, journal = {Data Mining and Knowledge Discovery}, title = {Unsupervised domain adaptation with non-stochastic missing data}, url = {https://doi.org/10.1007/s10618-021-00775-3 }, year = {2021}}

adaptation-imputation's People

Contributors

mkirchmeyer avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

djmartingale

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.