Giter Site home page Giter Site logo

autonlab / weasel Goto Github PK

View Code? Open in Web Editor NEW
153.0 4.0 11.0 27.17 MB

Weakly Supervised End-to-End Learning (NeurIPS 2021)

License: Apache License 2.0

Python 100.00%
machine-learning deep-learning weak-supervision end-to-end-learning pytorch-lightning multi-source-weak-supervision weakly-supervised-learning

weasel's Introduction

WeaSEL: Weakly Supervised End-to-end Learning

Python PyTorch Lightning Config: hydra license

This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 2021), that allows you to train your favorite neural network for weakly-supervised classification1

  • only with multiple labeling functions (LFs)2, i.e. without any labeled training data!
  • in an end-to-end manner, i.e. directly train and evaluate your neural net (end-model from here on), there's no need to train a separate label model any more as in Snorkel & co,
  • with better test set performance and enhanced robustness against correlated or inaccurate LFs than prior methods like Snorkel

1 This includes learning from crowdsourced labels or annotations!
2 LFs are labeling heuristics, that output noisy labels for (subsets of) the training data (e.g. crowdworkers or keyword detectors).

If you use this code, please consider citing our work

End-to-End Weak Supervision
Salva Rühling Cachay, Benedikt Boecking, and Artur Dubrawski
Advances in Neural Information Processing Systems (NeurIPS), 2021
arXiv:2107.02233v3

Credits

Getting Started

This library assumes familiarity with (multi-source) weak supervision, if that's not the case you may want to first learn its basics in e.g. this overview slides from Stanford or this Snorkel tutorial.

That being said, have a look at our examples and the notebooks therein showing you how to use Weasel for your own dataset, LF set, or end-model. E.g.:

Reproducibility

Please have a look at the research code branch, which operates on pure PyTorch.

Installation

1. New environment (recommended, but optional)
conda create --name weasel python=3.9
conda activate weasel  
2a: From source
python -m pip install git+https://github.com/autonlab/weasel#egg=weasel[all]
2b: From source, editable install
git clone https://github.com/autonlab/weasel.git
cd weasel
pip install -e .[all]

Minimal dependencies

Minimal dependencies, in particular not using Hydra, can be installed with

python -m pip install git+https://github.com/autonlab/weasel

The needed environment corresponds to conda env create -f env_gpu_minimal.yml.

If you choose to use this variant, you won't be able to run some of the examples: You may want to have a look at this notebook that walks you through how to use Weasel without Hydra as the config manager.

Note: Weasel is under active development, some uncovered edge cases might exist, and any feedback is very welcomed!

Apply WeaSEL to your own problem

Configuration with Hydra

Optional: This template config will help you get started with your own application, an analogous config is used in this tutorial script that you may want to check out.

Pre-defined or custom downstream models & Baselines

Please have a look at the detailed instructions in this Readme.

Using your own dataset and/or labeling heuristics

Please have a look at the detailed instructions in this Readme.

Citation

@article{cachay2021endtoend,
  author={R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur},
  journal={Advances in Neural Information Processing Systems}, 
  title={End-to-End Weak Supervision},
  year={2021}
}

weasel's People

Contributors

benbo avatar salvarc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

weasel's Issues

About the "strange error" in tutorial notebook

Thanks for the excellent work and code! When I try to run the tutorial 1_bias_bios, I face the same error mentioned in the notebook (ValueError: dictionary update sequence element #0 has length 1; 2 is required). Although rerunning the cell solves the problem, such a solution is not available when running the python code directly. Do you have any hints on what might cause the problem, and what can be done to fix it? It seems to me that the error might be related to hydra, since the corresponding part 1_bias_bios_no_hydra works fine for me.

Broken links to examples files

Thanks for the work! I would like to point out that the links to the tutorials in this page are broken, location of files is not correct, just in case you are not aware of it.

Multi-Label

Is there anything to consider for multi-labelling problems?

Errors in example scripts and notebook

Hello, thank you for sharing this work! I've been trying to run the example scripts and notebooks before running weasel on a different dataset. I keep on coming across the following exceptions when trying to fit weasel and the end-model:

  1. TypeError in 0_full_pipeline.ipynb

TypeError: vars() argument must have dict attribute

  1. MisconfigurationException in all other files in the examples folder

lightning_fabric.utilities.exceptions.MisconfigurationException: You are trying to self.log() but it is not managed by the Trainer control flow

Do you know what might be the problem?

raw data?

Hi,

Great work! Just wondering if the raw data, eg, raw text for IMDb and Amazon could be also released?

Request for Labeling Functions

Thanks for open-sourcing the code! Quick question: where can I find the additional labeling functions for Amazon, IMDB (136 LFs), and BiasBios (99 LFs)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.