Giter Site home page Giter Site logo

stonybrooknlp / bionli Goto Github PK

View Code? Open in Web Editor NEW
1.0 10.0 0.0 467 KB

[EMNLP2022] BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples

Home Page: https://StonyBrookNLP.github.io/BioNLI

Python 52.35% Jupyter Notebook 47.65%
biomedical biomedical-text controllable-generation generation natural-language-inference neurological nli text-generation transformers

bionli's Introduction

What is BioNLI?

BioNLI is a biomedical NLI dataset using controllable text generation

This is the official page for the paper:

BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples
accepted at EMNLP2022 (Findings).

BioNLI is the first dataset in biomedical natural language inference. This dataset contains abstracts from biomedical literature and mechanistic premises generated with nine different strategies.

Example

In the following example we see an example of an entry in the BioNLI dataset. Some supporting text was removed to save space. The premise is a set of sentences talking about two biomedical entiteis. The consistent hypothesis is the original conclusion sentence from the abstract paper, the inconsistent hypothesis is the generated sentence with one of the different nine strategies.

Coming Soon

Dataset Statistics

There are two different versions of this dataset. One is the large distribution which contains all possible perturbations and the other is the balanced distirbution. They both share the same test set. For the full distribution, we generate as many perturbations as possible for dev and test set, but for training each instance is perturbed once.

Full Distribution:

Image of full stats

Balanced Distribution:

Image of balanced stats

Download the data

The dataset can be downloaded here:

The full set can be downloaded from here.

The balanced set can be downloaded from here.

To access the test set please contact me.

License

BioNLI is distributed under CC BY 4.0 License.

Liked us? Cite us!

Please use the following bibtex entry:

@inproceedings{bastan-etal-2022-bionli,
    title = "{B}io{NLI}: Generating a Biomedical {NLI} Dataset Using Lexico-semantic Constraints for Adversarial Examples",
    author = "Bastan, Mohaddeseh  and
      Surdeanu, Mihai  and
      Balasubramanian, Niranjan",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.374",
    pages = "5093--5104",
    
}

bionli's People

Contributors

mhdbst avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionli's Issues

Dataset Issue

The dataset links are invalid, always throwing 404 error. Kindly update the details with valid URLs/Links to download Bio NLI Dataset.

Dataset Access

Hi,

Thank you very much for the great work. Just wondering if you could kindly share the dataset? It looks like the links provided in README.md are invalid. Much appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.