Giter Site home page Giter Site logo

yarn's Introduction

yarn

Yarn is a system for creating vectorial concept representations from an ontology containing descriptions of these concepts. These concept representations can then be used to disambiguate terms, and link them to the appropriate concept.

For more information, see the paper Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts by Stéphan Tulkens, Simon Šuster and Walter Daelemans, which was presented at the BioNLP Workshop at ACL 2016.

License

MIT

Contributors

Stéphan Tulkens, Simon Suster, and Walter Daelemans. If you use this work or build upon it, please cite our paper, as follows:

@inproceedings{tulkens2016using,
  title={Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts},
  author={Tulkens, St{\'e}phan and {\v{S}}uster, Simon and Daelemans, Walter},
  booktitle={Proceedings of the 15th Workshop on Biomedical Natural Language Processing},
  pages={77--82},
  year={2016}
}

Requirements

All are available from pip

Usage

Yarn requires:

  • A set of word vectors
  • A set of concepts, with their descriptions
  • A set of documents with their ambiguous terms marked

The word vectors we used can be downloaded from the BioASQ website.

If you want to replicate the original experiments, you need to adhere to the formats below. If you want to use Yarn for your own experiments, e.g. just creating concept representations, you can choose your own format.

concepts

Concepts are represented by a top-level dictionary of terms, concepts that pertain to these terms, and a list of descriptions (strings), of these concepts.

{"term":
  {"concept id_1":
    [description_1,
     description_2,
     ...
     description_n]
  },
  {"concept_id_2":
    [description_1,
     description_2,
     ...
     description_n]
  }
}

documents

Similarly, documents to be disambiguated are represented by a dictionary. Note that each document must contain at least one occurrence of the ambiguous term under which it is classified.

{"term":
  {"concept id_1":
    [document_1,
     document_2,
     ...
     document_n]
  },
  {"concept_id_2":
    [document_1,
     document_2,
     ...
     document_n]
  }
}

The original Yarn experiments were run with the MSH dataset (Jimeno-Yepes 2011) and the 2015AB release of the UMLS. Because these resources are not freely distributable, we were not able to redistribute them with this package.

yarn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.