Giter Site home page Giter Site logo

resourceminer's Introduction

ResourceMiner

Join the chat at https://gitter.im/RiffynInc/ResourceMiner

Getting started

  1. Look through the issues here in the GitHub repo and the wiki.
  2. Check out these useful resources
  3. Fork the repo and start coding!

See also the details section. Papers and results can be accessed through our MongoDB server:

mongo ec2-52-26-49-156.us-west-2.compute.amazonaws.com/plos -u mozsprint -p plos.

Papers are stored in collections with the journal names. The results (term-subject matches) are in the term_subject collection.

About Riffyn

Billions of dollars are lost each year on material and life science research with 10% reproducibility rates, and multi-million dollar failures during process transfer to manufacturing. Riffyn addresses this problem with a cloud-based operating system for laboratory R&D processes that combines computer-aided design of structured protocols including controlled vocabularies, real-time data acquisition, and automated statistical quality analytics.

Details

We would like to develop a natural language processing toolkit for extracting controlled vocabularies (ontologies) of research resources used in the scientific literature, and organizing them into an associative knowledge graph. The tool will report on the frequency and context of use of terms that are fed to it from Riffyn’s (or any) ontology of research resources.

To build the knowledge graph, the tool will annotate the ontology terms with keywords and subject area tags obtained from article annotations in PLOS and other journals. If time permits, the tool may also be extended to extract additional ontology terms from journal articles using a predictive algorithm. The tag information could also be engineered to flow in the reverse direction (from the ontology to the journal) to suggest tags for articles, analogous to when LinkedIn suggests skills of a person for you to endorse.

The toolkit will be developed via two subprojects (deliverables) below.

Deliverables

  1. A Python package to calculate usage statistics in PLOS articles of a given list of terms. For our test case, we will have two products:

    • a report of usage statistics for Riffyn’s 50,000+ resource term set
    • a JSON file associating PLOS subject area tags and terms from Riffyn's ontology
  2. A predictive model for resource terms in a research paper. Train a model to identify additional terms to add to the term set. Train on the PLoS library using Riffyn’s ontologies. The goal is to identify additional terms to incorporate.

Structure

  • bin -- code
  • config -- configuration files
  • data/papers -- plaintext paper content
  • data/terms -- term lists
  • docs -- further documentation
  • test -- reference files for testing

resourceminer's People

Contributors

marcuscarr avatar tgardner4 avatar gitter-badger avatar

Watchers

James Cloos avatar Bleu Knight avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.