ResourceMiner

Getting started

Look through the issues here in the GitHub repo and the wiki.
Check out these useful resources
Fork the repo and start coding!

See also the details section. Papers and results can be accessed through our MongoDB server:

mongo ec2-52-26-49-156.us-west-2.compute.amazonaws.com/plos -u mozsprint -p plos.

Papers are stored in collections with the journal names. The results (term-subject matches) are in the term_subject collection.

About Riffyn

Billions of dollars are lost each year on material and life science research with 10% reproducibility rates, and multi-million dollar failures during process transfer to manufacturing. Riffyn addresses this problem with a cloud-based operating system for laboratory R&D processes that combines computer-aided design of structured protocols including controlled vocabularies, real-time data acquisition, and automated statistical quality analytics.

Details

We would like to develop a natural language processing toolkit for extracting controlled vocabularies (ontologies) of research resources used in the scientific literature, and organizing them into an associative knowledge graph. The tool will report on the frequency and context of use of terms that are fed to it from Riffyn’s (or any) ontology of research resources.

To build the knowledge graph, the tool will annotate the ontology terms with keywords and subject area tags obtained from article annotations in PLOS and other journals. If time permits, the tool may also be extended to extract additional ontology terms from journal articles using a predictive algorithm. The tag information could also be engineered to flow in the reverse direction (from the ontology to the journal) to suggest tags for articles, analogous to when LinkedIn suggests skills of a person for you to endorse.

The toolkit will be developed via two subprojects (deliverables) below.

Deliverables

A Python package to calculate usage statistics in PLOS articles of a given list of terms. For our test case, we will have two products:
- a report of usage statistics for Riffyn’s 50,000+ resource term set
- a JSON file associating PLOS subject area tags and terms from Riffyn's ontology
A predictive model for resource terms in a research paper. Train a model to identify additional terms to add to the term set. Train on the PLoS library using Riffyn’s ontologies. The goal is to identify additional terms to incorporate.

Structure

bin -- code
config -- configuration files
data/papers -- plaintext paper content
data/terms -- term lists
docs -- further documentation
test -- reference files for testing

bleuknight / resourceminer Goto Github PK

resourceminer's Introduction

ResourceMiner

Getting started

About Riffyn

Details

Deliverables

Structure

resourceminer's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent