Giter Site home page Giter Site logo

philippadoherty / retriever_app Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 2.0 2.26 MB

Welcome to the retrieverApp! This is an open source tool used for publication progress reporting for large collaboration networks, consortia, and individual investigators alike.

License: Other

Python 53.92% HTML 46.08%

retriever_app's Introduction

retriever_app

Welcome to the retrieverApp! This is an open source tool used for publication progress reporting for large collaboration networks, consortia, and individual investigators alike.

This is a Python based application that will automatically gather data primarily from the NCBI E-utilites and generate an html file that summarizes this data. This product exclusively queries publications in the PubMed database and gathers the associated data from GEO, SRA, dbGap, and Clinical Trials databases.

How to install:

Clone github files locally: git clone https://github.com/philippadoherty/retriever_app.git and change working directory: cd retriever_app.

Install dependencies using pip: pip install .

After installation, you should be able to access two command line tools -- retriever_get and retriever_refresh.

Note: We recommend using a venv (current testing is with python version ~3.9)

Additional setup:

In order to use this application you must have an NCBI API Key, which you can get by creating a FREE NCBI account.

To get a key, go to NCBI to create an account or log in.

Go to account settings, scroll down to API Key Management to get your key.

Run the following commands to store your variables:

$ export [email protected]
$ export NCBI_API_KEY=your_api_key_here

Usage:

  1. Go to your working directory, and create a text file with your list of grants (or edit the sample file: example_grants.txt)

  2. Run the following command to retrieve your data by simply specifying the text file that contains your grant list.

$ retriever_get -grants example_grants.txt
  1. Navigate to the html file, retriever_app.html, in a browser to see the data summaries and tables.

To edit your data:

  1. Edit the excel files: Navigate to the sheets_for_editing folder, edit the excel sheet corresponding to the tabs on the html page. If you would like to change what data is displayed, change the value in the last column, display, from y to n. Save the excel sheet.

  2. To apply these changes run the following command:

retriever_refresh -f file_name to update the file that has been modified.

-f options are

  • data_catalog,
  • pub_cite,
  • clinical_trials,
  • software_catalog, or
  • dbgap_data.

Specify the file name without the file extension:

$ retriever_refresh -f data_catalog

Detailed description and more usage/tips to come ...

tips:

  1. json files are not intended to be directly edited
  2. you can edit/ fill in any other columns in the excel sheets and apply those changes.

Explanation of how we gather the data:

  1. query PubMed by grant and pull all publications (PubMed IDs) associated with these grants
  2. query Entrez elink to find PMID:GEO and PMID:SRA links, query GEO and SRA databases
  3. query Entrez elink for PMID:PMCID, query PMC for available text
  4. perform regex text searching for potential matches of clinical trials NCTID, dbGap accession number, and github repositories
  5. based on potential matches, query clinical trials API, query github API, webscrape dbGap for study meta data

retriever_app's People

Contributors

philippadoherty avatar taoliu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.