Giter Site home page Giter Site logo

metadata_sepsis's Introduction

The completeness of metadata accompanying omics studies

Preprint Available

This project contains the links to the datasets and the figures that were used for our study : ["Improving the completeness of public metadata accompanying omics studies"]

Table of contents

Datasets

Extraction of metadata

We carefully examined a total of 3,125 samples across 29 studies. The original publications from journals were manually surveyed to gather information about the nine clinical phenotypes in question. The authors of these publications who own the data were contacted personally to obtain the complete data that was analyzed for that particular study. To extract metadata from the public repository, two Python scripts were used. These scripts are available here. The first script is used to get the files from the repository in the XML format. Further, the second script extracts the information from the XML file into a CSV file. These summary files from the repository can be found here and the data summarized from the original publication can be found here.

Description of metadata accompanying sepsis studies

There are four CSV files that were used to produce the results of the analysis.

  • sepsis_clinical_phenotypes.csv contains data regarding the number of times a particular clinical phenotype has been reported on each - the publication and the public repository. The total number of times the clinical phenotype was reported is a sum of the individual platforms. This is further expressed as a percentage.

  • sepsis_comparison.csv reports the number of clinical phenotypes that have been reported on each of the platforms for each of the cohorts. There were a total of nine clinical phenotypes that were considered. The total number of clinical phenotypes has been expressed as a percentage of the total (all nine clinical phenotypes being reported corresponds to 100%).

  • sepsis_completeness.csv was used to observe which of the cohorts were most and least complete. The number of clinical phenotypes reported for each cohort on the publication and the public repository was counted, summed and the total was expressed as a percentage.

  • sepsis_individual_phenonotypes contains data to calculate the most and least discrepancy between the individual phenotypes reported on both platforms.

Reproducing results

We have prepared Jupyter Notebooks that utilize the data described above to visualize and reproduce the results presented in our editorial.

Acknowledgements

We take this opportunity to specifically thank Jeremy Rotman for assisting to write the two Python scripts to extract metadata from NCBI GEO. We also thank Henry Fu for his help in the initial manual work of going through publications to accumulate data.

Contact

Please do not hesitate to contact us ([email protected], [email protected]) if you have any comments, suggestions, or clarification requests regarding the study or if you would like to contribute to the extended analysis involving more disease conditions.

metadata_sepsis's People

Contributors

anushkar04 avatar

Stargazers

Qiushi Peng avatar starshine avatar Owain  gaunders avatar Mahmoud Ahmed avatar Jeremy Leipzig avatar

Watchers

James Cloos avatar Serghei Mangul avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.