Giter Site home page Giter Site logo

acronyms's Introduction

Analysing acronyms in PubMed data

R code to read and analyse data to examine the use of acronyms in published papers over time. The analysis examines titles and abstracts published on PubMed up until 2019. Our definition of acronym includes initialisms and abbreviations.

The folder animation contains animations of the top ten acronyms per year over time in titles and abstracts.

The folder data contains the following data on acronyms and the meta-data on papers:

  • titles[x].rds meta-data on the 24,873,372 included titles in rds format
  • titles_sample.txt a random sample of 1,000 included titles in tab-delimited format
  • abstracts[x].rds meta-data on the 18,249,091 included titles in rds format
  • abstracts_sample.txt a random sample of 1,000 included abstracts from abstracts.RDS in tab-delimited format
  • acronyms[x].rds the 139,959,947 acronyms
  • acronyms_sample.txt a random sample of acronyms from 1,000 papers in tab-delimited format The data are very large and hence have been split into multiple files. For the tab-delimited files I've given a random sample as an easily accessible taster of the data.

The data were sourced directly from PubMed in XML format (available here) hosted by the National Library of Medicine. The data here do not reflect the most current/accurate data available from the National Library of Medicine. The data were downloaded between 14 to 22 April 2020.

The variables in title[x].rds, titles_sample.txt, abstracts[x].rds and abstracts_sample.txt are:

  • pmid PubMed ID number
  • date date published on PubMed
  • type article type, e.g., "Journal Article" or "Editorial"
  • jabbrv journal abbreviation, e.g., "Biochem Med"
  • n.authors number of authors
  • n.words number of words in the title or abstract

The variables in acronyms_sample.txt and acronyms[x].rds are:

  • pmid PubMed ID number
  • acronyms the acronym (e.g., "HIV")
  • nchar the number of characters in the acronym
  • source 'Title' or 'Abstract'

The acronyms used above are:

  • RDS = R data source???
  • XML = Extensible markup language

acronyms's People

Contributors

agbarnett avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.