Giter Site home page Giter Site logo

arctic-papers's Introduction

Dataset of Arctic-related papers

The dataset contains two main graphs:

  • A bipartite graph of authors and papers
  • A tripartite graph of authors, papers, and institutions. This is the main graph. To operate over 'pairs' of nodesets, it is only necessary to remove nodes of category 'paper', 'inst' or 'author'.

The code to ceate the datasets is in scratch/. Their functions are the following:

  • scratch/scrapePapers.py - It scrapes all papers containing the word 'arctic' in their title, keywords or abstract, every year back to 1945.
  • scratch/scrapeIssn.py - Takes in a list of ISSNs (for instance in data/issn.txt) and downloads data about their publication
  • scratch/makeAuthorPaperGraph.py - It takes in data from the two previous scripts (check first few lines), and outputs a bipartite graph of authors and papers
  • scratch/makeTripartite.py - It takes in data from the two previous scripts (check first few lines), and outputs a tripartite graph that includes papers, authors, and institutions.

Notes on what the data files are

  • data/acia.csv is a list of author ids that we want to mark in the graph.
  • data/paperData.json is the list of paper information as scraped from the SCOPUS database.
  • data/scopusRefined.json is a refined version of data/paperData.json. The paper data is spread into several dictionaries, due to how openrefine works.

Graph conventions

Some data from Scopus was 'corrupted', so there will be a few papers without publication, author or affiliation information.

Node ids capture what entity the node represents by their first letter: an author("a"), paper("p") or institution("i"). For instance, node "a7563051" would represent an author, node "i7563957" would represent an institution, and node "p67865" would represent a paper.

Also, in tripartite.graphml, the nodes contain a category field which has a value of 'author','paper', or 'inst', for each entity.

Stats for tripartite graph and derivative graphs

50189 papers. 78901 authors. 13150 institutions.

The author-publication graph has 129090 nodes, 7916 connected components.

Their sizes: [95637, 295, 102, 101, 96, 77, 64, 55, 47, 46], etc.

50639 authors have ONE publication.

11646 authors have TWO publications.

5196 authors have THREE publications.

11420 authors have MORE THAN THREE publications.

8491 papers have ONE author.

10458 papers have TWO authors.

9397 papers have THREE authors.

21843 papers have MORE THAN THREE authors.

The author-institution graph has 92051 nodes, 7493 connected components.

Their sizes: [73729, 67, 53, 47, 45, 38, 36, 35, 35, 29], etc.

58527 authors have ONE institutions.

12546 authors have TWO instututions.

3370 authors have THREE institutions.

4458 authors have MORE THAN THREE institutions.

6850 institutions have ONE author.

1942 institutions have TWO authors.

1022 institutions have THREE authors.

3336 institutions have MORE THAN THREE authors.

arctic-papers's People

Contributors

pabloem avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.