Giter Site home page Giter Site logo

cord-19's Introduction

CORD-19

Code and data for the https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/

  • Question nĀ°1: Where and how to find relevant information?
    • Build: papers metadata file
    • Build: authors metadata file
    • Build: authors network
    • Build: paper network

Structure

Scrape

I scrape articles by source. For now I use the paper urls in the dataframe 'metadata.csv'. For now, I also only scrape papers from 'bioRxiv' as I have only written the code for that html page.

Scraped articles are saved in .json format to the data folder under respective sources for clarity.

Scraping and storage Ideas:

  • Scrape all sources
  • Manage all exceptions
  • Store articles and authors with Neo4J

Analysis

  • For each article, I want to extract keywords, key sentences and key paragraphs.
  • I want to get statistics of articles: # authors, # references and so on.
  • I want to compare articles with each other to find relevant terms and similarities and differences.

Analysis Ideas:

  • Build authors / co-authors network
  • Build article references network
  • Find isolated articles and articles that are highly connected
  • [!] Compare articles using word vector in addition to network structure
  • Recommend article reading and in-depth human analysis

Visualization

  • I want to visualize each article in HTML format in the browser.
  • I want to visualize keywords, sentences and paragraphs with highlighting.

Visualization Ideas:

  • Build research article navigator, display list of articles available and choose articles to view.
  • View article statistics and content.
  • Click on author to display author information
  • Click on reference (if data available) to open reference in new window
  • Click on highlighted term to see associated information: similar articles, etc.
  • Visualize references network
  • Visualize authors network

cord-19's People

Contributors

davidcdupuis avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.