Giter Site home page Giter Site logo

cambridgechemistryworkshopsep2015's Introduction

Automatic validation and extraction of data from publications in Chemical and Materials Sciences

Workshop at the Department of Chemistry, University of Cambridge

Register [here] (http://www.eventbrite.co.uk/e/contentmine-chemistry-hack-tickets-18534620549) (registration is FREE, places limited to 25 )

==============

ContentMine logo

Location: U202, Department of Chemistry, Lensfield Road CB2 1EW

Dates: 18-19 September 2015

18 September 2015 19 September 2015
Training Workshop & Publisher Panel Session Hackday
9:00 - 18:00 10:00 - 17:00

Contact us via [@TheContentMine] (https://twitter.com/TheContentMine) or [email protected]

Trainers:

Please read the [Pre-workshop Installation Instructions] (https://github.com/ContentMine/vms/blob/master/installation_intructions.md)

We would also appreciate your feedback

Workshop Purpose

Ever found that the key data you want is published in a text-based PDF journal?

  • ...found yourself manually downloading 100 papers click-by-click?
  • ...redrawing structures/spectra/graphs so you can recompute/analyze them?
  • ...retyping data from tables?
  • ...wishing that a computer can do the really boring discovery and retrieval of the data in the literature?

We all have. But new approaches are solving it. That's why Content-Mining (aka text-and-data mining, TDM) is one of the most exciting areas in scientific data. It's even been intensively debated in the European Parliament and Commission. And the UK is leading the way with new exemptions from copyright so that Universities like Cambridge are the ideal places to learn and develop the new techniques.

The workshop will bring together:

  • scientists with a need to discover data, especially in chemistry, materials, molecular bioscience - both experimental and computational
  • scientific publishers
  • library staff
  • technology developers.

We'll show how Open software can be used to

  • crawl the literature effectively using search APIs
  • scrape all the content from publisher web pages (supplemental data, structures)
  • normalize PDFs into semantic HTML
  • run search plugins to discover particular.

The first day will include overviews, installation of technology [1], and a panel of experts from the participants on policy and practice and a hands-on introduction. The second day will be a project-based hack where small groups will tackle their own communal problems. The event is sponsored by the EPSRC-IAA Knowledge Transfer Fund of the Chemistry Department. Facilitators are from Chemistry and Plant Sciences. Coffee, lunches and a Friday dinner are provided.

[1] all essential technology is Open and from contentmine.org, an Open project funded by the Shuttleworth Foundation.

Training Workshop and Publisher Panel Session Agenda

Times Session
9:00 Introductions
9:15 What is content mining?
  • Overview presentation from ContentMine staff
9:30 Think like a content miner
  • Hands-on activity facilitated by ContentMine staff introducing entity extraction techniques, precision and recall.
Scraping and the anatomy of scrapers
11:00 Preparations for panel discussion with publishers
12:30 Lunch
13:30 Publishers Q&A
15:30 Tea time
16:00 Entity recognition using AMI
  • Hands-on activity facilitated by ContentMine staff including extracting species names from OA papers using AMI-species.
18:00 onwards Informal social event (dinner)
  • Move as a group to nearby pub or late opening cafe (to discuss hackday projects).
Reservation to be confirmed at Browns from 18:00 onwards.

Workshop Hackday Agenda

Times Session
10:00 **Hacking in teams working on AMICHEM, Chemical tagger,... **
12:30 Lunch
13:30 **Hacking in teams working on AMICHEM, Chemical tagger,... **
15:30 Coffee Break
16:00 Presentation of hackday projects
  • Presentations delivered by participants, including future scope for development of their projects.
16:30 Panel discussion on accelerating uptake of content mining.
  • Panel and Q&A with audience including workshop participants.
17:00 Event close

Intended Audience

This two day event is intended for researchers or research-related staff who are not currently heavily involved in text and data mining but have at least some pre-existing computational skills. At minimum we expect familiarity with a command line interface and basic coding abilities in some language.

Click here to be advised of future ContentMine Workshops

cambridgechemistryworkshopsep2015's People

Contributors

chreman avatar grahamsteel avatar jbr36 avatar jcmolloy avatar petermr avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.