Giter Site home page Giter Site logo

table_extractor's Introduction

table_extractor

Code and data used in the paper, A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction

There are two main components to this repository:

  1. table_extractor code
  2. zeolite synthesis data

1. Table Extraction Code

This code extracts tables into json format from HTML/XML files. These HTML/XML files need to be supplied by the researcher. The code is written in Python3. To run the code:

  1. Fork this repository
  2. Download the Olivetti group materials science FastText word embeddings
  3. Install all dependencies
    • json, pandas, spacy, bs4, gensim, numpy, unidecode, sklearn, scipy, traceback
  4. Place all files in tableextractor/data
  5. Use Jupyter (Table Extractor Tutorial) to run the code

The code takes in a list of files and corresponding DOIs and returns a list of all tables extracted from the files as JSON objects. Currently, the code supports files from ACS, APS, Elsevier, Wiley, Springer, and RSC.

2. Zeolite Synthesis Data

The germanium containing zeolite data set used in the paper is publicly available in both Excel and CSV formats. Here is a description of each feature:

doi- DOI of the paper the synthesis route comes from

Si:B- molar amount of each element/compound/molecule used in the synthesis. Amounts are normalized to give Si=1 or Ge=1 if Si=0

Time- crystallization time in hours

Temp- crystallization temperature in °C

SDA Type- name given to the organic structure directing agent (OSDA) molecule in the paper

SMILES- the SMILES representation of the OSDA molecule

SDA_Vol- the DFT calculated molar volume of the OSDA molecule in bohr^3

SDA_SA- the DFT calculated surface area of the OSDA molecule in bohr^2

SDA_KFI- the DFT calculated Kier flexibility index of the OSDA molecule

From?- the location within a paper the compositional information is extracted. Either Table, Text, or Supplemental

Extracted- Products of the synthesis as they appear in the paper

Zeo1- the primary zeolite (zeotype) material made in the synthesis

Zeo2- the secondary zeolite (zeotype) material made in the synthesis

Dense1- the primary dense phase made in the synthesis

Dense2- the secondary dense phase made in the synthesis

Am- whether an amorphous phase is made in (or remains after) the synthesis

Other- any other unidentified phases made in the synthesis

ITQ- whether the synthesis made a zeolite in the ITQ series

FD1- the framework density of Zeo1

MR1- the maximum ring size of Zeo1

FD2- the framework density of Zeo2

MR2- the framework density of Zeo2

Citing

If you use this code or data, please cite the following as appropriate.

A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction Zach Jensen, Edward Kim, Soonhyoung Kwon, Terry Z. H. Gani, Yuriy Román-Leshkov, Manuel Moliner, Avelino Corma, and Elsa Olivetti ACS Central Science Article ASAP DOI: 10.1021/acscentsci.9b00193

table_extractor's People

Contributors

zjensen262 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.