Giter Site home page Giter Site logo

simd's Introduction

A Public Database of Thermoelectric Materials and System-Identified Material Representation for Data-Driven Discovery

Thermoelectric materials have received much attention for energy harvesting devices and power generators. However, discovering novel high-performance thermoelectric materials is a challenging task due to the diversity and the structural complexities of the thermoelectric materials containing alloys and dopants. For efficient data-driven discovery of novel thermoelectric materials, we constructed a public dataset that contains experimentally synthesized thermoelectric materials and their experimental thermoelectric properties. In our dataset, we achieved $R^2$-scores greater than 0.9 in the regression problems for predicting experimentally measured thermoelectric properties of the materials from their chemical compositions. Furthermore, we devised a material descriptor for the chemical compositions of the materials to improve extrapolation capabilities of machine learning methods. Based on transfer learning with the proposed material descriptor, we greatly improved $R^2$-score from 0.13 to 0.71 in predicting experimental ZTs of the materials from completely unseen material groups.

Reference: https://doi.org/10.1038/s41524-022-00897-2

Run

This repository provides an implementation of transfer learning based on System-Identified Material Representation (SIMD). By executing exec.py, you can train and evaluate the XGBoost regressor with SIMD to predict ZTs of thermoelectric materials from unexplored material groups.

Datasets

To reproduce the extrapolation results of SIMD, we should prepare the following two datasets of thermoelectric materials.

  • Starry dataset: It is a large materials dataset containing thermoelectric materials. Since it was collected by text mining, data pre-processing should be conducted to remove invalid data (reference: https://www.starrydata2.org).
  • ESTM dataset: It is a refined thermoelectric materials dataset for machine learning. ESTM dataset contains 5,205 experimental observations of thermoelectric materials and their properties (reference: https://doi.org/10.1038/s41524-022-00897-2).

Notes

  • This repository contains only a subset of the source Starry dataset due to the dataset license. Please visit Starrydata to download the full data of the source Starry dataset.
  • The full data of the ESTM dataset is provided in the dataset folder of this repository.
  • The results folder provides the extrapolation results on the full data of the Starry and ESTM dataset. You can check the extrapolation results reported in the paper.

simd's People

Contributors

krict-data avatar

Stargazers

physlab avatar Anoop K. Chandran avatar  avatar Piyush Paliwal avatar szy_and_c++ avatar Quantum Materials @ Tohoku University avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

jushinpon

simd's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.