This repository contains the dataset perovskites' work functions using first principles and the electronic descriptors to perform data-driven analysis. This work aims to identify the electronic factors and how they control perovskite oxides' work functions.
The following paper describes the details of the analysis:
Data-driven analysis of the electronic factors controlling the work functions of perovskite oxides
biokit is used for plotting correlation matrix, pdpbox is used to perform partial dependence analysis (PDP). The prerequisites can also be installed using pip.
pip install -r requirements.txt
The analysis of the perovskites' work functions contain three main steps, the help messages can be find using:
python correlated_matrix.py -h
python rfecv.py -h
python model_analysis.py -h
The dataset contains the original 38 electronic features and the first principles-derived work functions is in the
data
folder, the definition of the features can be found in our manuscript: Data-driven analysis of the electronic
factors controlling the work functions of perovskite oxides
-
remove correlated features, default by enumeration method and dumping the plot:
python correlated_matrix.py -e -p
This step creates new dataset
perovskite_wf_data_uncorr.csv
indata
directory. -
recursive feature elimination with cross-validations, hyperparameters tuned at each step, for example:
python rfecv.py --term A --rfecv
This perform the recursive feature elimination for A-termination of perovskites based on random forest regressor, the hyperparameter is selected at each step. The optimized features and transformed dataset are saved in
rfecv_results
anddata
directory, respectively. -
Analysis of the model's performance, with partial dependence analysis.
python model_analysis.py --term A --pdp_1d --pdp_2d
--pdp_1d and --pdp_2d flag will perform the partial dependence analysis, available feature names will be displayed.
rfecv and pdp results can be plotted using plotter.py
in plot_tools
directory:
python plotter.py --rfecv --pdp
the obtained figures are stored in rfecv_results
and results directory
, respectively.
The trained models for AO and BO2-terminated interfaces are stored in model_checkpoints
.