hackingmaterials / matminer_examples Goto Github PK

View Code? Open in Web Editor NEW

100.0 100.0 59.0 102.57 MB

A repo of examples for the matminer (https://github.com/hackingmaterials/matminer) code

License: Other

Python 0.04% Jupyter Notebook 99.96%

matminer_examples's People

Contributors

Stargazers

Watchers

Forkers

kylebystrom ardunn montoyjh albalu wardlt spacedome doppe1g4nger wwwanjian nitin0301 beuchbeuch1005 qi-max wcreus miaxxxf petema dichernyavsky shkdidrlf yingzongliang hq-ankou wangvei rhennig star-lank sufyanshk jzhang73 sparks-baird marzieghorbani mumerchem robocop1914 zhangquan2018 flaviu-gostin myfortune110 bhamadicharef 125268284 blokhin nhgowtham eyuval13 mukhtarbayerouniversity stharajkiran zcb-code zhangyuwen984 fermiq parthasarathidutta steinnberg dingrc wangmiaox peasant19 yifei-zx volkswagon-rgb min2021-hub elhampisheh abnano sheikhaasif20 abdul184 eoyeniyi22 harel-coffee wwjcmp anveshnathaniou eveprocreate shibu778

matminer_examples's Issues

PlotlyFig examples - use matminer data sets?

@ardunn @albalu

Should the PlotlyFig examples use the matminer example data sets?

They are just as easy to load as sklearn data sets
They would be more relevant / interesting to materials scientists than plotting boston housing prices or something

add ternary plot to matminer_examples

and also put an image of it in the README so it shows up in the visual gallery

@albalu

Citrine informatics in the matminer_examples not working

Getting following error:

Traceback (most recent call last):
File "MP_Citrine_MDF_MPDS_Masher.py", line 37, in
from matminer.data_retrieval.retrieve_Citrine import CitrineDataRetrieval
File "matminer/data_retrieval/retrieve_Citrine.py", line 2, in
from citrination_client import CitrinationClient, ChemicalFieldQuery,
ModuleNotFoundError: No module named 'citrination_client'

pipeline tutorial could benefit from some background

@spacedome

It would be nice if the sklearn Pipeline tutorial had an intro section briefly explaining the concept of a pipeline and why it is useful. I think most people reviewing the examples repo will not be familiar with sklearn pipelines as they are materials scientists. So it would help them to gain some context as to why they want to create a pipeline.

I don't think you need to go overboard. Maybe 1 paragraph or 2, with appropriate links to externally hosted docs as needed.

create demo notebook for functionfeaturizer

as requested on matminer help list (somehow not showing up on Google groups)

Predicting bulk modulus - r^2 values

The bulk modulus notebook takes the absolute value of r^2 when computing the scoring metric. But, it should not do that - if r^2 is less than zero, it is worse than predicting the mean.

Probably it won't affect this specific notebook, but better not to have that code in there. Very simple fix - just remove np.abs() around any r2 computation.

There is an Inconsistency in matminer.featurizers.structure.CoulombMatrix()

In featurize function, there are below coments.
""
Get Coulomb matrix of input structure.

    Args:
        s: input Structure (or Molecule) object.

    Returns:
        m: (Nsites x Nsites matrix) Coulomb matrix.

"""
Which said that the function returns N * N matrix.

But Actually， it return an N-dimension vector, because they return the characteristic value of Coulomb Matrix.

Binder examples cannot be run interactively

When running the notebooks in binder (as per the documentation), the lack of API keys mean the notebooks cannot be run interactively. Another issue is that matminer is not installed using the optional dependencies, so the MPDS examples fail with another error message.

Possible solutions:

Set the proper environment variables in binder. I don't know if this can be done without the keys being public in anyway.
Remove the link to binder as the notebooks can be viewed in the GitHub repository, just not run interactively.

If binder provides other benefits further to running the notebooks interactively then I guess it could make sense to leave the link in the documentation.

Can the figrecipe examples be ported to notebooks?

I think it would be nice to have these as notebooks. Users would be able to see the figures rendered here on GitHub. Plus, iteratively tweaking figures seems is much faster in notebooks (at least for me) and I'd like to advocate their use here.

What is your take on porting these examples to notebooks?

Example notebooks should show people how to plot the cross validation model errors

Right now you are plotting the errors on the training set with PlotlyFig which is misleading

You should generate the cross-validation plot and update the examples to show how to do that

all notebooks should specify last known working version of matminer they are compatible with

e.g., if a notebook stops working for the latest version of the code, this will tell a user what version of matminer to roll back to get things working at least in the short term.

error: "NameError: name 'staticmethodzoom' is not defined"

I am using the Anaconda environment and the Ipython notebook.

When using:
from matminer.featurizers.composition import ElementProperty

I get following error:

NameError Traceback (most recent call last)
in
----> 1 from matminer.featurizers.composition import ElementProperty

~/anaconda/envs/py3/lib/python3.6/site-packages/matminer/featurizers/composition.py in
16
17 from matminer.featurizers.base import BaseFeaturizer
---> 18 from matminer.featurizers.utils.stats import PropertyStats
19 from matminer.utils.data import DemlData, MagpieData, PymatgenData,
20 CohesiveEnergyData, MixingEnthalpy, MatscholarElementData

~/anaconda/envs/py3/lib/python3.6/site-packages/matminer/featurizers/utils/stats.py in
13
14
---> 15 class PropertyStats(object):
16 """This class contains statistical operations that are commonly employed
17 when computing features.

~/anaconda/envs/py3/lib/python3.6/site-packages/matminer/featurizers/utils/stats.py in PropertyStats()
337 return np.array(data_lst).flatten()
338
--> 339 @staticmethodzoom
340 def quantile(data_lst, weights=None, q=0.5):
341 """

NameError: name 'staticmethodzoom' is not defined

JSON issues with 'kernel_ridge_SCM_OFM' example

The 'kernel_ridge_scm_ofm' example script produces the following error when trying to load the 'flla' dataset:

REMOVE UNSTABLE ENTRIES: False
USE FABER DATASET: True
USE TERNARY OXIDE DATASET: False
NUMBER OF JOBS: 24
DEBUG MODE: False
Traceback (most recent call last):
File "kernel_ridge_SCM_OFM.py", line 67, in
df = load_dataset("flla")
File "/home/dennis/.local/lib/python3.5/site-packages/matminer/datasets/dataset_retrieval.py", line 63, in load_dataset
df = load_dataframe_from_json(data_path)
File "/home/dennis/.local/lib/python3.5/site-packages/matminer/utils/io.py", line 58, in load_dataframe_from_json
dataframe_data = json.load(f, cls=MontyDecoder)
File "/usr/lib/python3.5/json/init.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/init.py", line 312, in loads
s.class.name))
TypeError: the JSON object must be str, not 'bytes'

Any help in debugging is appreciated.

Example including tensorflow, keras, or pytorch

We have the graphic on the docs and in the paper:

We have several sklearn examples but no Keras or neural network examples. Might be good to include at least one?

add an example of using sklearn Pipeline

might be nice to see how to re-use the same ML pipeline for different data problems

update examples to use new seaborn-style data set loaders

Probably for @Doppe1g4nger

No module named 'figrecipes'

Hi,

I follow the example of matminer with figrecipes.
But it error at the first import line

from matminer.figrecipes.plot import PlotlyFig
ModuleNotFoundError: No module named 'matminer.figrecipes'

from figrecipes import PlotlyFig
ModuleNotFoundError: No module named 'figrecipes'

** matminer (0.7.2)
** plotly (4.14.3)

problem with importing matminer

i want to know how to import matminer into jupyter notebook and what it requires to run correctly

You must run 'fit' first!

When I run PRDF class, I encounter this error "You must run 'fit' first!".

from automatminer import MatPipe -> IndexError: list index out of range

sorry, but I just pip install the lib, then import failure...
could anyone give me some idea about it?
thanks

Python 3.7.4 (default, Dec 17 2019, 17:07:17) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from automatminer import MatPipe
/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.scorer module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.feature_selection.base module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.feature_selection. Anything that cannot be imported from sklearn.feature_selection is now part of the private API.
  warnings.warn(message, FutureWarning)
/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.neighbors.unsupervised module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.neighbors. Anything that cannot be imported from sklearn.neighbors is now part of the private API.
  warnings.warn(message, FutureWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/automatminer/__init__.py", line 2, in <module>
    from automatminer.featurization import AutoFeaturizer  # noqa
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/automatminer/featurization/__init__.py", line 1, in <module>
    from .core import AutoFeaturizer  # noqa
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/automatminer/featurization/core.py", line 27, in <module>
    from automatminer.featurization.sets import (
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/automatminer/featurization/sets.py", line 10, in <module>
    import matminer.featurizers.structure as sf
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/matminer/featurizers/structure.py", line 36, in <module>
    from matminer.featurizers.site import OPSiteFingerprint, \
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/matminer/featurizers/site.py", line 1670, in <module>
    class LocalPropertyDifference(BaseFeaturizer):
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/matminer/featurizers/site.py", line 1695, in LocalPropertyDifference
    def __init__(self, data_source=MagpieData(), weight='area',
  File "/home/inode01/xiaotong/code/mossbauer_preprocess/venv_moss/lib/python3.7/site-packages/matminer/utils/data.py", line 215, in __init__
    prop_value = float(lines[atomic_no - 1])
IndexError: list index out of range

add multi-indexing example

probably @ardunn

notebooks should warn users at top to start Jupyter with jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000

Otherwise I get errors with visualization, eg. see computed vs experimental bandgaps notebook

error in bulk_modulus notebook

@albalu
@ardunn


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-21-c406415a642e> in <module>()
     13 
     14 hist_plot["data"][0]['name'] = 'train'
---> 15 hist_plot["data"][1]['name'] = 'test'
     16 pf_rf.create_plot(hist_plot)

IndexError: list index out of range

Add example on loading your own data files

e.g.
https://groups.google.com/forum/#!topic/matminer/Vs7FxTeH1XA

CGCNNFeaturizer Example

I've been struggling a lot with getting CGCNNFeaturizer to work. The google groups doesn't seem to have much discussion on this, nor does the CGCNN repo have much to say about it other than a clarification about python bindings. The matminer documentation is also a bit hard to follow, but I think I've understood the main components. Does someone have (or could someone put together) a basic .ipynb showing how to get features from one of the pretrained models (e.g. 'bulk-moduli)?

PlotlyFig examples - easy images

I think someone just browsing the repo should be able to see the PlotlyFig example outputs.

I'd suggest:

putting an image file (at least one for each plot type) in the repo itself that one can directly see
having a README file in the PlotlyFig example directory that links to those figures. Thus, when you navigate to the PlotlyFig examples directory, the README just directly shows all the image outputs. Much nicer than cloning and running all the code.

If you want to see an example README with images, just look at the main matminer README

Jupyter notebooks should clearly state what needs to be in place BEFORE starting

e.g., If you need to have an API key (like Citrine) set up
if you need set certain data limits like jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000
what version of matminer this was tested with

using figrecipes in google colab

hei

i use module from matminer.figrecipes.plot import PlotlyFig
and i try this basic code

A Simple XY plot

pf = PlotlyFig(title="Basic Example", mode='notebook')

Inputs are tuples contain a list of x variables and y variables

pf.xy(([1, 2, 3], [4, 5, 6]))

but the plot not showed

so, can you tell how to show this plot in google colab ?

Typo and API key shown in notebook

I've noticed a couple of typos and other minor bugs in the basic data retrieval notebook. I will keep track of them in this issue and submit a pull request containing fixes when I've run through all the notebook examples.

In data_retrieval_basics.ipynb:

The order of cell execution is non-linear (so the numbers at the side of the cells are not incremental). This isn't necessarily a problem but may be confusing for people new to notebooks.
Under the Materials Project heading, example 2: df.to_cv(...) should be df.to_csv(...)
Under Citrine informatics: an API key is given when initializing the CitrineDataRetrieval. This should be removed.
Capitalise Globus
Empty cell at end of notebook

(I'll edit this message if I find anything else)

The 'Structure' object?

Hello!How can I get the same format as the 'structure' field in the examples datasets? I only have some POSCAR files.Can someone help me?
BestWishes!

issue with higher version of numpy.

File "/Users/jason/opt/anaconda3/lib/python3.9/site-packages/matminer/featurizers/structure/matrix.py", line 292, in init
my_ohvs[Z] = self.get_ohv(el, period_tag)
File "/Users/jason/opt/anaconda3/lib/python3.9/site-packages/matminer/featurizers/structure/matrix.py", line 338, in get_ohv
my_ohv = np.zeros(self.size, np.int)
File "/Users/jason/opt/anaconda3/lib/python3.9/site-packages/numpy/init.py", line 305, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'int'.