Comments (6)
My bad for not handling pickle files separately in load_train_test()
. Maybe we should rename the function into load()
since it's clearly morphing into more than just a training and test set loader. @pbenner Curious to hear your opinion!
from matbench-discovery.
Yes sounds good! Also fetch_process_wbm_dataset.py could be fully integrated and called when first running load()
from matbench-discovery.
This is the error I get using the new branch:
Traceback (most recent call last):
File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 322, in <module>
assert sum(no_id_mask := df_summary.index.isna()) == 6, f"{sum(no_id_mask)=}"
AssertionError: sum(no_id_mask)=0
from matbench-discovery.
Are you using pandas
v1.x.x? I just changed the code from v1 to v2 compat. I'll downwards pin pandas
in pyproject.toml
to avoid this in the future.
from matbench-discovery.
Indeed, I had pandas 1.5, trying to check with pandas 2.0. Meanwhile, I think 2023-02-07-mp-elemental-reference-entries.json.gz was modified:
python data/wbm/fetch_process_wbm_dataset.py
Loading 'wbm_summary' from cached file at '/home/pbenner/.cache/matbench-discovery/1.0.0/wbm/2022-10-19-wbm-summary.csv'
Warning: '/home/pbenner/.cache/matbench-discovery/1.0.0/mp/2023-02-07-mp-elemental-reference-entries.json.gz' associated with key='mp_elemental_ref_entries' does not exist. Would you like to download it now using matbench_discovery.data.load_train_test('mp_elemental_ref_entries'). This will cache the file for future use. [y/n] y
Downloading 'mp_elemental_ref_entries' from https://figshare.com/ndownloader/files/40344445
variable dump:
file='mp/2023-02-07-mp-elemental-reference-entries.json.gz',
url='https://figshare.com/ndownloader/files/40344445',
reader=<function read_json at 0x7f9898a875b0>,
kwargs={'compression': 'gzip'}
Traceback (most recent call last):
File "/home/pbenner/Source/tmp/matbench-discovery/data/wbm/fetch_process_wbm_dataset.py", line 24, in
from matbench_discovery.energy import get_e_form_per_atom
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/energy.py", line 66, in
pd.read_json(DATA_FILES.mp_elemental_ref_entries, typ="series")
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/data.py", line 217, in getattribute
self._on_not_found(key, msg)
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/data.py", line 239, in _on_not_found
load_train_test(key) # download and cache data file
File "/home/pbenner/Source/tmp/matbench-discovery/matbench_discovery/data.py", line 111, in load_train_test
df = reader(url, **kwargs)
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pandas/io/json/_json.py", line 760, in read_json
json_reader = JsonReader(
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pandas/io/json/_json.py", line 862, in init
self.data = self._preprocess_data(data)
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/site-packages/pandas/io/json/_json.py", line 874, in _preprocess_data
data = data.read()
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 301, in read
return self._buffer.read(size)
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/_compression.py", line 118, in readall
while data := self.read(sys.maxsize):
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 488, in read
if not self._read_gzip_header():
File "/home/pbenner/.local/opt/anaconda3/envs/crysfeat/lib/python3.10/gzip.py", line 436, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'{\n')
from matbench-discovery.
Yeah, I was in the process of updating the Figshare files but then got carried away. That error will be fixed before I merge #26.
from matbench-discovery.
Related Issues (20)
- add ChargE3Net to leaderboard HOT 2
- How to calculate the MAE of the submitted files HOT 4
- df_summary.index contains nan values HOT 1
- fetch_process_wbm_dataset.py: data/wbm/2022-10-19-wbm-init-structs.json.bz2 does not exist
- compute_struct_fingerprints.py: cannot insert material_id, already exists
- fetch_process_wbm_dataset.py: Generating Aflow labels raised exception=KeyError('wyckoff_spglib') HOT 1
- Location of site-stats.json.gz
- Benchmark design questions HOT 15
- Obtain E_above_hull predictions HOT 10
- Reference: Critical examination of robustness and generalizability HOT 2
- Importing CSV with pd.read_json() HOT 3
- Simplified user interface HOT 1
- Pytorch module and virtual environment usage HOT 6
- dead link in contributing HOT 1
- fetch_process_wbm_dataset.py: bad JSON file checksum HOT 1
- Mismatching fingerprint paths HOT 1
- Different training size for benchmarking HOT 2
- MIssing `"direct_url.json"` causes `JSONDecodeError`: Expecting value: line 1 column 1 (char 0)
- df_wbm has wrong index column name type for wandb.Table HOT 2
- Inconsistency in GNoME's F1 Scores on Matbench HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from matbench-discovery.