Comments (10)
If parallel_duke_processes: false
, it works. Parallel processing is somehow broken.
from powerplantmatching.
Thanks for reporting, it seems that your version of multiprocessing does not reference correctly to subfunctions. For now, I set the default of parallel_duke_processes to false. However, on python3 (conda environment for updated version on master branch) the parallel processing works fine. So if you still want to use it, you can pull the current version of powerplantmatching, update your conda environment and redo the calculcation
from powerplantmatching.
Hi everyone,
there is a fundamental difference between how multiprocessing works on linux (ie. by forking) and on windows (ie. starting new processes which reimport the module). With the former your child processes maintain the full state of the interpreter, i.e. variables, functions and so on, while with the latter they start from scratch re-import the module and then pickle the function call and its arguments across. Thus in the latter setting your childs are not able to access any inline functions or variables (you can enable this setting explicitly using mp.set_start_method('spawn')
on linux, but it is is the default on windows, refer to the multiprocessing docs).
from powerplantmatching.
Hi! Just had the same error. I also run it with Windows 7, installed env with conda -f requirements.yaml.
After setting parallel_duke_processes: false and tying again pm.collection.matched_data() I receive this:
AttributeErrorTraceback (most recent call last)
in ()
----> 1 pm.collection.matched_data()
C:\Users\TUBAF\powerplantmatching\powerplantmatching\collection.py in matched_data(config, stored, extend_by_vres, extendby_kwargs, subsume_uncommon_fueltypes, **collection_kwargs)
175 matching_sources = [list(to_dict_if_string(a))[0] for a in
176 config['matching_sources']]
--> 177 matched = collect(matching_sources, **collection_kwargs)
178
179 if isinstance(config['fully_included_sources'], list):
C:\Users\TUBAF\powerplantmatching\powerplantmatching\collection.py in collect(datasets, update, use_saved_aggregation, use_saved_matches, reduced, custom_config, config, **dukeargs)
98
99 if update:
--> 100 dfs = parmap(df_by_name, datasets)
101 matched = combine_multiple_datasets(
102 dfs, datasets, use_saved_matches=use_saved_matches,
C:\Users\TUBAF\powerplantmatching\powerplantmatching\utils.py in parmap(f, arg_list, config)
328 return [x for i, x in sorted(res)]
329 else:
--> 330 return list(map(f, arg_list))
331
332
C:\Users\TUBAF\powerplantmatching\powerplantmatching\collection.py in df_by_name(name)
69
70 df = conf['read_function'](config=config,
---> 71 **conf.get('read_kwargs', {}))
72 if not conf.get('aggregated_units', False):
73 return aggregate_units(df,
C:\Users\TUBAF\powerplantmatching\powerplantmatching\data.py in CARMA(raw, config)
272 'plant': 'Name',
273 'plant.id': 'projectID'})
--> 274 .assign(projectID=lambda df: 'CARMA' + df.projectID.astype(str))
275 .loc[lambda df: df.Country.isin(config['target_countries'])]
276 .replace(dict(Fueltype={'COAL': 'Hard Coal',
C:\Users\TUBAF\Anaconda3\envs\powerplantmatching\lib\site-packages\pandas\core\frame.pyc in assign(self, **kwargs)
3313 results = OrderedDict()
3314 for k, v in kwargs.items():
-> 3315 results[k] = com._apply_if_callable(v, data)
3316
3317 # <= 3.5 and earlier
C:\Users\TUBAF\Anaconda3\envs\powerplantmatching\lib\site-packages\pandas\core\common.pyc in _apply_if_callable(maybe_callable, obj, **kwargs)
406
407 if callable(maybe_callable):
--> 408 return maybe_callable(obj, **kwargs)
409
410 return maybe_callable
C:\Users\TUBAF\powerplantmatching\powerplantmatching\data.py in (df)
272 'plant': 'Name',
273 'plant.id': 'projectID'})
--> 274 .assign(projectID=lambda df: 'CARMA' + df.projectID.astype(str))
275 .loc[lambda df: df.Country.isin(config['target_countries'])]
276 .replace(dict(Fueltype={'COAL': 'Hard Coal',
C:\Users\TUBAF\Anaconda3\envs\powerplantmatching\lib\site-packages\pandas\core\generic.pyc in getattr(self, name)
4374 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4375 return self[name]
-> 4376 return object.getattribute(self, name)
4377
4378 def setattr(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'projectID'
I am inexperienced enough and don't know if fixing parallel_duke_processes has to do with the following error, or this is totally unrelated?
from powerplantmatching.
Hey! Thank you for your report. This error is not related to parallel_duke_process.
The problem occurs in the assignment of a new column when calling the CARMA data. I cannot reproduce the error though.
I can just guess:
Can you please check your pandas version with
pd.__version__
? It might be that your version does not support the function pd.DataFrame.assign with a lambda function correctly.
Another guess:
Your data reference is not correct. Can you check the output of
pd.DataFrame(pm.data.data_config).loc['source_file']
and check if the paths are pointing to the correct files and that these files exist?
from powerplantmatching.
Thank you very much @FabianHofmann!
I checked pandas:
import pandas as pd
pd.version
C:....\powerplantmatching\lib\site-packages\matplotlib\colors.py:298: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
if cbook.is_string_like(arg):
u'0.23.4'
And the otput of the line:
pd.DataFrame(pm.data.data_config).loc['source_file']
BNETZA powerplantmatching..\data\in\Kraftwerksliste_...
CARMA powerplantmatching..\data\in\Full_CARMA_2009_...
ENTSOE powerplantmatching..\data\in\entsoe_powerplan...
ESE powerplantmatching..\data\in\projects.xls
GEO powerplantmatching..\data\in\global_energy_ob...
GPD powerplantmatching..\data\in\global_power_pla...
IWPDCY powerplantmatching..\data\in\IWPDCY.csv
OPSD [powerplantmatching..\data\in\conventional_po...
UBA powerplantmatching..\data\in\kraftwerke-de-ab...
WEPP powerplantmatching..\data\in\platts_wepp.csv
Name: source_file, dtype: object
BNETZA no such file
CARMA Full_CARMA_2009_Dataset_1.csv
ENTSOE entsoe_powerplants.csv
ESE projects.xlsx
GEO global_energy_observatory_power_plants.sqlite
GPD global_power_plant_database.csv
IWPDCY no such file
OPSD conventional_power_plants_DE:csv ; conventional_power_plants_EU.csv
UBA no such file
WEPP no such file
from powerplantmatching.
okay, could you try:
cd
pd.read_csv(pm.data.data_config['CARMA']['source_file'])
it should return a DataFrame with about 50k rows. Does that work?
from powerplantmatching.
from powerplantmatching.
yes, that's the problem. You do not have the full files, only the references. Please check if you have git lfs installed.
In your bash, in the repo, please try
git-lfs pull
or
git-lfs fetch
if nothing is happening, please delete your repo and clone it again (make sure git lfs is installed).
If this is also not working, we have to see how to work it out...
from powerplantmatching.
from powerplantmatching.
Related Issues (20)
- CI: also test windows and macos?
- Combined data license / attribution statement HOT 1
- Add new section to contribution guide
- add streamlit application for capacity per country
- Add data from Global Hydropower Tracker HOT 2
- Matching error when using GEM datasets outside EU. HOT 1
- config_filter function returning NaNs for DateOut HOT 3
- Use latest JRC hydro power plant data HOT 2
- Create powerplants.csv and stats by GH action
- Default to global configuration? HOT 3
- Facing an Import Error HOT 1
- Combine GEM data into one
- Configuration of logging prevented through `logging.basicConfig(...)`
- string of dict is returned when calling powerplants.EIC
- address concatenation future warning:
- Regression caused by 45b79154
- .csv link identified as harmful by google
- Missing Status check on GCPT
- Replace zeroes in StorageCapacity_MWh with NaN
- Issue with reading config
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerplantmatching.