Giter Site home page Giter Site logo

Comments (10)

vogt31337 avatar vogt31337 commented on June 14, 2024

If parallel_duke_processes: false, it works. Parallel processing is somehow broken.

from powerplantmatching.

FabianHofmann avatar FabianHofmann commented on June 14, 2024

Thanks for reporting, it seems that your version of multiprocessing does not reference correctly to subfunctions. For now, I set the default of parallel_duke_processes to false. However, on python3 (conda environment for updated version on master branch) the parallel processing works fine. So if you still want to use it, you can pull the current version of powerplantmatching, update your conda environment and redo the calculcation

from powerplantmatching.

coroa avatar coroa commented on June 14, 2024

Hi everyone,

there is a fundamental difference between how multiprocessing works on linux (ie. by forking) and on windows (ie. starting new processes which reimport the module). With the former your child processes maintain the full state of the interpreter, i.e. variables, functions and so on, while with the latter they start from scratch re-import the module and then pickle the function call and its arguments across. Thus in the latter setting your childs are not able to access any inline functions or variables (you can enable this setting explicitly using mp.set_start_method('spawn') on linux, but it is is the default on windows, refer to the multiprocessing docs).

from powerplantmatching.

KristinaGov avatar KristinaGov commented on June 14, 2024

Hi! Just had the same error. I also run it with Windows 7, installed env with conda -f requirements.yaml.
After setting parallel_duke_processes: false and tying again pm.collection.matched_data() I receive this:

AttributeErrorTraceback (most recent call last)
in ()
----> 1 pm.collection.matched_data()

C:\Users\TUBAF\powerplantmatching\powerplantmatching\collection.py in matched_data(config, stored, extend_by_vres, extendby_kwargs, subsume_uncommon_fueltypes, **collection_kwargs)
175 matching_sources = [list(to_dict_if_string(a))[0] for a in
176 config['matching_sources']]
--> 177 matched = collect(matching_sources, **collection_kwargs)
178
179 if isinstance(config['fully_included_sources'], list):

C:\Users\TUBAF\powerplantmatching\powerplantmatching\collection.py in collect(datasets, update, use_saved_aggregation, use_saved_matches, reduced, custom_config, config, **dukeargs)
98
99 if update:
--> 100 dfs = parmap(df_by_name, datasets)
101 matched = combine_multiple_datasets(
102 dfs, datasets, use_saved_matches=use_saved_matches,

C:\Users\TUBAF\powerplantmatching\powerplantmatching\utils.py in parmap(f, arg_list, config)
328 return [x for i, x in sorted(res)]
329 else:
--> 330 return list(map(f, arg_list))
331
332

C:\Users\TUBAF\powerplantmatching\powerplantmatching\collection.py in df_by_name(name)
69
70 df = conf['read_function'](config=config,
---> 71 **conf.get('read_kwargs', {}))
72 if not conf.get('aggregated_units', False):
73 return aggregate_units(df,

C:\Users\TUBAF\powerplantmatching\powerplantmatching\data.py in CARMA(raw, config)
272 'plant': 'Name',
273 'plant.id': 'projectID'})
--> 274 .assign(projectID=lambda df: 'CARMA' + df.projectID.astype(str))
275 .loc[lambda df: df.Country.isin(config['target_countries'])]
276 .replace(dict(Fueltype={'COAL': 'Hard Coal',

C:\Users\TUBAF\Anaconda3\envs\powerplantmatching\lib\site-packages\pandas\core\frame.pyc in assign(self, **kwargs)
3313 results = OrderedDict()
3314 for k, v in kwargs.items():
-> 3315 results[k] = com._apply_if_callable(v, data)
3316
3317 # <= 3.5 and earlier

C:\Users\TUBAF\Anaconda3\envs\powerplantmatching\lib\site-packages\pandas\core\common.pyc in _apply_if_callable(maybe_callable, obj, **kwargs)
406
407 if callable(maybe_callable):
--> 408 return maybe_callable(obj, **kwargs)
409
410 return maybe_callable

C:\Users\TUBAF\powerplantmatching\powerplantmatching\data.py in (df)
272 'plant': 'Name',
273 'plant.id': 'projectID'})
--> 274 .assign(projectID=lambda df: 'CARMA' + df.projectID.astype(str))
275 .loc[lambda df: df.Country.isin(config['target_countries'])]
276 .replace(dict(Fueltype={'COAL': 'Hard Coal',

C:\Users\TUBAF\Anaconda3\envs\powerplantmatching\lib\site-packages\pandas\core\generic.pyc in getattr(self, name)
4374 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4375 return self[name]
-> 4376 return object.getattribute(self, name)
4377
4378 def setattr(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'projectID'

I am inexperienced enough and don't know if fixing parallel_duke_processes has to do with the following error, or this is totally unrelated?

Untitled.pdf

from powerplantmatching.

FabianHofmann avatar FabianHofmann commented on June 14, 2024

Hey! Thank you for your report. This error is not related to parallel_duke_process.
The problem occurs in the assignment of a new column when calling the CARMA data. I cannot reproduce the error though.

I can just guess:
Can you please check your pandas version with

pd.__version__

? It might be that your version does not support the function pd.DataFrame.assign with a lambda function correctly.

Another guess:
Your data reference is not correct. Can you check the output of

pd.DataFrame(pm.data.data_config).loc['source_file']

and check if the paths are pointing to the correct files and that these files exist?

from powerplantmatching.

KristinaGov avatar KristinaGov commented on June 14, 2024

Thank you very much @FabianHofmann!

I checked pandas:
import pandas as pd
pd.version
C:....\powerplantmatching\lib\site-packages\matplotlib\colors.py:298: MatplotlibDeprecationWarning: The is_string_like function was deprecated in version 2.1.
if cbook.is_string_like(arg):
u'0.23.4'

And the otput of the line:

pd.DataFrame(pm.data.data_config).loc['source_file']
BNETZA powerplantmatching..\data\in\Kraftwerksliste_...
CARMA powerplantmatching..\data\in\Full_CARMA_2009_...
ENTSOE powerplantmatching..\data\in\entsoe_powerplan...
ESE powerplantmatching..\data\in\projects.xls
GEO powerplantmatching..\data\in\global_energy_ob...
GPD powerplantmatching..\data\in\global_power_pla...
IWPDCY powerplantmatching..\data\in\IWPDCY.csv
OPSD [powerplantmatching..\data\in\conventional_po...
UBA powerplantmatching..\data\in\kraftwerke-de-ab...
WEPP powerplantmatching..\data\in\platts_wepp.csv
Name: source_file, dtype: object

BNETZA no such file
CARMA Full_CARMA_2009_Dataset_1.csv
ENTSOE entsoe_powerplants.csv
ESE projects.xlsx
GEO global_energy_observatory_power_plants.sqlite
GPD global_power_plant_database.csv
IWPDCY no such file
OPSD conventional_power_plants_DE:csv ; conventional_power_plants_EU.csv
UBA no such file
WEPP no such file

from powerplantmatching.

FabianHofmann avatar FabianHofmann commented on June 14, 2024

okay, could you try:

cd
pd.read_csv(pm.data.data_config['CARMA']['source_file'])

it should return a DataFrame with about 50k rows. Does that work?

from powerplantmatching.

KristinaGov avatar KristinaGov commented on June 14, 2024

I bet not like it should:
Unbenannt

from powerplantmatching.

FabianHofmann avatar FabianHofmann commented on June 14, 2024

yes, that's the problem. You do not have the full files, only the references. Please check if you have git lfs installed.
In your bash, in the repo, please try

git-lfs pull

or

git-lfs fetch

if nothing is happening, please delete your repo and clone it again (make sure git lfs is installed).
If this is also not working, we have to see how to work it out...

from powerplantmatching.

coroa avatar coroa commented on June 14, 2024

from powerplantmatching.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.