Giter Site home page Giter Site logo

pypsa / powerplantmatching Goto Github PK

View Code? Open in Web Editor NEW
144.0 144.0 49.0 54.73 MB

Set of tools to combine multiple power plant databases

Home Page: https://powerplantmatching.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Python 100.00%

powerplantmatching's Introduction

PyPSA - Python for Power System Analysis

PyPI version Conda version CI CI with micromamba Code coverage Documentation Status License Zenodo Examples of use pre-commit.ci status Code style: black Discord Contributor Covenant Stack Exchange questions

PyPSA stands for "Python for Power System Analysis". It is pronounced "pipes-ah".

PyPSA is an open source toolbox for simulating and optimising modern power and energy systems that include features such as conventional generators with unit commitment, variable wind and solar generation, storage units, coupling to other energy sectors, and mixed alternating and direct current networks. PyPSA is designed to scale well with large networks and long time series.

This project is maintained by the Department of Digital Transformation in Energy Systems at the Technical University of Berlin. Previous versions were developed by the Energy System Modelling group at the Institute for Automation and Applied Informatics at the Karlsruhe Institute of Technology funded by the Helmholtz Association, and by the Renewable Energy Group at FIAS to carry out simulations for the CoNDyNet project, financed by the German Federal Ministry for Education and Research (BMBF) as part of the Stromnetze Research Initiative.

Functionality

PyPSA can calculate:

  • static power flow (using both the full non-linear network equations and the linearised network equations)
  • linear optimal power flow (least-cost optimisation of power plant and storage dispatch within network constraints, using the linear network equations, over several snapshots)
  • security-constrained linear optimal power flow
  • total electricity/energy system least-cost investment optimisation (using linear network equations, over several snapshots and investment periods simultaneously for optimisation of generation and storage dispatch and investment in the capacities of generation, storage, transmission and other infrastructure)

It has models for:

  • meshed multiply-connected AC and DC networks, with controllable converters between AC and DC networks
  • standard types for lines and transformers following the implementation in pandapower
  • conventional dispatchable generators and links with unit commitment
  • generators with time-varying power availability, such as wind and solar generators
  • storage units with efficiency losses
  • simple hydroelectricity with inflow and spillage
  • coupling with other energy carriers (e.g. resistive Power-to-Heat (P2H), Power-to-Gas (P2G), battery electric vehicles (BEVs), Fischer-Tropsch, direct air capture (DAC))
  • basic components out of which more complicated assets can be built, such as Combined Heat and Power (CHP) units and heat pumps.

Documentation

Documentation

Quick start

Examples

Known users of PyPSA

Installation

pip:

pip install pypsa

conda/mamba:

conda install -c conda-forge pypsa

Additionally, install a solver.

Usage

import pypsa

# create a new network
n = pypsa.Network()
n.add("Bus", "mybus")
n.add("Load", "myload", bus="mybus", p_set=100)
n.add("Generator", "mygen", bus="mybus", p_nom=100, marginal_cost=20)

# load an example network
n = pypsa.examples.ac_dc_meshed()

# run the optimisation
n.optimize()

# plot results
n.generators_t.p.plot()
n.plot()

# get statistics
n.statistics()
n.statistics.energy_balance()

There are more extensive examples available as Jupyter notebooks. They are also described in the doc/examples.rst and are available as Python scripts in examples/.

Screenshots

PyPSA-Eur optimising capacities of generation, storage and transmission lines (9% line volume expansion allowed) for a 95% reduction in CO2 emissions in Europe compared to 1990 levels

image

SciGRID model simulating the German power system for 2015.

image

image

Dependencies

PyPSA is written and tested to be compatible with Python 3.7 and above. The last release supporting Python 2.7 was PyPSA 0.15.0.

It leans heavily on the following Python packages:

  • pandas for storing data about components and time series
  • numpy and scipy for calculations, such as linear algebra and sparse matrix calculations
  • networkx for some network calculations
  • matplotlib for static plotting
  • linpy for preparing optimisation problems (currently only linear and mixed integer linear optimisation)
  • cartopy for plotting the baselayer map
  • pytest for unit testing
  • logging for managing messages

The optimisation uses interface libraries like linopy which are independent of the preferred solver. You can use e.g. one of the free solvers HiGHS, GLPK and CLP/CBC or the commercial solver Gurobi for which free academic licenses are available.

Documentation

Please check the documentation.

Contributing and Support

We strongly welcome anyone interested in contributing to this project. If you have any ideas, suggestions or encounter problems, feel invited to file issues or make pull requests on GitHub.

  • In case of code-related questions, please post on stack overflow.
  • For non-programming related and more general questions please refer to the mailing list.
  • To discuss with other PyPSA users, organise projects, share news, and get in touch with the community you can use the discord server.
  • For bugs and feature requests, please use the PyPSA Github Issues page.
  • For troubleshooting, please check the troubleshooting in the documentation.

Code of Conduct

Please respect our code of conduct.

Citing PyPSA

If you use PyPSA for your research, we would appreciate it if you would cite the following paper:

Please use the following BibTeX:

@article{PyPSA,
   author = {T. Brown and J. H\"orsch and D. Schlachtberger},
   title = {{PyPSA: Python for Power System Analysis}},
   journal = {Journal of Open Research Software},
   volume = {6},
   issue = {1},
   number = {4},
   year = {2018},
   eprint = {1707.09913},
   url = {https://doi.org/10.5334/jors.188},
   doi = {10.5334/jors.188}
}

If you want to cite a specific PyPSA version, each release of PyPSA is stored on Zenodo with a release-specific DOI. The release-specific DOIs can be found linked from the overall PyPSA Zenodo DOI for Version 0.17.1 and onwards:

image

or from the overall PyPSA Zenodo DOI for Versions up to 0.17.0:

image

Licence

Copyright 2015-2024 PyPSA Developers

PyPSA is licensed under the open source MIT License.

powerplantmatching's People

Contributors

coroa avatar davide-f avatar energyls avatar eugenio2192 avatar euronion avatar fabianhofmann avatar febinka avatar fgotzens avatar fneum avatar irieo avatar jensch-dlr avatar martacki avatar martinhjel avatar pre-commit-ci[bot] avatar pz-max avatar rbaard1 avatar tomkourou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

powerplantmatching's Issues

countrycode issue in data.py

Hello,

I found a code part in data.py which caused an error by running the code.
So this part
opsd.Country = countrycode(codes=opsd.Country.tolist(), target='country_name', origin='iso2c')
in data.py in line 90 to 91 caused the following error:
TypeError: 'module' object is not callable

The solution for me was to change it to:
opsd.Country = countrycode.countrycode(codes=opsd.Country.tolist(), target='country_name', origin='iso2c')

I used Python 3.5 via Spyder

Best regards

Return unit level aggregation instead of power plant level

I was wondering is there a way of returning unit level aggregation of the power plants without the vertical aggregation? e.g instead of returning Belleville, Nuclear, 2620 MW there would be two rows , Belleville 1 , Nuclear , 1310 MW and Belleville 2 , Nuclear , 1310 MW

ESE database not found

Hi,

When trying to compile for the first time, the ESE dataset is missing and both links suggested are invalid. Anyone can update the links please?

Aggregate_units() does not work

Hello,

I've been trying to run the "Example of use.ipynb" and everything works fine except for aggregate_units(), which leads to the error message below.

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-13-a18a659aafe9> in <module>
----> 1 dfs = [geo.powerplant.aggregate_units(), entsoe.powerplant.aggregate_units()]
      2 intersection = pm.matching.combine_multiple_datasets(dfs)

~\Anaconda3\lib\site-packages\powerplantmatching\cleaning.py in aggregate_units(df, dataset_name, pre_clean_name, save_aggregation, country_wise, use_saved_aggregation, config)
    339         if country_wise:
    340             duplicates = pd.concat([duke(df.query('Country == @c'))
--> 341                                     for c in df.Country.unique()])
    342         else:
    343             duplicates = duke(df)

~\Anaconda3\lib\site-packages\powerplantmatching\cleaning.py in <listcomp>(.0)
    339         if country_wise:
    340             duplicates = pd.concat([duke(df.query('Country == @c'))
--> 341                                     for c in df.Country.unique()])
    342         else:
    343             duplicates = duke(df)

~\Anaconda3\lib\site-packages\powerplantmatching\duke.py in duke(datasets, labels, singlematch, showmatches, keepfiles, showoutput)
    109 
    110         run = sub.Popen(args, stderr=sub.PIPE, cwd=tmpdir, stdout=stdout,
--> 111                         universal_newlines=True)
    112         _, stderr = run.communicate()
    113 

~\Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    773                                 c2pread, c2pwrite,
    774                                 errread, errwrite,
--> 775                                 restore_signals, start_new_session)
    776         except:
    777             # Cleanup if the child failed starting.

~\Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
   1176                                          env,
   1177                                          os.fspath(cwd) if cwd is not None else None,
-> 1178                                          startupinfo)
   1179             finally:
   1180                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] The system cannot find the file specified

PHS capacity

Hi,

With regard to pumped-hydro storage (PHS), the newest version of powerplantmatching shows an existing energy storage capacity of 4.3 TWh (Europe-aggregate, assuming 6-hours duration for plants that do not have duration specified). In the earlier version 0.4.8, this number was a bit higher (10 TWh). As comparison, Geth et al. (2015) showed a PHS energy capacity of 1.3 TWh (including Norway and Switzerland) using 2012-numbers. PHS power capacity has increased from roughly 50 to 55 GW from 2014 to 2020 (iha, 2015, 2021), so energy storage capacity most likely is increased as well. But is it fair to say that energy storage capacity has been almost quadrupled since 2012 (from 1.3 TWh to 4.3 TWh)? Or how can we interpret this discrepancy?

Looking forward to hearing from you.

Sources:
Geth et al., 2015, https://doi.org/10.1016/j.rser.2015.07.145
iha, 2015, https://www.aler-renovaveis.org/contents/lerpublication/iha_2015_sept_hydropower-status-report.pdf
iha, 2021, https://assets-global.website-files.com/5f749e4b9399c80b5e421384/60c37321987070812596e26a_IHA20212405-status-report-02_LR.pdf

Undocumented dependency (java)

PPM requires java for duke. Java is not documented as a dependency.

For conda useres, openjdk is available in conda-forge (conda install -c conda-forge openjdk). Not sure about options for pip installation.

The dependecy should at least be documented somewhere.

Ping pypsa-meets-earth/pypsa-earth#279

Add datasets to custom configuration

Hi,

How can I add additional datasets to a custom configuration of the yaml? And where should I save the modified file for the changes to be applied?

Downloading collection matched data I get 'TypeError'

When I attempt to download the processed power plant data I get a TypeError.
I have verified my pandas version and it is 0.23.4, the dependency for the powerplantmatching of pandas is 0.23.0 or greater. So it should be fine...

Code:

`import powerplantmatching as pm
import pandas as pd

all_data = pm.collection.matched_data()`

Error:

`
File "C:\Users\sab\AppData\Local\Continuum\anaconda3\lib\site-packages\powerplantmatching\cleaning.py", line 290, in mode
return x.mode(dropna=False).at[0]

TypeError: mode() got an unexpected keyword argument 'dropna'
`

Problem with some power plants start year (DateIn)

Hi there!

I've been running some simulations with the latest version of pypsa-eur-sec, but every-time I tried to do a myopic run, I got the following error:

Traceback (most recent call last):
File "/home/parisr/projects/pypsa-eur-sec/.snakemake/scripts/tmpy5ti4rwe.add_existing_baseyear.py", line 491, in
add_power_capacities_installed_before_baseyear(n, grouping_years, costs, baseyear)
File "/home/parisr/projects/pypsa-eur-sec/.snakemake/scripts/tmpy5ti4rwe.add_existing_baseyear.py", line 177, in add_power_capacities_installed_before_baseyear
df_agg["grouping_year"] = np.take(
File "<array_function internals>", line 180, in take
File "/home/parisr/miniconda3/envs/pypsa-eur/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 190, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/home/parisr/miniconda3/envs/pypsa-eur/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 54, in _wrapfunc
return _wrapit(obj, method, *args, **kwds)
File "/home/parisr/miniconda3/envs/pypsa-eur/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 43, in _wrapit
result = getattr(asarray(obj), method)(*args, **kwds)
IndexError: index 9 is out of bounds for axis 0 with size 9

After some debugging, I realised there were some power plants in powerplants.csv that had their starting year after the last year in my grouping years (grouping_years: [1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2019]). I changed my end year to 2022 instead of 2019, which solved the issue for some power plants, but then I noticed that there are currently 5 nuclear plants with a starting year of 2027, which seemed incorrect after searching their names in google. I managed to solve my problem by changing the starting years to 1983, 1996, etc. I just wanted to know if this was intentional or is it a mistake. Here are the links for the 5 power plants :

https://github.com/FRESNA/powerplantmatching/blob/b0e5a05773b88d40e99f73fd28606cdc6ea3b240/powerplants.csv#L86
https://github.com/FRESNA/powerplantmatching/blob/b0e5a05773b88d40e99f73fd28606cdc6ea3b240/powerplants.csv#L96
https://github.com/FRESNA/powerplantmatching/blob/b0e5a05773b88d40e99f73fd28606cdc6ea3b240/powerplants.csv#L401
https://github.com/FRESNA/powerplantmatching/blob/b0e5a05773b88d40e99f73fd28606cdc6ea3b240/powerplants.csv#L512
https://github.com/FRESNA/powerplantmatching/blob/b0e5a05773b88d40e99f73fd28606cdc6ea3b240/powerplants.csv#L559

Thanks :)

Add options for automated sub-national aggregation

It would be cool to have an automated option for aggregating power plants into sub-national clusters within each country, based on standard sub-national units.

For instance, it would be really nice if the user could choose, as an option, the level of spatial aggregation, e.g.:

  • NUTS2
  • NUTS3
  • GADM

So, instead of having as an output just the aggregate capacity of each European country, one could have the aggregate capacity of each sub-national region of interest. This would facilitate a lot the functional coupling of the project to any power system model.

heuristics.py:58 - IndexingError

Hey guys,

while running the updated code I got the following Warning/Error:

powerplantmatching/heuristics.py:58: UserWarning: Boolean Series key will be reindexed to match DataFrame index.extend_by = extend_by[extend_by_b]

File"anaconda/envs/py35/lib/python3.5/sitepackages/pandas/core/indexing.py",line
1817, in check_bool_indexer raise IndexingError ('Unalignable boolean Series key provided')
IndexingError: Unalignable boolean Series key provided

I tried to run the code with Python 2.7 as well as 3.5. I attached a screenshot of the error.

Best regards
Simon

screen shot 2017-04-20 at 13 37 02
screen shot 2017-04-20 at 13 36 20

Beware LIO Code = 47 not always Ireland (the Country)

tbh, not entirely sure whether I'd class this as a 'bug', but definitely a health warning on how the data is structured as related to Northern Ireland.

This is not Ireland (the Country), of course, but Ireland the island which is what Eiregrid covers. Just to note some of the 47 plants are in the UK, which I'm not sure is really captured in the data structures that is in place, unless I'm missing something.

Set not correctly assigned

The Set column sometimes not correctly assigned. There are some pumped hydro store declared as PP, all run-of -river as Store.

import powerplantmatching as pm
df = pm.powerplants(from_url=True)
df.groupby("Set").Technology.value_counts()

pm.powerplants(reduced=False) not working

Hello,

I'm very much new to python and programming in general. I tried to use the powerplantmatching for my research and installed it as described with "pip install powerplantmatching entsoe-py --no-deps" and
"conda install pandas networkx pycountry xlrd seaborn pyyaml requests matplotlib geopy beautifulsoup4 cartopy". But for some reason it doesn't seem to work properly. I'm using MacOS and program with python on Jupiter Notebook.
Now if I use "pm.powerplants(from_url=True)" it works and shows me the database,
but if I use "pm.powerplants(reduced=False)" it shows me this error:
"FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/z2/9bwgw7_x5pl5wwx_405s91t00000gn/T/tmpka0l6pr6/linkfile.txt' ".

I hope you can help me with my issue.

can't update / load inital

Hi,

When trying:

import powerplantmatching as pm
pm.collection.matched_data()

Running Windows 7, installed env with conda -f requirements.yaml

I get:

INFO:powerplantmatching.collection:Collect combined dataset for CARMA, ENTSOE, ESE, GEO, GPD, IWPDCY, OPSD
INFO:powerplantmatching.utils:Run process with 2 parallel threads.
Traceback (most recent call last):
  File "C:\Anaconda3\envs\ppm\lib\runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "C:\Anaconda3\envs\ppm\lib\runpy.py", line 102, in _get_module_details
    loader = get_loader(mod_name)
  File "C:\Anaconda3\envs\ppm\lib\pkgutil.py", line 462, in get_loader
    return find_loader(fullname)
  File "C:\Anaconda3\envs\ppm\lib\pkgutil.py", line 472, in find_loader
    for importer in iter_importers(fullname):
  File "C:\Anaconda3\envs\ppm\lib\pkgutil.py", line 428, in iter_importers
    __import__(pkg)
  File "update_all_matches.py", line 13, in <module>
    update=True, use_saved_aggregation=True, use_saved_matches=False)
  File "powerplantmatching\collection.py", line 177, in matched_data
    matched = collect(matching_sources, **collection_kwargs)
  File "powerplantmatching\collection.py", line 100, in collect
    dfs = parmap(df_by_name, datasets)
  File "powerplantmatching\utils.py", line 320, in parmap
    p.start()
  File "C:\Anaconda3\envs\ppm\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda3\envs\ppm\lib\multiprocessing\forking.py", line 277, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Anaconda3\envs\ppm\lib\multiprocessing\forking.py", line 199, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 554, in save_tuple
    save(element)
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Anaconda3\envs\ppm\lib\pickle.py", line 754, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <function df_by_name at 0x000000000B6289E8>: it's not found as powerplantmatching.collection.df_by_name

conda list:

asn1crypto                0.24.0                py27_1003    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.5                        py_1    conda-forge
backports.shutil_get_terminal_size 1.0.0                      py_3    conda-fo
backports_abc             0.5                        py_1    conda-forge
basemap                   1.1.0                    py27_2    conda-forge
blas                      1.0                         mkl
ca-certificates           2018.10.15           ha4d7672_0    conda-forge
certifi                   2018.10.15            py27_1000    conda-forge
cffi                      1.11.5          py27h0c8e037_1001    conda-forge
chardet                   3.0.4                 py27_1003    conda-forge
colorama                  0.4.0                      py_0    conda-forge
cryptography              2.3.1            py27hc64555f_0    conda-forge
cryptography-vectors      2.3.1                 py27_1000    conda-forge
cycler                    0.10.0                     py_1    conda-forge
decorator                 4.3.0                      py_0    conda-forge
enum34                    1.1.6                 py27_1001    conda-forge
freetype                  2.9.1             hf819b56_1004    conda-forge
functools32               3.2.3.2                    py_3    conda-forge
futures                   3.2.0                 py27_1000    conda-forge
geographiclib             1.49                       py_0    conda-forge
geopy                     1.17.0                     py_0    conda-forge
geos                      3.5.1                     vc9_1  [vc9]  conda-forge
icc_rt                    2017.0.4             h97af966_0
icu                       58.2                      vc9_0  [vc9]  conda-forge
idna                      2.7                   py27_1002    conda-forge
intel-openmp              2019.0                      118
ipaddress                 1.0.22                     py_1    conda-forge
ipython                   5.8.0                    py27_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jpeg                      9b                        vc9_2  [vc9]  conda-forge
kiwisolver                1.0.1           py27hdc96acc_1002    conda-forge
libpng                    1.6.34                    vc9_0  [vc9]  conda-forge
matplotlib                2.2.3            py27h7423b85_0    conda-forge
mkl                       2019.0                      118
mkl_fft                   1.0.6                    py27_0    conda-forge
networkx                  2.2                        py_1    conda-forge
numpy                     1.15.4           py27hbe4291b_0
numpy-base                1.15.4           py27h2753ae9_0
openssl                   1.0.2o                    vc9_0  [vc9]  conda-forge
pandas                    0.23.4          py27h39f3610_1000    conda-forge
pathlib2                  2.3.2                 py27_1000    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pickleshare               0.7.5                 py27_1000    conda-forge
pip                       18.1                  py27_1000    conda-forge
powerplantmatching        0.10.0                    <pip>
prompt_toolkit            1.0.15                     py_1    conda-forge
pycountry                 18.5.26                   <pip>
pycparser                 2.19                       py_0    conda-forge
pygments                  2.2.0                      py_1    conda-forge
pyopenssl                 18.0.0                py27_1000    conda-forge
pyparsing                 2.3.0                      py_0    conda-forge
pyproj                    1.9.4                    py27_0    conda-forge
pyqt                      5.6.0            py27h4cbc711_7    conda-forge
pyshp                     2.0.0                      py_0    conda-forge
pysocks                   1.6.8                 py27_1002    conda-forge
python                    2.7.15            h2880e7c_1004    conda-forge
python-dateutil           2.7.5                      py_0    conda-forge
pytz                      2018.7                     py_0    conda-forge
pyyaml                    3.13            py27h0c8e037_1001    conda-forge
qt                        5.6.2                     vc9_1  [vc9]  conda-forge
requests                  2.20.1                py27_1000    conda-forge
scandir                   1.9.0           py27h0c8e037_1000    conda-forge
scipy                     1.1.0            py27h2df7626_1
seaborn                   0.9.0                      py_0    conda-forge
setuptools                40.5.0                   py27_0    conda-forge
simplegeneric             0.8.1                      py_1    conda-forge
singledispatch            3.4.0.3               py27_1000    conda-forge
sip                       4.18.1           py27hc56fc5f_0    conda-forge
six                       1.11.0                py27_1001    conda-forge
statsmodels               0.9.0           py27h0c8e037_1000    conda-forge
tornado                   5.1.1           py27h0c8e037_1000    conda-forge
traitlets                 4.3.2                 py27_1000    conda-forge
urllib3                   1.23                  py27_1001    conda-forge
vc                        9                             0    conda-forge
vs2008_runtime            9.0.30729.6161                0    conda-forge
wcwidth                   0.1.7                      py_1    conda-forge
wheel                     0.32.2                   py27_0    conda-forge
win_inet_pton             1.0.1                    py27_2    conda-forge
win_unicode_console       0.5                   py27_1000    conda-forge
wincertstore              0.2                   py27_1002    conda-forge
xlrd                      1.1.0                      py_2    conda-forge
yaml                      0.1.7                     vc9_0  [vc9]  conda-forge
zlib                      1.2.11                    vc9_0  [vc9]  conda-forge

config.yaml:

# io config
entsoe_token: XXX
google_api_key: 
opsd_vres_base_year: 2016

#matching config
# add a pandas query statement after the source name to filter the sources individually, e.g. - Carma: Fueltype=='Natural Gas'
# see http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method for info about query syntax
matching_sources:
    - CARMA
    - ENTSOE
    - ESE
    - GEO
    - GPD
    - IWPDCY
    - OPSD
    # - WEPP

fully_included_sources:
    - OPSD
    - ESE: Country not in ['Switzerland']
    - ENTSOE: Country not in ['Spain', 'Switzerland']
    - CARMA: Country == 'France' and Fueltype == 'Hydro'
    # - IWPDCY: lat == lat and Country == 'Sweden'

parallel_duke_processes: true
process_limit: 2

#data config
display_net_caps: true
remove_missing_coords: true
target_columns:
    - Name
    - Fueltype
    - Technology
    - Set
    - Country
    - Capacity
#    - Efficiency
    - Duration
    - YearCommissioned
    - Retrofit
    - lat
    - lon
    - File
    - projectID
target_countries:
#    - Austria
#    - Belgium
#    - Bulgaria
#    - Croatia
#    - Czech Republic
#    - Denmark
#    - Estonia
#    - Finland
#    - France
    - Germany
#    - Greece
#    - Hungary
#    - Ireland
#    - Italy
#    - Latvia
#    - Lithuania
#    - Luxembourg
#    - Netherlands
#    - Norway
#    - Poland
#    - Portugal
#    - Romania
#    - Slovakia
#    - Slovenia
#    - Spain
#    - Sweden
#    - Switzerland
#    - United Kingdom
target_fueltypes:
    - Bioenergy
    - Geothermal
    - Hard Coal
    - Hydro
    - Lignite
    - Natural Gas
    - Nuclear
    - Oil
    - Other
    - Solar
    - Waste
    - Wind
target_sets:
    - CHP
    - PP
    - Stores
target_technologies:
    - CCGT
    - OCGT
    - Steam Turbine
    - Combustion Engine
    - Run-Of-River
    - Pumped Storage
    - Reservoir
    - Marine
    - Onshore
    - Offshore
    - PV
    - CSP

# Allowed countries for matches of only CARMA and GEO
CARMA_GEO_countries:
#    -  Austria
#    -  Belgium
#    -  Bulgaria
#    -  Croatia
#    -  Denmark
#    -  Estonia
#    -  France
    -  Germany
#    -  Ireland
#    -  Italy
#    -  Luxembourg
#    -  Netherlands
#    -  Romania
#    -  Slovakia
#    -  Sweden
    
# heuristic config
fuel_to_lifetime:
    Bioenergy: 20
    Geothermal: 15
    Hard Coal: 45
    Hydro: 100
    Lignite: 45
    Natural Gas: 40
    Nuclear: 50
    Oil: 40
    Other: 5
    Solar: 25
    Waste: 25
    Wind: 25

# plotting config
fuel_to_color:
    OCGT: darkorange
    Hydro: royalblue
    Run-of-river: navy
    Ror: navy
    Lignite: orangered
    Nuclear: yellow
    Solar: gold
    Windoff: cornflowerblue
    Windon: steelblue
    Offshore: cornflowerblue
    Onshore: steelblue
    Wind: steelblue
    Bioenergy: g
    Natural Gas: firebrick
    CCGT: firebrick
    Coal: k
    Hard Coal: dimgray
    Oil: darkgreen
    Other: silver
    Waste: grey
    Geothermal: orange
    Battery: purple
    Hydrogen Storage: teal
    Electro-mechanical: teal
    Total: gold

TypeError in duke.py

Hello,

I mentioned a TypeError in duke.py by running the code:
TypeError: a bytes-like object is required, not 'str'

in line 99:
if 'Error' in stderr: raise RuntimeError("duke failed: {}".format(stderr))

I could fix it by setting 'Error' as a byte instead of a string:
if b'Error' in stderr: raise RuntimeError("duke failed: {}".format(stderr))

Best regards

Any untrapped NaN values in Technology cast to string "nan", then treated as matches by Duke

I am not sure how confident I am that this is a 'bug' in the original code / combination of datasets, but I think worth reporting as at least a potential weakness. The line that creates an issue, a potential one at least is:

.assign(**{col: df[col].astype(str) for col in ['Name', 'Country', 'Fueltype', 'Technology', 'Set', 'File']

which occurs in the aggregate_units function in Cleaning module. If "nan" text saved out to input.csv that then is used by Duke, these values are seen as a match, thereby affecting the outcome probability score inappropriately. Duke seems to want to see empty string/cells saved to CSV in order to (properly) ignore values as being 'missing'.

There is a coding style question as well as to whether type-enforcement should be confirmed earlier in the cleaning processing and/or whether this should be embededded as a fixed list of column names, particularly as the xml editing suggests that column names are able to be user-defined.

Adding Open Street Map (OSM) as a source

Some non EU countries are harder to find consistent data for. Could OSM be added for these countries as a source? Not all generators have names, but some do have names and capacities.

A quick example for Turkey with the Overpass-turbo API: https://overpass-turbo.eu/#

/*
This has been generated by the overpass-turbo wizard.
The original search was:
“power=plant in Turkey”
*/
[out:json][timeout:110];
// fetch area “Turkey” to search in
{{geocodeArea:Turkey}}->.searchArea;
// gather results
(
  // query part for: “power=plant”
  node["power"="plant"](area.searchArea);
  way["power"="plant"](area.searchArea);
  relation["power"="plant"](area.searchArea);
  node["power"="generator"](area.searchArea);
  way["power"="generator"](area.searchArea);
  relation["power"="generator"](area.searchArea);
);
// print results
out body;
>;
out skel qt;

image

image

The system cannot find the file specified, when aggregating

Hello everyone!
I am new using the package, and I still feel a bit lost. I am trying to do the example that is in the repository (example.ipynb), but I'm facing an error when I run the line that puts the data sets on the same level of aggregation.

dfs = [geo.powerplant.aggregate_units(), opsd.powerplant.aggregate_units()]

I think it might be due to my configuration, that maybe I'm lacking some library, because originally I thought I was running the script in a wrong environment or folder, but I checked and all that is fine, so I don't know what might be causing the error.
This is the error I'm facing:

INFO:powerplantmatching.cleaning:Aggregating blocks in data source 'GEO'.
ERROR:powerplantmatching.duke:Java was not found on your system.

FileNotFoundError Traceback (most recent call last)
File ~.conda\envs\ThesisCarlos\lib\site-packages\powerplantmatching\duke.py:130, in duke(datasets, labels, singlematch, showmatches, keepfiles, showoutput)
129 try:
--> 130 run = sub.Popen(
131 args,
132 stderr=sub.PIPE,
133 cwd=tmpdir,
134 stdout=stdout,
135 universal_newlines=True,
136 )
137 except FileNotFoundError:

File ~.conda\envs\ThesisCarlos\lib\subprocess.py:858, in Popen.init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
855 self.stderr = io.TextIOWrapper(self.stderr,
856 encoding=encoding, errors=errors)
--> 858 self._execute_child(args, executable, preexec_fn, close_fds,
859 pass_fds, cwd, env,
860 startupinfo, creationflags, shell,
861 p2cread, p2cwrite,
862 c2pread, c2pwrite,
863 errread, errwrite,
864 restore_signals, start_new_session)
865 except:
866 # Cleanup if the child failed starting.

File ~.conda\envs\ThesisCarlos\lib\subprocess.py:1311, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
1310 try:
-> 1311 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
1312 # no special security
1313 None, None,
1314 int(not close_fds),
1315 creationflags,
1316 env,
1317 cwd,
1318 startupinfo)
1319 finally:
1320 # Child is launched. Close the parent's copy of those pipe
1321 # handles that only the child should have open. You need
(...)
1324 # pipe will not close when the child process exits and the
1325 # ReadFile will hang.

FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

FileNotFoundError Traceback (most recent call last)
c:\Users\c.perez.villanueva\Documents\Thesis\Codes\Mergingdata.py in <cell line: 1>()
----> 15 dfs = [geo.powerplant.aggregate_units(), opsd.powerplant.aggregate_units()]
16 '''dfs = [geo.powerplant.aggregate_units(), beyondcoal.powerplant.aggregate_units(), opsd.powerplant.aggregate_units(), gpd.powerplant.aggregate_units()]'''

File ~.conda\envs\ThesisCarlos\lib\site-packages\powerplantmatching\cleaning.py:447, in aggregate_units(df, dataset_name, pre_clean_name, country_wise, config, **kwargs)
445 if country_wise:
446 countries = df.Country.unique()
--> 447 duplicates = pd.concat([duke(df.query("Country == @c")) for c in countries])
448 else:
449 duplicates = duke(df)

File ~.conda\envs\ThesisCarlos\lib\site-packages\powerplantmatching\cleaning.py:447, in (.0)
445 if country_wise:
446 countries = df.Country.unique()
--> 447 duplicates = pd.concat([duke(df.query("Country == @c")) for c in countries])
448 else:
449 duplicates = duke(df)

File ~.conda\envs\ThesisCarlos\lib\site-packages\powerplantmatching\duke.py:140, in duke(datasets, labels, singlematch, showmatches, keepfiles, showoutput)
138 err = "Java was not found on your system."
139 logger.error(err)
--> 140 raise FileNotFoundError(err)
142 _, stderr = run.communicate()
144 if showmatches:

FileNotFoundError: Java was not found on your system.

Thank you in advance for your help

Issue with reloading and rematching

Setting from_url=False (default) apparently does not load again all the necessary data files.
Adding update_all= True doesn't work either.
How to rerun the whole scripting after adapting the manual config file?

License of the pictures

Hi,

thank you for sharing your powerplantmatching tool under an open license!
I was wondering, whether your pictures also get an open license to use them for talks and so on. In particular, I am interested in the comparison picture of #1 (comment).

Thank you!

How do I know the integration of a new dataset was successful

Hi,

I followed your instructions Integrating a new dataset:

In the last step I got this:

In[9]
pm.data.JRC_OPEN_UNITS()
INFO:numexpr.utils:NumExpr defaulting to 4 threads.

Out[9]
Empty DataFrame
Columns: [Name, Fueltype, Technology, Set, Country, Capacity, Efficiency, Duration, 
Volume_Mm3, DamHeight_m, YearCommissioned, Retrofit, lat, lon, projectID]
Index: []`

Before updating, is there a way to check what I have done? With an empty Index, i'm not sure what I have done is correct.


It might be useful to include the function I added in the data.py:

def JRC_OPEN_UNITS(raw=False, config=None, update=False):

config = get_config() if config is None else config

df = parse_if_not_stored('JRC_OPEN_UNITS', update, config=config)
if raw:
    return df
return (df.rename(columns={'id': 'projectID',
                          'name_p': 'Name',
                          'capacity_p': 'Capacity',
                          'country': 'Country',
                          'type_g': 'Fueltype',
                          'lat': 'lat',
                          'lon': 'lon',
                          'year_commissioned':'YearCommissioned'})
        .loc[lambda df: df.Country.isin(config['target_countries'])]
        .replace(dict(Fueltype={'Fossil Gas': 'Natural Gas',
                                'Fossil Coal-derived gas': 'Natural Gas',
                                'Hydro Run-of-river and poundage': 'Hydro',
                                'Hydro Pumped Storage': 'Hydro',
                                'Hydro Water Reservoir': 'Hydro',
                                'Fossil Hard coal': 'Hard Coal',
                                'Fossil Hard Coal': 'Hard Coal',
                                'Fossil Brown coal/Lignite': 'Lignite',
                                'Fossil Oil': 'Oil',
                                'Fossil Oil shale': 'Oil',
                                'Wind Offshore': 'Wind',
                                'Wind Onshore': 'Wind',
                                'Biomass': 'Bioenergy',
                                'Fossil Peat': 'Bioenergy',
                                'Marine': 'Other'
                                }))
        .drop(columns = ['name_g', 'capacity_g', 'NUTS2', 'status_g', 'year_decommissioned'])
        .powerplant.convert_alpha2_to_country()
        .pipe(set_column_name, 'JRC_OPEN_UNITS')
        .pipe(config_filter))

WEPP Integration is outdated

I tried adding the WEPP dataset but it seems that the data function is using what seems an old method; it is referencing a config that is not there "config['WEPP']['source_file']". I solved it locally by implementing the "parse_if_not_stored" function but I don´t know if that´s ideal.
I had to use custom kwargs because when I exported the dataset in my computer it used european locale so I had to specify the separation symbols etc. Maybe include this as part of the config options? I have an example from my own project (kudos because I use yours partially) , look at the InputConfig field:

https://gitlab.com/dlr-ve/autumn/-/blob/master/autumn/config.yaml

CSV Files missing

installed the package but running any of the functions fails because of missing csv files. How to get hold of the required csvs would be very much appreciated.

Append BNetzA-MaStR renewables to power plants

I would like to add the wind and solar power plants from the BNetzA-MaStR to the matched power plant dataset (i.e. no matching of solar and wind, just concatenated at the end). I assume I need to create a function in data.py that translates the data to the common format? I read there's already the option to add wind and solar from OPSD. Where can I find this? I could draw some inspiration from that.

TypeError in extended_by_non_matched

I am running the module from a custom config file. Basically, I am selecting AT as target country and {Natural Gas, Hydro, Wind} as target fuel types. Main point, I am trying to obtain subsets of the already existing database.

I am importing the config file as:
config = pm.get_config(path_to_config)

And I call the powerplants module as:
plants = pm.powerplants(config, from_url=False, stored=False)

The following error pops up, with the printed (and trimmed to 5 elements only) lists being included_ids defined in the extended_by_non_matched function:

INFO:powerplantmatching.collection:Collect combined dataset for ENTSOE, GEO, GPD, JRC, OPSD
['OEU5814', 'OEU5717', 'OEU5816', 'OEU5815', 'OEU5809']
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'OPSD'.
['H284', 'H242', 'H174', 'H225', 'H194']
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'JRC'.
['14W-BFR-KW-----A', '14W-TZG-KW-----H', '14W-BYP-KW-----6', '14W-BWM-KW-----L', '14W-TZR-KW-----I']
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'ENTSOE'.
0

TypeError: only list-like objects are allowed to be passed to isin(), you passed a [int]

As can be seen, there is an int popping up in (what I think is) the aggregation of the matched database with extra elements from one of the reliable sources. Could this be caused by my approach in getting customized results?

Improve final dataset creation

In the last version there were duplicated entries in the final dataset. These likely come from geo-positions which were not correctly parsed. A better approach would be to disable the fill_geositions in the OPSD and ENTSOE dataset, and enable teh geoparsing for the matched dataset to fill up coordinates.

AttributeError: 'DataFrame' object has no attribute 'Name

Error when trying to load any data. (using e.g. pm.data.ENTSOE() or pm.powerplants() )

Example:

import powerplantmatching as pm
entsoe = pm.data.ENTSOE()

AttributeError Traceback (most recent call last)
in
----> 1 entsoe = pm.data.ENTSOE()

/media/chadhat/4d105adc-3356-4a16-9761-ee0dcd7f23dc/Work/Networks/database-matching/env/lib/python3.8/site-packages/powerplantmatching/data.py in ENTSOE(raw, update, config, entsoe_token, **fill_geoposition_kwargs)
721
722 return (
--> 723 df.rename_axis(index="projectID")
724 .reset_index()
725 .rename(columns=RENAME_COLUMNS)

/media/chadhat/4d105adc-3356-4a16-9761-ee0dcd7f23dc/Work/Networks/database-matching/env/lib/python3.8/site-packages/pandas/core/frame.py in assign(self, **kwargs)
4484
4485 for k, v in kwargs.items():
-> 4486 data[k] = com.apply_if_callable(v, data)
4487 return data
4488

/media/chadhat/4d105adc-3356-4a16-9761-ee0dcd7f23dc/Work/Networks/database-matching/env/lib/python3.8/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs)
356 """
357 if callable(maybe_callable):
--> 358 return maybe_callable(obj, **kwargs)
359
360 return maybe_callable

/media/chadhat/4d105adc-3356-4a16-9761-ee0dcd7f23dc/Work/Networks/database-matching/env/lib/python3.8/site-packages/powerplantmatching/data.py in (df)
726 .drop_duplicates("projectID")
727 .assign(
--> 728 Name=lambda df: df.Name.str.replace("_", " "), # for geoparsing
729 EIC=lambda df: df.projectID,
730 Country=lambda df: df.projectID.str[:2].map(COUNTRY_MAP),

/media/chadhat/4d105adc-3356-4a16-9761-ee0dcd7f23dc/Work/Networks/database-matching/env/lib/python3.8/site-packages/pandas/core/generic.py in getattr(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.getattribute(self, name)
5488
5489 def setattr(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'Name'

Custom config path is never accessed

According to the instructions in the README to add a custom dataset the package looks for custom configs in the directory returned by:

pm.core.package_config['custom_config']

However, the get_config function never access this directory, it goes instead into "repo_data_dir".

Line 28 of he core.py:

    package_config = _package_data('config.yaml')
    custom_config = filename if filename else _package_data('custom.yaml')

I placed my custom config there, but I don´t think that´s intented to be like that.

Possible bug: where is Wind in Italy?

Hi,

I noticed that, for some reason, your data show 0 wind power plants to exist in Italy, whereas Italy has about 10 GW of Wind power installed as of yet. Where is this wind gone? It seems strange, considering the variety of databases you use, that this issue occurs.

Error in heuristics.py

Hey guys,

I spotted an issue caused by code in "heuristics.py" and "collection.py"

File "powerplantmatching/collection.py", line 117, in MATCHED_dataset
    matched = extend_by_non_matched(matched, OPSD(), 'OPSD', clean_added_data=True)

  File "powerplantmatching/heuristics.py", line 52, in extend_by_non_matched
    not_included = pd.DataFrame(extend_by.projectID.tolist())

  File "anaconda/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py", line 2744, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'projectID'

Best regards

Issue with aggregate_units()

Hi!
I am trying to run the 'example of use.ipynb' file and run into the error message shown below (local paths removed.)

thanks!
Samarth

c:\<>\powerplantmatching\powerplantmatching\cleaning.py in aggregate_units(df, dataset_name, pre_clean_name, save_aggregation, country_wise, use_saved_aggregation, config)

    339         if country_wise:
    340             duplicates = pd.concat([duke(df.query('Country == @c'))
--> 341                                     for c in df.Country.unique()])
    342         else:
    343             duplicates = duke(df)

c:<>\powerplantmatching\powerplantmatching\cleaning.py in (.0)
    339         if country_wise:
    340             duplicates = pd.concat([duke(df.query('Country == @c'))
--> 341                                     for c in df.Country.unique()])
    342         else:
    343             duplicates = duke(df)

c:<>\powerplantmatching\powerplantmatching\duke.py in duke(datasets, labels, singlematch, showmatches, keepfiles, showoutput)
    109 
    110         run = sub.Popen(args, stderr=sub.PIPE, cwd=tmpdir, stdout=stdout,
--> 111                         universal_newlines=True)
    112         _, stderr = run.communicate()
    113 

~\Anaconda3\envs\db\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    727                                 c2pread, c2pwrite,
    728                                 errread, errwrite,
--> 729                                 restore_signals, start_new_session)
    730         except:
    731             # Cleanup if the child failed starting.

~\Anaconda3\envs\db\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
   1015                                          env,
   1016                                          os.fspath(cwd) if cwd is not None else None,
-> 1017                                          startupinfo)
   1018             finally:
   1019                 # Child is launched. Close the parent's copy of those pipe

FileNotFoundError: [WinError 2] The system cannot find the file specified

clean_powerplant name function dropping 8 IDs from GPD database

I've been doing some refactoring, but I believe this is an issue in the code I forked not long ago.

In the clean_powerplantname function The "common word" test looks for more than a (hard-wired) 20 occurences of a word and then drops them -- this beyond the fixed list of stop words. This is a clever idea and no problem with concept.

However, as/if a plant has ONLY such common words in the name, then the name Series to that point devolves to nullstrings for those entries. These are then dropped before the revised df is returned. (I changed code to revisit the Series to determine whether nullstrings created, and if so, revert back to previous clean point -- this works fine.)

Related to this, one might consider whether the dropping of null_string names coming out of this function is a good idea, or whether it should just be error trapped / reported to console. I wouldn't expct a name-cleaning function to potentially remove rows from the dataset.

In the GPD file I was using, I have eight occurrences that were dropped:

Porto Santo | WRI1022641
Pó | WRI1023421
Black Hill | GBR0003574
Black Rock | GBR0000485
Hill and Moor Landfill Scheme | GBR0000555
Long Hill Road | GBR0003101
River E | GBR0000428
Western Wood Energy Plant | GBR0000057

Introduce/Include MaSTR data

Hello dear friends,
I just finished a new data download of the MaStR database. The data quality is improving slowly.
Now I am facing some problems in the wind dataset. There are around 10000 double entries I would like to find. Did someone already tried MaStR and PPM?

Originally posted by @Ludee in #11 (comment)

KeyError: 'Efficiency' when updating matched dataset

After integrating the dataset in data.py.

I ran

 pm.powerplants(update_all=True)

And got the following error:

INFO:powerplantmatching.collection:Collect combined dataset for CARMA, ENTSOE, GEO, GPD, JRC, 
JRC_OPEN_update, OPSD
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'CARMA'.
WARNING:powerplantmatching.core:Geoparsing not possible as no google api key was found, please add 
the key to your config.yaml if you want to enable it.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'ENTSOE'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'GEO'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'GPD'.
INFO:powerplantmatching.cleaning:Aggregating blocks to entire units in 'JRC'.
Traceback (most recent call last):

  File "<ipython-input-5-2737e403b9c0>", line 1, in <module>
    pm.powerplants(update_all=True)

  File "c:\users\sab\desktop\powerplantmatching-0.4.1\powerplantmatching-0.4.1\powerplantmatching\collection.py", line 213, in matched_data
    matched = collect(matching_sources, **collection_kwargs)

  File "c:\users\sab\desktop\powerplantmatching-0.4.1\powerplantmatching-0.4.1\powerplantmatching\collection.py", line 99, in collect
    dfs = parmap(df_by_name, datasets)

  File "c:\users\sab\desktop\powerplantmatching-0.4.1\powerplantmatching-0.4.1\powerplantmatching\utils.py", line 364, in parmap
    return list(map(f, arg_list))

  File "c:\users\sab\desktop\powerplantmatching-0.4.1\powerplantmatching-0.4.1\powerplantmatching\collection.py", line 75, in df_by_name
    config=config)

  File "c:\users\sab\desktop\powerplantmatching-0.4.1\powerplantmatching-0.4.1\powerplantmatching\cleaning.py", line 281, in aggregate_units
    df = (df.assign(**{col: df[col] * df.Capacity for col in weighted_cols})

  File "c:\users\sab\desktop\powerplantmatching-0.4.1\powerplantmatching-0.4.1\powerplantmatching\cleaning.py", line 281, in <dictcomp>
    df = (df.assign(**{col: df[col] * df.Capacity for col in weighted_cols})

  File "C:\Users\sab\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 2917, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\sab\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2604, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Efficiency'

Issue with pm.data.wepp

Hello Fabian,
I just noticed that there seems to be an issue with the import of WEPP data:

  1. "source_file" should probably be replaced by "fn"
  2. WEPP uses "correct_manually" from utils.py which in turn relies on data_config which does not exist.
def correct_manually(df, name, config=None):\
  from. data import data_config

Error Message:
File "C:\ProgramData\Anaconda3\lib\site-packages\powerplantmatching\utils.py", line 143, in correct_manually
from .data import data_config
ImportError: cannot import name 'data_config' from 'powerplantmatching.data' (C:\ProgramData\Anaconda3\lib\site-packages\powerplantmatching\data.py)

Data-Sources

Some data sources are not mention in the documentation or newer versions are available. Some source are only mention and explained in the data.py file. A small comment in the documentation would be helpful.
missing sources:
IWPDCY (International Water Power & Dam Country Yearbook)
WEPP (Platts, World Elecrtric Power Plants Database)
UBA (Umwelt Bundesamt Datenbank "Kraftwerke in Deutschland)

New versions available:
IRENA stata (IRENA Capacity Statistics 2017 Database) now version 2018 is available. But I was not able to find a .csv file on their website.

Raise error if java not installed

Ths code check whether java is installed. Not having it installed, lead to confusion especially on windows machines where the resulting error is very opaque

Using powerplantmatching for a new dataset

Is it possible to use powerplantmatching on a new different dataset?

We have a network dataset where some of the information such as geographical location is missing. Would it be possible to match it with the datasets already included in powerplantmatching?

If yes, what is the way to do that? Do you have some documentation for that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.