majeau-bettez / ecospold2matrix Goto Github PK

Class for recasting Ecospold2 LCA dataset into Leontief matrix representations or Supply and Use Tables

License: BSD 2-Clause "Simplified" License

Python 100.00%

ecospold2matrix's Introduction

Ecospold2Matrix

A Python class to parse an ecospold2 life cycle assessment dataset and arrange it as matrices for further calculations.

It can recast an Ecoinvent 3 database into a Leontief coefficient matrix with extensions, or it can arrange the unallocated Ecoinvent data as Supply and Use tables (SUT).

Basic functionality

Conveniently store and log all parameters relevant to the data manipulation
Perform basic quality checks on data, and fix some potential issues
Arrange allocated data as Leontief technical coefficient matrix, with environmental extensions and labels
Arrange unallocated data as SUT
Optionally, change sign conventions for waste treatment
Optionally, scale elementary and intermediate flows to recorded production volumes
Save matrices to various different formats

Installation

Now the code can be installed with:

pip install git+git://github.com/repo_owner/ecospold2matrix#egg=ecospold2matrix

where

repo_owner could be: majeau-bettez or tngTUDOR or any fork of the project

Simple Use case

import ecospold2matrix as e2m

# Define parser object, with default and project-specific parameters
# Make sure that the path database/location holds the datasets and MasterData folders
parser = e2m.Ecospold2Matrix('/database/location/', project_name='eco31_cons', positive_waste=True)

# Assemble matrices, including scaled-up flow matrices, and save to csv-files
parser.ecospold_to_Leontief(fileformats=['csv'], with_absolute_flows=True)

Or else...

import ecospold2matrix as em
charfile = 'ecoinvent_3.3_LCIA_implementation/LCIA_implementation_3.3.xlsx'
parser = em.Ecospold2Matrix(
        sys_dir = 'ecoinvent_3.3_consequential_ecoSpold02/',
        lci_dir = 'ecoinvent_3.3_consequential_lci_ecoSpold02/datasets',
        positive_waste=True,
        prefer_pickles=True,
        project_name='ecoinvent33cons',
        version_name='ecoinvent33')

parser.ecospold_to_Leontief(characterisation_file=charfile, lci_check=True)

Short Demo

Have a look at this Ipython notebook for a demo of typical usage

Dependencies

Python 3
Pandas
Numpy
Scipy
lxml
Six
xlrd
xlwt
h5py

Open Source

This tool incorporates some code from the open-source Brightway2 project. Ecospold2Matrix is also open-source, so feel most welcome to use, give feedback, modify or contribute.

ecospold2matrix's People

Contributors

Stargazers

Watchers

Forkers

ntnu-indecol rlonka tngtudor luisqcosta maximeagez rich-wood myndtrust thomasgibon karagul hungchristine oases-project lorenzousai peasnuter emjburger qtu-ubc nfrancart rosaliehagenaars

ecospold2matrix's Issues

xml files contain BOM

Hi Guillaume,

When I try to parse the xml files in line 785 and 1011 I get an error message that the '<' sign is not in line 1 column 1. When inspecting the file with open(fp).readline() we discovered that the xml's contain a Byte Order Mark. We have been able to get around this by adjusting line 784 and line 1010 as follows:

with open(fp, 'r', encoding="utf-8") as fh:

ecoinvent now parses properly. Haven't been able to produce a C- matrix yet

How can I obtain the Z matrix?

Hi, majeau-bettez,
Based on the "ecospold2matrix" class you shared, I successfully recast Ecospold2 LCA dataset into Leontief matrix representations. But in my results, there are only A, F, C, and no Z matrix. How can I modify it to obtain the Z matrix? And what should I do if I want to get SUT?

Looking forward to your reply.
Thanks!

closing parenthesis bug

Hi Guillaume,
there is small bug. There is missing closing parenthesis. I do not have rights to push,
so I am attaching diff you can apply. Or just type it yourself. Up to you. It is just one
')'
In characterisation branch.
fix_closing_parenthesis.txt

fillna behavior

Hi,
I cannot export SparseMatrix with current master branch.
I used:

import ecospold2matrix as em2
parser = em2.Ecospold2Matrix('/home/tomas/dbs/3.1/cutoff_ecospold02', project_name='eco31_cutoff', positive_waste=True )
parser.ecospold_to_Leontief(fileformats=['SparseMatrix'], with_absolute_flows=True)

but the process cannot finish, because of a fillna problem. Here is the complete output:

If the IMP gets initialized as a DataFrame instead of a numpy array, then the process happens correctly (see tngTUDOR@2298089) but I'm not sure this is the best way to go. What do you think ?

WARNING: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
2016-05-10 16:54:42,937 - eco31_cutoff - INFO - Ecospold2Matrix Processing
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Project name: eco31_cutoff
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Unit process and Master data directory: /home/tomas/dbs/3.1/cutoff_ecospold02
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Data saved in: /home/tomas/virtualenvs/ecospold2matrix-dev
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Sign conventions changed to make waste flows positive
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Pickle intermediate results to files
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Order processes based on: ISIC, activityName
2016-05-10 16:54:42,940 - eco31_cutoff - INFO - Order elementary exchanges based on: comp, name, subcomp
2016-05-10 16:54:44,502 - eco31_cutoff - WARNING - obs2char_subcomps constraints temporarily relaxed because not full recipe parsed
2016-05-10 16:54:44,553 - eco31_cutoff - INFO - Products extracted from IntermediateExchanges.xml with SHA-1 of ca2c05c4dff035265fc44c53c7b534a3a711ff70
2016-05-10 16:54:59,855 - eco31_cutoff - WARNING - Removed 176 duplicate rows from activity_list, see duplicate_activity_list.csv.
2016-05-10 16:54:59,877 - eco31_cutoff - INFO - Activities extracted from ActivityIndex.xml with SHA-1 of c579d38fb6fa4a52ec4e09e5b04b873df77ce4c9
2016-05-10 16:54:59,904 - eco31_cutoff - INFO - Processing 11301 files in /home/tomas/dbs/3.1/cutoff_ecospold02/datasets
2016-05-10 16:55:52,495 - eco31_cutoff - INFO - Flows saved in /home/tomas/dbs/3.1/cutoff_ecospold02/flows.pickle with SHA-1 of ee7fcb8b40433af5e79b09a20d884ac01a2e7189
2016-05-10 16:55:52,542 - eco31_cutoff - INFO - Processing 11301 files - this may take a while ...
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:1111: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  self.PRO = PRO.sort(columns=self.PRO_order)
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:971: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  self.STR = STR.sort(columns=self.STR_order)
2016-05-10 16:56:40,862 - eco31_cutoff - INFO - Elementary flows extracted from ElementaryExchanges.xml with SHA-1 of 8a3a0a95e8a023950f42704eebc248014164166c
2016-05-10 16:56:40,910 - eco31_cutoff - INFO - Labels saved in /home/tomas/dbs/3.1/cutoff_ecospold02/rawlabels.pickle with SHA-1 of 2be0897a16fd2b0814f8aa6f49424f9b8f131650
2016-05-10 16:56:40,928 - eco31_cutoff - INFO - OK.   No untraceable flows.
2016-05-10 16:56:41,241 - eco31_cutoff - INFO - OK. Source activities seem in order. Each product traceable to an activity that actually does produce or distribute this product.
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:1475: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  self.PRO = self.PRO.sort(columns=self.PRO_order)
/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py:1476: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  self.STR = self.STR.sort(columns=self.STR_order)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/tomas/virtualenvs/ecospold2matrix-dev/export_3.1_leontief.py in <module>()
      1 import ecospold2matrix as em2
      2 parser = em2.Ecospold2Matrix('/home/tomas/dbs/3.1/cutoff_ecospold02', project_name='eco31_cutoff', positive_waste=True )
----> 3 parser.ecospold_to_Leontief(fileformats=['SparseMatrix'], with_absolute_flows=True)

/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
    449 
    450         # Save system to file
--> 451         self.save_system(fileformats)
    452 
    453         # Read/load lci cummulative emissions and perform quality check

/home/tomas/virtualenvs/ecospold2matrix-dev/src/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in save_system(self, file_formats)
   2015             PRO = self.PRO.fillna('').values
   2016             STR = self.STR.fillna('').values
-> 2017             IMP = self.IMP.fillna('').values
   2018             PRO_header = self.PRO.columns.values
   2019             PRO_header = PRO_header.reshape((1,len(PRO_header)))

AttributeError: 'numpy.ndarray' object has no attribute 'fillna'

build_AF tries to set elements of sparse matrix using loc which is not allowed

in line 1687 the setting of the absolute values for waste streams throws an error as it seems that the operation is not allowed on sparse arrays:

if self.positive_waste:
# In cutoff version of ecoinvent, some dummy waste processes do
# not seem to have negative reference outputs. These must then be
# identified more crudely based on string recognition, and their
# rows forced positive in the A-matrix
bo_cutoff = self.PRO.activityName.str.contains(self.__CUTOFFTXT)
self.A.loc[bo_cutoff,:] = self.A.loc[bo_cutoff,:].abs()

I found this workaround but perhaps there;s a better way?

This is the log output/error message:

2020-06-30 14:19:24,609 - plcaio_test - INFO - Starting to assemble the matrices
2020-06-30 14:19:28,229 - plcaio_test - INFO - fillna
2020-06-30 14:20:11,665 - plcaio_test - INFO - Starting normalizing matrices

TypeError Traceback (most recent call last)
in
8 characterisation_file=
9 '/home/jakobs/data/ecoinvent/ecoinvent 3.5_LCIA_implementation/LCIA_implementation_3.5.xlsx', )
---> 10 parser.ecospold_to_Leontief(fileformats='Pandas',with_absolute_flows=False)

~/Documents/IndEcol/OASES/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
405
406 # Finally, assemble normalized, symmetric matrices
--> 407 self.build_AF()
408
409 if with_absolute_flows:

~/Documents/IndEcol/OASES/ecospold2matrix/ecospold2matrix/ecospold2matrix.py in build_AF(self)
1685 # rows forced positive in the A-matrix
1686 bo_cutoff = self.PRO.activityName.str.contains(self.__CUTOFFTXT)
-> 1687 self.A.loc[bo_cutoff,:] = self.A.loc[bo_cutoff,:].abs()
1688
1689 if self.force_all_positive:

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/indexing.py in setitem(self, key, value)
669 key = com.apply_if_callable(key, self.obj)
670 indexer = self._get_setitem_indexer(key)
--> 671 self._setitem_with_indexer(indexer, value)
672
673 def _validate_key(self, key, axis: int):

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
984 v = np.nan
985
--> 986 setter(item, v)
987
988 # we have an equal len ndarray/convertible to our labels

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/indexing.py in setter(item, v)
960 s._consolidate_inplace()
961 s = s.copy()
--> 962 s._data = s._data.setitem(indexer=pi, value=v)
963 s._maybe_update_cacher(clear=True)
964

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/internals/managers.py in setitem(self, **kwargs)
559
560 def setitem(self, **kwargs):
--> 561 return self.apply("setitem", **kwargs)
562
563 def putmask(self, **kwargs):

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
440 applied = b.apply(f, **kwargs)
441 else:
--> 442 applied = getattr(b, f)(**kwargs)
443 result_blocks = _extend_blocks(applied, result_blocks)
444

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/internals/blocks.py in setitem(self, indexer, value)
1801
1802 check_setitem_lengths(indexer, value, self.values)
-> 1803 self.values[indexer] = value
1804 return self
1805

~/anaconda3/envs/pylcaio/lib/python3.7/site-packages/pandas/core/arrays/sparse/array.py in setitem(self, key, value)
459 # ExtensionBlock.where
460 msg = "SparseArray does not support item assignment via setitem"
--> 461 raise TypeError(msg)
462
463 @classmethod

TypeError: SparseArray does not support item assignment via setitem

`ecospold_to_Leontief` throws error `OperationalError: duplicate column name: id`

Attempting to run ecospold2matrix like so:

parser = e2m.Ecospold2Matrix(
    sys_dir = path_dir_ecoinvent_input,
    lci_dir = os.path.join(path_dir_ecoinvent_input, 'datasets'),
    project_name = e2m_project_name,
    characterisation_file = path_file_ecoinvent_characterisation,
    out_dir = path_dir_databases_pickle,
    positive_waste = False,
    nan2null = True
)
parser.ecospold_to_Leontief(
    fileformats = 'Pandas',
    with_absolute_flows=True
)

with Ecoinvent 3.8 returns the following error:

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Cell In [29], line 1
----> 1 parser.ecospold_to_Leontief(
      2     fileformats = 'Pandas',
      3     with_absolute_flows=True
      4 )

File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:425, in Ecospold2Matrix.ecospold_to_Leontief(self, fileformats, with_absolute_flows, lci_check, rtol, atol, imax, characterisation_file, ardaidmatching_file)
    423 else:
    424     self.prepare_matching_load_parameters()
--> 425     self.process_inventory_elementary_flows()
    426     self.read_characterisation()
    427     self.populate_complementary_tables()

File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/ecospold2matrix/ecospold2matrix.py:2803, in Ecospold2Matrix.process_inventory_elementary_flows(self)
   2801 # export to tmp SQL table
   2802 c = self.conn.cursor()
-> 2803 self.STR.to_sql('dirty_inventory',
   2804                 self.conn,
   2805                 index_label='id',
   2806                 if_exists='replace')
   2807 c.execute( """
   2808 INSERT INTO raw_inventory(id, name, comp, subcomp, unit, cas)
   2809 SELECT DISTINCT id, name, comp, subcomp, unit, cas
   2810 FROM dirty_inventory;
   2811 """)
   2813 self.clean_label('raw_inventory')

File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/core/generic.py:2987, in NDFrame.to_sql(self, name, con, schema, if_exists, index, index_label, chunksize, dtype, method)
   2830 """
   2831 Write records stored in a DataFrame to a SQL database.
   2832 
   (...)
   2983 [(1,), (None,), (2,)]
   2984 """  # noqa:E501
   2985 from pandas.io import sql
-> 2987 return sql.to_sql(
   2988     self,
   2989     name,
   2990     con,
   2991     schema=schema,
   2992     if_exists=if_exists,
   2993     index=index,
   2994     index_label=index_label,
   2995     chunksize=chunksize,
   2996     dtype=dtype,
   2997     method=method,
   2998 )


File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:695, in to_sql(frame, name, con, schema, if_exists, index, index_label, chunksize, dtype, method, engine, **engine_kwargs)
    690 elif not isinstance(frame, DataFrame):
    691     raise NotImplementedError(
    692         "'frame' argument should be either a Series or a DataFrame"
    693     )
--> 695 return pandas_sql.to_sql(
    696     frame,
    697     name,
    698     if_exists=if_exists,
    699     index=index,
    700     index_label=index_label,
    701     schema=schema,
    702     chunksize=chunksize,
    703     dtype=dtype,
    704     method=method,
    705     engine=engine,
    706     **engine_kwargs,
    707 )

File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:2187, in SQLiteDatabase.to_sql(self, frame, name, if_exists, index, index_label, schema, chunksize, dtype, method, **kwargs)
   2176             raise ValueError(f"{col} ({my_type}) not a string")
   2178 table = SQLiteTable(
   2179     name,
   2180     self,
   (...)
   2185     dtype=dtype,
   2186 )
-> 2187 table.create()
   2188 return table.insert(chunksize, method)

File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:838, in SQLTable.create(self)
    836         raise ValueError(f"'{self.if_exists}' is not valid for if_exists")
    837 else:
--> 838     self._execute_create()

File ~/miniconda3/envs/hybridization_bw_241/lib/python3.9/site-packages/pandas/io/sql.py:1871, in SQLiteTable._execute_create(self)
   1869 with self.pd_sql.run_transaction() as conn:
   1870     for stmt in self.table:
-> 1871         conn.execute(stmt)

OperationalError: duplicate column name: id

Error when run both master and dev branch

import ecospold2matrix as e2m
parser = e2m.Ecospold2Matrix('/home/radekl/Desktop/ecoinvent 3.4_cutoff_lci_ecoSpold02/datasets', project_name='eco34', out_dir='/home/radekl/eco_matrices', positive_waste=True)
2018-02-22 10:58:36,955 - eco34 - INFO - Ecospold2Matrix Processing
2018-02-22 10:58:36,961 - eco34 - INFO - Current git commit: ca2593a
2018-02-22 10:58:36,962 - eco34 - INFO - Project name: eco34
2018-02-22 10:58:36,962 - eco34 - INFO - Unit process and Master data directory: /home/radekl/Desktop/ecoinvent 3.4_cutoff_lci_ecoSpold02/datasets
2018-02-22 10:58:36,962 - eco34 - INFO - Data saved in: /home/radekl/eco_matrices
2018-02-22 10:58:36,962 - eco34 - INFO - Sign conventions changed to make waste flows positive
2018-02-22 10:58:36,962 - eco34 - INFO - Pickle intermediate results to files
2018-02-22 10:58:36,962 - eco34 - INFO - Order processes based on: ISIC, activityName
2018-02-22 10:58:36,962 - eco34 - INFO - Order elementary exchanges based on: comp, name, subcomp
2018-02-22 10:58:36,972 - eco34 - WARNING - Could not establish connection to database
parser.ecospold_to_Leontief(with_absolute_flows=True)
Traceback (most recent call last):
File "", line 1, in
File "/home/radekl/workspace/ecospold2matrix/ecospold2matrix/ecospold2matrix.py", line 363, in ecospold_to_Leontief
self.extract_products()
File "/home/radekl/workspace/ecospold2matrix/ecospold2matrix/ecospold2matrix.py", line 669, in extract_products
assert os.path.exists(fp), "Can't find " + self.__INTERMEXCHANGE
AssertionError: Can't find IntermediateExchanges.xml

Leontief coefficients format

Hi,

I was wondering what you mean by Leontief coefficients format? In the code the Leontief is called A, Do you mean that your code transforms the ecoinvent data to a symmetric A matrix only or is it also inverted and shaped in the form of (I-A)^-1? What are other attributes of the created A matrix? Does it include self-inputs and does each process have the same output like in IO is assumed 1 euro?

Thank you!

What's the difference between "F matrix" and "E matrix"?

Hi Guillaume Majeau-Bettez,

Could you please help me distinguish these two matrixes "F matrix" and "E matrix"? What's the difference between these two matrixes?
Thank you for your assistance.
Looking forward to your reply.

Best regards,
Xiaodan Han

String error in read_characterisation

Hi Guillaume,

I received an error executing the following line (2437)
foo.cas = foo.cas.str.replace('^[0]*','')

This only occured for those sheets in ReCiPe111.xlsx where there aren't any cas numbers specified for all stressors (LOP,LTP, WDP,MDP, FDP) . The nans are not recognized by str.replace.

A quick fix was to implement the following line.

if foo.cas.values.dtype != 'float64':
foo.cas = foo.cas.str.replace('^[0]*','')

Probably needs a more permanent solution though.