Comments (14)
The StEWI question is one for @bl-young.
Can we use that API as-is without having to get a key, etc.?
One alternative solution that I believe I discussed with Ben before is using personal API keys to download the data and then make that data available in repositories. He touched on this in issue 205 #205 (comment). It's not as transparent as I think we would ultimately like it, but it will save potential users the trouble of having to get their own API keys to get at the data.
Matt, the API requires a key.
I'm hearing a few options, please clarify which you prefer:
- We create an 'ELCI' API key and include it with the electricitylci package. The annual facility-level emissions should be well under the 1 M record limit, but leaves the key vulnerable.
- We query the API and provide annual emissions (e.g., 2016–2022), but requires a public server to store the data or bloating the repository data directory even mores than it already is.
- We integrate the API as a configuration parameter in the YAML, but requires users to apply for their own key.
Or some combination?
P.S. I don't think requesting an API key was burdensome and I think it helps government agencies track (and justify) their data management budgets.
from electricitylci.
The other reason I would advocate for this is that I know my machine runs into issues with the RCRAInfo (I think) data pulls because it relies on a chrome plugin that is blocked by my government machine.
In this regard you are in luck! @dt-woods found a solution there a few weeks ago USEPA/standardizedinventories#146
from electricitylci.
Here's the site I found for the new HTTPS site:
from electricitylci.
I found what looks like it might be CEMS data here:
but I'm not getting that it's by state or by quarter.
from electricitylci.
Ugh. https://github.com/USEPA/cam-api-examples
from electricitylci.
A challenge with the API solution is managing the API key, which can be done within the config.yml, similar to what was done in the scenario modeler.
I found the CAMPD custom data download:
Their website calls the following API (e.g., for Pennsylvania, Q1, 2016):
Looking at cems_data.py, it seems that build_cems_df
groups (aggregates) the quarterly results... do we need hourly data?
Also, does/could stewi support CAMS emission inventory?
from electricitylci.
The StEWI question is one for @bl-young.
Can we use that API as-is without having to get a key, etc.?
One alternative solution that I believe I discussed with Ben before is using personal API keys to download the data and then make that data available in repositories. He touched on this in issue 205 #205 (comment). It's not as transparent as I think we would ultimately like it, but it will save potential users the trouble of having to get their own API keys to get at the data.
from electricitylci.
Flowsa has dealt with API key issues in the past (see here, though I think this is talk of tweaking the approach in the future). I think that if this is facility data, and it falls within the schema available in stewi (e.g. FlowByFacility) than it could be a candidate for hosting this workflow. I am not very familiar with the nuances of this specific source. StEWI does not currently have an approach for handling API keys. Also seems like this could be a problem:
But yes, I think Matt's idea in general is a good one. Have a processed version available somewhere already, but allow users the ability to generate their own for whatever reason if they get an API key. Though I would recommend not storing it in the repository itself but rather some other public spot. Or vice versa, first check if an API key exists and process locally, and if it does not, go grab the processed data from your external source. That way it is still transparent and shows the script/fxn used to generate the processed data but users don't have to run that chunk of code.
Metadata is crucial here for reproducibility because users may be using different versions of the CEMS data depending on when it was pulled and whether it changed.
from electricitylci.
Oh I forgot to add, we don't need hourly data. I think I grabbed the quarterly data originally for this very reason. I believe we only got daily data that way. At this point for the eLCI, annual emissions are all that are needed - that would match every other data source, and that's how the quarterly data is aggregated anyways.
In some future version, it might be nice to be able to generate data at some fraction of the year, seasonal or even daily, but for now, I wouldn't worry about that at all. That is if annual data is somehow available through the API, I would take that - much smaller download.
from electricitylci.
It's pretty straightforward with the API, which tool all of about 30 seconds to request from here, see snippet:
>>> import requests
>>> s_url = "https://api.epa.gov/easey/streaming-services/emissions/apportioned/annual/by-facility"
>>> params = {'api_key': 'abcXYZ', 'year': 2016, 'stateCode': 'PA'}
>>> r = requests.get(s_url, params=params)
>>> len(r.json()) # number of facilities in PA for 2016
74
>>> r.json()
[{'stateCode': 'PA',
'facilityName': 'Brunot Island Power Station',
'facilityId': 3096,
'year': 2016,
'grossLoad': 44642.41,
'steamLoad': None,
'so2Mass': 0.177,
'co2Mass': 34891.613,
'noxMass': 9.454,
'heatInput': 587078.638},
...
{'stateCode': 'PA',
'facilityName': 'ETMT Marcus Hook Terminal',
'facilityId': 880107,
'year': 2016,
'grossLoad': None,
'steamLoad': 2161856.78,
'so2Mass': None,
'co2Mass': None,
'noxMass': 41.047,
'heatInput': 3261517.455}]
from electricitylci.
For completeness, here is where I referenced for my API call snippet:
from electricitylci.
I know this question is not for me... but I agree with these choices and would of course not recommend 1. I still think 2 and 3 are not mutually exclusive given that 2 provides a useful way towards reproducibility without worrying about changes in the CEMS data, or if the API goes down.
EPA's data mangement system has been quite easy to work with https://dmap-data-commons-ord.s3.amazonaws.com/index.html?prefix= and esupy is already configured to use it, if that is the route you decide to take.
from electricitylci.
I would say I do agree with Ben's approach - if the user does not have an API key or if they choose to download, then we can provide "canoncical" datasets on the AWS site mentioned above. In the short term and in the interest of getting newer data and a working version of elci out there, I would suggest that we focus on getting the canonical data sorted using manual pulls if necessary and with keeping up the metadata standard that exists on EPA data management system. The other reason I would advocate for this is that I know my machine runs into issues with the RCRAInfo (I think) data pulls because it relies on a chrome plugin that is blocked by my government machine.
The branched approach and building up all the API calls to me at least sounds like more effort than I intended for this year and is maybe something to shoot for next year, as we likely have to integrate EIA API calls as well. Up to two keys needed!
from electricitylci.
The backend API call for CAMS needs updated, see here.
Preliminary search over each state's annual facility emissions appears to be maxed at around 150 records (for Texas). The max returns from a single page request over the API is 500 records. Be warned that a future year with a state with more than 500 facility records will need to handle multiple page requests!
from electricitylci.
Related Issues (20)
- What impact assessment method? HOT 4
- Globals, references to globals, and editing references of globals
- Forced BA aggregation for FERC and US, but what about eGRID? HOT 3
- Should PC link to petcoke UP? HOT 2
- KeyError in fill_default_provider_uuids
- Missing data file reference in Wiki
- No fuel category in Stewi's getInventoryFacilities for eGRID 2020 HOT 1
- _exchange_table_creation_ref missing renewables HOT 1
- EIA coalpublic2021.xls Excel file format cannot be determined HOT 1
- Missing International Mix data for 2021 onward HOT 3
- Addressing the Industrial Cogeneration Problem and Implementing the filter in model_config HOT 1
- Fix output exchange flows mislabeled as resources HOT 5
- Mexican balancing authority labeled as Canada in BA_Codes_930.xlsx HOT 7
- No 2022 EIA transmission and distribution loss data HOT 3
- Missing Canadian Exports for 2021 and beyond HOT 1
- Fix region mis-match between consumption and distribution mixes
- Incorrect output flow for "at grid; consumption mix" HOT 2
- Issues with the electricity column in generate_plant_water_use() HOT 1
- Update coal model inventories
- Set temporal representativeness attribute for processes to inventory vintage, not target year
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from electricitylci.