I am trying to download the CEMS data from electricitylci. Surprisin

The StEWI question is one for <a class="user-mention notranslate" data-ho

Here's the site I found for the new HTTPS site: <a href="https

I found what looks like it might be CEMS data here: <a href="h

The StEWI question is one for <a class="user-mention notranslate" data-hovercard-type=

Flowsa has dealt with API key issues in the past (see <a href="https://github.com/USEP

Not Found: ftp://newftp.epa.gov about electricitylci HOT 14 OPEN

dt-woods commented on July 18, 2024

Not Found: ftp://newftp.epa.gov

from electricitylci.

Comments (14)

dt-woods commented on July 18, 2024 1

The StEWI question is one for @bl-young.

Can we use that API as-is without having to get a key, etc.?

One alternative solution that I believe I discussed with Ben before is using personal API keys to download the data and then make that data available in repositories. He touched on this in issue 205 #205 (comment). It's not as transparent as I think we would ultimately like it, but it will save potential users the trouble of having to get their own API keys to get at the data.

Matt, the API requires a key.

I'm hearing a few options, please clarify which you prefer:

We create an 'ELCI' API key and include it with the electricitylci package. The annual facility-level emissions should be well under the 1 M record limit, but leaves the key vulnerable.
We query the API and provide annual emissions (e.g., 2016–2022), but requires a public server to store the data or bloating the repository data directory even mores than it already is.
We integrate the API as a configuration parameter in the YAML, but requires users to apply for their own key.

Or some combination?

P.S. I don't think requesting an API key was burdensome and I think it helps government agencies track (and justify) their data management budgets.

from electricitylci.

bl-young commented on July 18, 2024 1

The other reason I would advocate for this is that I know my machine runs into issues with the RCRAInfo (I think) data pulls because it relies on a chrome plugin that is blocked by my government machine.

In this regard you are in luck! @dt-woods found a solution there a few weeks ago USEPA/standardizedinventories#146

from electricitylci.

dt-woods commented on July 18, 2024

Here's the site I found for the new HTTPS site:

https://www.epa.gov/air-emissions-inventories/ftp-download-not-working-what-do-i-do

from electricitylci.

dt-woods commented on July 18, 2024

I found what looks like it might be CEMS data here:

https://gaftp.epa.gov/Air/emismod/2016/v3/2016emissions/

but I'm not getting that it's by state or by quarter.

from electricitylci.

m-jamieson commented on July 18, 2024

Ugh. https://github.com/USEPA/cam-api-examples

from electricitylci.

dt-woods commented on July 18, 2024

A challenge with the API solution is managing the API key, which can be done within the config.yml, similar to what was done in the scenario modeler.

I found the CAMPD custom data download:

https://campd.epa.gov/data/custom-data-download

Their website calls the following API (e.g., for Pennsylvania, Q1, 2016):

https://api.epa.gov/easey/streaming-services/emissions/apportioned/quarterly/by-facility?=&year=2016&quarter=1&stateCode=PA

Looking at cems_data.py, it seems that build_cems_df groups (aggregates) the quarterly results... do we need hourly data?

Also, does/could stewi support CAMS emission inventory?

from electricitylci.

m-jamieson commented on July 18, 2024

The StEWI question is one for @bl-young.

Can we use that API as-is without having to get a key, etc.?

One alternative solution that I believe I discussed with Ben before is using personal API keys to download the data and then make that data available in repositories. He touched on this in issue 205 #205 (comment). It's not as transparent as I think we would ultimately like it, but it will save potential users the trouble of having to get their own API keys to get at the data.

from electricitylci.

bl-young commented on July 18, 2024

Flowsa has dealt with API key issues in the past (see here, though I think this is talk of tweaking the approach in the future). I think that if this is facility data, and it falls within the schema available in stewi (e.g. FlowByFacility) than it could be a candidate for hosting this workflow. I am not very familiar with the nuances of this specific source. StEWI does not currently have an approach for handling API keys. Also seems like this could be a problem:

But yes, I think Matt's idea in general is a good one. Have a processed version available somewhere already, but allow users the ability to generate their own for whatever reason if they get an API key. Though I would recommend not storing it in the repository itself but rather some other public spot. Or vice versa, first check if an API key exists and process locally, and if it does not, go grab the processed data from your external source. That way it is still transparent and shows the script/fxn used to generate the processed data but users don't have to run that chunk of code.

Metadata is crucial here for reproducibility because users may be using different versions of the CEMS data depending on when it was pulled and whether it changed.

from electricitylci.

m-jamieson commented on July 18, 2024

Oh I forgot to add, we don't need hourly data. I think I grabbed the quarterly data originally for this very reason. I believe we only got daily data that way. At this point for the eLCI, annual emissions are all that are needed - that would match every other data source, and that's how the quarterly data is aggregated anyways.

In some future version, it might be nice to be able to generate data at some fraction of the year, seasonal or even daily, but for now, I wouldn't worry about that at all. That is if annual data is somehow available through the API, I would take that - much smaller download.

from electricitylci.

dt-woods commented on July 18, 2024

It's pretty straightforward with the API, which tool all of about 30 seconds to request from here, see snippet:

>>> import requests
>>> s_url = "https://api.epa.gov/easey/streaming-services/emissions/apportioned/annual/by-facility"
>>> params = {'api_key': 'abcXYZ', 'year': 2016, 'stateCode': 'PA'}
>>> r = requests.get(s_url, params=params)
>>> len(r.json())  # number of facilities in PA for 2016
74
>>> r.json()
[{'stateCode': 'PA',
  'facilityName': 'Brunot Island Power Station',
  'facilityId': 3096,
  'year': 2016,
  'grossLoad': 44642.41,
  'steamLoad': None,
  'so2Mass': 0.177,
  'co2Mass': 34891.613,
  'noxMass': 9.454,
  'heatInput': 587078.638},
...
{'stateCode': 'PA',
  'facilityName': 'ETMT Marcus Hook Terminal',
  'facilityId': 880107,
  'year': 2016,
  'grossLoad': None,
  'steamLoad': 2161856.78,
  'so2Mass': None,
  'co2Mass': None,
  'noxMass': 41.047,
  'heatInput': 3261517.455}]

from electricitylci.

dt-woods commented on July 18, 2024

For completeness, here is where I referenced for my API call snippet:

https://github.com/USEPA/cam-api-examples/blob/main/Python/facility_data_demo.py

from electricitylci.

bl-young commented on July 18, 2024

I know this question is not for me... but I agree with these choices and would of course not recommend 1. I still think 2 and 3 are not mutually exclusive given that 2 provides a useful way towards reproducibility without worrying about changes in the CEMS data, or if the API goes down.

EPA's data mangement system has been quite easy to work with https://dmap-data-commons-ord.s3.amazonaws.com/index.html?prefix= and esupy is already configured to use it, if that is the route you decide to take.

from electricitylci.

m-jamieson commented on July 18, 2024

I would say I do agree with Ben's approach - if the user does not have an API key or if they choose to download, then we can provide "canoncical" datasets on the AWS site mentioned above. In the short term and in the interest of getting newer data and a working version of elci out there, I would suggest that we focus on getting the canonical data sorted using manual pulls if necessary and with keeping up the metadata standard that exists on EPA data management system. The other reason I would advocate for this is that I know my machine runs into issues with the RCRAInfo (I think) data pulls because it relies on a chrome plugin that is blocked by my government machine.

The branched approach and building up all the API calls to me at least sounds like more effort than I intended for this year and is maybe something to shoot for next year, as we likely have to integrate EIA API calls as well. Up to two keys needed!

from electricitylci.

dt-woods commented on July 18, 2024

The backend API call for CAMS needs updated, see here.

Preliminary search over each state's annual facility emissions appears to be maxed at around 150 records (for Texas). The max returns from a single page request over the API is 500 records. Be warned that a future year with a state with more than 500 facility records will need to handle multiple page requests!

from electricitylci.

Not Found: ftp://newftp.epa.gov about electricitylci HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent