Giter Site home page Giter Site logo

Comments (14)

dt-woods avatar dt-woods commented on July 18, 2024 1

The StEWI question is one for @bl-young.

Can we use that API as-is without having to get a key, etc.?

One alternative solution that I believe I discussed with Ben before is using personal API keys to download the data and then make that data available in repositories. He touched on this in issue 205 #205 (comment). It's not as transparent as I think we would ultimately like it, but it will save potential users the trouble of having to get their own API keys to get at the data.

Matt, the API requires a key.

I'm hearing a few options, please clarify which you prefer:

  1. We create an 'ELCI' API key and include it with the electricitylci package. The annual facility-level emissions should be well under the 1 M record limit, but leaves the key vulnerable.
  2. We query the API and provide annual emissions (e.g., 2016–2022), but requires a public server to store the data or bloating the repository data directory even mores than it already is.
  3. We integrate the API as a configuration parameter in the YAML, but requires users to apply for their own key.

Or some combination?

P.S. I don't think requesting an API key was burdensome and I think it helps government agencies track (and justify) their data management budgets.

from electricitylci.

bl-young avatar bl-young commented on July 18, 2024 1

The other reason I would advocate for this is that I know my machine runs into issues with the RCRAInfo (I think) data pulls because it relies on a chrome plugin that is blocked by my government machine.

In this regard you are in luck! @dt-woods found a solution there a few weeks ago USEPA/standardizedinventories#146

from electricitylci.

dt-woods avatar dt-woods commented on July 18, 2024

Here's the site I found for the new HTTPS site:

from electricitylci.

dt-woods avatar dt-woods commented on July 18, 2024

I found what looks like it might be CEMS data here:

but I'm not getting that it's by state or by quarter.

from electricitylci.

m-jamieson avatar m-jamieson commented on July 18, 2024

Ugh. https://github.com/USEPA/cam-api-examples

from electricitylci.

dt-woods avatar dt-woods commented on July 18, 2024

A challenge with the API solution is managing the API key, which can be done within the config.yml, similar to what was done in the scenario modeler.

I found the CAMPD custom data download:

Their website calls the following API (e.g., for Pennsylvania, Q1, 2016):

Looking at cems_data.py, it seems that build_cems_df groups (aggregates) the quarterly results... do we need hourly data?

Also, does/could stewi support CAMS emission inventory?

from electricitylci.

m-jamieson avatar m-jamieson commented on July 18, 2024

The StEWI question is one for @bl-young.

Can we use that API as-is without having to get a key, etc.?

One alternative solution that I believe I discussed with Ben before is using personal API keys to download the data and then make that data available in repositories. He touched on this in issue 205 #205 (comment). It's not as transparent as I think we would ultimately like it, but it will save potential users the trouble of having to get their own API keys to get at the data.

from electricitylci.

bl-young avatar bl-young commented on July 18, 2024

Flowsa has dealt with API key issues in the past (see here, though I think this is talk of tweaking the approach in the future). I think that if this is facility data, and it falls within the schema available in stewi (e.g. FlowByFacility) than it could be a candidate for hosting this workflow. I am not very familiar with the nuances of this specific source. StEWI does not currently have an approach for handling API keys. Also seems like this could be a problem:
image

But yes, I think Matt's idea in general is a good one. Have a processed version available somewhere already, but allow users the ability to generate their own for whatever reason if they get an API key. Though I would recommend not storing it in the repository itself but rather some other public spot. Or vice versa, first check if an API key exists and process locally, and if it does not, go grab the processed data from your external source. That way it is still transparent and shows the script/fxn used to generate the processed data but users don't have to run that chunk of code.

Metadata is crucial here for reproducibility because users may be using different versions of the CEMS data depending on when it was pulled and whether it changed.

from electricitylci.

m-jamieson avatar m-jamieson commented on July 18, 2024

Oh I forgot to add, we don't need hourly data. I think I grabbed the quarterly data originally for this very reason. I believe we only got daily data that way. At this point for the eLCI, annual emissions are all that are needed - that would match every other data source, and that's how the quarterly data is aggregated anyways.

In some future version, it might be nice to be able to generate data at some fraction of the year, seasonal or even daily, but for now, I wouldn't worry about that at all. That is if annual data is somehow available through the API, I would take that - much smaller download.

from electricitylci.

dt-woods avatar dt-woods commented on July 18, 2024

It's pretty straightforward with the API, which tool all of about 30 seconds to request from here, see snippet:

>>> import requests
>>> s_url = "https://api.epa.gov/easey/streaming-services/emissions/apportioned/annual/by-facility"
>>> params = {'api_key': 'abcXYZ', 'year': 2016, 'stateCode': 'PA'}
>>> r = requests.get(s_url, params=params)
>>> len(r.json())  # number of facilities in PA for 2016
74
>>> r.json()
[{'stateCode': 'PA',
  'facilityName': 'Brunot Island Power Station',
  'facilityId': 3096,
  'year': 2016,
  'grossLoad': 44642.41,
  'steamLoad': None,
  'so2Mass': 0.177,
  'co2Mass': 34891.613,
  'noxMass': 9.454,
  'heatInput': 587078.638},
...
{'stateCode': 'PA',
  'facilityName': 'ETMT Marcus Hook Terminal',
  'facilityId': 880107,
  'year': 2016,
  'grossLoad': None,
  'steamLoad': 2161856.78,
  'so2Mass': None,
  'co2Mass': None,
  'noxMass': 41.047,
  'heatInput': 3261517.455}]

from electricitylci.

dt-woods avatar dt-woods commented on July 18, 2024

For completeness, here is where I referenced for my API call snippet:

from electricitylci.

bl-young avatar bl-young commented on July 18, 2024

I know this question is not for me... but I agree with these choices and would of course not recommend 1. I still think 2 and 3 are not mutually exclusive given that 2 provides a useful way towards reproducibility without worrying about changes in the CEMS data, or if the API goes down.

EPA's data mangement system has been quite easy to work with https://dmap-data-commons-ord.s3.amazonaws.com/index.html?prefix= and esupy is already configured to use it, if that is the route you decide to take.

from electricitylci.

m-jamieson avatar m-jamieson commented on July 18, 2024

I would say I do agree with Ben's approach - if the user does not have an API key or if they choose to download, then we can provide "canoncical" datasets on the AWS site mentioned above. In the short term and in the interest of getting newer data and a working version of elci out there, I would suggest that we focus on getting the canonical data sorted using manual pulls if necessary and with keeping up the metadata standard that exists on EPA data management system. The other reason I would advocate for this is that I know my machine runs into issues with the RCRAInfo (I think) data pulls because it relies on a chrome plugin that is blocked by my government machine.

The branched approach and building up all the API calls to me at least sounds like more effort than I intended for this year and is maybe something to shoot for next year, as we likely have to integrate EIA API calls as well. Up to two keys needed!

from electricitylci.

dt-woods avatar dt-woods commented on July 18, 2024

The backend API call for CAMS needs updated, see here.

Preliminary search over each state's annual facility emissions appears to be maxed at around 150 records (for Texas). The max returns from a single page request over the API is 500 records. Be warned that a future year with a state with more than 500 facility records will need to handle multiple page requests!

from electricitylci.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.