openaq / openaq-api Goto Github PK

View Code? Open in Web Editor NEW

150.0 150.0 43.0 1.47 MB

OpenAQ Platform API - NO LONGER IN USE see https://github.com/openaq/openaq-api-v2

License: Other

Shell 1.15% JavaScript 98.04% Dockerfile 0.82%

openaq-api's People

Contributors

Stargazers

Watchers

Forkers

olafveerman dolugen cclauss cherryshoe fpedrera jancarloviray anilprasad nickolasclarke argysamo moddyy chegejames wselviz huikyole abarciauskas-bgse datakind-dc webbkyr tsigie wrfcoin wegiangb seemachouhan imagineer-aman wswapped minh5 dominicwhite prabz 56eo kinow varunkrishnaswamy iamgoingtoforgetmyusername andrewharvey mon-air-cm kerenvascs masterofnone69 aditya-kukde ketaki-t bitner vcc-lg debug6290 rayxke n8ers andrewillomitzer

openaq-api's Issues

Poland - Sources

About 14 sites across the country.

Home page:
http://sojp.wios.warszawa.pl/

View hourly levels at a given station:
http://sojp.wios.warszawa.pl/?page=hourly-report&data=04-10-2015&site_id=12&csq_id=1414&dane=w1

View station description:
http://sojp.wios.warszawa.pl/index.php?page=site-description&t=1&o=2&site_id=69

cc: @jflasher

Store organization / provider

For proper attribution, it could make sense to store the organization providing the data. This could either be set on source level, or set per measurement by the adapter.

In the case of the Dutch (#25) and the Chilean (#29) data, there is a difference between data provider and the maintainer of the station.

@jflasher @RocketD0g Any idea if and how you want to store this?

Comment from Knight Foundation Application - Chisato

As a social scientist conducting research on air pollution management in Ulaanbaatar, Mongolia, I highly commend this Open AQ initiative. Air pollution is certainly the most pressing environmental health issue in this capital city. Since 2012, there have been increased efforts to improve air quality monitoring by installing more air quality monitors throughout Ulaanbaatar. However, more monitors do not necessarily catalyze better data sharing. Even if these technologies produce reliable, real-time air quality data, this data will not make an impact on the research community, development field, and most importantly, the public-at-large if there is no sustainable, user-friendly, robust system set in place for data sharing. After all, air pollution is an inherently social problem. We need to connect the data to people. I believe that Open AQ would provide this foundational connection. Open source sharing of air quality data would create and sustain a global commitment around air pollution issues -- connecting people across cities and regions to examine the various ways to tackle this environmental challenge. The Open AQ initiative also values the socio-cultural dimensions of the air pollution issue. Different cultures with different political-economic systems approach the air pollution problem, it's management, and potential solutions in different ways. Open AQ will host it's first workshop in Ulaanbaatar this November with the goal of bringing together local experts, media, and community members together to develop, dispute, and deploy strategies that would best suit Ulaanbaatar. This demonstrates that the Open AQ initiative will engage with local communities as key members of developing this platform.

As a suggestion, (once the air quality data is calibrated and complete) I recommend including a section or layer on guidelines and policies from different countries. I think it would be beneficial to make information about different interventions (especially related to health) accessible. How is Delhi tackling the air pollution issue? Can Mexico City use the same model? Why are Chinese residents wearing masks but not residents in Jakarta? People across the globe can learn how different governments are tackling the air pollution problem and/or how local communities are using the data to hold different actors and institutions accountable for air pollution reduction. For example, a lot air pollution protection/air pollution-induced illness information is not readily available or part of the public discourse. In order for people to take ownership over their health as inhabitants on polluted cities, I think that a guidelines "layer" would catalyze more urgency of this issue and strengthen efforts to improve air quality from within communities. I foresee a global "blog" on air pollution issues where people discuss, debate, and learn from each other on how to best tackle the air pollution problem in their own communities.

Romania - Sources

142 sources for Romania, updated hourly:
http://www.calitateaer.ro/valori.php

London file

https://www.kimonolabs.com/api/ap7bzdcw

Add timeouts to the requests in adapters

If we don't set timeouts on the requests, Heroku may time us out which feels worse. Makes me wonder if we should have a system-wide request object that gets passed around so we can set defaults in one place?

Make measurements.value field a number

Right now it's a string which doesn't work well if we intend to be able to do allow querying by value ranges.

Comment from Knight Foundation Application - MH

Connecting individuals (including the media, politicians, citizen-scientists, and even other scientists) with scientific data is a constant challenge, and OpenAQ is an outstanding leap forward. By providing data in a programmatic method that ANYONE can utilize, you all are setting the standard for how this should be done. High school student writing about air pollution downwind from a power plant in your neighborhood? Click a button and get data. Reporter writing a story on how pollution in your city compares to another across the word? Click two buttons and download the data. Scientist wanting to do complex queries across multiple locations, adjusting for seasonality and time of day? Incorporate the API into your Python script.

One suggestion for long-term, future work. I agree with many of the other commenters that OpenAQ would make for a fantastic platform to extend to additional types of data. Meteorological, sea ice, and terrestrial flux data, for example, can be difficult data sets to access for both non-scientists and scientists alike. Often this data is squirreled away on a server in a proprietary format, is confusing to access, is not available by API, etc. Your platform could and should set the standard for how all types of scientific data is made easily and quickly accessible to the public.

How to include alternate unit sources

This just came up when looking to include Chilean data #37. For some of the measurements, they're reporting data in ppb or ppm (it looks like we may also be able to get it in ug/m3, but for the sake of argument, forget about that). We have the unit field in the measurement record for exactly this scenario, but do we actually want to use it? If some of the sources are reporting in ug/m3 and some are reporting in ppb or another alternate unit, it would seem to severely lessen the ability to directly visualize the data next to each other.

Is there an easy way to convert between ppb and ug/m3 or should we even do that?

cc/ @RocketD0g @olafveerman

Moscow, Russia - Data Sources

48 hour Moscow data
http://mosecom.ru/air/air-today/station/spirid/table.html

Better email handling

Too many emails are getting sent, figure out a more sane way to handle this. They are currently disabled via Heroku scheduler task until this is fixed.

Add ability to sort by value ranges

Now that values are numbers, we can do this.

Taiwan - Sources

Taiwan PM2.5:

http://taqm.epa.gov.tw/pm25/en/PM25A.aspx?area=10

Station descriptions (including coordinates) available by clicking on station name on link directly above:
http://taqm.epa.gov.tw/taqm/en/Site/Keelung.aspx

Ozone, as well as PM2.5, is available on the main page, but not sure how it is accessible programmatically:
http://taqm.epa.gov.tw/taqm/en/

2015 GBD/WHO template for including data in their global databases

No Immediate Action Intended - Background Info

FYI, a useful template of the type of information collected for the upcoming 2015 WHO and GBD global databases of annual average PM2.5 and PM10 pollution is below (They are primarily on the search currently for 2014 data). I can't find the issue, but I think @olafveerman brought up the categorization of sites before (e.g. what is the criteria for residential, urban, industrial, etc?). It has been indicated there is not strict criteria for this currently and countries are directed to fill out the template using their best judgement.

http://www.who.int/entity/phe/health_topics/outdoorair/databases/PHE-Template-OAP-database-entries-June2015.xls?ua=1

More China data sources

This should be in the exact same format as Beijing, and it is only PM2.5.

Chengdu - https://www.kimonolabs.com/api/bd87c7js
Guangzhou - https://www.kimonolabs.com/api/d4wfxfl2
Shanghai - https://www.kimonolabs.com/api/7jec7wh2
Shenyang - https://www.kimonolabs.com/apis/4lw816j4

New York, City - Data Sources

There are several on this site for NYC and also NY State. Here is one for NY City (CCNY):
http://www.dec.ny.gov/airmon/stationStatus.php?stationNo=73

Automatically create csv of last day's data and put on S3

Paraphrasing Slack conversation:

Basically, I’d like some way to dump either/both database dumps and daily/weekly/monthly csv dumps to an S3 bucket and make them available for easy download. Want to make it easy for someone to grab all of our data at once, and that’s probably not through the API.

Dutch data source

http://www.lml.rivm.nl/tabel/

Use new averagingPeriod style across all sources

Documentation

I'm working on some project related documentation. Mostly a glossary of the project (source, station, measurement), the application's flow and some guidelines on how to contribute.

I can imagine this living in a couple of places:

in the wiki of the openaq.github.io repo
Advantage: easy to edit
as a chapter on the Open AQ website
Advantage: easy to read, Disadvantage: less easy to contribute
as markdown files in the /docs folder of a repo
Disavantage: not easy to read, not easy to contribute

@RocketD0g @jflasher Any thoughts on how you want to set this up?

Data Sources - Most polluted counties in California

Hourly AQ data for several places in CA that have high levels of PM relative to the rest of the country (specifically: Kern + Merced Counties, CA, the city of Fresno, CA, etc.):

http://www.valleyair.org/Programs/RAAN/raan_monitoring_system.htm

Figure out how to best handle the limit flag on locations endpoint.

Does paging make sense in the return data context? Right now it's just hardcoded to 500 to make sure we get all the results, but should probably respect limit.

a

Skopje, Macedonia - Data Sources

Hourly data for each pollutant, using the following configuration shown in this attached image:

Add New Relic

Handle dates in a better way

We need to do some thinking about how to best handle dates across the platform. Dates should be stored in UTC in the database, but we probably need to keep some track of timezones for location and whether it supports DST? ugh.

Chile data sources

Sinca is the Chilean AQ information system. It contains measurements from 194 stations, including the one in Valdivias (see #28).

Have to check:

the license under which this is published
whether there is an API we can use

Belgian sources

Measurements for a lot of Belgian measuring stations: http://www.ircel.be/nl/luchtkwaliteit/metingen
This page shows rolling averages for most parameters.

When you drill down, it's possible to get the actual hourly measurements and not the rolling averages. For example:

go to this page
click on table with detailed info per monitoring site

Have not found a programmatic way to access this data, it might need to be scraped.

Turkey - Sources

Map of stations with coordinates and current readings and stations' current data are here (though not on unique urls):

http://www.havaizleme.gov.tr/Default.ltr.aspx

'TÜM İSTASYONLAR' = all stations

Clicking on the stations reveals site coordinates (click 'station description) and pollutant types measured.

(Sidenote: They use an AQI system with breakpoints same as the US EPA)

Add a deploy to Heroku button

https://devcenter.heroku.com/articles/heroku-button

Add status page

Comment from Knight Foundation Application - Lodoysamba

This is a constructive comment that was posted on our open Knight Foundation News Challenge (url at bottom).

This is an important work as scientists, researchers, and students should have access to air quality data via an internet portal. This is crucial for timely review and analysis at both the level of individual cities and regions, and at the international level. Superficial reports of atmospheric conditions solely from air quality stations are not adequate. Air quality depends not only on source emissions but also on weather conditions, population activity, and other factors. In some cases, initial data are not always available. In other cases, air quality offices refuse to share their data. Hence, while the environmental scientist aims to analyze and interpret the data, these problems of poor data quality and restricted access injure the scientist’s ability to generate quality measures to reduce air pollution. On the other hand, when comprehensive scientific data is available, policymakers and air quality officers are better able to orient their strategies to reduce air pollution.
An internet forum to warehouse data will help scientists from all nations to learn from one another. Such a forum would lead to improved methods to record data, analyze data, and utilize data more efficiently. By accessing a data warehouse, scientists in less developed countries could quickly learn how reduction measures affect air quality in other cities around the globe.
Furthermore, many students and researchers from non-environmental disciplines could also find value in the data. Mathematicians, for example, could use these types of data sets to improve tools of statistical data analysis.
The prototype http://openaq.org contains data for some analysis, but could use certain enhancements. For instance, the site ought to include information concerning the air quality measuring station type, along with the extent of validation or calibration of recording media to give researchers more confidence in the quality of the information.

prof.S.Lodoysamba, Mongolia

https://www.newschallenge.org/challenge/data/entries/openaq-the-first-open-air-quality-data-hub-for-the-world#c-b367e525a7e574817c19ad24b7b35607

Ulaanbaatar, Mongolia - Data Sources

Two sources of data for UB:

agaar.mn
http://www.ub-air.info/ub-air/en/laq/average-30min.html

Brazil - Sources

Sao Paolo - Internews is interested in this.

Hourly data is available by station:
http://sistemasinter.cetesb.sp.gov.br/Ar/php/ar_dados_horarios.php

Station description with pollutant + location here:
http://ar.cetesb.sp.gov.br/configuracao-da-rede-automatica/
Note: As far as I can tell, geographic coordinates not given and will have to be determined by contact, using address, or verifying through other sources.

Add a check on data insert to remove non-standard fields

This is probably part of a larger utility piece that would verify the data altogether (verify date is a date, value is a number, etc).

Add a way to do a dryrun of the fetch script for testing purposes

Maybe a flag for --nodb and --noemail, make them default?

Australian source

Hourly data available here.
No API found

Valdivias (Chile) data source

Data source reported by @ignacionf

http://recursos.datos.gob.cl/datastreams/94396/estado-del-aire-en-valdivia-2015/
with a JSON like source in: http://api.recursos.datos.gob.cl/datastreams/invoke/ESTAD-DEL-AIRE-EN-VALDI?auth_key=994baa562bdb5f34d17e78dd7957c233b6c0a5f5

The one thing to confirm, is whether MPF refers to PM2.5 and MPG refers to PM10.

cc @jflasher

Rename Heroku app

Or else I will forget what it's doing in 2 months and try a delete it.

Peru - Data Sources

Lima Metropolitan area:

Station locations and coordinates:
http://calidaddelaire.minam.gob.pe/estaciones.php

Click on the little red dots in the above to get the hourly AQ data for each station. Looks like unique url's:
http://www.senamhi.gob.pe/?p=0412&txt=112192

Add way to cut down number of fields in response for /measurements

With more data being added to the measurements record, we should add a way to specify response fields. Either a summary type flag or maybe a fields=date,value,... type option.

Japanese Sources

Sources for Yokohama:

http://cgi.city.yokohama.lg.jp/kankyou/saigai/data/taiki/all/all_0000_00_001.html
http://www.ihe.pref.miyagi.jp/telem/dayreportitem/?itemSelect=10&day=2015%E5%B9%B410%E6%9C%8804%E6%97%A5

Appears to be hourly but unsure. Joe is contacting Miyagi Prefecture regarding details, potentially existing API, station coordinates.

Tokyo:
Just oxides? Need help with translation:
http://www.ox.kankyo.metro.tokyo.jp/index.php?chiku=1
http://www.ox.kankyo.metro.tokyo.jp/

Main page: http://www.kankyo.metro.tokyo.jp/nature/index.html

Validated / unvalidated

@RocketD0g What do you feel about adding a validated / un-validated flag? This might be valuable information, especially when we start adding validated sources.

@jflasher's fine with it. Just checked with him.

How to handle negative values?

With the inclusion of #61, we are going to be pulling some negative values into the platform. There may already be some, just noticed with latest data source. Some of the negative values are -0.25 and some are -999.

For right now, I think we just store these as is, but in the future do we throw out measurements with negative values? Do we keep them in the platform and leave it up to others to remove them?

Comment from Knight Foundation Application - Langley

As a research scientist working on air pollution issues, more accessible AQ data in different regions of the world is highly critical to understanding sources, transport, and transformation of air pollutants in the atmosphere. A major global health issue, atmospheric particulate formation and transport is still not fully understood by the scientific community, and increasing the geospatial resolution of available AQ measurements for modellers and researchers could really raise our understanding of these issues. Additionally, this platform could help inform the public about local air quality issues and provide needed data for medical workers and journalists.
During my time working in a developing country, I have found it frustrating to try to scour scientific papers for names of scientists that may or may not know where AQ data is kept (when preparing proposals, briefings, and other official reports).
I like the suggestion from Chistato about including a section on guidelines and policies in different countries. This will allow direct impact of regulations to be observed. I am currently working in a developing country attempting to regulate AQ, and it is difficult for the officials to decide what type of AQ monitoring equipment to purchase and which regulations to push initially. Knowing what countries with similar air pollution sources and available resources have done in the past, and how this worked, would really be an asset for developing countries beginning to address AQ issues.
On a more scientific note, if possible and if available, including the meteorological data often captured by AQ monitoring stations (wind direction and wind speed) and a general description of measurement locations would help scientists best utilize these data in models.

Averaging period

As mentioned in #36, there are a couple of sources that report rolling averages. We seem to agree to store this with every measurement, but how to go about it?

We can either do a general purpose note field that can be used for anything:

{
  parameter: 'pm25',
  value: 4,
  note: '24 hour rolling average'
}

or we can attempt to standardize it in some way:

{
  parameter: 'pm25',
  value: 4,
  averagingPeriod: 24
}

@RocketD0g Do averaging periods tend to fall within a 4 - 24 hour range? Thoughts @jflasher ?