aaronpenne / get_noaa_ghcn_data Goto Github PK
View Code? Open in Web Editor NEWA tool to interface with and download Global Historical Climatology Network (GHCN) data into easily readable CSVs.
License: MIT License
A tool to interface with and download Global Historical Climatology Network (GHCN) data into easily readable CSVs.
License: MIT License
The usual concat issue arose:
/get_noaa_ghcn_data/get_dly.py:202: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
Just add 'sort=False' to line 202 of get_dly.py
df_all = pd.concat(list_dfs, axis=1, sort=False)
I'm getting the following exception when searching for stations, using the example from the screenshot on the README page. It's having problems doing some casting. I'm running this on the Windows Subsystem for Linux, so I know my setup is not typical, not sure if that would have anything to do with it.
Appreciate any help. Would be awesome to get this working, thanks!
230-****** WARNING ** WARNING ** WARNING ** WARNING ** WARNING ******
** This is a United States Department of Commerce computer **
** system, which may be accessed and used only for **
** official Government business by authorized personnel. **
** Unauthorized access or use of this computer system may **
** subject violators to criminal, civil, and/or administrative **
** action. All information on this computer system may be **
** intercepted, recorded, read, copied, and disclosed by and **
** to authorized personnel for official purposes, including **
** criminal investigations. Access or use of this computer **
** system by any person, whether authorized or unauthorized, **
** constitutes consent to these terms. **
****** WARNING ** WARNING ** WARNING ** WARNING ** WARNING ******
230 Anonymous access granted, restrictions apply
Enter station name, full or partial. (ex. Washington, san fran, USC): chico
Traceback (most recent call last):
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1655, in _cast_types
values = astype_nansafe(values, cast_type, copy=True)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/core/dtypes/cast.py", line 703, in astype_nansafe
return arr.astype(dtype)
ValueError: could not convert string to float: 'GSN'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "get_noaa_ghcn_data.py", line 17, in <module>
station_id = get_station_id.get_station_id(ftp)
File "/mnt/c/Users/mpaulus/get_noaa_ghcn_data-master/get_station_id.py", line 88, in get_station_id
df = pd.read_fwf(local_full_path, widths=widths, names=names, dtype=dtype, header=None)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 741, in read_fwf
return _read(filepath_or_buffer, kwds)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 455, in _read
data = parser.read(nrows)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1069, in read
ret = self._engine.read(nrows)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 2269, in read
data = self._convert_data(data)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 2338, in _convert_data
clean_conv, clean_dtypes)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1562, in _convert_to_ndarrays
cvals = self._cast_types(cvals, cast_type, c)
File "/home/mpaulus/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1658, in _cast_types
"type %s" % (column, cast_type))
ValueError: Unable to convert column LATITUDE to type <class 'float'>
Would be convenient to grab a specific station using codes from other systems (IATA, etc.)
Would be convenient to find all stations within a region, need the appropriate lookups and cleansing to handle this.
The fuzzywuzzy library is an obvious first choice.
When using this to on the following station:
USC00400081 Allardt, TN
The day with maximum snow depth 2/4/1998 is missing.
There is snow data, but no temperature data if the csv is downloaded directly from NOAA.
Didn't investigate, but does this tool remove incomplete entries?
Locally for now, eventually in the browser for #11
By porting to Lambda this would immediately become a scalable (basic) API to download output files. Would likely need to stash the file in an S3 bucket and provide link for download. Prep step to making this thing a webapp.
This ties into removing the Pandas dependency #7 and porting to a web interface #11
Instead of cherry picking one particular station in a city, return data as an aggregate of data available for a city. Likely need to find the center coordinates of a city/area and use mile radius calculation to find stations that fall within it. Then do a smart weighted average where data overlaps between stations.
Fun serverless challenge
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.