Giter Site home page Giter Site logo

hyriver / pygeohydro Goto Github PK

View Code? Open in Web Editor NEW
63.0 3.0 22.0 168.01 MB

A part of HyRiver software stack for accessing hydrology data through web services

Home Page: https://docs.hyriver.io

License: Other

Python 100.00%
hydrology watershed data-visualization webservices hydrologic-database climate-data python usgs

pygeohydro's Introduction

image

Binder

Build Website

JOSS

Package Description CI
Download Stat Navigate and subset NHDPlus (MR and HR) using web services Github Actions
Download Stat Access topographic data through National Map's 3DEP web service Github Actions
Download Stat Access NWIS, NID, WQP, eHydro, NLCD, CAMELS, and SSEBop databases Github Actions
Download Stat Access daily, monthly, and annual climate data via Daymet Github Actions
Download Stat Access daily climate data via GridMet Github Actions
Download Stat Access hourly NLDAS-2 data via web services Github Actions
Download Stat A collection of tools for computing hydrological signatures Github Actions
Download Stat High-level API for asynchronous requests with persistent caching Github Actions
Download Stat Send queries to any ArcGIS RESTful-, WMS-, and WFS-based services Github Actions
Download Stat Utilities for manipulating geospatial, (Geo)JSON, and (Geo)TIFF data Github Actions

HyRiver: Hydroclimate Data Retriever

Features

HyRiver is a software stack consisting of ten Python libraries that are designed to aid in hydroclimate analysis through web services. Currently, this project only includes hydrology and climatology data within the US. Some major capabilities of HyRiver are:

  • Easy access to many web services for subsetting data on server-side and returning the requests as masked Datasets or GeoDataFrames.
  • Splitting large requests into smaller chunks, under-the-hood, since web services often limit the number of features per request. So the only bottleneck for subsetting the data is your local machine memory.
  • Navigating and subsetting NHDPlus database (both medium- and high-resolution) using web services.
  • Cleaning up the vector NHDPlus data, fixing some common issues, and computing vector-based accumulation through a river network.
  • A URL inventory for many popular (and tested) web services.
  • Some utilities for manipulating the obtained data and their visualization.

image

Please visit examples webpage to see some example notebooks. You can also watch these videos for a quick overview of HyRiver capabilities:

You can also try this project without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver software stack pre-installed will be launched in your web browser, and you can start coding!

Please note that this project is in early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional databases and functionalities can be submitted via issue trackers of packages.

Citation

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation

You can install all the packages using pip:

$ pip install py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever

Please note that installation with pip fails if libgdal is not installed on your system. You should install this package manually beforehand. For example, on Ubuntu-based distros the required package is libgdal-dev. If this package is installed on your system you should be able to run gdal-config --version successfully.

Alternatively, you can install them using conda:

$ conda install -c conda-forge py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever

or mambaforge (recommended):

$ mamba install py3dep pynhd pygeohydro pydaymet pygridmet pynldas2 hydrosignatures pygeoogc pygeoutils async-retriever

Additionally, you can create a new environment, named hyriver with all the packages and optional dependencies installed with mambaforge using the provided environment.yml file:

$ mamba env create -f ./environment.yml

image

pygeohydro's People

Contributors

aaraney avatar cheginit avatar deepsourcebot avatar dependabot-preview[bot] avatar dependabot[bot] avatar emiliom avatar fernando-aristizabal avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pygeohydro's Issues

py3dep.elevation_bycoords returns 'None' for valid locations

What happened:
Passing a list of zipped lat,lon values to py3dep.elevation_bycoords returned 'None' for all pairs.

What you expected to happen:

Expected a list of elevations.

Minimal Complete Verifiable Example:

import py3dep

coords = [(42.69513, -71.030437),
 (42.694901, -71.027653),
 (42.695388, -71.026931),
 (42.695383, -71.026942),
 (42.696471, -71.023837),
 (42.697136, -71.023545),
 (42.699233, -71.024387),
 (42.698356, -71.021488),
 (42.696643, -71.023499),
 (42.694305, -71.030054),
 (42.693343, -71.03474),
 (42.693349, -71.034757),
 (42.694002, -71.035491),
 (42.693452, -71.033743)]

elev = py3dep.elevation_bycoords(coords, crs="epsg:4326")
print (elev)

[None, None, None, None, None, None, None, None, None, None, None, None, None, None]

Anything else we need to know?:

Environment:

Output of pygeohydro.show_versions() ``` INSTALLED VERSIONS ------------------ commit: 9729f67e75fe31fa6b5eb122562e4c0c22792c6d python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:24:02) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

aiohttp: 3.8.1
aiohttp-client-cache: 0.5.2
aiosqlite: 0.17.0
async-retriever: 0.2.0
async-retriever>=0.2: None
click: 8.0.3
cytoolz: 0.11.2
dask: 2021.12.0
defusedxml: 0.7.1
geopandas>=0.7: None
lxml: 4.7.1
nest-asyncio: installed
netCDF4: 1.5.8
networkx: 2.6.3
numpy>=1.17: None
orjson: 3.6.5
owslib: 0.25.0
pandas>=1.0: None
pip: 21.3.1
py3dep: 0.11.4
py3dep>=0.11.3: None
pyarrow: 6.0.1
pydantic: 1.8.2
pydaymet: 0.11.4
pygeohydro: 0.11.4
pygeoogc: 0.11.7
pygeoogc>=0.11: None
pygeoogc>=0.11.5: None
pygeoutils: 0.11.7
pygeoutils>=0.11: None
pygeoutils>=0.11.5: None
pynhd: 0.11.1
pyproj>=2.2: None
pytest: None
rasterio>=1.2: None
requests: 2.26.0
requests-cache>=0.8: None
rioxarray>=0.8: None
scipy: 1.7.3
setuptools: 59.4.0
shapely>=1.6: None
simplejson: 3.17.6
ujson: 4.2.0
urllib3: 1.26.7
ward: None
xarray>=0.18: None
yaml: 6.0

</details>

`pygeohydro.soil_gnatsgo()` function erroring out

What happened?

I was following the soil storage capacity tutorial and was unable to run the pygeohydro.soil_gnatsgo() function without getting errors. My code is below. I'm not sure if this is an issue with my conda environment or something with the gnatsgo database related so I've also provided my environment yaml and the error message. It's uploaded as a text file because I couldn't upload it as a .yaml file.

I'm on Windows 11 with Python 3.12.2. I've defined my PROJ_LIB and PROJ_DATA paths at the start of the script and those should be pointing to the environment created by the yaml. I've tried to update rasterio with mamba with mamba update rasterio, but it's telling me everything is up-to-date.

What did you expect to happen?

I expected to have a similar output for soil thickness as is shown in the soil storage capacity tutorial, but my code is erroring out and I'm not able to access the thickness data from gNATSGO. I've tried other STAC variables as well, but have had no luck with those either.

Minimal Complete Verifiable Example

# get basin
test_basin = pynhd.NLDI().get_basins("11092450")

# get basin wkt string
test_basin_rasterio_wkt = rasterio.crs.CRS.from_wkt(test_basin.crs.to_wkt())

# get basin geom
test_basin_geom = test_basin.geometry["USGS-11092450"]

# get soil properties data (this works fine)
test_soils_data = pygeohydro.soil_properties() # this runs with rasterio warnings but gives result

# mask soil properties data with basin geom
test_soils_data_mask = pygeoutils.xarray_geomask(test_soil_data, test_basin_geom, test_basin_rasterio_wkt)
# i kept getting rasterio errors if i used test_basin.crs.to_wkt() here rather than test_basin_rasterio_wkt

# get soil thickness data
test_thickness_data = pygeohydro.soil_gnatsgo("tk0_999a", test_field_geom, test_basin_rasterio_wkt)
# this has similar rasterio warnings as above but errors out with more rasterio errors

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Error messages:

>>> test_mukey_data = pygeohydro.soil_gnatsgo("tk0_999a", test_field_geom, test_basin_rasterio_wkt)

WARNING:rasterio._env:CPLE_AppDefined in PROJ: proj_create_from_database: SQLite error on SELECT name, type, coordinate_system_auth_name, coordinate_system_code, datum_auth_name, datum_code, area_of_use_auth_name, area_of_use_code, text_definition, deprecated FROM geodetic_crs WHERE auth_name = ? AND code = ?: no such column: area_of_use_auth_name
WARNING:rasterio._env:CPLE_AppDefined in PROJ: proj_create_from_database: SQLite error on SELECT name, ellipsoid_auth_name, ellipsoid_code, prime_meridian_auth_name, prime_meridian_code, area_of_use_auth_name, area_of_use_code, deprecated FROM geodetic_datum WHERE auth_name = ? AND code = ?: no such column: area_of_use_auth_name
Traceback (most recent call last):
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\file_manager.py", line 211, in _acquire_with_cache_info
    file = self._cache[self._key]
           ~~~~~~~~~~~^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\lru_cache.py", line 56, in __getitem__
    value = self._cache[key]
            ~~~~~~~~~~~^^^^^
KeyError: [<function open at 0x000001F4F2AAE3E0>, (WindowsPath('cache/005089ad56d76b182f3308ea5dc486455e1b3e28e2af21f4c554edab4c89a04a.tiff'),), 'r', (('sharing', False),), '69f84f8f-772c-4148-8a69-80468c6750b4']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rasterio\\crs.pyx", line 730, in rasterio.crs.CRS.from_wkt
  File "rasterio\\_err.pyx", line 209, in rasterio._err.exc_wrap_ogrerr
rasterio._err.CPLE_BaseError: OGR Error code 5

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 800, in soil_gnatsgo
    ds = xr.merge((get_layer(lyr) for lyr in lyrs), combine_attrs="drop_conflicts")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\core\merge.py", line 1015, in merge
    for obj in objects:
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 800, in <genexpr>
    ds = xr.merge((get_layer(lyr) for lyr in lyrs), combine_attrs="drop_conflicts")
                   ^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 791, in get_layer
    ds = xr.merge(_open_tiff(f, lyr) for f in fpaths)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\core\merge.py", line 1015, in merge
    for obj in objects:
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 791, in <genexpr>
    ds = xr.merge(_open_tiff(f, lyr) for f in fpaths)
                  ^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 740, in _open_tiff
    ds = rxr.open_rasterio(file)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\rioxarray\_io.py", line 1124, in open_rasterio
    riods = manager.acquire()
            ^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\file_manager.py", line 193, in acquire
    file, _ = self._acquire_with_cache_info(needs_lock)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\file_manager.py", line 217, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\rasterio\env.py", line 451, in wrapper
    return f(*args, **kwds)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\rasterio\__init__.py", line 304, in open
    dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rasterio\\_base.pyx", line 331, in rasterio._base.DatasetBase.__init__
  File "rasterio\\_base.pyx", line 350, in rasterio._base.DatasetBase._set_attrs_from_dataset_handle
  File "rasterio\\_base.pyx", line 408, in rasterio._base.DatasetBase.read_crs
  File "rasterio\\_base.pyx", line 385, in rasterio._base.DatasetBase._handle_crswkt
  File "rasterio\\crs.pyx", line 732, in rasterio.crs.CRS.from_wkt
rasterio.errors.CRSError: The WKT could not be parsed. OGR Error code 5

Anything else we need to know?

Thank you for creating the HyRiver python tools and the tutorials! They are very helpful and I'm excited to use them more in my work. 💦🐍

Environment

sheila_env_yaml.txt

StreamStats vs. NLDI Watershed

I was compiling a list of projects/users for the StreamStats services and Dave Blodgett indicated you were using StreamStats. I see you've replaced StreamStats with NLDI services. I have a couple of additional pieces of information for you:
NLDI and StreamStats are working together to revise the NLDI delineation tools so they will delineate from a click point not just from the catchment.
The data processing steps and quality assurance work, as well as the underlying data in StreamStats, typically mean that delineations from StreamStats will be more accurate than from the NHDPlus datasets being queried in NLDI. For example, South Carolina data is based on lidar data, we're currently working on 3-meter lidar data in Nebraska. Thus, depending on the use, you may want to include the option of using StreamStats as well as NLDI.

Thanks!

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context about the feature request here.

Shapely import issue on Darwin

  • Hydrodata version: 0.4.4
  • Python version: 3.7.7
  • Operating System: macOS 10.14.4
  • Using Conda

Description

I found an issue when importing any from shapely.geometry packages on Shapely 1.7.0.
Getting the following:

File "/Users/austinraney/miniconda3/envs/hydrodata/lib/python3.7/site-packages/shapely/geos.py", line 62, in load_dll
    libname, fallbacks or []))
OSError: Could not find lib cxx or load any of its variants [].

I found the related issue and it seems that the PR fixed it (I tested the change on my system at least). Just something to know about. Hopefully they will update the package on PyPI soon.

What I Did

python -c "from shapely.geometry import Point"

Monthly & annual SSEBop ET available via OPenDAP from USGS THREDDS server

Re: https://github.com/cheginit/hydrodata/blob/master/hydrodata/datasets.py#L854

Since there's still no web service available for subsetting SSEBop, the data first
needs to be downloaded for the requested period then it is masked by the
region of interest locally. Therefore, it's not as fast as other functions and
the bottleneck could be the download speed.

FYI, there is an OPeNDAP endpoint available from the USGS CIDA THREDDS server (managed by David Blodgett, I think) for monthly and annual SSEBop ET -- though not daily:

NLCD not working

What happened:
Running any nlcd* fails.

What you expected to happen:
It should work!

Minimal Complete Verifiable Example:

import pygeohydro  as gh

nlcd = gh.nlcd_bycoords([(-87.11890, 34.70421)])

Anything else we need to know?:

Environment:

Output of pygeohydro.show_versions()
INSTALLED VERSIONS
------------------
commit: 2d4c4ed0aa39f85ff62a47c52645dbc80b9dceb0
python: 3.10.2 | packaged by conda-forge | (main, Jan 14 2022, 08:03:02) [Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 21.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.1
libnetcdf: 4.8.1

aiodns: 3.0.0
aiohttp: 3.8.1
aiohttp-client-cache: 0.6.1
aiosqlite: 0.17.0
async-retriever: 0.3.2.dev20+g8f29acf
bottleneck: 1.3.2
brotli: installed
cchardet: 2.1.7
click: 8.0.3
cytoolz: 0.11.2
dask: 2022.02.0
defusedxml: 0.7.1
folium: 0.12.1.post1
geopandas: 0.10.2
lxml: 4.7.1
matplotlib: 3.5.1
netCDF4: 1.5.8
networkx: 2.6.3
numpy: 1.21.5
owslib: 0.25.0
pandas: 1.4.1
py3dep: 0.12.3.dev9+g469244f
pyarrow: 7.0.0
pydantic: 1.8.1
pydaymet: 0.12.3
pygeohydro: 0.12.4
pygeoogc: 0.12.3.dev16+g99eff81
pygeos: 0.12.0
pygeoutils: 0.12.4.dev2+g7f077f9
pynhd: 0.3.2.dev20+g8f29acf
pyproj: 3.3.0
pytest: 7.0.1
pytest-cov: 3.0.0
rasterio: 1.2.10
requests: 2.26.0
requests-cache: 0.9.2
richdem: 2.3.0
rioxarray: 0.10.0
scipy: 1.8.0
shapely: 1.8.0
tables: 3.7.0
ujson: 5.1.0
urllib3: 1.26.7
xarray: 0.21.1
xdist: 2.5.0
yaml: 6.0
None

Did something change with NWIS?

What happened:
my NWIS example stopped working

What you expected to happen:

Minimal Complete Verifiable Example:

from pygeohydro import NWIS

nwis = NWIS()

start = '1979-02-01T01:00:00'
stop =  '2020-12-31T23:00:00'

sta = ['USGS-01030350', 'USGS-01030500']

ds_obs = nwis.get_streamflow(sta, (start,stop), to_xarray=True)

I tried pygeohydro versions 0.13.0 and 0.13.1

NWIS data retrieval enhancement ideas

It would be great to be able to set some parameter to ensure that the retrieved NWIS data are in UTC.

Also it would be nice to have the ability to return the data (along with metadata such as units!) as an xarray dataset instead of a pandas dataframe.

There is an example NWIS code here by @dnowacki-usgs that optionally returns an xarray dataset.

WBD Feature returning keyerror: 'layers'

What happened?

image
When loading a huc6 byid, received KeyError: "layers"

What did you expect to happen?

Expected huc6 170900 polygon to be loaded into notebook.

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Inclusion of NLCD 2021 Data

Is your feature request related to a problem?

Following the NLCD 2021 data release announcement, I'm looking forward to updating workflows to utilize the most recent land cover data. Would it be possible to add the 2021 data for the pygeohydro nlcd functions?

Describe the solution you'd like

Adding 2021 as a selectable option for the "years" arguments.

Describe alternatives you've considered

No response

Additional context

No response

'utf-8' codec error from pynhd

What happened?

When passing nhd_info=True to nwis.get_info() function, got error. I was able to replicate this error in new Colab environment. with pygeohydro-0.16.0 and pynhd-0.16.2

from pygeohydro import NWIS

Outlet = '01500500'
ParamCd = '00060'

nwis = NWIS()

query = {
    "site": Outlet,
    "parameterCd": ParamCd,
    "siteTypeCd": "ST",
    "hasDataTypeCd": "dv"
}
Outlet_gdf = nwis.get_info(query, expanded=True, nhd_info=True)

What did you expect to happen?

It has been working recently, but got error today.

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

UnicodeDecodeError                        Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/async_retriever/_utils.py](https://localhost:8080/#) in retriever(uid, url, s_kwds, session, read_type, r_kwds, raise_status)
     81         try:
---> 82             return uid, await getattr(response, read_type)(**r_kwds)
     83         except (ClientResponseError, ValueError) as ex:

17 frames
[/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py](https://localhost:8080/#) in text(self, encoding, errors)
   1147 
-> 1148         return self._body.decode(  # type: ignore[no-any-return,union-attr]
   1149             encoding, errors=errors

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 31378: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
[<ipython-input-2-89c816bf29ae>](https://localhost:8080/#) in <cell line: 14>()
     12     "hasDataTypeCd": "dv"
     13 }
---> 14 Outlet_gdf = nwis.get_info(query, expanded=True, nhd_info=True)

[/usr/local/lib/python3.10/dist-packages/pygeohydro/nwis.py](https://localhost:8080/#) in get_info(self, queries, expanded, fix_names, nhd_info)
    385 
    386         if nhd_info:
--> 387             nhd = self._nhd_info(sites["site_no"].to_list())
    388             sites = pd.merge(sites, nhd, left_on="site_no", right_on="site_no", how="left")
    389 

[/usr/local/lib/python3.10/dist-packages/pygeohydro/nwis.py](https://localhost:8080/#) in _nhd_info(site_ids)
    296         except (TypeError, IntCastingNaNError):
    297             area["comid"] = area["comid"].astype("Int32")
--> 298         nhd_area = pynhd.streamcat("fert", comids=area["comid"].dropna().to_list())
    299         area = area.merge(
    300             nhd_area[["COMID", "WSAREASQKM"]], left_on="comid", right_on="COMID", how="left"

[/usr/local/lib/python3.10/dist-packages/pynhd/nhdplus_derived.py](https://localhost:8080/#) in streamcat(metric_names, metric_areas, comids, regions, states, counties, conus, percent_full, area_sqkm)
    666         A dataframe with the requested metrics.
    667     """
--> 668     sc = StreamCatValidator()
    669     names = [metric_names] if isinstance(metric_names, str) else metric_names
    670     names = [sc.alt_names.get(s.lower(), s.lower()) for s in names]

[/usr/local/lib/python3.10/dist-packages/pynhd/nhdplus_derived.py](https://localhost:8080/#) in __init__(self)
    533 class StreamCatValidator(StreamCat):
    534     def __init__(self) -> None:
--> 535         super().__init__()
    536 
    537     def validate(

[/usr/local/lib/python3.10/dist-packages/pynhd/nhdplus_derived.py](https://localhost:8080/#) in __init__(self)
    508 
    509         url_vars = f"{self.base_url}/variable_info.csv"
--> 510         names = pd.read_csv(io.StringIO(ar.retrieve_text([url_vars])[0]))
    511         names["METRIC_NAME"] = names["METRIC_NAME"].str.replace(r"\[AOI\]|Slp[12]0", "", regex=True)
    512         names["SLOPE"] = [

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in retrieve_text(urls, request_kwds, request_method, max_workers, cache_name, timeout, expire_after, ssl, disable, raise_status)
    500     '01646500'
    501     """
--> 502     return retrieve(
    503         urls,
    504         "text",

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in retrieve(urls, read_method, request_kwds, request_method, max_workers, cache_name, timeout, expire_after, ssl, disable, raise_status)
    433     results = (loop.run_until_complete(session(url_kwds=c)) for c in chunked_reqs)
    434 
--> 435     resp = [r for _, r in sorted(tlz.concat(results))]
    436     if new_loop:
    437         loop.close()

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in <genexpr>(.0)
    431     chunked_reqs = tlz.partition_all(max_workers, inp.url_kwds)
    432     loop, new_loop = utils.get_event_loop()
--> 433     results = (loop.run_until_complete(session(url_kwds=c)) for c in chunked_reqs)
    434 
    435     resp = [r for _, r in sorted(tlz.concat(results))]

[/usr/local/lib/python3.10/dist-packages/nest_asyncio.py](https://localhost:8080/#) in run_until_complete(self, future)
     96                 raise RuntimeError(
     97                     'Event loop stopped before Future completed.')
---> 98             return f.result()
     99 
    100     def _run_once(self):

[/usr/lib/python3.10/asyncio/futures.py](https://localhost:8080/#) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception.with_traceback(self._exception_tb)
    202         return self._result
    203 

[/usr/lib/python3.10/asyncio/tasks.py](https://localhost:8080/#) in __step(***failed resolving arguments***)
    230                 # We use the `send` method directly, because coroutines
    231                 # don't have `__iter__` and `__next__` methods.
--> 232                 result = coro.send(None)
    233             else:
    234                 result = coro.throw(exc)

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in async_session_with_cache(url_kwds, read, r_kwds, request_method, cache_name, timeout, expire_after, ssl, raise_status)
    233             for uid, url, kwds in url_kwds
    234         )
--> 235         return await asyncio.gather(*tasks)  # pyright: ignore[reportGeneralTypeIssues]
    236 
    237 

[/usr/lib/python3.10/asyncio/tasks.py](https://localhost:8080/#) in __wakeup(self, future)
    302     def __wakeup(self, future):
    303         try:
--> 304             future.result()
    305         except BaseException as exc:
    306             # This may also be a cancellation.

[/usr/lib/python3.10/asyncio/tasks.py](https://localhost:8080/#) in __step(***failed resolving arguments***)
    230                 # We use the `send` method directly, because coroutines
    231                 # don't have `__iter__` and `__next__` methods.
--> 232                 result = coro.send(None)
    233             else:
    234                 result = coro.throw(exc)

[/usr/local/lib/python3.10/dist-packages/async_retriever/_utils.py](https://localhost:8080/#) in retriever(uid, url, s_kwds, session, read_type, r_kwds, raise_status)
     83         except (ClientResponseError, ValueError) as ex:
     84             if raise_status:
---> 85                 raise ServiceError(await response.text(), str(response.url)) from ex
     86             return uid, None
     87 

[/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py](https://localhost:8080/#) in text(self, encoding, errors)
   1146             encoding = self.get_encoding()
   1147 
-> 1148         return self._body.decode(  # type: ignore[no-any-return,union-attr]
   1149             encoding, errors=errors
   1150         )

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 31378: invalid start byte

Anything else we need to know?

No response

Environment

SYS INFO

commit: None
python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.1.58+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')

PACKAGE VERSION

async-retriever 0.16.0
pygeoogc 0.16.1
pygeoutils 0.16.1
py3dep N/A
pynhd 0.16.2
pygridmet N/A
pydaymet N/A
hydrosignatures 0.16.0
pynldas2 N/A
pygeohydro 0.16.0
aiohttp 3.9.3
aiohttp-client-cache 0.11.0
aiosqlite 0.20.0
cytoolz 0.12.3
ujson 5.9.0
defusedxml 0.7.1
joblib 1.3.2
multidict 6.0.5
owslib 0.30.0
pyproj 3.6.1
requests 2.31.0
requests-cache 1.2.0
shapely 2.0.3
url-normalize 1.4.3
urllib3 2.0.7
yarl 1.9.4
geopandas 0.13.2
netcdf4 1.6.5
numpy 1.25.2
rasterio 1.3.9
rioxarray 0.15.3
scipy 1.11.4
xarray 2023.7.0
click 8.1.7
pyflwdir N/A
networkx 3.2.1
pyarrow 14.0.2
folium 0.14.0
h5netcdf 1.3.0
matplotlib 3.7.1
pandas 2.0.3
numba 0.58.1
bottleneck N/A
py7zr N/A
pyogrio N/A

Error when using readme tutorial on gh.nlcd_bygeom() and gh.cover_statistics()... Updated syntax?

Hello,

First of all thanks for this tutorial & code, super helpful. Just wondering if the syntax has been updated or if I'm just misunderstanding:

I tired to run the following code in the readme:

import pygeohydro as gh
from pynhd import NLDI

basins = NLDI().get_basins(["01031450", "01031500", "01031510"])
lulc = gh.nlcd_bygeom(geometry, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc.cover_2016) 

and got the following error:
NameError: name 'geometry' is not defined

Well no surprise... I changed "geometry" to "basins" and reran to get the follow error message:
AttributeError: 'dict' object has no attribute 'cover_2016

So ultimate I ran something like this:


basins = NLDI().get_basins(["01031450", "01031500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01031450"]['cover_2016'])
stats

{'classes': {'Open Water': 2.846106932303314,
'Developed, Open Space': 2.240292742427323,
'Developed, Low Intensity': 0.5488920512299248,
'Developed, Medium Intensity': 0.1910957511689368,
'Developed, High Intensity': 0.028461069323033134,
'Deciduous Forest': 28.660296808294365,
'Evergreen Forest': 14.474486684285424,
'Mixed Forest': 29.58731449481602,
'Shrub-Forest': 11.20552957918276,
'Herbaceous-Forest': 2.4232567595039645,
'Shrub/Scrub': 0.14637121366131328,
'Grassland/Herbaceous': 0.028461069323033134,
'Pasture/Hay': 0.5976824557836958,
'Cultivated Crops': 0.06505387273836145,
'Woody Wetlands': 6.753405163651148,
'Emergent Herbaceous Wetlands': 0.20329335230737955},
'categories': {'Background': 0.0,
'Unclassified': 0.0,
'Water': 2.846106932303314,
'Developed': 3.0087416141492174,
'Barren': 0.0,
'Forest': 86.35088432608254,
'Shrubland': 0.14637121366131328,
'Herbaceous': 0.028461069323033134,
'Planted/Cultivated': 0.6627363285220573,
'Wetlands': 6.956698515958529}}

NHDPlus Implementation

Before I offer my suggestion, I may be missing the utility of shipping the NHD with the repo. With that in mind and if you don't mind elaborating later, what are your thought on moving away from shipping the NHDPlus dataset to users and instead relying on the USGS's api to verify and obtain gauge metadata? It should be a straight forward call that doesn't require a key.

Missing examples/tutorial.ipynb

Description

Its seems examples/tutorial.ipynb which was added on commit 68fe37f has been removed. What are the future plans for examples/ and if you would like me to write something up, do you have features in mind you would like me to showcase?

NLCD not working

What happened:
The NLCD service is down.

What you expected to happen:
Layer names have been changed and some of the science products are not available as well.

Minimal Complete Verifiable Example:

import pygeohydro as gh
from pynhd import NLDI

geometry = NLDI().get_basins("01031500").geometry[0]
lulc = gh.nlcd(geometry, 100, years={"impervious": None, "cover": 2016, "canopy": None})

Anything else we need to know?:

The MRLC developers are working on updating the database and adding 2019 version. On their website they state that the science product will be added soon. I contacted them and they gave a two-week time frame for bringing the service back with the new dataset.

I have already added support for the new dataset (with this commit and tested it with the mrlc_display layers (not the science product). Once it's back up I will carry out the final tests and release a new version.

Environment:

Output of pygeohydro.show_versions()
INSTALLED VERSIONS
------------------
commit: 90ee816f2741e0f969327406fd17f77676a8a62e
python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) 
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

affine: 2.3.0
aiohttp: 3.7.4.post0
aiohttp-client-cache: 0.4.0
aiosqlite: 0.17.0
async-retriever: 0.2.1.dev28+ga7c8f33.d20210706
click: 7.1.2
cytoolz: 0.11.0
dask: 2021.06.2
defusedxml: 0.7.1
folium: unknown
geopandas: 0.9.0
lxml: 4.6.3
matplotlib: 3.4.2
nest-asyncio: installed
netCDF4: 1.5.6
networkx: 2.5.1
numpy: 1.21.0
orjson: 3.5.4
owslib: 0.24.1
pandas: 1.3.0
pip: 21.1.3
py3dep: 0.11.1.dev18+g13e8ea1
pyarrow: 4.0.0
pydantic: 1.8.2
pydaymet: 0.11.1.dev12+g1096693.d20210706
pygeohydro: 0.11.1.dev11+g90ee816.d20210706
pygeoogc: 0.11.1.dev31+g1c457f5.d20210706
pygeoutils: 0.11.2.dev25+g05892c6.d20210706
pynhd: 0.11.1.dev13+g3b76c3d.d20210706
pyproj: 3.1.0
pytest: 6.2.4
rasterio: 1.2.6
requests: 2.25.1
requests-cache: 0.6.4
scipy: 1.7.0
setuptools: 49.6.0.post20210108
shapely: 1.7.1
simplejson: 3.17.2
urllib3: 1.26.6
ward: None
xarray: 0.18.2
yaml: 5.4.1
None

Inconsistent results between groups of coordinates and single coordinates.

    import geopandas as gpd
    import pygeohydro as gh

    DATA_URL = (
        "Resources/Overlays/Landmarks/Energy_-_Nuclear/Energy_-_Nuclear.shp"
    )
    gdf = gpd.read_file(DATA_URL).to_crs("epsg:4326").head(3)
    coords = list(zip(gdf.geometry.x, gdf.geometry.y))

    print("Data read in...")

    result = gh.nlcd_bycoords(coords, years={"cover": [2019]})
    print(f"PyGeoHydro\n----------\n{result}")

    print("\n\nTesting single points\n")

    coord = (gdf.geometry[0].x, gdf.geometry[0].y)
    result = gh.nlcd_bycoords([coord], years={"cover": [2019]})
    print(f"PyGeoHydro\n----------\n{result}")

This produces:

Data read in...
PyGeoHydro
----------
                     geometry  cover_2019
0  POINT (-87.11890 34.70420)          23
1  POINT (-88.83390 40.17190)          24
2  POINT (-95.68978 38.23926)          24


Testing single points

PyGeoHydro
----------
                     geometry  cover_2019
0  POINT (-87.11890 34.70420)          24

The first point is the same in each case but the land use value differs.

Anything else we need to know?:

Environment:

Output of pygeohydro.show_versions() ``` INSTALLED VERSIONS ------------------ commit: 9729f67e75fe31fa6b5eb122562e4c0c22792c6d python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:07) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

aiohttp-client-cache>=0.5.1: None
aiohttp>=3.8.1: None
aiosqlite: 0.17.0
async-retriever: 0.3.1
async-retriever>=0.3.1: None
cytoolz: 0.11.2
dask: 2021.12.0
defusedxml: 0.7.1
folium: 0.12.1.post1
geopandas>=0.7: None
lxml: 4.7.1
matplotlib>=3.0: None
netCDF4: 1.5.8
networkx: 2.6.3
numpy>=1.17: None
owslib: 0.25.0
pandas>=1.0: None
pip: 21.3.1
py3dep: None
pyarrow: 6.0.1
pydantic: 1.9.0
pydaymet: None
pygeohydro: 0.12.2
pygeoogc: 0.12.1
pygeoogc>=0.12: None
pygeoutils: 0.12.1
pygeoutils>=0.12: None
pynhd: 0.3.1
pynhd>=0.12: None
pyproj>=2.2: None
pytest: None
rasterio>=1.2: None
requests: 2.27.1
requests-cache>=0.8: None
rioxarray>=0.8: None
scipy: 1.7.3
setuptools: 60.5.0
shapely>=1.6: None
ujson: 5.1.0
urllib3: 1.26.8
ward: None
xarray>=0.18: None
yaml: 6.0

</details>

ImportError: DLL load failed: The specified procedure could not be found.

What happened?

I just installed Pygeohydro on my Anaconda by using Conda install prompt. When I imported this package, this issue happened.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

ImportError                               Traceback (most recent call last)
<ipython-input-3-9e1575f9c961> in <module>
----> 1 import pygeohydro

C:\Anaconda\lib\site-packages\pygeohydro\__init__.py in <module>
      4 from .exceptions import InvalidInputRange, InvalidInputType, InvalidInputValue, ZeroMatched
      5 from .print_versions import show_versions
----> 6 from .pygeohydro import (
      7     NID,
      8     NWIS,

C:\Anaconda\lib\site-packages\pygeohydro\pygeohydro.py in <module>
     14 import pandas as pd
     15 import pygeoogc as ogc
---> 16 import pygeoutils as geoutils
     17 import rasterio as rio
     18 import xarray as xr

C:\Anaconda\lib\site-packages\pygeoutils\__init__.py in <module>
      4 from .exceptions import InvalidInputType, InvalidInputValue
      5 from .print_versions import show_versions
----> 6 from .pygeoutils import MatchCRS, arcgis2geojson, gtiff2xarray, json2geodf
      7 
      8 try:

C:\Anaconda\lib\site-packages\pygeoutils\pygeoutils.py in <module>
     14 import orjson as json
     15 import pyproj
---> 16 import rasterio as rio
     17 import rasterio.mask as rio_mask
     18 import rasterio.transform as rio_transform

C:\Anaconda\lib\site-packages\rasterio\__init__.py in <module>
     20             pass
     21 
---> 22 from rasterio._base import gdal_version
     23 from rasterio.drivers import is_blacklisted
     24 from rasterio.dtypes import (

ImportError: DLL load failed: The specified procedure could not be found.

Anything else we need to know?

I change the Python version to 3.6 and this issue can be solved. However, a new issue happen.
AttributeError Traceback (most recent call last)
in
----> 1 import pygeohydro

C:\Anaconda\envs\env\lib\site-packages\pygeohydro_init_.py in
1 from pkg_resources import DistributionNotFound, get_distribution
2
----> 3 from . import helpers, plot
4 from .exceptions import InvalidInputRange, InvalidInputType, InvalidInputValue
5 from .print_versions import show_versions

C:\Anaconda\envs\env\lib\site-packages\pygeohydro\helpers.py in
5 import numpy as np
6 import pandas as pd
----> 7 from pygeoogc import RetrySession
8
9

C:\Anaconda\envs\env\lib\site-packages\pygeoogc_init_.py in
24
25 if sys.platform.startswith("win"):
---> 26 asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

AttributeError: module 'asyncio' has no attribute 'WindowsSelectorEventLoopPolicy'

Environment

0.11.0

The tree canopy layer name has changed

What happened?

The MRLC web service has changed the layer names of the tree canopy layers.

What did you expect to happen?

Requesting for canopy shouldn't fail.

Minimal Complete Verifiable Example

from pygeoogc import WMS, ServiceURL

wms = WMS(
    ServiceURL().wms.mrlc,
    layers="NLCD_2011_Tree_Canopy_L48",
    outformat="image/geotiff",
    crs=4326,
    validation=False,
)
wms_resp = wms.getmap_bybox(
    (-69.77, 45.07, -69.31, 45.45),
    1e3,
)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

SYS INFO

commit: None
python: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:07:22) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
libhdf5: 1.14.0
libnetcdf: 4.9.2

PACKAGE VERSION

aiodns 3.0.0
aiohttp 3.8.4
aiohttp-client-cache 0.8.1
aiosqlite 0.19.0
async-retriever 0.15.0
bottleneck 1.3.7
brotli 1.0.9
click 8.0.3
cytoolz 0.12.0
dask 2023.5.1
defusedxml 0.7.1
folium 0.14.0
geopandas 0.13.0
h5netcdf 1.2.0
hydrosignatures 0.15.1.dev3+gfaa6354
lxml 4.9.2
matplotlib 3.7.1
netCDF4 1.6.3
networkx 3.1
numba 0.57.0
numpy 1.24.3
owslib 0.29.2
pandas 2.0.2
py3dep 0.14.1.dev30+g4eb740f
pyarrow 12.0.0
pydaymet 0.14.1.dev20+g9aa0d8b
pygeohydro 0.15.1.dev1+g1c902b0.d20230523
pygeoogc 0.14.1.dev31+g0c6d4f1
pygeos N/A
pygeoutils 0.14.1.dev22+gf377c19
pynhd 0.14.1.dev38+g7d12f75
pynldas2 0.14.1.dev27+g8d2f7cb
pyproj 3.5.0
pytest 7.3.1
pytest-cov 4.1.0
rasterio 1.3.7
requests 2.31.0
requests-cache 1.0.1
richdem N/A
rioxarray 0.14.1
scipy 1.10.1
shapely 2.0.1
tables 3.8.0
ujson 5.7.0
urllib3 2.0.2
xarray 2023.5.0
xdist N/A
yaml N/A

NLCD by location

For an aviation use case - "Where did the drone launch?' - I'd like to get the land use for a lot of points in the U.S.

A similar use case that I use is "What is the elevation at a particular point?" To answer this, I run an https://open-elevation.com/ docker instance and use the API to pass in thousands of lat/lon pairs. It returns the pairs with the elevation and I take the elevation and add it as a column to my dataframe.

A good solution would be a function that operated on a local copy of the NLCD database, took lat/lon pairs, and returned the text description of the land use. The pairs could be distinct lat / lon values or possibly a geodataframe with one or more points.

If this function was vectorized and could process large numbers of points quickly that would be a bonus but not necessary.

The best option I could come up with involved setting a bounding box around each point. (See Discussions for details.)

Add support for Water Quality Portal

Is your feature request related to a problem? Please describe.
The Water Quality Portal has a RESTful service that could be useful.

Describe the solution you'd like
The documentation for this service can be found here

Describe alternatives you've considered
N/A

Additional context
N/A

Add support for SensorThings

Is your feature request related to a problem? Please describe.
No. USGS water data has a new web service called SensorThings that provides access to many USGS datasets.

Describe the solution you'd like
A demo repository of its initial implementation is here.

Describe alternatives you've considered
N/A

Additional context
N/A

Retrieve records availability from NWIS

Is your feature request related to a problem?

Currently, the get_info function from NWIS does not return info like data availability range.

from pygeohydro import NWIS

nwis = NWIS()
SiteID = "01636500"
ParamCd = "00060"
query = {
    "site": SiteID,
    "parameterCd": ParamCd,
    "siteStatus": "all",
}
SiteInfo = nwis.get_info(query, expanded=True)
print(SiteInfo.columns)
Index(['agency_cd', 'site_no', 'station_nm', 'site_tp_cd', 'dec_lat_va',
       'dec_long_va', 'coord_acy_cd', 'dec_coord_datum_cd', 'alt_va',
       'alt_acy_va', 'alt_datum_cd', 'huc_cd', 'lat_va', 'long_va',
       'coord_meth_cd', 'coord_datum_cd', 'district_cd', 'state_cd',
       'county_cd', 'country_cd', 'land_net_ds', 'map_nm', 'map_scale_fc',
       'alt_meth_cd', 'basin_cd', 'topo_cd', 'instruments_cd',
       'construction_dt', 'inventory_dt', 'drain_area_va',
       'contrib_drain_area_va', 'tz_cd', 'local_time_fg', 'reliability_cd',
       'gw_file_cd', 'hcdn_2009', 'geometry'],
      dtype='object')

But it's available in the xarray retrieved using get_streamflow function, begin_date and end_date .

SiteFlow = nwis.get_streamflow(SiteID, dates=("2010-01-01", "2010-01-05"), to_xarray=True)
SiteFlow

image

I feel it's better to examine the availability range, then decide the dates we use in get_streamflow.

Describe the solution you'd like

It's directly available thru NWIS site service by setting seriesCatalogOutput to True:

url = f"https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=01636500&seriesCatalogOutput=true&siteStatus=all&hasDataTypeCd=dv&outputDataTypeCd=dv"
r = requests.get(url, allow_redirects=True)
content = r.content.decode('utf-8')
lines = content.split('\n')
start_index = next(i for i, line in enumerate(lines) if not line.startswith('#'))
column_names = lines[start_index].split('\t')
data_rows = [line.split('\t') for line in lines[start_index+2:] if line.strip()]
df = pd.DataFrame(data_rows, columns=column_names)
df

image

Describe alternatives you've considered

No response

Additional context

No response

Example code for NWIS query does not work

What happened:
Running the example code to generate a list of NWIS sites throws the following error 'NWIS' object has no attribute 'query_bybox'

Minimal Complete Verifiable Example:

from pygeohydro import NWIS

nwis = NWIS()
query = {
    **nwis.query_bybox(bbox),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
    (info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()

Environment:

Output of pygeohydro.show_versions() INSTALLED VERSIONS

commit: None
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-167-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: en_US.ISO8859-1
libhdf5: 1.12.1
libnetcdf: 4.8.1

aiodns: 3.0.0
aiohttp: 3.8.1
aiohttp-client-cache: 0.7.1
aiosqlite: 0.17.0
async-retriever: 0.3.3
bottleneck: 1.3.4
brotli: installed
cchardet: 2.1.7
click: 6.7
cytoolz: 0.11.2
dask: 2022.6.1
defusedxml: 0.7.1
folium: 0.12.1.post1
geopandas: 0.11.0
lxml: 4.8.0
matplotlib: 3.4.3
netCDF4: 1.6.0
networkx: 2.8.4
numpy: 1.23.0
owslib: 0.25.0
pandas: 1.4.3
py3dep: 0.13.1
pyarrow: 6.0.1
pydantic: 1.9.1
pydaymet: 0.13.1
pygeohydro: 0.13.2
pygeoogc: 0.13.2
pygeos: 0.12.0
pygeoutils: 0.13.2
pynhd: 0.13.2
pyproj: 3.3.0
pytest: None
pytest-cov: None
rasterio: 1.2.10
requests: 2.28.1
requests-cache: 0.9.4
richdem: 0.3.4
rioxarray: 0.11.1
scipy: 1.8.1
shapely: 1.8.2
tables: 3.7.0
ujson: 5.3.0
urllib3: 1.26.9
xarray: 2022.3.0
xdist: None
yaml: 6.0```

</details>

NLCD not working

What happened?

Running any nlcd* fails, with the following error:
ServiceUnavailableError: Service is currently not available, try again later: https://www.mrlc.gov/geoserver/mrlc_download/wms

What did you expect to happen?

I expected the service to be available again within a few days. But the outage is continuing longer than I expected. i am wondering if there is a breaking change in the API.

Minimal Complete Verifiable Example

import pygeohydro  as gh
nlcd = gh.nlcd_bycoords([(-87, 34)])

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-72-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.8.0

aiodns: 3.0.0
aiohttp: 3.8.3
aiohttp-client-cache: 0.7.3
aiosqlite: 0.17.0
async-retriever: 0.3.6
bottleneck: 1.3.5
brotli: installed
cchardet: 2.1.7
click: 7.1.2
cytoolz: 0.12.0
dask: 2022.10.0
defusedxml: 0.7.1
folium: 0.13.0
geopandas: 0.11.1
lxml: 4.8.0
matplotlib: 3.4.3
netCDF4: 1.5.7
networkx: 2.8.7
numpy: 1.23.4
owslib: 0.27.2
pandas: 1.5.1
py3dep: 0.13.6
pyarrow: 5.0.0
pydantic: 1.10.2
pydaymet: 0.13.6
pygeohydro: 0.13.6
pygeoogc: 0.13.6
pygeos: 0.10.2
pygeoutils: 0.13.6
pynhd: 0.13.6
pyproj: 3.3.1
pytest: None
pytest-cov: None
rasterio: 1.2.1
requests: 2.28.1
requests-cache: 0.9.6
richdem: 0.3.4
rioxarray: 0.12.2
scipy: 1.9.2
shapely: 1.8.0
tables: None
ujson: 5.5.0
urllib3: 1.26.11
xarray: 2022.10.0
xdist: None
yaml: 6.0

Handling server disconnects

I need to run millions of points against the new gh.nlcd_bycoords. During a recent run, the server disconnected:

  File "/home/kovar/anaconda3/envs/a50-dev/lib/python3.9/site-packages/aiohttp/c
lient.py", line 559, in _request
    await resp.start(conn)
  File "/home/kovar/anaconda3/envs/a50-dev/lib/python3.9/site-packages/aiohttp/c
lient_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/home/kovar/anaconda3/envs/a50-dev/lib/python3.9/site-packages/aiohttp/s
treams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Questions:

  1. Should pygeohydro handle this more gracefully or:
  2. Is it up to me to handle it and, if so, how?
  3. I need to rerun this, and many other collections of points. Am I abusing the server? Is the a way to do this locally?

Thank you.

-David

get_streamflow() to_xarray inconsistent dtypes

What happened:
Repeated calls to get_streamflow() returning an xarray DataSet have different dtypes for some fields (notably, strings).

What you expected to happen:
The returned encodings/schema would be consistent for all calls, and match the internal
schema of the NWIS database from which the data is fetched.

Minimal Complete Verifiable Example:

from pygeohydro import NWIS
nwis=NWIS()
DATE_RANGE=("2020-01-01", "2020-12-31")
site_A = nwis.get_streamflow('USGS-402114105350101', DATE_RANGE, to_xarray=True )
site_B = nwis.get_streamflow('USGS-02277600', DATE_RANGE, to_xarray=True )

assert site_A['station_nm'].dtype == site_B['station_nm'].dtype
## fails

assert site_A['alt_datum_cd'].dtype == site_B['alt_datum_cd'].dtype
## fails

Anything else we need to know?:
This has come up for me as I try to fetch streamflow data one gage at a time as part of a parallelized workflow -- each worker fetches one streamgage, manipulates it, then appends to a common dataset (in my case, a zarr store). The common zarr store was templated using NWIS.get_streamflow() data, which established the 'standard' dtypes.

The dtypes for these particular fields (station_nm and alt_datum_cd) are unicode strings, with the length of the string (and the dtype) being that of the returned data for a given request. That is, the dtype for Site_A's alt_datum_cd (above) is '<U6' because the data happens to be 6 chars for that gage. For Site_B's alt_datum_cd, the dtype is '<U1'. It isn't just that the string is shorter, the dtype is different, which causes the zarr write to fail.

I can work around this by re-casting in the case of these two strings:

Site_B['alt_datum_cd'] = xr.DataArray(data=Site_B['alt_datum_cd'].values.astype('<U6'), dims='gage_id')

But in the case of the station name field, I don't know what the max length might be from the database. I can cast to '<U46' (the dtype for Site_A's station_nm), but other gages may have longer names, which will be truncated when cast to this dtype.

It would be useful to have get_streamflow() return the same string encoding/dtype in all cases, so that separate calls can be treated identically.

Environment:

Output of pygeohydro.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) 
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.181-99.354.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.2
libnetcdf: 4.8.1
aiodns: 3.0.0
aiohttp: 3.8.3
aiohttp-client-cache: 0.7.3
aiosqlite: 0.17.0
async-retriever: 0.3.6
bottleneck: 1.3.5
brotli: installed
cchardet: 2.1.7
click: 8.0.2
cytoolz: 0.12.0
dask: 2022.04.2
defusedxml: 0.7.1
folium: 0.13.0
geopandas: 0.11.1
lxml: 4.9.1
matplotlib: 3.4.3
netCDF4: 1.6.0
networkx: 2.8.7
numpy: 1.23.3
owslib: 0.27.2
pandas: 1.4.2
py3dep: 0.13.6
pyarrow: 9.0.0
pydantic: 1.10.2
pydaymet: 0.13.6
pygeohydro: 0.13.6
pygeoogc: 0.13.6
pygeos: 0.13
pygeoutils: 0.13.6
pynhd: 0.13.6
pyproj: 3.4.0
pytest: None
pytest-cov: None
rasterio: 1.3.2
requests: 2.28.1
requests-cache: 0.9.6
richdem: None
rioxarray: 0.12.2
scipy: 1.9.1
shapely: 1.8.4
tables: 3.7.0
ujson: 5.5.0
urllib3: 1.26.11
xarray: 2022.9.0
xdist: None
yaml: 5.4.1

Error "cannot import name 'MatchCRS' from 'pygeoogc'" when importing pynhd or pygeohydro

What happened?

Got error: "cannot import name 'MatchCRS' from 'pygeoogc'" when importing pynhd. Windows conda env info:

  • python: 3.9.16
  • pynhd: 0.2.0
  • pygeoogc: 0.14.0
  • pygeohydro: 0.11.0

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import pynhd

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[3], line 1
----> 1 import pynhd
      2 import pandas
      3 import geopandas

File ~\anaconda3\envs\flow_Ml\lib\site-packages\pynhd\__init__.py:7
      5 from .network_tools import prepare_nhdplus, topoogical_sort, vector_accumulation
      6 from .print_versions import show_versions
----> 7 from .pynhd import NLDI, NHDPlusHR, WaterData
      9 try:
     10     __version__ = get_distribution(__name__).version

File ~\anaconda3\envs\flow_Ml\lib\site-packages\pynhd\pynhd.py:10
      8 import pygeoogc as ogc
      9 import pygeoutils as geoutils
---> 10 from pygeoogc import WFS, ArcGISRESTful, MatchCRS, RetrySession, ServiceURL
     11 from requests import Response
     12 from shapely.geometry import MultiPolygon, Polygon

ImportError: cannot import name 'MatchCRS' from 'pygeoogc'

Anything else we need to know?

No response

Environment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.