Giter Site home page Giter Site logo

pysatmadrigal's Introduction

pysatMadrigal

pysatMadrigal: pysat support for Madrigal data sets

Documentation Status Pytest with Flake8 Coverage Status DOI PyPI version

pysatMadrigal allows users to import data from the Madrigal database into pysat (pysat documentation).

Installation

The following instructions provide a guide for installing pysatMadrigal and give some examples on how to use the routines.

Prerequisites

pysatMadrigal uses common Python modules, as well as modules developed by and for the Space Physics community. This module officially supports Python 3.6+.

Common modules Community modules
h5py madrigalWeb>=2.6
numpy pysat >= 3.1.0
pandas
xarray

PyPi Installation

pip install pysatMadrigal

GitHub Installation

git clone https://github.com/pysat/pysatMadrigal.git

Change directories into the repository folder and run the setup.py file. For a local install use the "--user" flag after "install".

cd pysatMadrigal/
python -m build .
pip install .

Examples

The instrument modules are portable and designed to be run like any pysat instrument.

import pysat
from pysatMadrigal.instruments import dmsp_ivm
ivm = pysat.Instrument(inst_module=dmsp_ivm, tag='utd', inst_id='f15')

Another way to use the instruments in an external repository is to register the instruments. This only needs to be done the first time you load an instrument. Afterward, pysat will identify them using the platform and name keywords.

pysat.utils.registry.register('pysatMadrigal.instruments.dmsp_ivm')
ivm = pysat.Instrument('dmsp', 'ivm', tag='utd', inst_id='f15')

The package also include analysis tools. Detailed examples are in the documentation.

pysatmadrigal's People

Contributors

aburrell avatar asher-pembroke avatar asherp avatar jklenzing avatar jonathonmsmith avatar rstoneback avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

cnsnln

pysatmadrigal's Issues

Unit tests DMSP methods

Description

Add unit tests for DMSP methods. Example in documentation provides a place to start for running functions.

Potential impact

  • Is the feature related to an existing problem?
    • Code needs tests.
  • How critical is this feature to your workflow?
  • How wide of an impact to you anticipate this enhancement having?
    • Confirms utility of existing functions.
  • Would this break any existing functionality?
    • Not likely.

Potential solution(s)

Add unit tests for the uncovered DMSP methods.

Alternatives

Leaving as is.

Additional context

Example in documentation runs some of the functions at hand.

pysat imports

The current setup needs to be changed to ensure all the pysat dependencies are installed.

BUG: remote_file_list

When calling inst.remote_file_list:

        for i, temp in enumerate(files):

            for j, key in enumerate(keys):

>               val = temp[key_str_idx[0][j]:key_str_idx[1][j]]

E               TypeError: 'MadrigalExperimentFile' object is not subscriptable

Now that remote_file_list is part of the tests inherited from pysat, this bug has appeared in the unit tests.

ENH/BUG: "single-file" instruments in general pandas instrument

Description

Some of the tags in the general pandas instrument have a single file for all data. The current setup does not make fake files for the file list, so if I load tag='120', I have to know to load data for 11-27-1963 to get the data I want. Any other data will return an empty data array. This is not intuitive.

Potential impact

  • Problem identified as part of #89
  • During unit tests, each file is loaded ~8 times, due to the different cleaning levels and strict_time_flag tests.
  • Downstream users may have to change how they access data

Potential solution(s)

Move these files out of the generalized pandas instrument (see #89) where a potential speed gain in the unit tests may result. See below for run times.

Alternatives

A simple solution to the bug part of this is to fix the list_files routine to create a fake daily list where all dates point to the core file and potentially down-select data after the load.

Additional context

Stats for affected instruments only. This may affect more instruments, only checking for the ones with unit tests.

Tag Load time
120 56.5s
210 21.9s
212 48.9s

TST: Expand instrument method coverage

There are a number of additional functions in the instruments here not in the standard output. Additional tests can be added using the pysat test instruments.

Load simple files

Allow simple text files to be loaded by the general pysatMadrigal methods load routine. This will expand the file support to all standard file formats.

BUG: netCDF with Python 3.12 on Mac

Description

self = <pysatMadrigal.tests.test_instruments.TestInstruments object at 0x119549a30>
clean_level = 'clean'
inst_dict = {'inst_id': '', 'inst_module': <module 'pysatMadrigal.instruments.gnss_tec' from '/Users/runner/work/pysatMadrigal/pys...uments/gnss_tec.py'>, 'tag': 'site', 'user_info': {'password': '[email protected]', 'user': 'pysat+CI_tests'}}

    @pytest.mark.second
    # Need to maintain download mark for backwards compatibility.
    # Can remove once pysat 3.1.0 is released and libraries are updated.
    @pytest.mark.load_options
    @pytest.mark.download
    @pytest.mark.parametrize("clean_level", ['none', 'dirty', 'dusty', 'clean'])
    def test_load(self, clean_level, inst_dict):
        """Test that instruments load at each cleaning level.
    
        Parameters
        ----------
        clean_level : str
            Cleanliness level for loaded instrument data.
        inst_dict : dict
            Dictionary containing info to instantiate a specific instrument.
            Set automatically from instruments['download'] when
            `initialize_test_package` is run.
    
        """
    
        test_inst, date = initialize_test_inst_and_date(inst_dict)
        if len(test_inst.files.files) > 0:
            # Set Clean Level
            test_inst.clean_level = clean_level
            target = 'Fake Data to be cleared'
            test_inst.data = [target]
            try:
>               test_inst.load(date=date, use_header=True)

/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pysat/tests/classes/cls_instrument_library.py:343: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pysat/_instrument.py:3332: in load
    self.data, meta = self._load_data(date=self.date, fid=self._fid,
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pysat/_instrument.py:1619: in _load_data
    data, mdata = self._load_rtn(load_fname, tag=self.tag,
pysatMadrigal/instruments/gnss_tec.py:315: in load
    data, meta, lat_keys, lon_keys = gnss.load_site(fnames)
pysatMadrigal/instruments/methods/gnss.py:1[54](https://github.com/pysat/pysatMadrigal/actions/runs/8251407088/job/22619624627?pr=98#step:10:55): in load_site
    data, meta = general.load(fnames, 'site', '', xarray_coords=xcoords)
pysatMadrigal/instruments/methods/general.py:780: in load
    file_data = xr.open_dataset(load_file_types["netCDF4"][0],
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/api.py:573: in open_dataset
    backend_ds = backend.open_dataset(
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:646: in open_dataset
    store = NetCDF4DataStore.open(
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:409: in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:3[56](https://github.com/pysat/pysatMadrigal/actions/runs/8251407088/job/22619624627?pr=98#step:10:57): in __init__
    self.format = self.ds.data_model
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:418: in ds
    return self._acquire()
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:412: in _acquire
    with self._manager.acquire_context(needs_lock) as root:
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py:137: in __enter__
    return next(self.gen)
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/file_manager.py:199: in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
src/netCDF4/_netCDF4.pyx:2492: in netCDF4._netCDF4.Dataset.__init__
    ???
src/netCDF4/_netCDF4.pyx:1927: in netCDF4._netCDF4._get_vars
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   RuntimeError: NetCDF: HDF error

src/netCDF4/_netCDF4.pyx:2034: RuntimeError

Test configuration

  • OS: "mac-latest"
  • Version: Python 3.12
  • Other details about your setup that could be relevant: netCDF4-1.6.5-cp312-cp312-macosx_10_9_x86_64.whl.metadata

ENH: add line-of-sight TEC support to `gnss_tec` instrument

Madrigal provides two types of TEC files. The first is the VTEC map. The second, LOS, provides slant TEC. These files are much larger and should not be loaded in their entirety. Because this is more complicated, adding this data set is being treated as a separate issue.

When doing this, we need to determine the best way to load the LOS data, and how to best use sat_id to define method.

For help with this, refer to: http://cedar.openmadrigal.org/static/siteSpecific/programming_tips.pdf

ENH: stop pytest at first failure

Description

Unit tests occasionally fail because of issued accessing Madrigal. These need to be restarted and don't need to run all the way through.

Potential impact

  • Is the feature related to an existing problem? Tests take a long time to run
  • How critical is this feature to your workflow? Moderate
  • How wide of an impact to you anticipate this enhancement having? Making devs lives easier
  • Would this break any existing functionality? No

Potential solution(s)

Add the -x flag to the pytest command, see: https://stackoverflow.com/questions/36804181/long-running-py-test-stop-at-first-failure

Alternatives

It can be informative to see where all the failures are at once, but here we frequently get unit test failures early on due to the downloads where all other tests pass.

BUG: list_remote_files and two_digit_year_break

Description

pysatMadrigal.general.list_remote_files only accepts a single input for two_digit_year_break though the use of this keyword may need to vary for different tags. For examples, gnss_tec the vtec tag has a two digit year in filename, while site does not.

To Reproduce this bug:

Steps to reproduce the behavior:

  1. gnss = pysat.Instrument("gnss", "tec", 'vtec')
  2. gnss.remote_file_list(dt.datetime(2020, 12, 1), dt.datetime(2020, 12, 2), user=user, password=password)
  3. See error

Consider including images or test files to help others reproduce the bug and
solve the problem.

Test configuration

  • OS: MacOS
  • Version Python 3.9
  • Other details about your setup that could be relevant

Additional context

Add any other context about the problem here.

ENH: use directory creation from pysat

Description

Pysat version > 3.0.1 will have a routine in pysat.utils.files that creates missing directories in a structure. This will be useful in at least the test_methods_general classes.

Potential impact

  • Is the feature related to an existing problem? Reduce code/duplicate code in unit tests
  • How critical is this feature to your workflow? Critical for testing.
  • How wide of an impact to you anticipate this enhancement having? Local impact.
  • Would this break any existing functionality? No, functionality remains the same.

Potential solution(s)

Replace local code with upstream function.

Alternatives

Keep local code.

STY: Update standards in conf.py

  • Update variables for better maintainability
  • Link authors to zenodo.json
  • Scrub for flake8
  • Remove exclude = conf.py from setup.cfg

TEC with HDF5 and netCDF4

Line 115 of methods/general.py: file_data = filed['Data']['Table Layout']

From @rstoneback

I think I sorted out where the issue pops up. If I only work with hdf files, or only netcdf files, things work. However, if I say have hdf files, then download netcdf files, then the original hdf files no longer work. I think the same the other way as well. As long as folks stick with only one file type then it works, but there is potential for users to break things down the road.

I'm sure you could fix it, but since the hdfs take much longer to load, you may want to consider going netcdf only for this instrument.

DOC: add more examples

Improve documentation by adding examples for:

  • plotting VTEC (re-creating gnss_tec_vtec_example.png)
  • using methods.dmsp.smooth_ram_drifts
  • using methods.dmsp.update_DMSP_ephemeris
  • using methods.dmsp.add_drift_unit_vectors
  • using methods.dmsp.add_drifts_polar_cap_x_y
  • using methods.jro.calc_measurement_loc

TST: Expand testing for pandas to xarray

Description

The pandas to xarray conversion method is non-trivial. My experience is any area not tested has bugs. Expanded testing for the pandas to xarray conversion will help ensure that everything is working as expected.

Generalised Madrigal Instrument

The new general methods for Madrigal data that uses MadrigalWeb makes it possible to create a general instrument object for any data stored there. Users could supply the instrument and data code at instantiation instead of a name, and the only thing that would be lacking that the specific instruments have are targeted acknowledgements and the clean routine. Using the Madrigal instrument and experiment codes could be made easier by having the general madrigal init routine could grab these keyword arguments, use them along with functools.partial to set the load and download routines as needed.

Restructure Madrigal methods

It is a bit silly to have a file named madrigal.py in pysatMadrigal. This file contains the general instrument methods. Possible ways to rename this file sensibly:

  • pysatMadrigal.instruments.methods.general,
  • pysatMadrigal.utils.methods, or
  • pysatMadrigal.instruments.methods and remove the methods directory.

pysatMadrigal CI update

Describe the bug
GitHub actions is deprecating an old node. To fix this, we need to repace:

    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2

with

    - uses: actions/checkout@v3
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4

In the .yml files.

To Reproduce
Go to the actions and read the warnings: https://github.com/pysat/pysat/actions/runs/3283879182

Expected behavior
No warnings in the actions

Desktop (please complete the following information):

  • OS: GitHub Actions

BUG: gnss_tec cannot load for xarray 0.18.0

Loading vtec data into xarray is not working with 0.18.0. Test code (assuming data has been downloaded):

import datetime as dt
import pysat
import pysatMadrigal

test_date = dt.datetime(2017, 11, 19)
gns = pysat.Instrument(inst_module=pysatMadrigal.instruments.gnss_tec, tag='vtec')
gns.load(date=test_date)

Error message:

~/code/core/pysatMadrigal/pysatMadrigal/instruments/methods/general.py in load(fnames, tag, inst_id, xarray_coords, file_type)
     92         # Xarray natively opens netCDF data into a Dataset
     93         if len(fnames) == 1:
---> 94             file_data = xr.open_dataset(fnames[0])
     95         else:
     96             file_data = xr.open_mfdataset(fnames, combine='by_coords')

/opt/anaconda3/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 
    484     if engine is None:
--> 485         engine = plugins.guess_engine(filename_or_obj)
    486 
    487     backend = plugins.get_backend(engine)

/opt/anaconda3/lib/python3.8/site-packages/xarray/backends/plugins.py in guess_engine(store_spec)
    110             warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
    111 
--> 112     raise ValueError("cannot guess the engine, try passing one explicitly")
    113 
    114 

ValueError: cannot guess the engine, try passing one explicitly

Discovered during work on #45. Previous CI tests using conda are still using xarray 0.17.0.

ENH: Use the pysat functions for testing warnings

Description

Use the pysat provided methods for testing warnings, pysat.utils.testing.eval_warnings

Potential solution(s)

Use provided tests as per listed standards for using pysat functions when possible

Alternatives

Leave as is, tests currently work

BUG: pandas future warnings

Description

There are several future warnings to address

To Reproduce this bug:

  /home/runner/work/pysatMadrigal/pysatMadrigal/pysatMadrigal/instruments/methods/general.py:1363: FutureWarning: The behavior of array concatenation with empty entries is deprecated. In a future version, this will no longer exclude empty items when determining the result dtype. To retain the old behavior, exclude the empty entries before the concat operation.
    out = pds.concat(out_series).sort_index()

pysatMadrigal/tests/test_instruments.py: 18 warnings
  /opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/pysat/_instrument.py:2205: FutureWarning: 'S' is deprecated and will be removed in a future version, please use 's' instead.
    file_inc = pds.tseries.frequencies.to_offset(file_freq)

pysatMadrigal/tests/test_instruments.py::TestInstruments::test_download[inst_dict13]
  /home/runner/work/pysatMadrigal/pysatMadrigal/pysatMadrigal/instruments/dmsp_ssj.py:270: FutureWarning: 'AS-JAN' is deprecated and will be removed in a future version, please use 'YS-JAN' instead.
    if date_array.freq not in ['AS-JAN', 'YS', 'AS']:

pysatMadrigal/tests/test_instruments.py::TestInstruments::test_download[inst_dict13]
  /home/runner/work/pysatMadrigal/pysatMadrigal/pysatMadrigal/instruments/dmsp_ssj.py:270: FutureWarning: 'AS' is deprecated and will be removed in a future version, please use 'YS' instead.
    if date_array.freq not in ['AS-JAN', 'YS', 'AS']:

Test configuration

  • OS: All
  • Version: All
  • Other details about your setup that could be relevant: pandas v2.2.1

BUG: datetime future warning

Description

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC)

Test configuration

  • OS: All
  • Version: Python 3.12
  • Other details about your setup that could be relevant

DOC: create package documentation

Create package documentation that includes:

  • Overview of package
  • Installation instructions
  • Citation guidelines
  • Guidelines for contributing
  • API
  • Examples for using instruments

Create Logo

Create a nice version of the sketched logo currently available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.