Giter Site home page Giter Site logo

ph5's Introduction

ph5's People

Contributors

damhuonglan avatar derick-hess avatar dthomas1953 avatar gbbofh avatar ktjacobs avatar maevap avatar nick-falco avatar rsdeazevedo avatar timronan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ph5's Issues

ph5tomsAPI.py not correctly filtering by shotid or shotline

After some further testing, it seems that the ph5tomsAPI.py (v. 2016.298) program is not correctly filtering by shotid or shotline. When I enter "junk" values for shotline and shotid the program still returns data.

Perhaps it is ignoring these values completely and retrieving all event information, or maybe the large value for length is causing the problem.

e.g. This command returns data for seisorz
python ph5tomsAPI.py --nickname master.ph5 --ph5path /hdf5-data2/PH5_Experiments/pn4/13-005/ -e 99999 --shotline 99999 --array 001 --length 435000 --stream

ph5tostationxml.py starttime & endtime wildcards

The starttime and endtime arguments in ph5tostationxml.py need to accept the * wildcard. The ? wildcard does not need to be supported though.

Also I'd suggest raising a custom exception when a invalid starttime or endtime is entered, instead of just allowing the code to break.

(venv) falco:apps nick$ python ph5tostationxml.py -n master.ph5 -p /hdf5-data2/PH5_Experiments/pn4/13-005/ --starttime=*

Traceback (most recent call last):
  File "ph5tostationxml.py", line 586, in <module>
    ph5sxml = PH5toStationXML(args_dict)
  File "ph5tostationxml.py", line 187, in __init__
    self.args.get('start_time'), "%Y:%j:%H:%M:%S.%f")
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data '*' does not match format '%Y:%j:%H:%M:%S.%f'

Add support for radial geographic constraints to ph5tostationxml.py

It would be nice to add support for radial geographic constraints to ph5tostationxml.py. A radial geographic constraint consists of a latitude/longitude geographic position and a radius distance boundary. Data is only returned for stations found within that geographic point/radius boundary.

i.e. Radial Geographic Constraints

parameter examples discussion default type
lat[itude] 35 Specify the central latitude point1. degrees
lon[gitude] 170 Specify the central longitude point1. degrees
maxradius 20 Specify maximum distance from the geographic point defined by latitude and longitude1. degrees
minradius 19 Specify minimum distance from the geographic point defined by latitude and longitude1. 0 degrees

PH5 StationTXT is missing some information

Can we add the following information to the ObsPy Inventory returned by ph5tostationxml.py? StationTXT is incomplete without this information.

Channel level StationTXT Missing:
Azimuth|Dip|SensorDescription|Scale|ScaleFreq|ScaleUnits

Network level StationTXT Missing:
TotalStations

Examples about how these fields should be labeled/added may be found in the ObsPy StationTXT tests file (test_station_text_parsing.py).

ph5tostationxml.py array argument does not filter results correctly

I noticed that the array argument in ph5tostationxml.py does not filter results correctly. I believe one of the problems is with the ph5API module not returning all arrays for a given experiment.

For example:
python ph5tostationxml.py --nickname=master --ph5path=/hdf5-data2/PH5_Experiments/pn4/13-005/ --basepath=/hdf5-data2/PH5_Experiments/pn4/13-005/ --array=*
returns a list of Array_t_names containing only the first array, even though array was set to *:
i.e. returns self.ph5.Array_t_names = ['Array_t_001']

PH5 web form needs to be updated to access PN4 experiments

The PH5 web form needs to be updated to access the /pn4 directory so that updated (PN4) experiments can remain available in SEG-Y.

A notice should also be added to the form informing users to use the PH5 Web Services for all non SEG-Y data requests.

Tasks:

  • Update web form to access PN4 experiments
  • Add a notice to the web form telling users to use the PH5 Web Services

ph5tostationxml.py response level requests do not return response information

Response level requests made to ph5tostationxml.py currently do not return any response level information. I believe this was because response info is being stored in PH5 as RESP files, and there is no easy way to read RESP files in Python.

What is the current status of this? Last I remember hearing was that a developer at the PIC had developed a library for this purpose, and was going to add it to ObsPy for the upcoming release.

Raise custom exceptions for known potential problems in PH5 APIs

Runtime errors need to be caught and raise custom exceptions in the ph5tostationxml.py and ph5tomsAPI.py APIs.

You can create your own custom exceptions by subclassing the builtin Python Exception class.

In the Web Services code, I’ll then catch the known custom exception types that get raised by the PH5 APIs and then take an action that doesn’t break the services (i.e. skip the bad experiment and log the error.).


For example:

Currently if one PH5 Experiment in the archive is missing a required piece of information, such as a SEED network code, it will break both of the Web Services.

If a custom exception was raised in the APIs when the code tried to access the non-existent info, then I could have caught that exception and skipped+logged the bad experiment.

ph5tostationxml needs fixing

The SEED -> Inventory read support + NRL client obspy pull request that LLoyd has been working on is now finalized but Lion renamed some functions, moved things around and removed a few things so the RESP code in ph5tostationxml will no longer work.

It will be a few minor changes to get it working. I'll do that this week.

Output metadata in csv format

David Okaya suggested having a new tool or ph5tostationxml output the metadata in basically a csv file where the user can set the delimiter. He said this would be really useful for use in many active source software tools as well as really useful for creating maps using Generic Mapping Tools(GMT).

This would be an easy addition to ph5tostationxml.

Version tagging

I have looked over the PH5 document we sent to Kent and the working group. It was decided that the major version number of the software would correspond with the major version number of the PH5 format. IN the original document version Software version 4.0 would be the initial release with 4.1 being released when RESP was integrated into PH5. The thought on this was it would take much longer for the changes to be in Obspy.

I think we should release 4.0 when the current 2 pull requests(#92 and #93) are done. Incremental number releases for bug fixes such as 4.0.1 and new features not related to the next major release of 5.0 following the 4.1.xxx, 4.2.xxxx, etc format (semantic versioning system following MAJOR.MINOR.PATCH)

PH5 version 5.0 and software has the following possible additions:

  • Adding provenance model(seis-provenance)
  • Additions to event table to store earthquake metadata
  • Possible more from suggested community and ph5 working group

The current working groups focus is to get to 5.0

Validating PH5 arguments

I'm working on adding better validation checks for web service query arguments. Here are some questions regarding certain PH5 query arguments that I have:

  • Will report numbers always contain a two digit first number followed by a dash followed by a three digit number? (e.g. 15-021)
  • Will a array-id always be a three digit numeric value? (e.g. 001)
  • Will a shot-id always be four digit numeric value? (e.g. 1001)
  • What is the format for receiver ids?
  • Does the "length" parameter need to be a non-negative integer? (e.g. 0, 1, 2, etc..)
  • Does the "offset" parameter need to be a non-negative integer?
  • Does the "component" parameter need to be a non-negative integer? Is there a restriction on the number of digits like with the SEED channel code?
  • Does the "reduction velocity" parameter need to be a non-negative integer?
  • Can the extended header parameter only be "P", "S", or "U"?
  • What would some example minlat, maxlat, minlon, and maxlat values look like in the ph5tostationxml.py service?
  • What does a example "shotline" argument value look like?

Thanks

ph5toms memory usage

ph5toms uses a lot of memory (>1Gb per request). This affects our ability to scale the web services. We need to work on making this this more efficient.

The memory usage graph looks like this for the following request on my local machine:
mprof run --include-children ph5toms --nickname master --ph5path /hdf5-data2/PH5_Experiments/pn4/16-015/ -o . --station 100? --channel DPZ --starttime 2016-06-26T00:00:00.0 --stoptime 2016-06-27T00:00:00.0
yw 100 dpz 2016_ph5dataselect

Creating a test PH5 Dataset

@derick-hess @rsdeazevedo
I would like to create a very small test PH5 dataset that I can use for unit testing. Do you know what files I would have to modify to create a as minimal as possible PH5 dataset that can be tracked in Git?

ph5tomsAPI not handling wildcards correctly

@derick-hess @rsdeazevedo The ph5tomsAPI.py program no longer handles wildcards correctly for a number of cases. Support for wildcards is still required for request by shot to work.

A few examples of requests that return no data when they should are:
python ph5tomsAPI.py -n master.ph5 -p /hdf5-data2/PH5_Experiments/pn4/13-005/ --stream --shotline 001 --eventnumbers 5003 --array 001 --length 60 --station 100?

python ph5tomsAPI.py -n master.ph5 -p /hdf5-data2/PH5_Experiments/pn4/13-005/ --stream --shotline 001 --eventnumbers 5003 --array 001 --length 60 --station *

python ph5tomsAPI.py -n master.ph5 -p /hdf5-data2/PH5_Experiments/pn4/13-005/ --stream --shotline 001 --eventnumbers 5003 --array 001 --length 60 --channel EP?

python ph5tomsAPI.py -n master.ph5 -p /hdf5-data2/PH5_Experiments/pn4/13-005/ --stream --station=* --starttime 2014-11-23T10:00:00.0 --stoptime 2014-11-23T15:00:00.0

This one actually errors out.
python ph5tomsAPI.py -n master.ph5 -p /hdf5-data2/PH5_Experiments/pn4/13-005/ --stream --shotline 001 --eventnumbers 5003 --array=* --length 60

Make ph5tostationxml accept lists of requests for a given network

Anther improvement I want to make is to have the ph5tostationxml.run_ph5_to_stationxml(sta_xml_obj) method accept a list of request objects for a given network. This will vastly speed up a lot of repetitive POST requests that currently time out. If you are refactoring the ph5tostationxml.py module please keep this in mind.

For example, currently the ObsPy Fed Catalog client makes POST requests, formatted like the example below, that can be hundreds of lines.

level = 'station'
YW 1001 * * <start-time> <end-time>
YW 1002 * * <start-time> <end-time>
YW 1003 * * <start-time> <end-time>
YW 1004 * * <start-time> <end-time>
... etc.

Long requests currently time-out largely because we process each request independently (performing the same work of extracting all stations/channels more than one time). Adding support for lists of requests to the ph5tostationxml.py API will fix this problem since large amounts of data will only have to be read from each requested network one time.

Add restricted status to ph5tostationxml.py

The StationXML produced my ph5tostationxml.py should include the restricted status.

For non-restricted experiments the status would be restrictedStatus="open". For restricted experiments that status would be restrictedStatus="closed".

Station text sensitivity incorrect

when I check the instrument sensitivity from obspy response object on station 518 ch HHZ in is reported as:

Instrument Sensitivity:
Value: 944485763.626
Frequency: 1.0
Input units: M/S
Input units description: None
Output units: COUNTS
Output units description: None

but the station text reports it as:

YW|518|01|HHZ|36.5967756197|-97.6347917006|328.42|0.0|0.0|-90.0|Guralp cmg-3t|629129.0|0.05|M/S|100.0|2016-06-19T00:00:00|2016-10-30T00:00:00

ph5tostationxml.py Station names not compared correctly

I found the following bug while testing the web services with experiment 15-018.

Description:
Requests for all stations, like the one below, will return data:
https://service.iris.edu/ph5beta/dataselect/1/query?network=4C&start=2015-08-04T16:30:00&end=2015-08-05T19:30:00&nodata=404

The same request with the station parameter defined will return no data for all stations:
e.g. https://service.iris.edu/ph5beta/dataselect/1/query?network=4C&station=DAN&start=2015-08-04T16:30:00&end=2015-08-05T19:30:00&nodata=404

Location of bug:
In the Parse_Station_list method where you are checking if a station exists in a particular experiment you are comparing station array order ids to the station parameter entered by the user.

For the case of station code “DAN” this leads to “DAN” being compared with “['1001', '1002', '1003', '1004', '1005']”. Station code "DAN" obviously doesn’t match any of these, and results in no data being returned.

i.e. On ph5tostationxml.py line 495:
pattern = DAN, all_stations = ['1001', '1002', '1003', '1004', '1005’] resulting in no match

Modify APIs to accept a list of PH5 source directories

I think that it would be helpful for the PH5 APIs to accept a list of ph5 data paths.

This would allow for the PH5 Beta Web Services to additionally access a PH5 data directory that isn't viisble to the production web services (e.g. /hdf5-data2/PH5_Experiments/pn4-beta/).

With the addition of the pn4-beta directory, experiments could be tested with Web Services before being added to the final PH5 archive.

ph5toexml.py needs to enforce deploy/pickup times

Currently ph5toexml.py does not enforce deploy/pickup times but ph5tomsAPI.py does.

We need to add check event start-times against the deploy pickup times, so that user doesn't see events outside of the accepted date range.

@derick-hess and @rsdeazevedo I'm not sure of the best way to read the deploy/pickup times to compare against without reading the entire Array_t table. Ideas?

ph5torec.py and ph5toevt.py should be changed to match ph5tomsAPI.py model

The ph5torec.py and ph5toevt.py scripts for converting PH5 to SEG-Y should be changed to match the general model used in ph5tomsAPI.py.

From what I can tell, this implies the following:

  1. ph5torec.py and ph5toevt.py should be combined into one file (e.g. ph5tosegy.py)
  2. The combined file (ph5tosegy.py) should be a module that includes a class called PH5toSEGY. e.g. https://github.com/PIC-IRIS/PH5/blob/master/webservices/ph5tomsAPI.py#L93
  3. The PH5toSEGY class would contain methods for returning SEG-Y data objects in Shot (event) and Receiver order. (e.g. def get_data_shot(), def get_data_recevier() ). These methods may be further decomposed to reuse code.
  4. The get_data_shot() and get_data_recevier() methods in the above example, would return SEG-Y data objects. These objects would have a write method that writes their contents to a file like object or standard out.

With this model, running the program from the command line is trivial:

  1. The user would supply parameters including a request type.
  2. The program will then instantiate a PH5toSEGY object instance with the supplied parameters
  3. The program will call the requested data selection method (get_data_shot() or get_data_recevier()) on that PH5toSEGY instance. This will return a SEG-Y data object.
  4. The program will call the SEG-Y data objects write method, which will in return write the SEG-Y data to a file like object or standard out.

PH5View using PyQt4 Webkit is no longer supported (insecure)

PH5View uses the webkit module of PyQt4. This has been deemed insecure and major linux distributions (Centos, Ubuntu, etc) no longer include webkit in PyQt4. Ph5View needs to be updated to no loner use webkit.

Lan is the software developer at the PIC who is in charge of developing and maintaining PH5View. When she gets back form vacation we should have her fix this.

ph5tostationxml.py not correctly filtering data

@rsdeazevedo @derick-hess When a request is made to ph5tostationxml.py the results are not filtered correctly. It seems that the service is only enforcing the query filters at one level. For example if a request is made for an unknown channel, then no channel information is displayed but station and network info is still displayed. In this case no information should be displayed at all.

Here are a few examples:

Still displays network and station info when a unknown channel is requested.
python ph5tostationxml.py --basepath /hdf5-data2/PH5_Experiments/pn4/13-005/ -p /hdf5-data2/PH5_Experiments/pn4/13-005/ -n master.ph5 --level=channel --station 1001,1002 --channel XXX
Still displays network and station info when a unknown location is requested.
python ph5tostationxml.py --basepath /hdf5-data2/PH5_Experiments/pn4/13-005/ -p /hdf5-data2/PH5_Experiments/pn4/13-005/ -n master.ph5 --level=channel --station 1001,1002 --location XX

Package and Module names should be lower case

A lot of the modules in the PH5 project contain upper case letters. These should be changed to all lower case.

The PEP8 Style Guideline states the following regarding module and package names:

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket).

TODO:

  • Change all module and package names to all lower case.

Receiver ID missing from StationXML

We currently provide a receiver-id filter in for the PH5WS Station Web Service, but it is not in the resulting StationXML. We need to add receiver-id to the ObsPy Inventory in ph5tostationxml.py.

PH5 Repo / Documentation Needs Work

The working group made a good point that an arbitrary user should be able to install and use the various PH5 APIs and utility code. Currently there is not enough documentation for them to realistically do this.

Below are a few ideas for improving the repo. Please let me know what you think.

  • Reorganize the the GitHub repository

    • Remove any code that and utilities that are unnecessary for someone to interact with a PH5 dataset.
    • Separate core functionality into separate directories. (i.e. PH5 APIs in one directory, Utilities for updating (modifying) PH5 datasets in another directory, Tools for viewing PH5 in another directory, etc.)
  • Develop installation instructions for an arbitrary user.

  • Add a tutorial outlining how someone would interact with a PH5 experiment using the command line APIs, Web Services, and other utilities.

  • Add a high level overview document describing the scope of PH5 (probably already exists in some form and was requested by the working group)

Feel free to add to this list. I think several of these items could be done fairly easily and would make PH5 much more appealing/useful to outsiders.

Issue with ph5tostationxml.py handling empty location codes

@rsdeazevedo @derick-hess

Issue description
After we updated the Seisorz experiment to correctly store empty string location codes, instead of "--", I found a problem with the way the Web Services pass empty location codes to the ph5tostationxml.py API.

The problem was caused because there is not a reliable way, that I know of, to represent empty-string location codes in a comma separated string. This issue makes it impossible to request the Seisorz experiment that contains only empty location codes, as the PH5toStationXML class expects comma separated strings for location values.

Fixing the issue required some small changes to ph5tostationxml.py:

  • I changed the network_list, reportnum_list, sta_list, array_list, location_list, channel_list and ph5path parameters in ph5tostationxml.py to expect a Python List object instead of a comma separated string of values.
  • I moved related conversions from comma separated strings to Python List objects to the __main__ method of ph5tostationxml.py, and out of the PH5toStationXML class. This way the ability to use the program as a command line application should remain unchanged.

If it is okay with both of you, I am going to submit a pull request with my changes.

Thanks,
Nick

ph5tostationxml.py queries against both PH5 Station-ID and SEED Station Code

Currently the ph5tostationxml.py program compares values for the station argument against both the SEED Station Code and the PH5 Station-ID.

This leads to misleading results. For example:
http://service.iris.edu/ph5beta/station/1/query?network=4C,ZI&station=100?,DAN,IVY,LGC&format=text&level=channel
returns all stations for 4C, when you expect to see only DAN, IVY, and LGC.

I think the logic should be changed to only compare against the SEED Station Code if one is defined. @derick-hess what do you think? Do all experiments have SEED Station Codes?

Change ph5tomsAPI to not access (open) a PH5 file for every request

Our method for adding support for HTTP POST request to the web service has consequently caused the program to produce a list of requests, where each request in the list contains only one (SNCL) Network, Station, Location, Channel, Starttime (ST), and Endtime (ET). The program then calls the ph5tomsAPI for each request in the list, causing the program open the PH5 experiment many times.

We propose a solution where the ph5tomsAPI.PH5toMSeed class constructor can accept an optional PH5API object. This would allow the program to read the PH5 experiment once per network (experiment), and reuse the same PH5API object for each subsequent SNCL request for that network.

I'm not sure of all the details that would be involved in this implementation, but this is the basic concept. A similar implementation will also be needed in the ph5tostationxml.py program, and the ph5tosegyAPI once it is completed. Thoughts?

Stationxml/Stationtxt sample rate issue.

Noticed the sample rate in the VM channels are wrong. This is an issue in ph5tostationxml. I will fix it as soon as I get back from lunch and do a pull request.

What is happening is when it puts the sample rate into the inventory object it isnt taking into account the sample rate_multiplier. Should just be a single line that needs to be updated

PH5toStationXML Response Management

We need to add a key for sample rate. Since there are different responses based on sample rate that can have the same sensor_key and data_logger key, another ket based on sample_rate needs to be added.

I found this when the VM stations were added. There is now response for VM currently in the PH5, but I noticed since they are still cmg-3t/rt130 instruments, it's using the last loaded response and loading that into the inventory object as the response for the VM channels.

I think this error would only ever come up if there is no response for a channel and it matched keys from a previous channel.

Syntax errors & other small problems

Below is a list of syntax errors and other problems that I have found in some of the PH5 scripts.

ph5api linreg() error

In rare case of a 0 value, not enough values are returned by the function. Have fix in PR

ph5api returns a trace of 0 samples in certain cases on LH Channels.

While testing Wavefields we noticed the following behavior. No matter the day, if you request data that includes anything outside the first hour of the day on LH channels ph5api.cut returns a trace of length 0 samples.

The following requests returns LH channels:

https://service.iris.edu/ph5beta/dataselect/1/query?network=YW&station=1001,2019,518,601&starttime=2016-06-24T00:00:00&endtime=2016-06-24T01:00:00&nodata=404

however

https://service.iris.edu/ph5beta/dataselect/1/query?network=YW&station=1001,2019,518,601&starttime=2016-06-24T00:00:00&endtime=2016-06-25T00:00:00&nodata=404

does not return LH channels even though the first request was a subset of this request.

Duplicate code should be moved to ph5utils.py

There is a lot of duplicate code between the utilities. This code should be pulled into the ph5utils.py core module.

Some of the heavily duplicated classes I found were: Index_t_Info, Resp, Das_Groups, Offset_Azimuth, and Rows_Keys.

ph5tostationxml.py not filtering at the appropriate level of granularity

ph5tostationxm.pyl is only filtering results to the level of granularity defined by the "level" query argument.

The ph5tostationxml.py program should always filter at the channel level, regardless of the "level" argument. The "level" argument only should define what level of detail the returned StationXML/Text contains, not what level to filter results.

Example:
The request with channel="bad" at level=station returns data when it should not:
http://service.iris.edu/ph5beta/station/1/query?channel=bad&level=station&nodata=404

However for level=channel this request correctly returns no-data, because the channel query argument is now being enforced:
http://service.iris.edu/ph5beta/station/1/query?channel=bad&level=channel&nodata=404

Let me know if you have any questions.

Request by shot filename and actual event times mismatch

I’ve noticed that the labeled dates/times on SAC files returned by for a “request by shot” largely differ from the event start times returned by the PH5 event web service

For example:
http://service.iris.edu/ph5beta/dataselect/1/query?reqtype=shot&shotline=001&shotid=5011&array=001&length=60&reportnum=15-016&station=1002&format=sac&nodata=404

Returns a file labeled ZI.1002..DPZ.2015.180.04.00.00.SAC

http://service.iris.edu/ph5beta/event/1/query?catalog=15-016&format=shottext

The shot starttime for shotid=5011 however is 2015-06-29T09:33:19.000000Z

2015.180.04.00.00 != 2015-06-29T09:33:19.0

We need to track-down and fix the cause of these time differences.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.