geoscienceaustralia / hiperseis Goto Github PK

High Performance Seismologic Data and Metadata Processing System

License: GNU General Public License v3.0

Python 14.98% Shell 0.70% Makefile 0.02% Perl 0.58% Jupyter Notebook 82.40% Dockerfile 0.01% C++ 0.03% MATLAB 0.86% Fortran 0.43%

geophysics tomography inversion seismology

hiperseis's Issues

convert and ingest ENGDAHL events into seiscomp3

These are the events that @alexgorb passed onto us in text files.

Explore and analyse the 2 options of integrating receiver function code and station metadata, catalogs and temporary survey waveforms

We have 2 options for using the receiver function codebase at https://github.com/trichter/rf.git to generate the analysis over the temporary stations data:

Ingest waveform data, temporary station xmls and event data into a Seiscomp3 instance and then use the receiver function codebase as is by creating an FDSN client connected to the FDSN webservice running on the Seiscomp3 instance.
Directly call the receiver function code on the waveform miniseed and skip the ingestion processas suggested by @alexgorb.

Both these approaches have their merits and demerits.

The first approach is easy, less error-prone and faster and to implement because we would be using the receiver function code base without any tweaking. But, it will take that extra step of ingesting everything into Seiscomp3.

The second approach will require tweaking the receiver function code base to a great extent, primarily how it extracts the waveform centered around the event and uses several routines related to rfstats, which in turn use others. The routines are documented scantily in that only one line describes what the routine does, but the code in the routines is largely undocumented.

Will have another discussion with @rh-downunder and @alexgorb before taking a call on which approach to pursue.

Distributed cluster gather/sort/filter/match operations

By # 41c83f7, we can process a small number of events, say upto, 5k events. This works on single process and in memory sort/filter/joins using pandas dataframe.

However, we need to process upwards of 500k+ events. Just from ISC and Engdahl we have 300k+ events.

Broken seiscomp3 python packages after passive-seismic and iLoc installtion in AWS

ISF event file manipulation function

We need to be able to accept an ISF file, a P/S pick, and output a new ISF with the new pick appended.

Integrate the network code information with the engdahl files data

This is a derived issue from: #23

ISC event syncing with seiscomp3

@alexgorb says we need to have a provision to use the events as reported by ISC. We can use this service for that: http://www.isc.ac.uk/iscbulletin/search/bulletin/

We can download quakemls, use obspy to convert to sc3ml and ingest into seiscomp3.

Implement associated amplitude for picks

.... in event objects during custom picking should contain associated amplitudes for the picks.
Required for location aglorithms to run.

Needs to be associated to the corresponding picks as in the obspy/seiscomp3 datamodel.
Jira - pst-192

travel time ellipticity correction

We need ellipticity correction for travel times before we use the travel times in inversion products.

We need: ftp://rses.anu.edu.au/pub/ak135/ellip
and that requires this: ftp://rses.anu.edu.au/pub/ak135/tau

Analyse CWB Query data issues

The following cwb query data issues have been analysed so far:

For some of the network/station/channel combinations and for a particular time window, the data being returned just blows up. For example, for the below query string:

java -jar -Xmx1600m ~/CWBQuery/CWBQuery.jar -h 54.153.144.205 -t dcc -s "IUPET..BH200" -b "2015/03/01 00:00:00" -d 31d

The output generated is:

centos@proc:~/miniseed$ java -jar -Xmx1600m ~/CWBQuery/CWBQuery.jar -h 54.153.144.205 -t dcc -s "IUPET..BH200" -b "2015/03/01 00:00:00" -d 31d
04:13:25 Query on IUPET BH200 184868 mini-seed blks 2015 060:00:00:00.0499 2015 090:00:00:00.000 ns=51312360 #dups=8
04:15:25.869 Thread-2 ReadTimeout went off. close the socket 120602 waitlen=120000 sock.isCLosed()=false loop=162

And the size of miniseed file being generated just blows up. It reaches sizes of 100G plus. If I abort this query, the returned file's size turns to zero after sometime so I cannot analyse it either. Is this a known issue for some network/station/channel combinations?

For some net/sta/cha combinations, the number of blocks for a given time period is higher than that of others. Its possible that the sampling frequency for those combinations is higher than that of others. Can we query the sampling frequency of such combinations? so we can better manage the query by diving into appropriate time periods. Otherwise, the response for such combinations takes too long (sometimes half an hour) and is returned in batches which makes it difficult to manage. Or if you know some pertinent information, please do share.
On querying CWB for a given time period, the response miniseed files have data that overflows the requested time window. For e.g. when I query as below:

centos@proc:~/miniseed$ java -jar -Xmx1600m /CWBQuery/CWBQuery.jar -h 54.153.144.205 -t dcc -s "GEFAKI.BHZ" -b "2015/03/15 10:00:00" -d 7200
04:34:01 Query on GEFAKI BHZ 000299 mini-seed blks 2015 074:09:59:42.8695 2015 074:12:00:15.370 ns=144651 #dups=0
299 Total blocks transferred in 463 ms 645 b/s 0 #dups=0
centos@proc:/miniseed$ ls -la
total 53424
drwxrwxr-x 2 centos centos 126 Sep 19 04:34 .
drwx------. 7 centos centos 270 Sep 19 04:14 ..
-rw-rw-r-- 1 centos centos 139264 Sep 19 04:34 GEFAKI_BHZ__.msd

And when I look at the starttime and endtime for this miniseed:

In [1]: from obspy.core import read
In [2]: st = read('GEFAKI_BHZ__.msd')
In [3]: tr = st[0]
In [4]: tr.stats
Out[4]:
network: GE
station: FAKI
location:
channel: BHZ
starttime: 2015-03-15T09:59:42.869538Z
endtime: 2015-03-15T12:00:15.369538Z
sampling_rate: 20.0
delta: 0.05
npts: 144651
calib: 1.0
_format: MSEED
mseed: AttribDict({'record_length': 4096, 'encoding': u'STEIM2', 'filesize': 139264, u'dataquality': u'D', 'number_of_records': 34, 'byteorder': u'>'})

`zone`ing functionality after `cluster`ing

@alexgorb

We need separate clustered datafile into 3 files:

one for regional,
second which intersect area, and
other with global data

Region can be specified like the following:

upperlat = 90.
bottomlat= 50.
leftlon  = 240.
rightlon = 280.

Improve median `filter` performance during 3d travel time inversion input generation

This step is very slow and memory intensive in one process as we can have many millions of groups :
https://github.com/GeoscienceAustralia/passive-seismic/blob/8507212e5e5e189ffa4d55e61cfba2a012db3107/seismic/cluster/cluster.py#L325

Can use a parallel pandas (https://dask.readthedocs.io) implementation to improve both time and memory requirements.

arrivals filter by maximum residual as well where 5 sec is for P and 10 sec for S residuals

As indicated by Alexei here.

More context from here.

we discard all arrivals that don't satisfy the max residual criteria for P and S waves (filter stage 1), should not be considered for the observed travel time median filter (stage 2).

Convert and ingest Earthmon events into seiscomp3

We do this once the earthmon db can be moved into an aws PostgreSQL db.

Wendy is tracking progress of the db migration.

Temporary stations remaining tasks for Q1 deliverable

Temporary stations metadata generation - make sure can be ingested into sc3.
Ingestion of temporary stations time series data into sc3 after QA/QC - need to write complete functionality from available time series directory structure to sc3 ingestion. Check with scart that this data can be queried via sc3. This is dependent on the item above.

passsive-seismic pickers should output obspy Pick/Event objects

Integration of obspy Pick object, with the following features:

creation info
filterID
WavesteamID
evaluation mode

Jira - PST-194

CWB waveform data ingestion into seiscomp3 poc instance on AWS

The script https://github.com/GeoscienceAustralia/passive-seismic/blob/pst-12/CWB/cwbingest1month.sh is currently running which is automatically ingesting the CWB waveform data into the seiscomp3 AWS instance "niket_pst_poc_latest". This script takes care of working around the hurdle described in Issue #17.

Will update this issue once the waveform data ingestion is complete.

Apply travel time ellipticity correction to inversion inputs

This is the ellipticity correction: #33
Should be applied to observed travel time in the output of #35 / #44 .

S-wave detection for stations that have no S association

Step 5 in this process.

create docker image from aws ami for CI

We need a docker image corresponding to our seiscomp3 image from aws.

Here is one possible solution.

picker for P & S waves

Picking alogirthms and parameters must be user configurable.
Automated by event id. Event id to be pulled from seiscomp3, miniseed dumped, p and s picked and saved to seiscomp3 database as a new event to be used by iLoc.

RD Blade as alternative to AWS for seiscomp3 and CWB

Generate list of stations in ISC events not found in ISC station files and not found in our seiscomp3

There are many such stations that are not present in the ISC stations list and also in our seiscomp3. We can ignore the arrivals from these stations as we have millions of arrivals, but this is something we can contribute back to ISC for their cooperation.

Tests for xcorqc

Need tests for: https://github.com/GeoscienceAustralia/passive-seismic/tree/master/xcorqc

Work with Jason on analysing the temporary station survey dataset for receiver function analysis

@alexgorb We have received a small subset of the temporary station survey dataset from Jason. But he mentioned that it's raw data and will need to be qa/qc-ed before consumption. He has also passed on a small subset of the 7d network which he says is better quality. I will load this data and analyse.

Make coverage work with multiprocessing

Changes in f5d811c brings most of convert_logs under coverage report. However, that was accomplished by avoiding system calls or multiprocessing calls during tests.

This can provide coverage reporting while using multiprocessing/system call.

Enable/Configure fdsnws for querying the SC3 server for station metadata and waveform data

Enable/Configure fdsnws for querying the SC3 server for station metadata and waveform data.

Explore Gempa's event correction step during antelope event conversion

Extract metadata for all temp stations for receiver function analysis

Analyse the data available at /g/data/ha3/Passive/_ANU/7D(2012-2013)/ASDF and extract metadata for all temp stations for receiver function analysis

Explore iLoc functionalities without requiring seiscomp3 database access

My research indicates that all we need is IMS1.0/ISF1.0 or ISF2.0 input files and a station list file.

The isf_stationfile is a simple comma and space separated text file with stationcode, alternate_stationcode, lat, lon, elevation.

This is desirable for many reasons including:

users not requiring write permission to seiscomp3 database
not creating/duplicating events in the database
more flexibility in our overall architecture, provides more isolated working environement, and,
not requiring us two seiscomp3 VMs

identify all primary stations inside a given radius of an event

The process needs to be fully automated and should work like:
executable/python_script event_id/event.xml -d seiscomp3_inventory_db(xml)/seiscomp3_db

Output should all stations that satisfy the criteria and their respective coordinates.

This is step 2 in this process.

add preferred magnitude and change event type for ISC events

Currently, the ISC events downloaded from the bulletin website have the preferred origin missing. The code to add the most appropriate preferred origin is checked in at bdef9d5#diff-c276ddde798091163facc2a874f3553b. But it has to run on the ISC events that are backed-up at: s3://pyrobots-backup/niket/failed-attempts/. Also, the event type needs to be changed from "other" to "earthquake" for them to be ingestable into Seiscomp3.

Explore the receiver function concept

Reference from @rh-downunder:
https://ds.iris.edu/media/workshop/2013/01/advanced-studies-institute-on-seismological-research/files/lecture_introrecf.pdf

References from @alexgorb:

Codes:
https://seiscode.iris.washington.edu/projects/rfsyn
https://github.com/trichter/rf
https://github.com/iwbailey/processRFmatlab

Reading: https://openresearch-repository.anu.edu.au/bitstream/1885/49353/6/04chapter03.pdf

Other:
https://academic.oup.com/gji/article/164/3/551/732954/On-the-use-of-teleseismic-receiver-functions-for
https://academic.oup.com/gji/article/168/1/171/578851/Teleseismic-wavefield-interpolation-and-signal
http://authors.library.caltech.edu/73702/2/410.full.pdf
https://sbgf.org.br/revista/index.php/rbgf/article/viewFile/189/59
https://link.springer.com/article/10.1007/s11589-003-0045-2
https://dl.acm.org/citation.cfm?id=3075895
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.905.466&rep=rep1&type=pdf

The goal of this task is to explore the receiver function concept, look at the available implementations and integrate with the waveform and event metadata pipeline hosted on the seiscomp3 POC.

@alexgorb, thanks for the chalk and talk session yesterday. I am also attaching the picture of the board.

investigate the corrupted data being returned from CWB server for some stations

It is required to retrieve the overlap corrected waveform data from CWB. Using the "-t dcc" switch helps achieve that. But or some of the stations, this is returning corrupted data.
Data was viewed and analysed with scrttv and found to be erroneous and noise laden.
On investigating further it was found that for such stations, only particular time windows have this issue and not the entire 1 month time windows data.

Introduce Continious Intergration

Travis/CircleCI?

P and S arrivals for stations without any association

Step 6 in this process.

update or provide hpc instructions to use python3.6

Repeated picks in Engdahl event

Test found the following.

In engdahl event 8881.xml we have these two picks. They are identical except the pick id.

Is that correctly parsed as reported by Engdahl?

    <pick publicID="smi:engdahl.ga.gov.au/pick/2418177">
      <time>
        <value>2008-04-11T10:04:27.000900Z</value>
      </time>
      <waveformID networkCode="" stationCode="CHTO" channelCode="BHZ"/>
      <backazimuth>
        <value>85.91</value>
      </backazimuth>
      <onset>emergent</onset>
      <phaseHint>P</phaseHint>
      <evaluationMode>automatic</evaluationMode>
      <creationInfo>
        <agencyID>ga-engdahl</agencyID>
        <agencyURI>smi:engdahl.ga.gov.au/ga-engdahl</agencyURI>
        <author>niket_engdahl_parser</author>
        <creationTime>2017-11-10T07:53:57.118115Z</creationTime>
      </creationInfo>
    </pick>
    <pick publicID="smi:engdahl.ga.gov.au/pick/2418178">
      <time>
        <value>2008-04-11T10:04:27.000900Z</value>
      </time>
      <waveformID networkCode="" stationCode="CHTO" channelCode="BHZ"/>
      <backazimuth>
        <value>85.91</value>
      </backazimuth>
      <onset>emergent</onset>
      <phaseHint>P</phaseHint>
      <evaluationMode>automatic</evaluationMode>
      <creationInfo>
        <agencyID>ga-engdahl</agencyID>
        <agencyURI>smi:engdahl.ga.gov.au/ga-engdahl</agencyURI>
        <author>niket_engdahl_parser</author>
        <creationTime>2017-11-10T07:53:57.118115Z</creationTime>
      </creationInfo>
    </pick>

obspy read seiscomp3 exported station metadata

May need obspy hacking, more exception handling.

Generate 3D travel time inversion input

@alexgorb Did I get these right? Can you fill the remaining ones?

nblock: block id of the source
nst: block id of the station
resid: time residual of arrival
nev: ?
vlon: source longitude
vlat: source latitude
dep: source depth
slon: station longitude
slat: station latitude 
obstt: ?

Output of this exercise will be used in 3d inversion model.
The format of the output needs to be the following with the column names being as above in that order:

   17503  266751   0.3  592032   55.673  86.934  21.6   87.695  43.628  484.800   43.800   1
   17503  272459   0.8  592032   55.673  86.934  21.6   74.620  42.637  490.800   44.500   1
   17503  288465  -0.1  592032   55.673  86.934  21.6  116.175  39.850  522.400   48.700   1
   17503  291184  -1.4  592032   55.673  86.934  21.6   75.980  39.266  516.100   47.900   1
   17503  292720  -0.1  592032   55.673  86.934  21.6   99.814  39.221  522.200   48.600   1

Download and ingest all available stationxmls

Many ways of doing this. Can use obspy client.

iLoc/RSTT algorithm

Need to work with historical events in seiscomp3.

CWB data ingestion into seiscomp3

This is how this can proceed:

Use cwb query to download gap filled and overlap corrected miniseed from the CWB server
Miniseeds need to be pushed into slarchive (can use scart).
Push events (from antelope for now) into sc3. Make sure events overlap the timeseries data. This will help here: https://github.com/GeoscienceAustralia/passive-seismic/tree/master/antelope.
Test that event based miniseed queries work via sc3 utilities (scevtstreams and scart combination)
Automation of the whole process

POC target: 1 month worth of historical data for all primary stations in aws.

Example of cwb query: query -h localhost -t ms -s ".*" -b "2005-02-08 05:37:52.98" -d 1000

Clarification on 3:

You can not create the antelope virtualenv inside our sc3 image in aws since antelope uses propriatary python libraries. Instead you will have to export the events in seiscomp3 xml format and copy them across to the sc3 image. This part will require some ingenuity to automate.
You will also need to make sure CWB instance contains waveform data that overlap the events that you are exporting from antelope and importing into sc3 in the previous step. @zhang01GA Will help with additional waveform data if you need so.

Tests missing

https://github.com/GeoscienceAustralia/passive-seismic/tree/master/METADATA/SCRIPTS
https://github.com/GeoscienceAustralia/passive-seismic/tree/master/ASDFdatabase
https://github.com/GeoscienceAustralia/passive-seismic/tree/master/fix_ASDF_metadata

implement filter and detrend options for timeseries data

Jira - PST-193

Tests and CI for rstt/iloc functionality

Currently iloc/rst installation script does is not checked via CI and also iloc/rstt has no tests.

Extract stations coordinates from text files for ISC/ENGDAHL events

From stations information shared by ISC in text files, extract station coordinates that can be used in many applications.

Specifically, required by #35 .

Run receiver function analysis on a given AU station for ISC events for March 2015

Some antelope migrated events in seiscomp3 have no arrivals for `preferred_origins`

My tests found for some (or all?) antelope events the preferred origins are missing arrivals. We need to investigate.

pytest test_cluster.py

These are the preferred origins for the two events showing no arrivals.

/home/sudipta/repos/passive-seismic/tests/mocks/events/00974105.xml
Origin
               resource_id: ResourceIdentifier(id="quakeml:ga.ga.gov.au/origin/1126705")
                      time: UTCDateTime(2015, 3, 3, 2, 44, 20, 169000)
                 longitude: 176.1633
                  latitude: -14.8737
                     depth: 10000.0
                   quality: OriginQuality(associated_phase_count=17, used_phase_count=13)
           evaluation_mode: 'automatic'
         evaluation_status: 'preliminary'
             creation_info: CreationInfo(author='regpMwp ms', version='1425351689.25')
                      ---------
                  comments: 1 Elements
/home/sudipta/repos/passive-seismic/tests/mocks/events/00967739.xml
Origin
               resource_id: ResourceIdentifier(id="quakeml:ga.ga.gov.au/origin/1133422")
                      time: UTCDateTime(2015, 3, 21, 0, 18, 54, 304000)
                 longitude: 118.4494
                  latitude: -9.9875
                     depth: 0.0
                   quality: OriginQuality(associated_phase_count=29, used_phase_count=18)
           evaluation_mode: 'automatic'
         evaluation_status: 'preliminary'
             creation_info: CreationInfo(author='rega', version='1426898089.18')
                      ---------
                  comments: 1 Elements

Station XMLs for missing ENGDAHL/ISC stations

We have some additional stations in the ENGDHAL/ISC events that are not available in our seiscomp3.
For now, if possible, any workflow that requires station information should use the files that was emailed by ISC recently. Or for any statistical measure (like inputs to inversion) we can ignore those specific stations.

geoscienceaustralia / hiperseis Goto Github PK

hiperseis's Issues

Recommend Projects

Recommend Topics

Recommend Org