Giter Site home page Giter Site logo

geoscienceaustralia / hiperseis Goto Github PK

View Code? Open in Web Editor NEW
56.0 40.0 22.0 442.12 MB

High Performance Seismologic Data and Metadata Processing System

License: GNU General Public License v3.0

Python 14.98% Shell 0.70% Makefile 0.02% Perl 0.58% Jupyter Notebook 82.40% Dockerfile 0.01% C++ 0.03% MATLAB 0.86% Fortran 0.43%
geophysics tomography inversion seismology

hiperseis's Issues

Explore and analyse the 2 options of integrating receiver function code and station metadata, catalogs and temporary survey waveforms

We have 2 options for using the receiver function codebase at https://github.com/trichter/rf.git to generate the analysis over the temporary stations data:

  1. Ingest waveform data, temporary station xmls and event data into a Seiscomp3 instance and then use the receiver function codebase as is by creating an FDSN client connected to the FDSN webservice running on the Seiscomp3 instance.

  2. Directly call the receiver function code on the waveform miniseed and skip the ingestion processas suggested by @alexgorb.

Both these approaches have their merits and demerits.

The first approach is easy, less error-prone and faster and to implement because we would be using the receiver function code base without any tweaking. But, it will take that extra step of ingesting everything into Seiscomp3.

The second approach will require tweaking the receiver function code base to a great extent, primarily how it extracts the waveform centered around the event and uses several routines related to rfstats, which in turn use others. The routines are documented scantily in that only one line describes what the routine does, but the code in the routines is largely undocumented.

Will have another discussion with @rh-downunder and @alexgorb before taking a call on which approach to pursue.

Distributed cluster gather/sort/filter/match operations

By # 41c83f7, we can process a small number of events, say upto, 5k events. This works on single process and in memory sort/filter/joins using pandas dataframe.

However, we need to process upwards of 500k+ events. Just from ISC and Engdahl we have 300k+ events.

Implement associated amplitude for picks

.... in event objects during custom picking should contain associated amplitudes for the picks.
Required for location aglorithms to run.

Needs to be associated to the corresponding picks as in the obspy/seiscomp3 datamodel.
Jira - pst-192

travel time ellipticity correction

We need ellipticity correction for travel times before we use the travel times in inversion products.

We need: ftp://rses.anu.edu.au/pub/ak135/ellip
and that requires this: ftp://rses.anu.edu.au/pub/ak135/tau

Analyse CWB Query data issues

The following cwb query data issues have been analysed so far:

  1. For some of the network/station/channel combinations and for a particular time window, the data being returned just blows up. For example, for the below query string:

java -jar -Xmx1600m ~/CWBQuery/CWBQuery.jar -h 54.153.144.205 -t dcc -s "IUPET..BH200" -b "2015/03/01 00:00:00" -d 31d

The output generated is:

centos@proc:~/miniseed$ java -jar -Xmx1600m ~/CWBQuery/CWBQuery.jar -h 54.153.144.205 -t dcc -s "IUPET..BH200" -b "2015/03/01 00:00:00" -d 31d
04:13:25 Query on IUPET BH200 184868 mini-seed blks 2015 060:00:00:00.0499 2015 090:00:00:00.000 ns=51312360 #dups=8
04:15:25.869 Thread-2 ReadTimeout went off. close the socket 120602 waitlen=120000 sock.isCLosed()=false loop=162

And the size of miniseed file being generated just blows up. It reaches sizes of 100G plus. If I abort this query, the returned file's size turns to zero after sometime so I cannot analyse it either. Is this a known issue for some network/station/channel combinations?

  1. For some net/sta/cha combinations, the number of blocks for a given time period is higher than that of others. Its possible that the sampling frequency for those combinations is higher than that of others. Can we query the sampling frequency of such combinations? so we can better manage the query by diving into appropriate time periods. Otherwise, the response for such combinations takes too long (sometimes half an hour) and is returned in batches which makes it difficult to manage. Or if you know some pertinent information, please do share.

  2. On querying CWB for a given time period, the response miniseed files have data that overflows the requested time window. For e.g. when I query as below:

centos@proc:~/miniseed$ java -jar -Xmx1600m /CWBQuery/CWBQuery.jar -h 54.153.144.205 -t dcc -s "GEFAKI.BHZ" -b "2015/03/15 10:00:00" -d 7200
04:34:01 Query on GEFAKI BHZ 000299 mini-seed blks 2015 074:09:59:42.8695 2015 074:12:00:15.370 ns=144651 #dups=0
299 Total blocks transferred in 463 ms 645 b/s 0 #dups=0
centos@proc:
/miniseed$ ls -la
total 53424
drwxrwxr-x 2 centos centos 126 Sep 19 04:34 .
drwx------. 7 centos centos 270 Sep 19 04:14 ..
-rw-rw-r-- 1 centos centos 139264 Sep 19 04:34 GEFAKI_BHZ__.msd

And when I look at the starttime and endtime for this miniseed:

In [1]: from obspy.core import read
In [2]: st = read('GEFAKI_BHZ__.msd')
In [3]: tr = st[0]
In [4]: tr.stats
Out[4]:
network: GE
station: FAKI
location:
channel: BHZ
starttime: 2015-03-15T09:59:42.869538Z
endtime: 2015-03-15T12:00:15.369538Z
sampling_rate: 20.0
delta: 0.05
npts: 144651
calib: 1.0
_format: MSEED
mseed: AttribDict({'record_length': 4096, 'encoding': u'STEIM2', 'filesize': 139264, u'dataquality': u'D', 'number_of_records': 34, 'byteorder': u'>'})

`zone`ing functionality after `cluster`ing

@alexgorb

We need separate clustered datafile into 3 files:

  • one for regional,
  • second which intersect area, and
  • other with global data

Region can be specified like the following:

upperlat = 90.
bottomlat= 50.
leftlon  = 240.
rightlon = 280.

Temporary stations remaining tasks for Q1 deliverable

  • Temporary stations metadata generation - make sure can be ingested into sc3.
  • Ingestion of temporary stations time series data into sc3 after QA/QC - need to write complete functionality from available time series directory structure to sc3 ingestion. Check with scart that this data can be queried via sc3. This is dependent on the item above.

picker for P & S waves

  • Picking alogirthms and parameters must be user configurable.
  • Automated by event id. Event id to be pulled from seiscomp3, miniseed dumped, p and s picked and saved to seiscomp3 database as a new event to be used by iLoc.

Explore iLoc functionalities without requiring seiscomp3 database access

My research indicates that all we need is IMS1.0/ISF1.0 or ISF2.0 input files and a station list file.

The isf_stationfile is a simple comma and space separated text file with stationcode, alternate_stationcode, lat, lon, elevation.

This is desirable for many reasons including:

  • users not requiring write permission to seiscomp3 database
  • not creating/duplicating events in the database
  • more flexibility in our overall architecture, provides more isolated working environement, and,
  • not requiring us two seiscomp3 VMs

Explore the receiver function concept

investigate the corrupted data being returned from CWB server for some stations

  1. It is required to retrieve the overlap corrected waveform data from CWB. Using the "-t dcc" switch helps achieve that. But or some of the stations, this is returning corrupted data.
  2. Data was viewed and analysed with scrttv and found to be erroneous and noise laden.
  3. On investigating further it was found that for such stations, only particular time windows have this issue and not the entire 1 month time windows data.

Repeated picks in Engdahl event

Test found the following.

In engdahl event 8881.xml we have these two picks. They are identical except the pick id.

Is that correctly parsed as reported by Engdahl?

    <pick publicID="smi:engdahl.ga.gov.au/pick/2418177">
      <time>
        <value>2008-04-11T10:04:27.000900Z</value>
      </time>
      <waveformID networkCode="" stationCode="CHTO" channelCode="BHZ"/>
      <backazimuth>
        <value>85.91</value>
      </backazimuth>
      <onset>emergent</onset>
      <phaseHint>P</phaseHint>
      <evaluationMode>automatic</evaluationMode>
      <creationInfo>
        <agencyID>ga-engdahl</agencyID>
        <agencyURI>smi:engdahl.ga.gov.au/ga-engdahl</agencyURI>
        <author>niket_engdahl_parser</author>
        <creationTime>2017-11-10T07:53:57.118115Z</creationTime>
      </creationInfo>
    </pick>
    <pick publicID="smi:engdahl.ga.gov.au/pick/2418178">
      <time>
        <value>2008-04-11T10:04:27.000900Z</value>
      </time>
      <waveformID networkCode="" stationCode="CHTO" channelCode="BHZ"/>
      <backazimuth>
        <value>85.91</value>
      </backazimuth>
      <onset>emergent</onset>
      <phaseHint>P</phaseHint>
      <evaluationMode>automatic</evaluationMode>
      <creationInfo>
        <agencyID>ga-engdahl</agencyID>
        <agencyURI>smi:engdahl.ga.gov.au/ga-engdahl</agencyURI>
        <author>niket_engdahl_parser</author>
        <creationTime>2017-11-10T07:53:57.118115Z</creationTime>
      </creationInfo>
    </pick>

Generate 3D travel time inversion input

@alexgorb Did I get these right? Can you fill the remaining ones?

nblock: block id of the source
nst: block id of the station
resid: time residual of arrival
nev: ?
vlon: source longitude
vlat: source latitude
dep: source depth
slon: station longitude
slat: station latitude 
obstt: ?

Output of this exercise will be used in 3d inversion model.
The format of the output needs to be the following with the column names being as above in that order:

   17503  266751   0.3  592032   55.673  86.934  21.6   87.695  43.628  484.800   43.800   1
   17503  272459   0.8  592032   55.673  86.934  21.6   74.620  42.637  490.800   44.500   1
   17503  288465  -0.1  592032   55.673  86.934  21.6  116.175  39.850  522.400   48.700   1
   17503  291184  -1.4  592032   55.673  86.934  21.6   75.980  39.266  516.100   47.900   1
   17503  292720  -0.1  592032   55.673  86.934  21.6   99.814  39.221  522.200   48.600   1

CWB data ingestion into seiscomp3

This is how this can proceed:

  1. Use cwb query to download gap filled and overlap corrected miniseed from the CWB server
  2. Miniseeds need to be pushed into slarchive (can use scart).
  3. Push events (from antelope for now) into sc3. Make sure events overlap the timeseries data. This will help here: https://github.com/GeoscienceAustralia/passive-seismic/tree/master/antelope.
  4. Test that event based miniseed queries work via sc3 utilities (scevtstreams and scart combination)
  5. Automation of the whole process

POC target: 1 month worth of historical data for all primary stations in aws.

Example of cwb query: query -h localhost -t ms -s ".*" -b "2005-02-08 05:37:52.98" -d 1000

Clarification on 3:

  • You can not create the antelope virtualenv inside our sc3 image in aws since antelope uses propriatary python libraries. Instead you will have to export the events in seiscomp3 xml format and copy them across to the sc3 image. This part will require some ingenuity to automate.
  • You will also need to make sure CWB instance contains waveform data that overlap the events that you are exporting from antelope and importing into sc3 in the previous step. @zhang01GA Will help with additional waveform data if you need so.

Some antelope migrated events in seiscomp3 have no arrivals for `preferred_origins`

My tests found for some (or all?) antelope events the preferred origins are missing arrivals. We need to investigate.

pytest test_cluster.py

These are the preferred origins for the two events showing no arrivals.

/home/sudipta/repos/passive-seismic/tests/mocks/events/00974105.xml
Origin
               resource_id: ResourceIdentifier(id="quakeml:ga.ga.gov.au/origin/1126705")
                      time: UTCDateTime(2015, 3, 3, 2, 44, 20, 169000)
                 longitude: 176.1633
                  latitude: -14.8737
                     depth: 10000.0
                   quality: OriginQuality(associated_phase_count=17, used_phase_count=13)
           evaluation_mode: 'automatic'
         evaluation_status: 'preliminary'
             creation_info: CreationInfo(author='regpMwp ms', version='1425351689.25')
                      ---------
                  comments: 1 Elements
/home/sudipta/repos/passive-seismic/tests/mocks/events/00967739.xml
Origin
               resource_id: ResourceIdentifier(id="quakeml:ga.ga.gov.au/origin/1133422")
                      time: UTCDateTime(2015, 3, 21, 0, 18, 54, 304000)
                 longitude: 118.4494
                  latitude: -9.9875
                     depth: 0.0
                   quality: OriginQuality(associated_phase_count=29, used_phase_count=18)
           evaluation_mode: 'automatic'
         evaluation_status: 'preliminary'
             creation_info: CreationInfo(author='rega', version='1426898089.18')
                      ---------
                  comments: 1 Elements

Station XMLs for missing ENGDAHL/ISC stations

We have some additional stations in the ENGDHAL/ISC events that are not available in our seiscomp3.
For now, if possible, any workflow that requires station information should use the files that was emailed by ISC recently. Or for any statistical measure (like inputs to inversion) we can ignore those specific stations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.