isce-framework / dolphin Goto Github PK

View Code? Open in Web Editor NEW

55.0 10.0 17.0 53.04 MB

InSAR phase linking library for PS/DS processing

Home Page: https://dolphin-insar.readthedocs.io

License: Other

Dockerfile 0.43% Shell 0.60% Python 98.91% Julia 0.06%

insar time-series deformation squeesar phase-linking geoscience geospatial-processing remote-sensing

dolphin's Issues

Updating existing PS file in workflow

We've already created the logic to update an existing amplitude dispersion file from a new SLC:

https://github.com/opera-adt/dolphin/blob/11a4a2c4f076d5b1bbe7f137f2295104de7a14f4/src/dolphin/ps.py#L178

However, that was made with the Fringe data format in mind, and we may be using different ones.

How should we specify in the workflow that we want to update?
How should we save the metadata telling what SLCs have gone into an existing amplitude dispersion/mean file?

I want to know the connections and features of COMPASS, dolphin, s1-reader, sweets, and fringe.

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

I need or want [...]

Edges of bursts have artifacts and need to be pruned

The stitched temporal correlation has visible burst lines:

even though this barely affects the stitched phase:

It's very likely caused by just having not a full window of data. With a half window of {x: 11, y: 5}, we see about 11/2 columns and 5/2 rows for each burst that have bad values in the temporal correlation:

but, again, this actually doesn't affect the phase in most places. Only certain areas show problems.

The fix should be either

don't write out a buffer equal to the half window for each processed block during the phase linking step, or
during stitching, morphologically erode some number of pixels equal to a half window.

(1) is probably the better solution

Allow user to specify output spacing in meters, instead of just `strides`

Right now there is a field for output_resolution, but it's not implemented: the strides field must be used to downsample the output.

This should be a relatively simple feature to

get the pixel size/ GeoTransform of the input data
convert the meters given for the output_resolution into a strides
overwrite the strides field (raising an error if they tried to specify both strides and resolution)

Right now we're not attempted to allow a different output projection than the input SLCs, so we shouldn't need to account for any differences in units.

Add regularization so matrix inversion during phase linking doesn't fail

We have the ability to do inversion by specifying beta here:
https://github.com/opera-adt/dolphin/blob/cc74bd516e0cb2ba94bbea9e277ce74fc15fc447/src/dolphin/phase_link/mle.py#L28

but we aren't trying to catch and problem when there's numerical instability/near-singular correlation matrices.

Note of why we haven't had problems so far: numpy seems to be pretty good at giving some inverse even when the correlation matrix is rank 1:

In [35]: n = 5; v = np.exp(1j * np.random.rand(n)); C = np.outer(v, v.conj())

In [36]: np.abs(C)
Out[36]:
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])


In [37]: np.angle(C)
Out[37]:
array([[ 0.  , -0.13,  0.5 , -0.16, -0.42],
       [ 0.13,  0.  ,  0.63, -0.03, -0.29],
       [-0.5 , -0.63,  0.  , -0.66, -0.91],
       [ 0.16,  0.03,  0.66,  0.  , -0.26],
       [ 0.42,  0.29,  0.91,  0.26,  0.  ]])


In [51]: def reg(G, beta=0.05): return (1 - beta) * G + beta * np.eye(len(G))


In [52]: reg(np.abs(C))
Out[52]:
array([[1.  , 0.95, 0.95, 0.95, 0.95],
       [0.95, 1.  , 0.95, 0.95, 0.95],
       [0.95, 0.95, 1.  , 0.95, 0.95],
       [0.95, 0.95, 0.95, 1.  , 0.95],
       [0.95, 0.95, 0.95, 0.95, 1.  ]])

Doing the direct inverse on the outer product (rank 1) means the result is close to singular. Sometimes it works, other times it gives this error:

In [111]: lambdas, evecs = la.eigh(la.inv(np.abs(C)) * C)
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Input In [111], in <cell line: 1>()
----> 1 lambdas, evecs = la.eigh(la.inv(np.abs(C)) * C)

File <__array_function__ internals>:180, in inv(*args, **kwargs)

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/numpy/linalg/linalg.py:545, in inv(a)
    543 signature = 'D->D' if isComplexType(t) else 'd->d'
    544 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 545 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    546 return wrap(ainv.astype(result_t, copy=False))

File ~/miniconda3/envs/mapping/lib/python3.10/site-packages/numpy/linalg/linalg.py:88, in _raise_linalgerror_singular(err, flag)
     87 def _raise_linalgerror_singular(err, flag):
---> 88     raise LinAlgError("Singular matrix")

LinAlgError: Singular matrix

but Cupy will actually not raise any error and just give nans.

If we do a very small amount of regularization, it looks fine:

In [107]: lambdas, evecs = la.eigh(la.inv(reg_alpha(np.abs(C), .0001)) * C)

In [108]: np.angle(evecs[:, 0] * evecs[:, 0][0].conj())
Out[108]: array([ 0.        ,  0.13063866, -0.49638746,  0.16285045,  0.41785788])
# which matches what we want:

In [110]: np.angle(v * v[0].conj())
Out[110]: array([ 0.        ,  0.13063866, -0.49638746,  0.16285045,  0.41785788])

so it seems like the good first step is just to always to a tiny bit of regularization before inverting, since it's the same thing as deflating all the correlation values (which we know are often over estimated anyway). Doing something like checking the matrix_rank takes longer than just calculating the inverse:

In [116]: %timeit la.inv(np.abs(C).astype('float32'))
957 µs ± 33.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [117]: %timeit la.matrix_rank(np.abs(C).astype('float32'))
1.07 ms ± 68.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Update to pydantic V2

some CI tests have started failing because of this: pydantic/pydantic#6079 with the response that it's fixed in V2.

Create the single-update version of the s1_disp workflow

Currently we have a version of the ministack-based workflow (dolphin/workflows/s1_disp_stack.py) to process a historical archive of data. We need the ability to ingest one new SLC, pull some number of old SLCs + saved compressed SLCs to produce a new wrapped phase estimate.

The structure of the core modules (ps.py and the 'phase_link` subpackage) should make it straighforward to add the single-update version; it will be mostly bookkeeping on where/how to pull the required data.

The general outline of this new workflow should be

Locate the SLCs and the Compressed SLCs we will use
Create a VRTStack https://github.com/opera-adt/dolphin/blob/eda6c25057568cf421c8937f0e01a69c774ba514/src/dolphin/workflows/wrapped_phase.py#L37-L41 which should sort the compressed SLCs before the normal ones
(possibly deferred until later): update the pre-computed PS map: https://github.com/opera-adt/dolphin/blob/eda6c25057568cf421c8937f0e01a69c774ba514/src/dolphin/ps.py#L146-L151 . Re-save to some location
Pass the PS pixels and VRT stack through the phase_link.run_mle : https://github.com/opera-adt/dolphin/blob/eda6c25057568cf421c8937f0e01a69c774ba514/src/dolphin/sequential.py#L147-L178
- Note that we won't need to use the sequential.py module; we're only passing one stack through run_mle, so we don't need to do all the mini-stack juggling
- We also only need to save the last layer that will come out of run_mle, since it a number of images equal to the number of input SLCs
Reference the newly estimated SLC phase to something (TODO: make a separate issue for that research topic)

This will probably be divided into several issues for easier testing/tracking.

integration testing setup

Setup a nightly or weekly integration test that runs through the primary workflow on some golden dataset.

The related parts to this are

Create a golden dataset: #15
Set up some way to store this data and pull it for the test (possibly using Zenodo or https://github.com/fatiando/pooch, depending on how big it is)
Figure out how to set up the CI system to have a longer-running test, and to trigger it on a schedule rather than on-push/on-PR

Converting to the new CSLC product specification

We have to switch over some hard coded HDF5 paths now that /science is gone.

We should also probably tolerate both version, at least while we're still testing and have many terrabytes of CSLCs with /science/CSLC/

Create a golden test dataset

Get some number of input bursts
run it through the wrapped phase workflow
stitch and unwrap
verify the output products

Used both for integration testing and for demonstration.

Add type annotations to function which are missing them

Before merging to the stable branch, functions missing type annotations should get them added.

Add automatic reference point selection based on highest quality PS

The default reference point at the end can be a PS with

the lowest value of $D_A$
the highest amplitude

Singular matrix error for CPU workflow

From @taliboliver 's testing:

[02/08 13:44:16] [INFO s1_disp_stack.py] Running wrapped phase estimation
[02/08 13:44:16] [INFO wrapped_phase.py] Creating persistent scatterer file /mnt/aurora-r0/cabrera/data/compass_test/scratch/PS/ps_pixels.tif
100%|███████████████████████████████████████████████████████████████████████████████████| 12/12 [01:01<00:00,  5.14s/it]
[02/08 13:45:19] [INFO ps.py] Waiting to write 0 blocks of data.
[02/08 13:45:19] [INFO ps.py] Finished writing out PS files
[02/08 13:45:19] [INFO wrapped_phase.py] Running sequential EMI step in /mnt/aurora-r0/cabrera/data/compass_test/scratch/linked_phase
[02/08 13:45:20] [INFO sequential.py] VRTStack(19 bands, outfile=/mnt/aurora-r0/cabrera/data/compass_test/scratch/slc_stack.vrt): from /mnt/aurora-r0/cabrera/data/compass_test/stack/t013_026559_iw1/20210601/t013_026559_iw1_20210601_VV.h5 to /mnt/aurora-r0/cabrera/data/compass_test/stack/t013_026559_iw1/20211222/t013_026559_iw1_20211222_VV.h5
[02/08 13:45:20] [INFO sequential.py] Processing 15 files + 0 compressed. Output folder: /mnt/aurora-r0/cabrera/data/compass_test/scratch/linked_phase/20210601_20211104
[02/08 13:45:21] [INFO sequential.py] VRTStack(15 bands, outfile=/mnt/aurora-r0/cabrera/data/compass_test/scratch/linked_phase/20210601_20211104/20210601_20211104.vrt): from t013_026559_iw1_20210601_VV.h5 to t013_026559_iw1_20211104_VV.h5
 11%|████████▊                                                                         | 20/185 [00:09<01:20,  2.06it/s]Traceback (most recent call last):
  File "/home/cabrera/python/miniconda3/envs/opera/bin/dolphin", line 8, in <module>
    sys.exit(main())
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/cli.py", line 22, in main
    run_func(**arg_dict)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/_log.py", line 94, in wrapper
    result = f(*args, **kwargs)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/workflows/_run_cli.py", line 26, in run
    s1_disp_stack.run(cfg, debug=debug)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/_log.py", line 94, in wrapper
    result = f(*args, **kwargs)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/workflows/s1_disp_stack.py", line 61, in run
    cur_ifg_list = wrapped_phase.run(burst_cfg, debug=debug)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/_log.py", line 94, in wrapper
    result = f(*args, **kwargs)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/workflows/wrapped_phase.py", line 71, in run
    sequential.run_evd_sequential(
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/sequential.py", line 165, in run_evd_sequential
    cur_mle_stack, tcorr = run_mle(
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/phase_link/mle.py", line 125, in run_mle
    mle_est, temp_coh = _run_cpu(
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/phase_link/_mle_cpu.py", line 66, in run_cpu
    output_phase = mle_stack(C_arrays, beta, reference_idx)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/phase_link/mle.py", line 208, in mle_stack
    Gamma_inv = xp.linalg.inv(Gamma)
  File "<__array_function__ internals>", line 180, in inv
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/numpy/linalg/linalg.py", line 552, in inv
    ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/numpy/linalg/linalg.py", line 89, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix

This is not happening for the GPU workflow. Numpy is louder with it's erroring for inversion than cupy is.

Add ability to use SHP neighborhoods during phase linking

WIP PR in #67

Switch `provision-with-micromamba` to `setup-micromamba`

Deprecation warnings have started printing, need to switch Docker/github actions to this https://github.com/mamba-org/setup-micromamba

[Bug]: PS masks and rasters have no `nodata`

Checked for duplicates

Yes - I've already checked

Describe the bug

The PS pixels currently look like this

What did you expect?

Two problems:

the top is zero when it should be nodata (it was skipped in the iteration)
The nodata wasn't saved

Environment

>>> dolphin.show_versions()
dolphin info:
     dolphin: 0.0.2.post1.dev4+g9999481
       isce3: 0.9.0-dev+16d8777ce-dirty
     compass: 0.1.3

System:
      python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0]
  executable: /u/aurora-r0/staniewi/miniconda3/envs/mapping/bin/python
     machine: Linux-3.10.0-1160.59.1.el7.x86_64-x86_64-with-glibc2.17

Python deps:
       numpy: 1.23.4
       numba: 0.56.3
  osgeo.gdal: 3.6.1
        h5py: 3.7.0
 ruamel_yaml: 0.15.80
    pydantic: 1.10.2
  setuptools: 62.6.0

Write out the `Workflow` config description into YAML as a comment

The YAML file currently gets the default filled, but the descriptions are in python only. @gmgunter point out that it would be a very nice thing if the descriptions could be written as comments so that a user could output all the schema to easily view the options.

This can likely be done with the .schema method for the pydantic Model objects. For example,

In [24]: from dolphin.workflows import config
In [25]: o = config.Outputs()

In [27]: o.schema()
Out[27]:
{
    'title': 'Outputs',
    'description': 'Options for the output format/compressions.',
    'type': 'object',
    'properties': {
        'output_format': {'description': 'Output raster format for the workflow final product.', 'default': 'NetCDF', 'allOf': [{'$ref': '#/definitions/OutputFormat'}]},
...

In [28]: {name: v['description'] for name, v in o.schema()['properties'].items()}
Out[28]:
{
    'output_format': 'Output raster format for the workflow final product.',
    'scratch_directory': 'Directory to store intermediate files.',
    'output_directory': 'Directory to store final output files.',
    'output_resolution': 'Output (x, y) resolution (in units of input data)',
    'strides': 'Alternative to specifying output resolution: Specify the (x, y) strides (decimation factor) to perform while processing input. For example, strides of [4, 2] would turn an input resolution of [5, 10] into an output resolution of [20, 20].',
    'hdf5_creation_options': 'Options for `create_dataset` with h5py.',
    'gtiff_creation_options': 'GDAL creation options for GeoTIFF files'
}

Geoff also pointed out the built in textwrap library that can be used to format the longer descriptions into a nicer comment

What format do I need to prepare data for processing.

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Describe the feature request

I need or want [...]

`sequential.py` only checks for full completion before reprocessing

Found by @mirzaees , the sequential workflow will reprocess all ministacks unless it reached full completion. It should check for existing ministacks.

Related to #110

Corrections to unwrapped phase results

Placeholder issue for which corrections we will use (e.g. topospheric, ionospheric, Solid Earth Tide, bulk plate motion) and how we will implement this.

An ionospheric data download script was added here: opera-adt/disp-s1#9
We can get the input date list by pointing at the SLCs or whichever files and using group_by_date:

import operator
from disp_s1 import ionosphere
from dolphin import utils

dates_to_files = utils.group_by_date(Path("delivery_data_medium/input_slcs/").glob("*.h5"))
# {(datetime.date(2022, 11, 7),): [PosixPath('delivery_data_medium/input_slcs/t042_088906_iw1_20221107.h5')], ...
# The keys are tuples of all dates in the filenames. We just want the first one in each filename
date_list = list(map(operator.itemgetter(0), dates_to_files.keys()))
out_dir = Path('.')
ionosphere.download_ionex(date_list, out_dir)

[Bug]: Compressed SLCs are getting created incorrectly for the single update workflow

Here's a sample correlation matrix using 1 Fringe-created CompSlc and 5 normal SLCs:

Here's the same correlation matrix, testing with dolphin functions (top row)

(edit: this was not the result of a workflow, but was from testing each step without referencing to the first date)

The fact that it's lower correlation than all others is a clear bug, though it was basically getting ignored in the full sequential workflow (why it's been undetected until now).

Using the Frame DB

Add the capability to pull from some version of the Frame/Burst database so we can input Frame ID, output a List[BurstId].

About tutorial

Is there a tutorial for general users? A Jupyter notebook would be very useful for beginners to learn about this package.

MacOS CI test is failing, possibly due to pymp

https://github.com/opera-adt/dolphin/actions/runs/5892579338/job/15982198860?pr=111

Not sure why this is happening. Turning off Numba JIT did not work in this case either: https://github.com/opera-adt/dolphin/actions/runs/5869555083/job/15914774930 so it doesn't seem like a numba-caused segfault. It might be related to calling fork, possibly badly interacting with pytest-xdist.

Best option might be to drop pymp and finish the no-pymp branch, or use one of the ideas from here

Setting thread counts for each process

For the purposes of benchmarking the CPU-version runtime, we'll want to control both the number of processes, and also the number of threads per process that numpy uses.

At first, we can simply do export OMP_NUM_THREADS=1 or OPENBLAS_NUM_THREADS, either manually ourselves or in the code like fringe does, though a more robust way might use https://github.com/joblib/threadpoolctl to detect the correct number
Since this will affect the runtime, we should record the number in addition to the cpu_count used

Add Tophu option for unwrapping step

Add the ability to use Tophu if isce3+tophu are installed
This also means we need to decide how optional to make isce3/tophu for Dolphin (which may be useful for people who would like to do their own unwrapping)

Example walkthough notebook

Create a notebook which demonstrates the basic usage of Dolphin which

shows what the necessary inputs are
demonstrates some of the possible configuration parameters a user might tinker with
(optional) shows how to set up the GPU capabilities (maybe this should be separate)
(TODO: do we want to provide the golden dataset for demonstration from #15 ? or tell them how to make their own CSLC stack?)
Plots the outputs and computed quality metrics

possible extra idea: the demo notebook is also the integration test #13 , maybe by using some library like https://papermill.readthedocs.io/en/latest/usage-execute.html , or just similar to how https://github.com/fastai/nbdev executes the notebooks as a verification test

Set up `nox` for local multi-python testing

[Bug]: geotransform of `linked_phase` doesn't account for the `strides` that are set

$ grep -n2 strides dolphin_config.yaml
741:  strides:
742-    x: 6
743-    y: 3


(mapping) [staniewi@aurora full-frame-30m]$ shape scratch/t087_185678_iw2/PS/ps_pixels.tif
1,4710,20230
(mapping) [staniewi@aurora full-frame-30m]$ gdalinfo scratch/t087_185678_iw2/PS/ps_pixels.tif | grep Pixel
Pixel Size = (5.000000000000000,-10.000000000000000)

(mapping) [staniewi@aurora full-frame-30m]$ shape scratch/t087_185678_iw2/linked_phase/20180306.slc.tif
1,1570,3371
(mapping) [staniewi@aurora full-frame-30m]$ gdalinfo scratch/t087_185678_iw2/linked_phase/20180306.slc.tif | grep Pixel
Pixel Size = (5.000000000000000,-10.000000000000000)

The output is right, it's just the geo-metadata that needs to be updated for the stride factor.

Benchmarking runtimes

Top level issue tracking the improvements/additions needed to track a full-frame workflow runtime.

Current knobs to test are

Using CPU vs GPU
SLC input stack size
Output posting (or equivalently strides)
Size of blocks loaded at one time from the input stack
For CPU, number of CPUs/number of threads per CPU
Algorithm for phase linking (MLE vs EVD)

I've put these in the order of my guess for which will have the biggest effect, but we clearly need to do the tests to see.

Things we need for good testing

the single-update workflow script (#11)
recording the threads (#28)
recording the block size/fixing the max_ram_gb option (#32)
adding the ability to use EVD instead of MLE (#138 )
using vmtouch -e on the SLC stack before starting the workflow: https://github.com/hoytech/vmtouch . This will make sure we don't get very fast runs just because the SLC data as cached, as we can't count on that happening for the production runs.

[Bug]: workflows with large `strides` fail on the `fill_ps` stage

Checked for duplicates

Yes - I've already checked

Describe the bug

For larger striding windows, the s1_disp workflow is failing here:

  File "/u/aurora-r0/staniewi/repos/dolphin/src/dolphin/phase_link/mle.py", line 319, in _fill_ps_pixels
    mle_est[i][fill_mask] = slc_stack[i][slc_r_idxs, slc_c_idxs] * ref
ValueError: NumPy boolean array indexing assignment cannot assign 595 input values to the 596 output values where the mask is true

Block size setting and fixing the `max_ram_gb` option of `dolphin config`

The --max-ram-gb option of dolfin config is a little confusing right now:

It's used to determine the block size to load at one time:

https://github.com/scottstanie/dolphin/blob/236273ffd5d69bc3125d33c6ebd5862529631c89/src/dolphin/io.py#L729

# Find size of 3D chunk to load while staying at ~`max_bytes` bytes of RAM

The extra logic in the functions is because we want to load entire HDF5 chunks together, rather than just arbitrary pixels.

But it's not setting a hard limit on the total RAM used by the workflow.

Implementing the latter feature could be difficult, but we'll need to know it for AWS deployment purposes. We might have to just see through testing how the size of the loaded blocks affects the peak RAM usage.

We should probably use psutil to get the system resources if the user does not provide a desired memory limit.

In [3]: psutil.virtual_memory()
Out[3]: svmem(total=34359738368, available=14455209984, percent=57.9, used=16429285376, free=508936192, active=13963444224, inactive=13808140288, wired=2465841152)

In [4]: psutil.virtual_memory().total / 2**30
Out[4]: 32.0

In [5]: psutil.virtual_memory().available / 2**30
Out[5]: 13.456863403320312

In the mean time, we should make sure that we log the size of blocks we're loading, and maybe log the peak ram it leads to.

Using `SharedArray` instead of pymp

Reason: pymp works okay, but has some weirdnesses like not being able to run with python -m pdb, and possible leaving open memory in processes.

Demo: https://gist.github.com/scottstanie/2c39bcfc2de2cd652f4520f116b9ce81

Make a class to hold the original array
Create the shared memory pointer

Add a `dolphin plot` option with some summary plots

It seems like having a dolphin plot to run after dolphin run would be good to inspect the results, quality check, etc.

We could have a plot/ folder at that has it's own optional requirements
This might be a good use for papermill to run some pre-made jupyter notebook and pass the parameters of the output file locations
We could make things interactive if we have it run a jupyter notebook, like having a covariance-matrix-plotter that responds to clicks

Copying over platform attributes from CSLC products

Talking with @taliboliver about what mintpy would like to see, there's a bunch of stuff we need to grab from the CSLC metadata (which needs to be included anyone in the more complete product)

start/end time from all bursts
radar wavelength

(to be completed/added to)

Add the ability to use EVD for the phase linking algorithm

Right now we only have the MLE algorithm implemented for the actual workflow. Add EVD as an option in case people want that instead, as it might be faster (especially for the CPU)

Make cleaner separation between workflow stages to help with AWS deployment

For the possibility of running each of the burst wrapped phase estimation on separate small AWS nodes, the first part of the workflow should be broken out slightly more. Currently it's easy to run in parallel on one machine, but the multi-machine case may require more output staging.
This would also have the benefit of making the first section more resilient to terminated spot instances.

I'm picturing that instead of just one Workflow which is groups for inputs, outputs, ps, phase_linking, interferograms, unwrapping, we might have 3 high level config classes,

WrappedPhase, controlling what's currently run in workflows.wrapped_phase.py
~~Stitch, the first part of stitch_and_unwrap.py~~
Unwrap

The reasons for this split:

WrappedPhase happens independently for 27 bursts in a frame. It doesn't require much RAM, and it could be run cheaply on like a 4 CPU machine (easier than provisioning one 4*27 CPU machine). This does the PS, phase linking, and ends with some interferograms being formed from the phase linking outputs.
- This step could use GPU machines effectively, but isn't necessary
~~Stitch is basically all input/output, mostly using gdal_merge.py, and could really be like a 1 CPU machine.~~ (not worth separating this from the unwrap stage)
Unwraping is a lot more variable... TBD on best machine types for this, but it could be small-ish, and have one job launched per interferogram

Progress bar of wrapped phase is broken

E.g.

Ignoring the mean of empty slice error, it ended at 86/195 progress.
This means that this part is probably wrong: https://github.com/opera-adt/dolphin/blob/cc74bd516e0cb2ba94bbea9e277ce74fc15fc447/src/dolphin/sequential.py#L151-L153

Creating a separate object from the main `Workflow` for SAS/PGE purposes

There needs to be a new YAML interface for SDS which looks something like this:

runconfig:
  name:
  groups:
    pge_name_group:
        pge_name:

    input_file_group:
      input_file_path:
    dynamic_ancillary_file_group:
        ....
      *algorithm_parameters*:
    primary_executable:
      product_type:

    product_path_group:
      product_path:
      scratch_path:
      sas_output_path:
      product_version:

    log_file:

where algorithm_parameters has the rest of the things in the current Workflow.

Current plan: make a new separate SasWorkflow which contains (probably easier than subclassing) the Workflow, then has more custom logic for going to/from YAML.

Background loading / data pre-fetching

Currently the data IO might be the largest percentage of the total run time prior to unwrapping.

We should use something like https://github.com/isce-framework/isce3/blob/develop/python/packages/isce3/io/background.py to pre-fetch data blocks

Set up intermediate files with a `tempdir`, only transfer to file name after completion

To make the workflows more robust against interruption/AWS instance termination, the intermediate files should have a different name from the final one as the write out results block-by-block. The idea is similar to the browser download which writes to a ".random characters.tmp", then moves the result to <filename>.

Similarly, the unwrapping result should use this, as the raster is created at the start and written. This might require an isce3 change.

Make sure the `nodata` mask for burst interferograms happens at intersection, not union, of `nodata`

Seems like one of the two SLCs extends further down than the other, leading to garbage at edges
This might be only happening in the PS mask.

Ministack of size 1 fails for the stack workflow

@taliboliver pointed out the cryptic error

[02/07 11:22:58] [INFO sequential.py] VRTStack(1 bands, outfile=/mnt/aurora-r0/cabrera/data/compass_test/scratch/t013_026557_iw2/linked_phase/20210806_20210806/20210806_20210806.vrt): from t013_026557_iw2_20210806_VV.h5 to t013_026557_iw2_20210806_VV.h5
^M  0%|          | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
...
"/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/sequential.py", line 165, in run_evd_sequential
    cur_mle_stack, tcorr = run_mle(
  File "/home/cabrera/python/miniconda3/envs/opera/lib/python3.9/site-packages/dolphin/phase_link/mle.py", line 91, in run_mle
    _, rows, cols = slc_stack.shape
ValueError: not enough values to unpack (expected 3, got 2)

This can happen if 1 date is passed to a stack, or if COMPASS bursts are passed that only have 1 date.

There's probably two fixes needed

better logging/warning/error if the user passes a single date, instead of a stack. Maybe enforce some minimum stack size, under which the phase linking will likely be garbage
adjustment to the ministacks so that we don't get this error if, e.g., 16 dates are passed with a ministack size of 15

Skipping nodata based on the provided polygon

take away the "get_nodata_mask" that reads image data
Make a reader for the COMPASS nodata Polygon

Demo

# This one would make us add "shapely" as a requirement
from shapely import ops, wkt
import h5py

polygons = []
for f in slc_filenames:
    with h5py.File(f) as hf:
        poly = wkt.loads(hf["science/SENTINEL1/identification/bounding_polygon"][()].decode("utf-8"))
        polygons.append(poly)

ops.unary_union(polygons)

without shapely:

from osgeo import gdal, ogr
geom1 =  ogr.CreateGeometryFromWkt(wkt1)
geom2 =  ogr.CreateGeometryFromWkt(wkt2)
geom = geom1.Buffer(50).Union(geom2)

# ... do in a for-loop
geom.ExportToWkt()

# https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html?highlight=rasterize
# something with gdal.Rasterize(...)
# or maybe this 
# https://github.com/rasterio/rasterio/blob/51efb0d7a8afbac2df2cc31aa7840e4f8a32b4bf/rasterio/_features.pyx#L260

[Bug]: Maximum-brightness selection for PS in the strided output is broken

The phase linking result with no ps_mask looks like this for one day:

The code where the PS pixels are filled by taking the brightest magnitude pixel in each stride window has bad artifacts:

For a temporary fix, the new PR will just average the phase of PS pixels within the stride window (which is much easier to get the implementation right):

Two side notes

the PS mask, even at 0.25 $D_A$ threshold, looks like this, where most of Mauna Loa is a PS:

2. It looks like the PS pixels are still somewhat noisy... (zooming in)

this might require the addition of Sara's PS pruning methods, even at the lower 0.25 threshold.

Save the stitched PS mask to an output layer

For multi-burst workflows, the PS files are in separate folders.

Save the multi-looked version that matches what is used to fill the output grid
Stitch them together to make one mask
Save (compression should make it very small)

Document the unwrapping setup if not using Tophu

The current unwrapping script is assuming that there's the snaphu executable compiled and on the $PATH.

We should document this for people who want the unwrapping to run after the interferogram stitching
Ideally snaphu would be conda-installable. This won't be a lot of work, but i'll need to send my early attempt to ryan to check what the conda-forge maintainers will want changed about the build script

Logging improvements

There are some additions/fixes we can do to the current logging setup. Opening this now while waiting to get more specific requirements on what PGE might need

@gmgunter pointed out that the "Max memory usage" doesn't make it to the log file, if you pass one. that's a bug
There's work to check that this isn't happening in other spots. The initial file logging was set up quickly and not tested much

It's also worth checking for any places where I might be forcing the output of logs in the "library" portion of dolphin (as opposed to the "application" portion, i.e. dolphin.workflows). The python logging recommendations are to only attach a NullHandler and let applications decide how they want your logs.

possible useful model: https://stackoverflow.com/questions/19425736/how-to-redirect-stdout-and-stderr-to-logger-in-python

Combining the PS-selection with SHP-finding and phase-linking

Currently PS selection happens in a separate step. This means we load the data stack once, save/update the PS, then reload all blocks to phase link.

Should we do the PS selection + SHP finding + phase link all in same loading period?

The timings in #6 show that when we calculate PS on a whole stack, the loading time is the majority of the time and we'd get a large improvement by only loading stack blocks once.

But, it won't be as a big a speedup for online-updating of the PS, where we'd only pull and iterate over 1-2 SLCs to get a new PS map. Thus this is low priority.

Marking "noisy" acquisitions/output products

Q. How could we mark "noisy" acquisitions that aren't worth pulling?

We will be pulling $k$ old SLCs along with each new SLC to estimate/produce a new product. We almost certainly to not want to just pull the $k$ most recent SLCs (c.f. the Norwegian InSAR service, who basically skips the entire winter).

This is a two part issue:

What quality metrics can we produce that best tell is what are the old SLCs that are worth re-pulling?
Operationally, how can we mark our output products (or label Sentinel products) so we know that certain historical ones aren't worth using to estimate the latest phase?

isce-framework / dolphin Goto Github PK

dolphin's Issues

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Describe the bug

What did you expect?

Environment

Checked for duplicates

Alternatives considered

Related problems

Describe the feature request

Checked for duplicates

Describe the bug

Recommend Projects

Recommend Topics

Recommend Org