The karabo-pipeline from i4ds

Fix RASCIL_DATA directory not found error in current RASCIL packages.

The data directory env variable is not picked up in Jupyter kernels. Provide a better fix, like putting the directory at the correct destination without the need for a variable.

Simulated data from SKA Data Challenge

Idea by Daniel Schärer ([email protected]): Use the simulations that were used for the SKA Data Challenge as well. The people organizing the challenge have probably invested a lot of time to make sure that the data is representative.

Maybe we could ask those people for the input data, re-create their instrument simulation and benchmarking?

Investigate worker file access solutions

When using a distributed cluster, all workers must also be able to get certain files. For example for the rascil_imager the workers need the ms file on their file system. When working locally you can just set the directory.

However for actual distributed clusters some solution would be good, also because current api's from dask for uploading files to the workers are not satisfactory.
For example client.upload_file uploads a file to all workers, however this is first of all slow if there are too many workers and an ms file is a directory so it cannot be uploaded with this function.

Maybe we need more functions , like upload directory (recursive file upload) or some sort of a better distributed file system integration?

Document ALMA antenna configuration

The ALMA configuration should not have this many antennas. This needs fixing.

Identify CSCS execution models

We should get answers to these questions:

Can we develop remotely on CSCS?
Can we mix local/remote running? I.e. develop locally, and run computationally intensive parts on CSCS?
(How) can we let the public use CSCS infra?
Should we use containers/Docker or not?
How should we deal with Big Data? What are our options? Do we co-locate computation with data?

We should come up with a (written-down) vision how we could/should use the CSCS infra.

Fix CUDA builds for OSKAR.

The Linux builds for OSKAR with Cuda are working, but when they are then imported on a host machine. The linker cannot find the cuda runtime libraries.
Investigate why the libraries cannot be found on the host even if the cudatoolkit is added as a dependency for conda.
If the libraries are found it should be possible to run the OSKAR version with CUDA on non-cuda enabled devices. As the OSKAR implementation always checkes if CUDA devices are actually present.

For now the OSKAR conda packages are working but only with CPU computing, which is a bit slower.

Planting radio sources onto the Healpix grid

The radio sources from GLEAM survey needs to be transferred to the all-sky Healpix coordinates.

Make plot axis in Source Detection notebooks consistent with standard practice

(From Marc Audard) Currently, the coordinate axes in the Source Detection notebook is unexpected for scientists: They usually expect the x-axis in the opposite direction. In our notebooks we try to work around this by negating coordinates instead, but then plot them with a positive x-axis:

Instead we should not negate coordinates and instead flip the coordinates in our plots.

Documentation for Science users

We should provide an introduction & overview for science users.

Overview
Tutorials
Examples
FAQs

Encapsulate current usage of OSKAR into a karabo module

Create a module for simulation with specificed functions to encapsulate the current usage of OSKAR and make it easer for use.

Support galaxy classification/parametrization scenarios

Slava is interested in classification of galaxies. Emma knows about groups that are looking for asymmetric galaxies. For those kinds of problems, our pipeline should support development and benchmarking of classification and parameter estimation models.

Add satellite RFI simulations from Marta Spinelli

Marta Spinelli has code that does RFI modelling from satellites (maybe other sources as well). Having RFI is important for intensity mapping, probably for other use cases as well.

Support multiple ALMA configuration cycles

Omkar Bait Today at 3:58 PM
Hi All. Is it possible to change the ALMA telescope configuration depending on the ALMA cycle id in the current pipeline?

4 replies

Rohit Sharma 1 hour ago
Yes, definitely. Just point me to the link of the configurations.

Omkar Bait 45 minutes ago
Thanks. This link has all the configurations upto Cycle 8: https://almascience.eso.org/tools/casa-simulator

Mark Sargent 44 minutes ago
Note, however, that within a given cycle various configurations are used.

Omkar Bait 42 minutes ago
Yes, the configuration file is tagged as cycle_number.x, index x denotes different configurations within a cycle. I guess this can be used as an input parameter.

Include Haslam all-sky radio maps

Yes, the most ideal example would be to look at the all-sky radio maps. Here is the link to download fits files: https://lambda.gsfc.nasa.gov/product/foreground/fg_2014_haslam_408_get.cfm
We can use these files for the sky module to simulate the sky.

Originally posted by @rohitcbscient in #1 (comment)

Fix BDSF Build for MacOS

BDSF building for MacOS is not working.
The support for mac-os is not crucial. Also the team at PyBDSF does not really "support" mac-os. The installation is more of a community effort.

Add Cygnus A and Centaurus A data sets

In addition, @rohitcbscient has downloaded some observations in the .fits format of "Cygnus A" & "Centaurus A" from:

https://ned.ipac.caltech.edu/byname?objname=Cygnus+A&hconst=67.8&omegam=0.308&omegav=0.692&wmap=4&corr_z=1

We should be able to put them into Oskar somehow at some point

Originally posted by @Lukas113 in #5 (comment)

Automate github pages sphinx build

Add Unit Test infrastructure

We should have unit tests for our features, integrated with the CI pipeline.

Bundle Pinocchio with Karabo

Integrate https://github.com/pigimonaco/Pinocchio into Karabo. @PascalHitz knows the Pinocchio Code and how to use it.

Goals:

Integrate synthetically generated test data sets from Slava

Slava suggested we send him our initial test data sets (#5), and his team should be able to create new synthetic data with a similar distribution that might be interesting for more tests.

Implement a Proof-of-Concept "Quality Benchmark"/Analysis Module

Implement a very basic, first analysis module of the pipeline. Ideas:

Compute difference of reconstructed image vs. ground-truth image
Compute difference of estimated parameters to ground-truth-parameters of synthetic sources
Record execution times, and compare with a baseline (my algorithm vs. CLEAN)

Is DASK well-supported on the CSCS infrastructure?

Does DASK suffice to make use of CSCS infrastructure? Do we need more?

What type of data storage does CSCS provide?

Can different users use the same data storage? Is there already something implemented like this on CSCS.
This is useful for when massive files are interesting for different users.

Create a template to deploy/run code for 3rd party developers

The stimela interface is mentioned as an issue example for a system to adapt it with more algorithms. see.: https://github.com/ratt-ru/Stimela

Define an interface
Make sure the interface can be used for imaging for emma's team

Enable the pipeline to be used for testing/development by Slava's group

Contact: Slava Voloshynovskiy http://sip.unige.ch/team/prof-sviatoslav-voloshynovskiy/

Slava would like to use the pipeline for various algorithm development. Having a scientifically accurate simulation & test pipeline would be interesting to his team. Kinds of algorithms they would like to develop:

Image reconstruction
Parameter estimation

They are not radio-astronomy specialists. Don't know file formats, tools, problems, file sizes, .... The pipeline should make it easy to focus on parameter estimation only (Visibility or Image --> Parameter), without having to deal with other aspects.

Make sure there is a representative data set for development with Ground Truth
Contact Slava
Discuss his team's requirements (Runtime, Programming Languages, Platform, Data Amounts, Benchmarks, ...)
Implement MVP/prototype

Version controlling of packages

Currently the latest versions of the used packages like RASCIL or Oskar are installed. However, these are still under development. So we have no exact control over it and it may cause problems for our framework.

Prepare first data set

The pipeline should come with a set of test data. Could be simulated, real-world data, or both.

For #8 we'd like to have realistic, real-world data sets and simple, synthetic tests.

Integrate WSClean

WSClean is nicely packaged with Spack in http://git.astron.nl/RD/schaap-spack (Stefano Corda @ EPFL knows more, Manuel Stutz is in contact)

Fix test errors in branch "tm_configurations"

Your changes in the (renamed) "tm_configurations" branch lead to test failures. Can you fix the tests and create another PR?

Traceback (most recent call last):
  File "/__w/Karabo-Pipeline/Karabo-Pipeline/karabo/test/test_telescope.py", line 36, in test_read_meerkat_file
    self.assertEqual(len(tel.stations), 64)
AssertionError: 80 != 64

Encapsulate current usage of RASCIL into a karabo module

Add a new module named Imaging and define interfaces for imaging, with rascil as an implementation.
Hide dask features and other implementation details in our implemantion.

Create Conda Package of Rascil for easy dependency management

Currently, rascil is only installable via pip. Create a Conda package for rascil, so that the installation of the karabo-pipeline is truly a one-liner.

Documentation for Software/Algorithm developers

We should make sure that developers on other teams have a smooth entry when they work with our code.

Brief explanations how Karabo can be extended with new

Data sets
Imaging algorithms
Source Detection algorithms
RFI/noise simulations

Visibility output format

How are simulated visibilities stored/output? What is the API and File Format we should use initially? What existing formats are there? Pros/Cons?

tools21cm

UZH provides a simulation tool for 21cm radiation and simulation of foreground signals

should we consider this package for our pipeline?
is this kind of radiation covered with the UZH simulation of WP-C331?

Update docker image name from "SKA"

The docker image is currently called ghcr.io/i4ds/ska:main. Should be updated to "karabo" or "karabo-pipeline"/...

Add Random and systematic noise (from atmosphere)

From Omkar Bait: Also, can we add random and systematic noise (from atmosphere) to the visibilities, particularly for ALMA like frequencies? For example, the CASA simulator has a pwv parameter for noise from water vapour: https://casadocs.readthedocs.io/en/stable/api/tt/casatasks.simulation.simalma.html#casatasks.simulation.simalma

Sky simulation output format

What is the interface between the Sky simulation and the Telescope module? What file format/API should we use initially? What are the pros/cons of the already existing formats?

Include Losito, a LOFAR simulation

https://github.com/darafferty/losito
Can this package be useful for us?

Enable the pipeline to be used for Bluebild testing/development

Contact: Emma Tolley (https://people.epfl.ch/emma.tolley)

Emma would like to use the pipeline for Bluebild development. Having a scientifically accurate simulation & test pipeline would be interesting to her team. Bluebild does the "Visibility --> Image" conversion, probably without calibration. Emma would like the pipeline already as-is as starting point.

Make sure Emma's group has access to the simulation pipeline
#35
- Add interface for adding new algorithms (https://github.com/i4Ds/Karabo-Pipeline/wiki/API-Extensibility-with-existing-and-new-algorithms)
Implement MVP/prototype

Comparison of Karabo and the Hirax/Devin's Pipeline

No noise, no RFI
Trivial sky model
Are the results similar?

Add data sets from Olga Taran

From Olga Taran ([email protected]):

[...] we expect to have the first simulated data (about 1 thousand) next week.
With respect to the ms files I am not sure because they are quite heavy and in this case we need to find the proper storage
since you can not have direct access to our servers due to the University rules and it might take some time.
At the same time, the true sky model, the dirty and clean images we will share with you via one of our clouds since they are less heavy. The produced images will be of size 512 x 512 pixels and the sources will be of different intensity and shape.

Generated telescope.tm as tmp does not work

I've seen that the temporary construction of a telescope.tm file and set it in the OSKAR simulation (simulation.set_telescope_model in interferometer.py) leads to incorrect visibilities (I've not seen any sources in the produces image from this visibilities). However, when I take the same constructed telescope.tm file and refer to it in the setting tree I got the correct visibilities. Therefore, I made a hotfix where I just refer to the telescope.tm file in the data folder.

Coordinate conversion function from Lat/Long to image pixels

For #20 @Lukas113 needs a coordinate conversion from Lat/Long to image pixels.

Give the pipeline a new name

The pipeline should have good name, people can pronounce

Test Source Detection workflow directly on dirty image (without CLEAN)

Slava suggested we should test the source detection workflow directly on the dirty image, without running CLEAN.

Bluebuild Team Requirements

Runtime ?
Programming Languages
- C++, CUDA and Python
Platform
- Linux (seems so)
Data Amounts
Benchmarks
- Simulated data, single timestep

Add SKA LOW and SKA Mid Telescope Configurations

Create new functions in the telescope.py to, so the users can get the SKA Low and SKA mid telescope configurations with a single function call.

imaging with bigger cellsize

During my tests with source detection, I noticed that when I wanted a larger image, the sources disappeared. I did this with a filtering of the GLEAM survey with ra=250 dec=-80 and a filtering of radius 0.55 degree from the phase center. The imaging parameters were 2048 pixels and a cellsize of 1.1079e-5 (1.1079e-52048180/pi = 1.3 degree, which should completely cover the 1.1 degree from the filtering). However, when I used a cellsize of 3.8785e-5 (about 4.5 degrees), the sources were present at the calculated locations as expected. If I took a cellsize of 1.70044231e-5 (ca 2 degree) the sources were at the expected locations, but they had an interesting pattern (major axis FWHM, minor axis FWHM and position angle were also provided).

This was only the result of a few tests, i.e. it was not tested extensively. Therefore it is worthwhile to reproduce this and find out what the reason is. My guess is that I did not understand all the imaging parameters and therefore did not set them correctly.

Test Noise Implementation of OSKAR

benchmark
add interface if the benchmark is successful

Add source detection and benchmarking metrics

Research of existing source detection packages to detect sources in the deconvolved images.
Also add possibility to benchmark the source detection. Research existing benchmark metrics for source detection

i4ds / karabo-pipeline Goto Github PK

karabo-pipeline's People

Contributors

Stargazers

Watchers

Forkers

karabo-pipeline's Issues

Recommend Projects

Recommend Topics

Recommend Org