Giter Site home page Giter Site logo

i4ds / karabo-pipeline Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 5.0 239.8 MB

The Karabo Pipeline can be used as Digital Twin for SKA

Home Page: https://i4ds.github.io/Karabo-Pipeline/

License: MIT License

Shell 0.01% Jupyter Notebook 92.76% Python 7.20% Makefile 0.01% Dockerfile 0.03%
ska

karabo-pipeline's People

Contributors

acsillaghy avatar chrisfinlay avatar cvoegele avatar fschramka avatar jejestern avatar kenfus avatar lmachadopolettivalle avatar lukas113 avatar mpluess avatar rohitcbscient avatar sfiruch avatar trailfog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

karabo-pipeline's Issues

Simulated data from SKA Data Challenge

Idea by Daniel Schärer ([email protected]): Use the simulations that were used for the SKA Data Challenge as well. The people organizing the challenge have probably invested a lot of time to make sure that the data is representative.

Maybe we could ask those people for the input data, re-create their instrument simulation and benchmarking?

Investigate worker file access solutions

When using a distributed cluster, all workers must also be able to get certain files. For example for the rascil_imager the workers need the ms file on their file system. When working locally you can just set the directory.

However for actual distributed clusters some solution would be good, also because current api's from dask for uploading files to the workers are not satisfactory.
For example client.upload_file uploads a file to all workers, however this is first of all slow if there are too many workers and an ms file is a directory so it cannot be uploaded with this function.

Maybe we need more functions , like upload directory (recursive file upload) or some sort of a better distributed file system integration?

Identify CSCS execution models

We should get answers to these questions:

  • Can we develop remotely on CSCS?
  • Can we mix local/remote running? I.e. develop locally, and run computationally intensive parts on CSCS?
  • (How) can we let the public use CSCS infra?
  • Should we use containers/Docker or not?
  • How should we deal with Big Data? What are our options? Do we co-locate computation with data?

We should come up with a (written-down) vision how we could/should use the CSCS infra.

Fix CUDA builds for OSKAR.

The Linux builds for OSKAR with Cuda are working, but when they are then imported on a host machine. The linker cannot find the cuda runtime libraries.
Investigate why the libraries cannot be found on the host even if the cudatoolkit is added as a dependency for conda.
If the libraries are found it should be possible to run the OSKAR version with CUDA on non-cuda enabled devices. As the OSKAR implementation always checkes if CUDA devices are actually present.

For now the OSKAR conda packages are working but only with CPU computing, which is a bit slower.

Make plot axis in Source Detection notebooks consistent with standard practice

(From Marc Audard) Currently, the coordinate axes in the Source Detection notebook is unexpected for scientists: They usually expect the x-axis in the opposite direction. In our notebooks we try to work around this by negating coordinates instead, but then plot them with a positive x-axis:
image

Instead we should not negate coordinates and instead flip the coordinates in our plots.

Support galaxy classification/parametrization scenarios

Slava is interested in classification of galaxies. Emma knows about groups that are looking for asymmetric galaxies. For those kinds of problems, our pipeline should support development and benchmarking of classification and parameter estimation models.

Support multiple ALMA configuration cycles

Omkar Bait Today at 3:58 PM
Hi All. Is it possible to change the ALMA telescope configuration depending on the ALMA cycle id in the current pipeline?

4 replies

Rohit Sharma 1 hour ago
Yes, definitely. Just point me to the link of the configurations.

Omkar Bait 45 minutes ago
Thanks. This link has all the configurations upto Cycle 8: https://almascience.eso.org/tools/casa-simulator

Mark Sargent 44 minutes ago
Note, however, that within a given cycle various configurations are used.

Omkar Bait 42 minutes ago
Yes, the configuration file is tagged as cycle_number.x, index x denotes different configurations within a cycle. I guess this can be used as an input parameter.

Fix BDSF Build for MacOS

BDSF building for MacOS is not working.
The support for mac-os is not crucial. Also the team at PyBDSF does not really "support" mac-os. The installation is more of a community effort.

Implement a Proof-of-Concept "Quality Benchmark"/Analysis Module

Implement a very basic, first analysis module of the pipeline. Ideas:

  • Compute difference of reconstructed image vs. ground-truth image
  • Compute difference of estimated parameters to ground-truth-parameters of synthetic sources
  • Record execution times, and compare with a baseline (my algorithm vs. CLEAN)

What type of data storage does CSCS provide?

Can different users use the same data storage? Is there already something implemented like this on CSCS.
This is useful for when massive files are interesting for different users.

Enable the pipeline to be used for testing/development by Slava's group

Contact: Slava Voloshynovskiy http://sip.unige.ch/team/prof-sviatoslav-voloshynovskiy/

Slava would like to use the pipeline for various algorithm development. Having a scientifically accurate simulation & test pipeline would be interesting to his team. Kinds of algorithms they would like to develop:

  • Image reconstruction
  • Parameter estimation

They are not radio-astronomy specialists. Don't know file formats, tools, problems, file sizes, .... The pipeline should make it easy to focus on parameter estimation only (Visibility or Image --> Parameter), without having to deal with other aspects.

  • Make sure there is a representative data set for development with Ground Truth
  • Contact Slava
  • Discuss his team's requirements (Runtime, Programming Languages, Platform, Data Amounts, Benchmarks, ...)
  • Implement MVP/prototype

Version controlling of packages

Currently the latest versions of the used packages like RASCIL or Oskar are installed. However, these are still under development. So we have no exact control over it and it may cause problems for our framework.

Prepare first data set

The pipeline should come with a set of test data. Could be simulated, real-world data, or both.

For #8 we'd like to have realistic, real-world data sets and simple, synthetic tests.

Fix test errors in branch "tm_configurations"

Your changes in the (renamed) "tm_configurations" branch lead to test failures. Can you fix the tests and create another PR?

Traceback (most recent call last):
  File "/__w/Karabo-Pipeline/Karabo-Pipeline/karabo/test/test_telescope.py", line 36, in test_read_meerkat_file
    self.assertEqual(len(tel.stations), 64)
AssertionError: 80 != 64

Documentation for Software/Algorithm developers

We should make sure that developers on other teams have a smooth entry when they work with our code.

Brief explanations how Karabo can be extended with new

  • Data sets
  • Imaging algorithms
  • Source Detection algorithms
  • RFI/noise simulations

Visibility output format

How are simulated visibilities stored/output? What is the API and File Format we should use initially? What existing formats are there? Pros/Cons?

tools21cm

UZH provides a simulation tool for 21cm radiation and simulation of foreground signals

  • should we consider this package for our pipeline?
  • is this kind of radiation covered with the UZH simulation of WP-C331?

Sky simulation output format

What is the interface between the Sky simulation and the Telescope module? What file format/API should we use initially? What are the pros/cons of the already existing formats?

Enable the pipeline to be used for Bluebild testing/development

Contact: Emma Tolley (https://people.epfl.ch/emma.tolley)

Emma would like to use the pipeline for Bluebild development. Having a scientifically accurate simulation & test pipeline would be interesting to her team. Bluebild does the "Visibility --> Image" conversion, probably without calibration. Emma would like the pipeline already as-is as starting point.

Add data sets from Olga Taran

From Olga Taran ([email protected]):

[...] we expect to have the first simulated data (about 1 thousand) next week.
With respect to the ms files I am not sure because they are quite heavy and in this case we need to find the proper storage
since you can not have direct access to our servers due to the University rules and it might take some time.
At the same time, the true sky model, the dirty and clean images we will share with you via one of our clouds since they are less heavy. The produced images will be of size 512 x 512 pixels and the sources will be of different intensity and shape.

Generated telescope.tm as tmp does not work

I've seen that the temporary construction of a telescope.tm file and set it in the OSKAR simulation (simulation.set_telescope_model in interferometer.py) leads to incorrect visibilities (I've not seen any sources in the produces image from this visibilities). However, when I take the same constructed telescope.tm file and refer to it in the setting tree I got the correct visibilities. Therefore, I made a hotfix where I just refer to the telescope.tm file in the data folder.

Bluebuild Team Requirements

  • Runtime ?
  • Programming Languages
    • C++, CUDA and Python
  • Platform
    • Linux (seems so)
  • Data Amounts
  • Benchmarks
    • Simulated data, single timestep

imaging with bigger cellsize

During my tests with source detection, I noticed that when I wanted a larger image, the sources disappeared. I did this with a filtering of the GLEAM survey with ra=250 dec=-80 and a filtering of radius 0.55 degree from the phase center. The imaging parameters were 2048 pixels and a cellsize of 1.1079e-5 (1.1079e-52048180/pi = 1.3 degree, which should completely cover the 1.1 degree from the filtering). However, when I used a cellsize of 3.8785e-5 (about 4.5 degrees), the sources were present at the calculated locations as expected. If I took a cellsize of 1.70044231e-5 (ca 2 degree) the sources were at the expected locations, but they had an interesting pattern (major axis FWHM, minor axis FWHM and position angle were also provided).

This was only the result of a few tests, i.e. it was not tested extensively. Therefore it is worthwhile to reproduce this and find out what the reason is. My guess is that I did not understand all the imaging parameters and therefore did not set them correctly.

Add source detection and benchmarking metrics

Research of existing source detection packages to detect sources in the deconvolved images.
Also add possibility to benchmark the source detection. Research existing benchmark metrics for source detection

  • Implementation
  • Use telescope module
  • Package imaging
  • Package sky model
  • WCS conversion without image generation
  • implement automatic source assignment of predicted sources with ground truth
  • Imaging
  • Source detection (https://www.astromatic.net/software/sextractor/ or PyBSDF)
  • #87
  • #92

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.