i4ds / karabo-pipeline Goto Github PK
View Code? Open in Web Editor NEWThe Karabo Pipeline can be used as Digital Twin for SKA
Home Page: https://i4ds.github.io/Karabo-Pipeline/
License: MIT License
The Karabo Pipeline can be used as Digital Twin for SKA
Home Page: https://i4ds.github.io/Karabo-Pipeline/
License: MIT License
The data directory env variable is not picked up in Jupyter kernels. Provide a better fix, like putting the directory at the correct destination without the need for a variable.
Idea by Daniel Schärer ([email protected]): Use the simulations that were used for the SKA Data Challenge as well. The people organizing the challenge have probably invested a lot of time to make sure that the data is representative.
Maybe we could ask those people for the input data, re-create their instrument simulation and benchmarking?
When using a distributed cluster, all workers must also be able to get certain files. For example for the rascil_imager the workers need the ms file on their file system. When working locally you can just set the directory.
However for actual distributed clusters some solution would be good, also because current api's from dask for uploading files to the workers are not satisfactory.
For example client.upload_file uploads a file to all workers, however this is first of all slow if there are too many workers and an ms file is a directory so it cannot be uploaded with this function.
Maybe we need more functions , like upload directory (recursive file upload) or some sort of a better distributed file system integration?
The ALMA configuration should not have this many antennas. This needs fixing.
We should get answers to these questions:
We should come up with a (written-down) vision how we could/should use the CSCS infra.
The Linux builds for OSKAR with Cuda are working, but when they are then imported on a host machine. The linker cannot find the cuda runtime libraries.
Investigate why the libraries cannot be found on the host even if the cudatoolkit is added as a dependency for conda.
If the libraries are found it should be possible to run the OSKAR version with CUDA on non-cuda enabled devices. As the OSKAR implementation always checkes if CUDA devices are actually present.
For now the OSKAR conda packages are working but only with CPU computing, which is a bit slower.
The radio sources from GLEAM survey needs to be transferred to the all-sky Healpix coordinates.
(From Marc Audard) Currently, the coordinate axes in the Source Detection notebook is unexpected for scientists: They usually expect the x-axis in the opposite direction. In our notebooks we try to work around this by negating coordinates instead, but then plot them with a positive x-axis:
Instead we should not negate coordinates and instead flip the coordinates in our plots.
We should provide an introduction & overview for science users.
Create a module for simulation with specificed functions to encapsulate the current usage of OSKAR and make it easer for use.
Slava is interested in classification of galaxies. Emma knows about groups that are looking for asymmetric galaxies. For those kinds of problems, our pipeline should support development and benchmarking of classification and parameter estimation models.
Marta Spinelli has code that does RFI modelling from satellites (maybe other sources as well). Having RFI is important for intensity mapping, probably for other use cases as well.
Omkar Bait Today at 3:58 PM
Hi All. Is it possible to change the ALMA telescope configuration depending on the ALMA cycle id in the current pipeline?
4 replies
Rohit Sharma 1 hour ago
Yes, definitely. Just point me to the link of the configurations.
Omkar Bait 45 minutes ago
Thanks. This link has all the configurations upto Cycle 8: https://almascience.eso.org/tools/casa-simulator
Mark Sargent 44 minutes ago
Note, however, that within a given cycle various configurations are used.
Omkar Bait 42 minutes ago
Yes, the configuration file is tagged as cycle_number.x, index x denotes different configurations within a cycle. I guess this can be used as an input parameter.
Yes, the most ideal example would be to look at the all-sky radio maps. Here is the link to download fits files: https://lambda.gsfc.nasa.gov/product/foreground/fg_2014_haslam_408_get.cfm
We can use these files for the sky module to simulate the sky.
Originally posted by @rohitcbscient in #1 (comment)
BDSF building for MacOS is not working.
The support for mac-os is not crucial. Also the team at PyBDSF does not really "support" mac-os. The installation is more of a community effort.
In addition, @rohitcbscient has downloaded some observations in the .fits format of "Cygnus A" & "Centaurus A" from:
We should be able to put them into Oskar somehow at some point
Originally posted by @Lukas113 in #5 (comment)
We should have unit tests for our features, integrated with the CI pipeline.
Integrate https://github.com/pigimonaco/Pinocchio into Karabo. @PascalHitz knows the Pinocchio Code and how to use it.
Goals:
Slava suggested we send him our initial test data sets (#5), and his team should be able to create new synthetic data with a similar distribution that might be interesting for more tests.
Implement a very basic, first analysis module of the pipeline. Ideas:
Does DASK suffice to make use of CSCS infrastructure? Do we need more?
Can different users use the same data storage? Is there already something implemented like this on CSCS.
This is useful for when massive files are interesting for different users.
The stimela interface is mentioned as an issue example for a system to adapt it with more algorithms. see.: https://github.com/ratt-ru/Stimela
Contact: Slava Voloshynovskiy http://sip.unige.ch/team/prof-sviatoslav-voloshynovskiy/
Slava would like to use the pipeline for various algorithm development. Having a scientifically accurate simulation & test pipeline would be interesting to his team. Kinds of algorithms they would like to develop:
They are not radio-astronomy specialists. Don't know file formats, tools, problems, file sizes, .... The pipeline should make it easy to focus on parameter estimation only (Visibility or Image --> Parameter), without having to deal with other aspects.
Currently the latest versions of the used packages like RASCIL or Oskar are installed. However, these are still under development. So we have no exact control over it and it may cause problems for our framework.
The pipeline should come with a set of test data. Could be simulated, real-world data, or both.
For #8 we'd like to have realistic, real-world data sets and simple, synthetic tests.
WSClean is nicely packaged with Spack in http://git.astron.nl/RD/schaap-spack (Stefano Corda @ EPFL knows more, Manuel Stutz is in contact)
Your changes in the (renamed) "tm_configurations" branch lead to test failures. Can you fix the tests and create another PR?
Traceback (most recent call last):
File "/__w/Karabo-Pipeline/Karabo-Pipeline/karabo/test/test_telescope.py", line 36, in test_read_meerkat_file
self.assertEqual(len(tel.stations), 64)
AssertionError: 80 != 64
Add a new module named Imaging and define interfaces for imaging, with rascil as an implementation.
Hide dask features and other implementation details in our implemantion.
Currently, rascil is only installable via pip. Create a Conda package for rascil, so that the installation of the karabo-pipeline is truly a one-liner.
We should make sure that developers on other teams have a smooth entry when they work with our code.
Brief explanations how Karabo can be extended with new
How are simulated visibilities stored/output? What is the API and File Format we should use initially? What existing formats are there? Pros/Cons?
UZH provides a simulation tool for 21cm radiation and simulation of foreground signals
The docker image is currently called ghcr.io/i4ds/ska:main
. Should be updated to "karabo" or "karabo-pipeline"/...
From Omkar Bait: Also, can we add random and systematic noise (from atmosphere) to the visibilities, particularly for ALMA like frequencies? For example, the CASA simulator has a pwv parameter for noise from water vapour: https://casadocs.readthedocs.io/en/stable/api/tt/casatasks.simulation.simalma.html#casatasks.simulation.simalma
What is the interface between the Sky simulation and the Telescope module? What file format/API should we use initially? What are the pros/cons of the already existing formats?
https://github.com/darafferty/losito
Can this package be useful for us?
Contact: Emma Tolley (https://people.epfl.ch/emma.tolley)
Emma would like to use the pipeline for Bluebild development. Having a scientifically accurate simulation & test pipeline would be interesting to her team. Bluebild does the "Visibility --> Image" conversion, probably without calibration. Emma would like the pipeline already as-is as starting point.
From Olga Taran ([email protected]):
[...] we expect to have the first simulated data (about 1 thousand) next week.
With respect to the ms files I am not sure because they are quite heavy and in this case we need to find the proper storage
since you can not have direct access to our servers due to the University rules and it might take some time.
At the same time, the true sky model, the dirty and clean images we will share with you via one of our clouds since they are less heavy. The produced images will be of size 512 x 512 pixels and the sources will be of different intensity and shape.
I've seen that the temporary construction of a telescope.tm file and set it in the OSKAR simulation (simulation.set_telescope_model in interferometer.py) leads to incorrect visibilities (I've not seen any sources in the produces image from this visibilities). However, when I take the same constructed telescope.tm file and refer to it in the setting tree I got the correct visibilities. Therefore, I made a hotfix where I just refer to the telescope.tm file in the data folder.
Slava suggested we should test the source detection workflow directly on the dirty image, without running CLEAN.
Create new functions in the telescope.py to, so the users can get the SKA Low and SKA mid telescope configurations with a single function call.
During my tests with source detection, I noticed that when I wanted a larger image, the sources disappeared. I did this with a filtering of the GLEAM survey with ra=250 dec=-80 and a filtering of radius 0.55 degree from the phase center. The imaging parameters were 2048 pixels and a cellsize of 1.1079e-5 (1.1079e-52048180/pi = 1.3 degree, which should completely cover the 1.1 degree from the filtering). However, when I used a cellsize of 3.8785e-5 (about 4.5 degrees), the sources were present at the calculated locations as expected. If I took a cellsize of 1.70044231e-5 (ca 2 degree) the sources were at the expected locations, but they had an interesting pattern (major axis FWHM, minor axis FWHM and position angle were also provided).
This was only the result of a few tests, i.e. it was not tested extensively. Therefore it is worthwhile to reproduce this and find out what the reason is. My guess is that I did not understand all the imaging parameters and therefore did not set them correctly.
Research of existing source detection packages to detect sources in the deconvolved images.
Also add possibility to benchmark the source detection. Research existing benchmark metrics for source detection
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.