Giter Site home page Giter Site logo

frb's Introduction

Search Fast Radio Bursts in Radioastron-project archive data

Installing

$ git clone https://github.com/akutkin/frb.git
  • Create & activate virtual environment for installing dependencies
$ cd frb
$ wget https://github.com/pypa/virtualenv/archive/master.zip
$ unzip master.zip
$ python2 virtualenv-master/virtualenv.py ./venv
$ source venv/bin/activate
  • Install dependencies inside virtual environment
$ pip2 install scipy astropy scikit-learn scikit-image matplotlib h5py sqlalchemy

Finding injected pulses in one file

  • Download sample data
$ cd examples
$ wget https://www.dropbox.com/s/ag7rz88kjnblqzv/data.tgz
$ tar -xvzf data.tgz
  • Run script
$ python2 caching.py
  • Deactivate virtual environment
$ deactivate

This script will inject pulses in raw data and search for them using three algorithms. Each one begins with non-coherent de-dispersion. First one shear t-DM plane on universal value and average frequencies to find peaks in time and then DM. Last two use pre-processing the resulting t-DM plane to reduce the noise and exclude some extended regions of atypicaly high amplitude. Blobs of high intensity are found. Next, 2D elliptical gaussians are fitted to regions of individuals blobs in originalt-DM plane. Second algorithm chooses candidates with auto-selected threshold gaussian amplitudes and some other parameters of gaussians that are specific to narrow dispersed pulses. Third algorithm uses artificially injected pulses to train Gradient Boosting Classifier[Currently didn't used]. It uses features of fitted gaussians as well as numerous blob properties to build desicion surface in features space.

Searching pulses using fitted elliptical gaussians in t-DM place is much faster then using Gradient Boosting Classifier. It is because later needs training sample to be constructed & analyzed. Also it finds best parameters of classifier using grid of their values. All these steps (training of classifier) must be done only once for small portion of data.

Currently, amplitudes of injected pulses in training phase are set by hand. It will be fixed soon by analyzing amplitudes of noise pulses in apriori pulse-free small chunk of data.

Script will create png plots of found candidates in original dynamical spectra & t-DM plane in frb/examples directory and dump data on found candidates and data searched in frb/frb/frb.db SQLite database. It can be easily viewed/queried in Firefox with SQLite Manager addon.

Process experiment

  • Login to frb computer with your credientials
  • Clone frb repository, install virtual environment, activate it & install dependencies inside virtual environment (see Installing)
  • Download experiment CFX-file
$ cd frb/frb
$ wget https://www.dropbox.com/s/8pcmgmed36fo8uy/RADIOASTRON_RAKS12EC_C_20151030T210000_ASC_V1.cfx
  • Compile my5spec program for converting raw Mk5 data to txt-format
$ cd ../my5spec; /usr/bin/make
  • Run example
$ cd ../frb
$ python2 pipeline.py

Script processes experiment (raks12ec, C-band, Noto & Yebes radiotelescopes). Results on data searched & pulse candidates are dumped to frb/frb/frb.db SQLite database. Finally, script check DB to find close (in time & DM) pulse candidates among searched antennas.

Using docker

There's no need to install any packages except docker itself

  • Install docker package on your machine
  • Run container with image and mount it to some directory host_dir on your host machine. It will take some time to download image with preinstalled OS & software.
$ docker run -it -v host_dir:/home/frb-dev/data ipashchenko/frb /bin/bash
  • Inside container load data and start script
# cd frb-dev/examples
# wget https://www.dropbox.com/s/ag7rz88kjnblqzv/data.tgz
# tar -xvzf data.tgz
# python2 caching.py
  • After it's execution copy results (images & DB file) to mounted dirrectory
# cp *.png ../data/.
# cp ../frb/frb.db ../data/.
  • Results can be viewed in directory host_dir on your host machine.

TODOs

  • Currently, my5spec fails to read raw data with some format (see issue #7) and fails to read ends of files (see issue #13).

frb's People

Contributors

ipashchenko avatar akutkin avatar ivstog avatar

Watchers

 avatar  avatar

frb's Issues

my5spec writes out 0 spectra

In [20]: ds = m5.create_dspec(**dspec_params)
../my5spec/./my5spec -a 1 -n 64 -l 0.5 /home/ilya/code/frb/data/raw_data/raks12er/tr/rk12er_tr_no0002 MKIV1_4-256-4-2 /home/ilya/code/frb/data/rk12er_tr_no0002_a1n64l0.5_dspec
Start at:
mjd = 53679
sec = 47701
ns  = 157500000.000000
nint = 250
Real time step = 1.000000 ms
mark5_stream_decode failed. End of file?
0 spectra (0.000 sec) were wrote in each file

Could it be that frb-computer doesn't have mark5access installed?

Handle ends of Mark5 files

When processing last chunk of raw data file:

Start at:
mjd = 325
sec = 79102
ns  = 0.000000
nint = 250
Real time step = 1.000000 ms
mark5_stream_decode failed. End of file?
98999 spectra (98.999 sec) were wrote in each file

This cause AssertionError now. Choose n_t depending on number of wrote spectra.

Dockerize

Create image with pre-installed dependencies & frb repo. Mount data volume & connect DB

Huge repo size

Commit f1927c0 removed big m5 file, but seems it still in repo hist, so repo size is 100MB now)

Try elliptical hough transform

skimage.transform.hough_ellipse for t-DM plane. But first one should find perimeter of ellipses. This could be the issue because if threshold is too low, there will be no ellipses.

Refactor M5 to follow API of subsequent processing

Could you refactor raw_data.py to make M5 instances to return not only numpy array with chunk of dynamical spectra, but also metadata dictionary with the following keys:n_nu, n_t, nu_0, d_nu, d_t, t_0, exp_code, antenna, freq, band ,pol? I finally merged metadata with dynamical spectra array in one class. After creating DynSpectra instances with array & metadata we don't need anything else for subsequent processing.

It could be cool if method that does this will return generator - not to store all arrays in memory but creating them from Mark5 only when they are needed in small chunks (like 5 minutes or even less).

Another option - to put data in HDF5 format that can save metadata too. See utils.save_hdf5 as example.

Class for DSP + meta-data

What about to join dsp array and it's metadata in one class? That class should have methods to de-disperse dsp array, average it, etc. Check frames.Frame class that handles dynamical spectra operations. If your code can create instance of that class from m5 files then we can pass this instance right to software for preprocessing (optionally, e.g. de-dispersion) and searching candidates (i've just created Searcher class that implements something like that:)

Mask noisy frequency channels

In test data max. frequency channel has much higher amplitude for all times. Check results with masking this channel out.

More features to supply to classifiers

Currently only 9 and only 2-3 of them are informative (for GBC). Should i use feature selection or let the algo decide what are the most informative ones (like GBC)?

m5spec failes...

CalledProcessError: Command '['../my5spec/./my5spec', '-a', '1', '-n', '64', '-l', '300', '-o', '900', '/mnt/frb_data/raw_data/2015_303_raks12ec/nt/rk12ec_nt_no0004.m5a', 'Mark5B-256-4-2', '/home/ilya/code/frb/frb/rk12ec_nt_no0004_a1n64l300o900_dspec']' returned non-zero exit status 1

WTF? End of file? See issue #11

testing DB+search use case

First, one need to create DB with related tables. Just run candidates.py and it will create frb.db in
in code's directory (the location of DB file can be easily changed). Then run search_candidates.py. It will create fake dynamical spectra, add some fake FRBs, process it with de-dispersion, different pre-processing and search options (using caching of de-dispersion and pre-processing results) and put results in frb.db in tables searched_data & candidates. These 2 tables are related via one searched dynamical spectra - many candidates relation. It is SQLite DB and can be viewed with Firefox extension SQLite Manager.

``M5.create_dspec`` error

OSError: [Errno 2] No such file or directory

I changed dspec_path in raw_data.py to my local FRB /home/ilya. It trying to use outfile - /home/ilya/rk12ep_ar_no0001_a1n64l0.5_dspec and finally raises OSError.

Better training sample creation

Somehow choose amplitudes of training sample to be not so big and not so low to be detected. Probably train classifiers with different amplitude training sample. Should amp be a parameter to some method of PulseClasiifier?

Need configuration file or smth.

To store DB location, number of frequency channels to make, time step. Frequency, channel width could be found/calculated. Would be nice to have local (test) regime and production on frb-host.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.