chimefrb / fitburst Goto Github PK

View Code? Open in Web Editor NEW

8.0 6.0 0.0 746 KB

An open-source package of utilities for direct modeling of radio dynamic spectra.

Home Page: https://chimefrb.github.io/fitburst/

License: MIT License

Python 100.00%

frbs pulsars radio-astronomy

fitburst's Introduction

fitburst

This repository contains functions, objects, and scripts for modeling dynamic spectra of dispersed astrophysical signals at radio frequencies.

Installation

Currently, fitburst can be installed by cloning the repo and running pip in the following way:

pc> git clone [email protected]:CHIMEFRB/fitburst.git
pc> cd fitburst
pc/fitburst> pip install . # add the --user option if you're looking to install in your local environment.

Usage and Documentation

Please refer to the documentation linked above to find desciptions on the codebase and examples for interacting with it.

Publication

The theory behind the modeling and analysis routines is presented in a publication currently under review, but available on the arXiv. This publication includes a variety of fitting examples and discussions on the treatment of biasing effects (e.g., intra-channel smearing from pulse dispersion) that can be accounted for within fitburst. If you use this codebase and publish results obtained with it, we ask that you cite the fitburst publication using the following BibTex entry:

@article{fpb+23,
       author = {{Fonseca}, Emmanuel and {Pleunis}, Ziggy and {Breitman}, Daniela and {Sand}, Ketan R. and {Kharel}, Bikash and {Boyle}, Patrick J. and {Brar}, Charanjot and {Giri}, Utkarsh and {Kaspi}, Victoria M. and {Masui}, Kiyoshi W. and {Meyers}, Bradley W. and {Patel}, Chitrang and {Scholz}, Paul and {Smith}, Kendrick},
        title = "{Modeling the Morphology of Fast Radio Bursts and Radio Pulsars with fitburst}",
      journal = {arXiv e-prints},
     keywords = {Astrophysics - High Energy Astrophysical Phenomena, Astrophysics - Instrumentation and Methods for Astrophysics},
         year = 2023,
        month = nov,
          eid = {arXiv:2311.05829},
        pages = {arXiv:2311.05829},
          doi = {10.48550/arXiv.2311.05829},
archivePrefix = {arXiv},
       eprint = {2311.05829},
 primaryClass = {astro-ph.HE},
       adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv231105829F},
      adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Credits

All authors of the fitburst papers are the founding developers of fitburst, with Emmanuel Fonseca leading the development team. We welcome novel and meaningful contributions from interested users!

This package was built using namoona.

fitburst's People

Contributors

Stargazers

Watchers

fitburst's Issues

ensure parameter/attribute naming is consistent

Part of this lies beyond fitburst but it would be ideal to have a strict set of parameters (and their naming conventions!) required at minimum listed somewhere.

For context, this came up when trying to use the generic reader to load in some extracted CHIME/Pulsar filterbank data and follow along one of the example scripts. The model setup and containing object attribute naming conventions seems to have changed slightly. If we can set down a specification for this, that will make writing the reader classes (and whatever external code is used to make the data input) much easier.

Use Information criterion or Bayesian evidence instead of chi squared to determine best model

When fitting with multiple models where we vary the number of peaks, currently, we use the chi squared to determine the best model with the least parameters. Better practice would be to change this to use Akaike Information Criterion (AIC) or some other IC OR, if not too difficult to extract, the Bayesian evidence.

DM keyword not working as expected with new baseband pipeline

Supplying the DM to make_input should skip the S/N maximising DM correction, and straight go to DM_phase (if fit_DM = True, otherwise, the DM is just the input DM throughout).
Currently, even if a DM is input, it is passed to get_snr which performs S/N maximising DM optimization. This may yield a DM that is so off that DM_phase is unable to fix it afterward (since it only has range +/- 0.15 ). Example where this occurs: event 203935225.

ValueError: x0 is infeasible. Singlebeam file 71665813

/data/user-data/ksand/baseband-analysis/baseband_analysis/core/signal.py in get_weights(power_org, f_id, diagnostic_plots, spectrum_lim)
440
441 if spectrum_lim:
--> 442 f_lim = get_spectrum_lim(f_id, power, diagnostic_plots=diagnostic_plots)
443 if f_lim[1] - f_lim[0] < 20:
444 # Ignore if less than 20 channels are left

/data/user-data/ksand/baseband-analysis/baseband_analysis/core/signal.py in get_spectrum_lim(f_id, power, diagnostic_plots)
350 try:
351 popt, pcov = curve_fit(
--> 352 model_gaussian, f, spect, p0=p0, bounds=bounds, maxfev=1e5
353 )
354 except RuntimeError:

~/.local/lib/python3.7/site-packages/scipy/optimize/minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
799
800 res = least_squares(func, p0, jac=jac, bounds=bounds, method=method,
--> 801 **kwargs)
802
803 if not res.success:

~/.local/lib/python3.7/site-packages/scipy/optimize/_lsq/least_squares.py in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
806
807 if not in_bounds(x0, lb, ub):
--> 808 raise ValueError("x0 is infeasible.")
809
810 x_scale = check_x_scale(x_scale, x0)

ValueError: x0 is infeasible.

infinite reference frequency causes errors

If the reference frequency is stated as not finite (e.g., np.inf) then the following errors appear:

{'dm': [106.4], 'width': [0.0019660800000000003], 'arrival_time': [58850.08545024392], 'reference_freq': [inf], 'amplitude': [1.0]}
/home/bmeyers/code/CHIME-Pulsar/fitburst/fitburst/routines/ism.py:108: RuntimeWarning: divide by zero encountered in power
  sc_time = sc_time_ref * (freq / freq_ref) ** sc_idx
/home/bmeyers/code/CHIME-Pulsar/fitburst/fitburst/routines/ism.py:108: RuntimeWarning: invalid value encountered in multiply
  sc_time = sc_time_ref * (freq / freq_ref) ** sc_idx
/home/bmeyers/code/CHIME-Pulsar/fitburst/fitburst/routines/spectrum.py:55: RuntimeWarning: divide by zero encountered in log
  log_freq = np.log(freqs / freq_ref)
Traceback (most recent call last):
  File "fitburst_example_simulated_data.py", line 31, in <module>
    fitter.fit(data.times, data.freqs, data.data_full)
  File "/home/bmeyers/code/CHIME-Pulsar/fitburst/fitburst/analysis/fitter.py", line 50, in fit
    lsfit = least_squares(
  File "/home/bmeyers/code/CHIME-Pulsar/fitburst/venv/lib/python3.8/site-packages/scipy/optimize/_lsq/least_squares.py", line 824, in least_squares
    raise ValueError("Residuals are not finite in the initial point.")
ValueError: Residuals are not finite in the initial point.

fitburst should realize this and then scale things back to a sensible default, perhaps 1 GHz?

Extracting width per frequency channel (VLBI)

The purpose of this feature is to extract the width per frequency channel, giving an start and stop timestamp. Then this information will be used within the VLBI pipeline and applied to the outrigger sites. Applying this custom mask to each data set will reduce the noise and improve the correlation strength.
The idea is to run fitburst instantly once an event has been triggered and received by the outrigger sites, optimize DM and share these set of parameters to the outriggers sites (or perhaps a common site?). Then apply mask and correlate.

Run fitburst with baseband data (@ketansand branch)
Currently the software only returns a single width parameter, find and extract the full matrix of widths params per channel
Define width per frequency given a central timestamp (per frequency channel) and apply the geometric delay correction for the outrigger sites
Test first with PSR B0531+21 and older data sets, check whether the correlation strength improves
Test with the VLBI paper FRB, check whether the correlation strength improves
Integrate this feature in the main fitburst package
Integrate this feature in the future VLBI pipeline and automate the process

Small scattering time close to the threshold to change to Gaussian profile model causes a sharp cut in the model.

Problem: when the scattering time is small and close to the threshold to change models (https://github.com/CHIMEFRB/fitburst/blob/main/fitburst/analysis/model.py#L162), compute_profile_pbf malfunctions and creates an unphysical sharp cut in the model spectrum.
Examples:
Events 65547659, 163362088 (R3) and 151677988 (R3).

(Temporary) solution:
Here: https://github.com/CHIMEFRB/fitburst/blob/main/fitburst/analysis/model.py#L162
change 0.05 to 0.15. Results for the same three events after implementing this solution:

have DM be a fittable parameter

as of this posting, the new fitburst can fit all per-burst morphological parameters (spectral index/running, arrival time, width, and amplitude). however, we need to add DM as a fittable parameter, which basically means either replicating what the production/on-site fitburst does or using a more efficient algorithm.

whatever is decided, this should largely entail (a) supplying the LSFitter (and, by extenstion, the SpectrumModeler) with the appropriate timestamp array in order to dedisperse to the loaded DM, and (b) adding a for-loop and a call to the DM time-delay function in fitburst/routines/. i will experiment with this over the next week or two, with the goal of getting a working version by the end of February.

have CHIME/Pulsar filterbank data be read into `fitburst`

hi @bwmeyers, @cherryng,

it is totally possible (and part of my intention) to have fitburst be run on CHIME/Pulsar data. this would essentially involve writing a separate DataReader class in the fitburst/backends/ area for CHIME/Pulsar, which will handle the I/O and gathering of metadata needed for the rest of fitburst processing.

it will be best to first decide the format of the input data. for example, did you want fitburst to read directly from filterbank files? or is it instead possible to have the burst be extracted and stored into, for example, a .npz format? also, would you want to stash a dedispersed spectrum into a file as a matrix? or instead record trimmed, raw (i.e., not dedispersed) fitlerbank centered on the burst?

finally, is it possible to get a sense of what metadata can be provided to fitburst by input filterbank data? for example, we can supply a best-guess DM, arrival time at some reference frequency and boxcar width?

once we have a sense of what the input data can look, then we can chat about how to move forward on developing the structure necessary for getting fitburst to run on Pulsar data.

Clarity of reference times in documentation

One thing I noted in the recent PR #19 was there's some ambiguity in the metadata requested and how it's used.

Specifically, the arrival times are to be given in seconds, but seconds since what reference epoch? Is that what the times_bin0 is used for? What time format is times_bin0 assumed to be in? Is the arrival_time relative to times_bin0 or some other reference epoch? Internally we know it's relative to times_bin0 since we create a times array from 0 to num_times * res_time seconds, but best to broadcast that.

e.g., "a floating-point scalar indicating the value of time bin at index 0, in seconds" is ambiguous - we should be very explicit: is it Unix time, MJD, JD, etc.?

add functionality to downsample data in frequency and/or time

the current fitburst runs on the native resolution of CHIME/FRB data. add the appropriate functions and method (presumably to the DataReader object) to allow for downsampling in frequency and/or time, prior to least-squares fitting.

Component-wise DM correction

Problem: Certain multi-component bursts seem to have components with different DMs. It would be useful to add a feature to correct DM for every component separately. I think this feature can only work when there is some space around each component. If the components are too close to one another, this may be more tricky. I think it should be like a on/off feature (off by default) that someone can turn on if they notice the DMs are different and the pulses are far enough apart.

Example: Event 65547659 (see image)

Fitburst file location:/data/user-data/danielabreitman/fitburst_65547659.npz on frb-analysis.

Add option to add parameters in run_fitburst function

Options to add parameters for Fitburst in case it doesn't converge. So that we don't have to go into the code and make changes.

RuntimeError: Signal not found in the data - Singlebeam file 171855709

/usr/local/lib/python3.7/site-packages/fitburst/backend/baseband.py in fit_profile(path, nwalkers, nchain, ncores, time_range, downsample, downsample2, DM, fit_DM, logl, save, spectrum_lim, fill_missing_time, show_chains, peaks, diagnostic_plots)
252 # Fit pulse profile
253 event_id = path.split('/')[-2].split('_')[-1]
--> 254 data, freq_id, freq, power, valid_channels, DM, DM_err, downsampling_factor, profile, t_res, peak_times, start, full_power = cut_profile(path, diagnostic_plots = diagnostic_plots, time_range = time_range, fill_missing_time = fill_missing_time, downsample = downsample, downsample2 = downsample2, DM = DM, fit_DM = fit_DM, peaks = peaks, spectrum_lim = spectrum_lim, return_full = True)
255 profile_pars, profile_pars_errs = fit_emg_mcmc(profile, t_res * np.arange(len(profile)), peaks=peak_times, logl = logl, res = t_res, nwalkers=nwalkers,
256 nchain=nchain, event_id=event_id, ncores=ncores, show_chains = show_chains,

/usr/local/lib/python3.7/site-packages/fitburst/backend/baseband.py in cut_profile(path, downsample, downsample2, peaks, time_range, DM, fit_DM, spectrum_lim, fill_missing_time, return_full, diagnostic_plots)
82 diagnostic_plots=False,
83 spectrum_lim = spectrum_lim,
---> 84 return_full=True
85 )
86 dm_max_range = 0.3

/data/user-data/ksand/baseband-analysis/baseband_analysis/analysis/snr.py in get_snr(data, DM, diagnostic_plots, w, valid_channels, time_range, return_full, spectrum_lim, DM_range, downsample, refine_RFI, fill_missing_time, thres_mean, thres_std, doublecheck_RFI, DM_step, raise_missing_signal, check_old_process, check_channel_number)
255 error_msg = "Signal not found in the data"
256 if raise_missing_signal:
--> 257 raise RuntimeError(error_msg)
258 else:
259 log.warning(error_msg)

RuntimeError: Signal not found in the data

`mkdocs` not required for install

The packages listed in the setup.cfg should list the bare minimum required so that fitburst installs and is runnable. mkdocs does not fit that picture as it's purpose it to just generate the documentation, which can be found online.

(As it happens it also causes issues on Cedar when trying to install several newer versions of system-default packages.)

ValueError: array must not contain infs or NaNs

This error comes up when we do spectral fitting

`ValueError                                Traceback (most recent call last)
/tmp/ipykernel_11914/3411376667.py in <module>
     27            save = 'pm_', # Prefix for saving MCMC results
     28            diagnostic_plots=True,
---> 29            output = fit_file #prefix for fitburst npz input file
     30           )

/usr/local/lib/python3.7/site-packages/fitburst/pipelines/make_input_file.py in make_input(path, peaks, nwalkers, nchain, ncores, show_chains, DM, fit_DM, logl, save, diagnostic_plots, downsample2, downsample, spectrum_lim, fill_missing_time, ref_freq, time_range, output)
     71                    downsample = downsample, DM = DM, fit_DM = fit_DM, logl = logl, peaks = peaks,
     72                    fill_missing_time = fill_missing_time, save = save, spectrum_lim = spectrum_lim,
---> 73                    time_range = time_range, show_chains=show_chains, diagnostic_plots=diagnostic_plots)
     74 
     75     params, scattering_t = np.array(params[:-1]), params[-1]

/usr/local/lib/python3.7/site-packages/fitburst/backend/baseband.py in fit_profile(path, nwalkers, nchain, ncores, time_range, downsample, downsample2, DM, fit_DM, logl, save, spectrum_lim, fill_missing_time, show_chains, peaks, diagnostic_plots)
    314         spectrum_pars, spectrum_errs = fit_rpl_mcmc(spectrum, freq, peaks=spect_peaks, res = 400. / 1024., nwalkers=100, 
    315                  nchain=5000, event_id=event_id, ncores=ncores, show_chains = show_chains,
--> 316                  diagnostic_plots=diagnostic_plots
    317                  )
    318         amps[i] = spectrum_pars[0]

/usr/local/lib/python3.7/site-packages/fitburst/analysis/mcmc_fitter.py in fit_rpl_mcmc(profile, xvals, peaks, res, nwalkers, nchain, ncores, ICs, event_id, show_chains, diagnostic_plots)
    363     for i in range(len(profile)):
    364         print(profile[i])
--> 365     fit, popt, popt_e,pcov = fitter.fit_LS(profile, xvals, peaks, event_id, ICs = ICs, model='rpl', res=res)
    366     tau, flat_samples, final_pars, final_errs = mcmc(popt, 
    367         profile, xvals, res,event_id = event_id, nwalkers=nwalkers, nchain=nchain,

/usr/local/lib/python3.7/site-packages/fitburst/analysis/fitter.py in fit_LS(profile, xvals, peaks, event_id, model, res, ICs, tight, bounds, diagnostic_plots)
    402     plt.plot(x,f(x,ICs),color = 'orange')
    403     plt.show()
--> 404     popt,pcov = curve_fit(f,x, y, p0=ICs, bounds = (lb, ub),maxfev = 100000000)
    405     if diagnostic_plots:
    406         print('Curve_fit params: ' + str(popt))

~/.local/lib/python3.7/site-packages/scipy/optimize/minpack.py in curve_fit(f, xdata, ydata, p0, sigma, absolute_sigma, check_finite, bounds, method, jac, **kwargs)
    799 
    800         res = least_squares(func, p0, jac=jac, bounds=bounds, method=method,
--> 801                             **kwargs)
    802 
    803         if not res.success:

~/.local/lib/python3.7/site-packages/scipy/optimize/_lsq/least_squares.py in least_squares(fun, x0, jac, bounds, method, ftol, xtol, gtol, x_scale, loss, f_scale, diff_step, tr_solver, tr_options, jac_sparsity, max_nfev, verbose, args, kwargs)
    928         result = trf(fun_wrapped, jac_wrapped, x0, f0, J0, lb, ub, ftol, xtol,
    929                      gtol, max_nfev, x_scale, loss_function, tr_solver,
--> 930                      tr_options.copy(), verbose)
    931 
    932     elif method == 'dogbox':

~/.local/lib/python3.7/site-packages/scipy/optimize/_lsq/trf.py in trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, loss_function, tr_solver, tr_options, verbose)
    123         return trf_bounds(
    124             fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
--> 125             loss_function, tr_solver, tr_options, verbose)
    126 
    127 

~/.local/lib/python3.7/site-packages/scipy/optimize/_lsq/trf.py in trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, loss_function, tr_solver, tr_options, verbose)
    299             J_h = J_augmented[:m]  # Memory view.
    300             J_augmented[m:] = np.diag(diag_h**0.5)
--> 301             U, s, V = svd(J_augmented, full_matrices=False)
    302             V = V.T
    303             uf = U.T.dot(f_augmented)

~/.local/lib/python3.7/site-packages/scipy/linalg/decomp_svd.py in svd(a, full_matrices, compute_uv, overwrite_a, check_finite, lapack_driver)
    106 
    107     """
--> 108     a1 = _asarray_validated(a, check_finite=check_finite)
    109     if len(a1.shape) != 2:
    110         raise ValueError('expected matrix')

~/.local/lib/python3.7/site-packages/scipy/_lib/_util.py in _asarray_validated(a, check_finite, sparse_ok, objects_ok, mask_ok, as_inexact)
    291             raise ValueError('masked arrays are not supported')
    292     toarray = np.asarray_chkfinite if check_finite else np.asarray
--> 293     a = toarray(a)
    294     if not objects_ok:
    295         if a.dtype is np.dtype('O'):

/usr/local/lib/python3.7/site-packages/numpy/lib/function_base.py in asarray_chkfinite(a, dtype, order)
    487     if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
    488         raise ValueError(
--> 489             "array must not contain infs or NaNs")
    490     return a
    491 

ValueError: array must not contain infs or NaNs```

add functionality for fitting an arbitrary number of burst components

the current fitburst only fits for a single component, though parameter definitions allow for setting mulitple values per parameter. add the appropriate functionality to use this for fitting mutliple burst components in the manner done by CHIME/FRB (i.e., DM and scattering parameters are "global" parameters, all other are per-burst.)

Make spectral fitting optional

Add the fit_spectrum = False flag in the make-input part of the code

add capability to plot summary of fit

I have been adding functionality to downsample 1D and 2D data, partly to enable plotting of data/model/residuals in a similar manner done for the on-site/production version of fitburst. once these are tested, i will then mostly import the triptych routine written by @zpleunis to create fit summaries.

Improve error message in Hessian and covariance matrix calculation

Add one try-except block for the Hessian (L257) and another for the covariance (L258) with clear error message.

fitburst/fitburst/analysis/fitter.py

Line 257 in 57f18f4

hessian = jacobian.T.dot(jacobian)

create a pre-fit algorithm for producing refined guesses of time and amplitude parameters

currently, fitburst uses a set of user-generated values of the model parameters as "initial guesses" for the fitting portion of the pipeline. (these values are supplied to fitburst from the input .npz files described in the documentation.) however, it is usually desirable to refine these guesses further, if desired by a user. this will be most important for "multi-component" spectra, i.e., profiles with several distinct "sub-bursts" in frequency and time.

let's add some code that uses one or more methods for determining initial guesses from the input spectrum after data are read in via the DataReader. the code can live in the fitburst/analysis section of the codebase and should be importable and used optionally if specified with a command-line option.

for starters, let's develop a Python object (likely a class) that can be invoked for running (at least) a peak-finding algorithm as a method within the class; I suggest looking into and consider using the find_peak() function from the scipy package for our first method.

I've created a test data set for you to develop the algorithm against; I'll generate more data for us to test against once we have a working algorithm, but for starters use the data located here: /data/user-data/efonseca/data/fitburst/frb/code_development. the file format of these data is the same as the fitburst-compliant "generic" format, so that you can read in these data using the generic DataReader:

(chimefrb-py3) [efonseca@frb-analysis code_development]$ python
...
>>> from fitburst.backend.generic import DataReader
>>> data = DataReader("test_data_CHIMEFRB_77049697.npz")
>>> data.load_data()
>>> data.data_full.max(), data.data_full.min()
(1.991973876953125, -0.5901955366134644)

a summary file of the test data -- from the CHIME/FRB event with ID 77049697 -- is posted here:

add a `save_results` method to `DataReader` class

we should add some functionality to have fitburst record or push fit results to user-specified places. for example, a general user will likely want things recorded to ASCII or .npz, whereas CHIME/FRB will want results reported to their database. my current thinking is to add a generic method to the DataReader base class, and then overload this in the fitburst/backend/chimefrb.py version of the DataReader to push things to FRBMaster.

add ability to fit models against specified sub-band of input spectrum

the current fitburst does not allow for fitting to a specific fraction of the input dynamic spectrum. there may be cases where only fitting to a fraction of the band is worthwhile. we should add the ability to perform "sub-banded fits."

Repository Updates

Move fitburst to solely use, pyproject.toml config
Update dependencies for tests, chimefrb extras and docs.
Add pre-commit config for automated code checks
Add a Github Action to run pre-commit and pytest
Add a Github Action to run automatically create a release

Once these actions are done, we can add fitburst as a dependency in chimefrb/baseband-analysis

Please supply DM value error

If we don't supply DM value we get this error. It should automatically look for the value if the file is coherently de-dispersed. If not then perform coherent dedispersion.

allow for optional dispersion-smearing removal during fits

the current fitburst does not account for dispersion smearing, and should/will adapt the algorithm used by CHIME/FRB (upsample + disperse + downsample).

Add estimate on amplitude

Some burst are not fitting correct amplitude. We can provide the guess from our mcmc fits. This affecting very bright events.

design and add structure for computing post-fit statistics

the current fitburst only computes an F-test based on separate fits where scattering is either directly modeled or ignored entirely. however, there are a number of statistical tests to be performed, whether they be for model comparison and/or quality of fits (e.g., tests of Gaussianty in best-fit residuals). we should design and implement code to perform these tests, probably as a separate class that receives several inputs from the best-fit model and computes a variety of statistical measures.

Update run_fitburst -> fitburst_example_generic

To do this, we just need to change how we run fitburst from the notebook i.e. pass a command using os.

Add argument to show the fit result in fitburst_example_generic script

If fitburst_example_generic.py is ran on jupyter, it would be nice to see the plot -> add --show True as an argument.
By default, we will have --show False since it will be most often ran by command line on frb-analysis (that cannot show the plot since no GUI).