franpoz / sherlock Goto Github PK

View Code? Open in Web Editor NEW

17.0 3.0 7.0 852.52 MB

Easy and versatile open-source code to explore Kepler, K2 and TESS data in the search for exoplanets

License: MIT License

Python 98.03% Shell 0.69% Dockerfile 0.16% Singularity 0.02% CSS 0.06% HTML 0.30% JavaScript 0.45% PHP 0.29%

exoplanets tess kepler

sherlock's People

Contributors

Stargazers

Watchers

Forkers

martindevora luiscerdenomota dobrycheva renlliang3 anthuil liciascl planethunters

sherlock's Issues

Add min and max sectors limit so objects with more sectors than that will be skipped

This would be useful for batch executions where the user doesn't know very well the content of the lightcurves.

Binning in the detrends plots

Remove the binning plot in the detrending plots.
It is not offering to much information. It is better to let only the corrected un-binned plots

Add user lightcurve pre-processing open function

As well as we do a savgol filter with a short cadence, we could provide an open function as we do for ois filtering to do any processing wanted by the user before sherlock begins with the runs.

Remove usage of binning for long cadence light curve plots

Allow users to insert their own signal selection algorithm

We could add a setup method which would allow the user to overwrite the SignalSelector interface and provide his implementation to sherlock.

Save the time and flux for future steps

Save the time, flux, flux_err after SHERLOCK operations in a csv file to proceed with vetting and fits steps after.

For a given signal that we want to vett and fit, we need as imputs its period and T0, which has been provided by SHERLOCK previously.

Store ois csvs into user home as sherlock metadata folder

That way the pipeline could be executed from any path and it would always find the data and thus, it wouldn't need to download again.

Adjust border score window depending to the cadence

Run storage in folders

We are generating a considerable amount of plots in the results. It is more practical to save independently each run of SHERLOCK in a folder.

Disable initial smooth for long cadences

Flag to make sherlock to look into specific interest areas within the star

Like habitable zones or optimistic zones, ice limit, tidal lock range etc

Vetting: Add transits plot with y axis limited by depth

As LATTE plots are not very useful for shallow transits we might add our custom plots of the transits with the flux values limited by a depth value. That way we could focus the view into the values that we really want to explore.

Add more parameters to fit with allesfitter

We are using only the essential parameters for the fit. We can add some more like the limb darkening parameters and many others might not be not critical for the fit but could be helpful to increase its accuracy.

Add validation for all the user inputs

Validation is right now anecdotal and needs to be pushed at least at user input levels.

Prepare SHERLOCK for pip packaging

Following: https://packaging.python.org/tutorials/packaging-projects/

Improve selection of auto-detrend period

Right now we choose the lowest period from the strongest ones. It might be better to just choose the stronger one or think about something different.

Create Sherlock Object of Interests csv

When user calls for a fit and vetting see fitting and vetting for one of the promising signals, the final result can be a new entry in a sois.csv if the user thinks the signal is good enough to be a strong candidate.

Add a second fit guess after TLS best transit signal is returned

One of the analysis points which are not solved by SHERLOCK yet is the visual processing of every detrend result for each run. We could add a new mechanism to reject false positives returned by TLS by analyzing the found transit environment. For instance, we could assess sinusoidal trends, high variability of signal, empty measurements around the transit...

Add planet radius and rs/rp to all the signals in all the runs

Sometimes SHERLOCK leaves out of the final promising candidates list one or several signals that could also be relevant. Thus, planet radius and r_star/r_planet ratio would be useful when printed besides each found signal for each detrend.

Include in the report the radius ratio between planet a star

In order to apply a proper fitting of the transits via [ALLESFITER] (#25)
We need to provide the ratio between the planet and the star. This can be easily given in the log files making use of the
results.rp_rs line in TLS

Add self vetting process

We should include an option to run a vetting similar to how is done in LATTE

Add time range limit to sherlock execution (similar to sectors limit)

This way we avoid limiting for instance to 2 sectors which are too far in the time.

[BUG] Sometimes model curve doesn't match the transit parameters

Example for TIC 299798795:

tls_results.folded_phase[np.argwhere(tls_results.folded_phase_model < 1)]
This returns a range between 0.495 and 0.505 for the period of 4.17 which would be a duration of 60 mins aprox. However, the tls returned duration is 13 minutes.
Possible bug for transitleastsquares.

[Bug] in reporting border score

When running the TIC 467179258, in the 3rd run is detected a signal which all the detected transits are in borders. However, the border score indicates that they are not (Border score=1).
This needs further exploration.

win_size    Period    Per_err   N.Tran  Mean Depth (ppt)  T. dur (min)  T0            SNR           SDE           FAP           Border_score      
PDCSAP_FLUX 32.01687  0.104577  1       1.899             138.0         1710.2785     8.614         24.781        8.0032e-05    1.00

LATTE vetting printing exception when executed for object without TOI

Example:
vet.py --object_dir MIS_TIC_307210830_all --candidate 5

Traceback (most recent call last):
  File "/home/martin/git_repositories/sherlockpipe/sherlockpipe/vet.py", line 162, in __process
    dec, self.args)
  File "/home/martin/git_repositories/sherlockpipe/watson/lib/python3.6/site-packages/LATTE/LATTEbrew.py", line 363, in brew_LATTE
    ldv.LATTE_DV(tic, indir, syspath, transit_list, sectors_all, target_ra, target_dec, tessmag, teff, srad, [0], [0], tpf_corrupt, astroquery_corrupt, FFI = False,  bls = False, model = model, mpi = args.mpi)
  File "/home/martin/git_repositories/sherlockpipe/watson/lib/python3.6/site-packages/LATTE/LATTE_DV.py", line 89, in LATTE_DV
    TOI = (float(TOIpl["Full TOI ID"]))
  File "/home/martin/git_repositories/sherlockpipe/watson/lib/python3.6/site-packages/pandas/core/series.py", line 129, in wrapper
    raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
couldn't download TP but continue anyway

Modify the layout of the detrend plots

The current layout of the detrends applied is not very useful when many sectors are available, even worse if the sectors are separated. For example:

We should think about a more friendly-read layout. For example:

Paper to be submitted to JOSS

The first draft is already posted in the paper folder. Needs further details and development.

The intention is to submit to https://joss.theoj.org

matplolib version

To run the vetting via LATTE package, it should be used an old version of matplotlib. We may use the same for the full sherlock to avoid the need of a virtual env for the vetting part.

Consider [sub]harmonics detection in the same Run

Armonics, subharmonics and the source signal can be detected by several detrends in the same run. Maybe we can enhance the signal scoring by detecting whether the found signal is armonic or subharmonic of some others which were found in the same run.

Modify the outputs names of work folders

The current nomenclature for working folders is given as MIS_TIC XXXXX, which is no longer convenient.

Let's move to a new one as TICXXXXX_2MIN or TICXXXXXX_FFI etc.

[BUG] cannot convert float NaN to integer

It seems like a TLS issue, but I create it here until we confirm it. It happened in the first run of this execution:

sherlock = Sherlock([]).setup_detrend(initial_rms_threshold=2.5, initial_rms_bin_hours=5)\
         .setup_transit_adjust_params(snr_min=8, period_min=0.5, period_max=70, max_runs=8)\
         .load_ois().filter_hj_ois().limit_ois(7, 8).run()

The exception was:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/martin/.local/lib/python3.6/site-packages/transitleastsquares/core.py", line 153, in search_period
    duration_min_in_samples = int(numpy.floor(duration_min * len(y)))
ValueError: cannot convert float NaN to integer

Create properties examples

Give several examples of the usage of Sherlock through sherlock_user_properties.python

Rename result directories

From MIS_TIC_XXXXXX_all to TICXXXXXX_2MIN_all

Create automated tests for simple methods and manual tests for complex tasks

Add user defined searchzone open function

So the algorithm is open to new definitions given by any user without needing to push their custom solution into the main branch.

warning for the YAML files

We need to update how to load yaml files. The current version yields this warning:

/usr/local/lib/python3.6/dist-packages/sherlockpipe/__main__.py:16: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  sherlock_user_properties = yaml.load(open(resources_dir + "/" + 'properties.yaml'))
/usr/local/lib/python3.6/dist-packages/sherlockpipe/__main__.py:17: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  user_properties = yaml.load(open(args.properties))

Add metadata for each object run so it can be recovered in case sherlock is stopped

That way long executions which are interrupted can be resumed instead of restarted.

Detect new ois and refresh files

Add yaml and allesfitter to the pip

complete the list of installed packages needed with allesfitter and yaml

In-depth vetting via LATTE code

To provide with an in-depth vetting of a promising signal, we should consider adapt LATTE criteria.

Add scoring algorithm based on all the detrends power spectra

tls_results.power represent the SDE vs period of every detrended signal. We could look for every found signal period into the otherdetrended signals power spectrum to see if it is also strong enough (even not being the stronghest for its source signal).

Add a proper fit of the promising results

After a detection by SHERLOCK the solution should be improved by a proper fit to have the best ephemeris as possible to perform a ground-based follow-yp. We might use Allesfitter/Juliet/Exofast for that.

Add documentation of Sherlock API flow

Create Contributing.md

Auto-detrend for specific cases: fast rotators, pulsators, etc

Right now auto-detrend only works for one period. It would be a great addition to do a detrend or subtraction of pulsations from pulsating stars. For this we can follow the asteroseismology examples from Lightkurve. In order to be able to do that we need to be able to automatically characterize the kind of object we are processing to select the proper algorithm to detrend it.

For fast rotators, auto-detrend should be done only for the strongest period and not for the shortest one within a range of strong periods.

Enhance installation guide in the README, maybe create a Installation.md

Detect new sectors since last run and tics included

Add errors to the provisioning file for fitting

The current version of the file that one needs for the fitting looks like:

# This example YAML file illustrates the provisioning of planet and star parameters. The star mass here is important
# because the semi-major axis is not being provided and thus, it'd need to be calculated from the period and the M_star.
settings:
  cpus: 7
planet:
  id: 231663901
  period: 1.4309133565904666
  t0: 1326.004206688168
  rp_rs: 0.10455530183261129
star:
  R_star: 1.02
  M_star: 1.05

But we need to also use the errors given by TLS, at least in the period. For the case of T0, TLS does not provide any error, but I would say that we can use a reasonable value of +-0.01 days

Add depth to candidates report and csv

This is useful not only for reviewing reports but also for subsequent future fitting.

Proper solution via fit with Allesfitter

To obtain the best solution as possible once we have the TLS results we need to fit the data properly. We choose
Allesfitter.

We will need three files:

1) photometic file: TESS.csv
it should be a coma separated csv file with three columns; #time, flux, flux_err

# time,flux,flux_err
0.000000000000000000e+00,1.000993428306022448e+00,2.000000000000000042e-03
1.041666666666666609e-02,9.997246288021938154e-01,2.000000000000000042e-03
2.083333333333333218e-02,1.001297691868046513e+00,2.000000000000000042e-03

2) settings.csv
This file can be constant for all our trials, in principle there is not need to change to many things. It looks like:

#name,value
###############################################################################,
# General settings,
###############################################################################,
companions_phot,b
companions_rv,
inst_phot,Leonardo
inst_rv,
###############################################################################,
# Fit performance settings,
###############################################################################,
multiprocess,True
multiprocess_cores,4
fast_fit,True
fast_fit_width,0.3333333333333333
secondary_eclipse,False
phase_curve,False
shift_epoch,True
inst_for_b_epoch,all
###############################################################################,
# MCMC settings,
###############################################################################,
mcmc_nwalkers,100
mcmc_total_steps,2000
mcmc_burn_steps,1000
mcmc_thin_by,1
###############################################################################,
# Nested Sampling settings,
###############################################################################,
ns_modus,dynamic
ns_nlive,500
ns_bound,single
ns_sample,rwalk
ns_tol,0.01
###############################################################################,
# Limb darkening law per object and instrument,
# if 'lin' one corresponding parameter called 'ldc_q1_inst' has to be given in params.csv,
# if 'quad' two corresponding parameter called 'ldc_q1_inst' and 'ldc_q2_inst' have to be given in params.csv,
# if 'sing' three corresponding parameter called 'ldc_q1_inst'; 'ldc_q2_inst' and 'ldc_q3_inst' have to be given in params.csv,
###############################################################################,
host_ld_law_Leonardo,quad
###############################################################################,
# Baseline settings per instrument,
# baseline params per instrument: sample_offset / sample_linear / sample_GP / hybrid_offset / hybrid_poly_1 / hybrid_poly_2 / hybrid_poly_3 / hybrid_pol_4 / hybrid_spline / hybrid_GP,
# if 'sample_offset' one corresponding parameter called 'baseline_offset_key_inst' has to be given in params.csv,
# if 'sample_linear' two corresponding parameters called 'baseline_a_key_inst' and 'baseline_b_key_inst' have to be given in params.csv,
# if 'sample_GP' two corresponding parameters called 'baseline_gp1_key_inst' and 'baseline_gp2_key_inst' have to be given in params.csv,
###############################################################################,
baseline_flux_Leonardo,hybrid_offset
###############################################################################,
# Error settings per instrument,
# errors (overall scaling) per instrument: sample / hybrid,
# if 'sample' one corresponding parameter called 'log_err_key_inst' (photometry) or 'log_jitter_key_inst' (RV) has to be given in params.csv,
###############################################################################,
error_flux_Leonardo,sample
###############################################################################,
# Exposure times for interpolation,
# needs to be in the same units as the time series,
# if not given the observing times will not be interpolated leading to biased results,
###############################################################################,
t_exp_Leonardo,
###############################################################################,
# Number of points for exposure interpolation,
# Sample as fine as possible; generally at least with a 2 min sampling for photometry,
# n_int=5 was found to be a good number of interpolation points for any short photometric cadence t_exp;,
# increase to at least n_int=10 for 30 min phot. cadence,
# the impact on RV is not as drastic and generally n_int=5 is fine enough,
###############################################################################,
t_exp_n_int_Leonardo,
###############################################################################,
# Number of spots per object and instrument,
###############################################################################,
host_N_spots_Leonardo,
###############################################################################,
# Number of flares (in total),
###############################################################################,
N_flares,
###############################################################################,
# TTVs,
###############################################################################,
fit_ttvs,False
###############################################################################,
# Stellar grid per object and instrument,
###############################################################################,
host_grid_Leonardo,default
###############################################################################,
# Stellar shape per object and instrument,
###############################################################################,
host_shape_Leonardo,sphere
###############################################################################,
# Flux weighted RVs per object and instrument,
# ("Yes" for Rossiter-McLaughlin effect),
###############################################################################,

3) params.csv
The main parameters that need to be modified here from candidate to candidate are the:

b_rr,0.1,1,trunc_normal 0 1 0.1 0.05,$R_b / R_\star$,
b_rsuma,0.2,1,trunc_normal 0 1 0.2 1.5,$(R_\star + R_b) / a_b$,
b_cosi,0.0,1,uniform 0.0 0.03,$\cos{i_b}$,
b_epoch,1.09,1,trunc_normal -1000000000000.0 1000000000000.0 1.09 0.05,$T_{0;b}$,$\mathrm{BJD}$
b_period,3.41,1,trunc_normal -1000000000000.0 1000000000000.0 3.41 0.05,$P_b$,$\mathrm{d}$
b_f_c,0.0,0,trunc_normal -1 1 0.0 0.0,$\sqrt{e_b} \cos{\omega_b}$,
b_f_s,0.0,0,trunc_normal -1 1 0.0 0.0,$\sqrt{e_b} \sin{\omega_b}$,

The full file looks like:

#name,value,fit,bounds,label,unit
#companion b astrophysical params,,,,,
b_rr,0.1,1,trunc_normal 0 1 0.1 0.05,$R_b / R_\star$,
b_rsuma,0.2,1,trunc_normal 0 1 0.2 1.5,$(R_\star + R_b) / a_b$,
b_cosi,0.0,1,uniform 0.0 0.03,$\cos{i_b}$,
b_epoch,1.09,1,trunc_normal -1000000000000.0 1000000000000.0 1.09 0.05,$T_{0;b}$,$\mathrm{BJD}$
b_period,3.41,1,trunc_normal -1000000000000.0 1000000000000.0 3.41 0.05,$P_b$,$\mathrm{d}$
b_f_c,0.0,0,trunc_normal -1 1 0.0 0.0,$\sqrt{e_b} \cos{\omega_b}$,
b_f_s,0.0,0,trunc_normal -1 1 0.0 0.0,$\sqrt{e_b} \sin{\omega_b}$,
#dilution per instrument,,,,,
dil_Leonardo,0.0,0,trunc_normal 0 1 0.0 0.0,$D_\mathrm{0; Leonardo}$,
#limb darkening coefficients per instrument,,,,,
host_ldc_q1_Leonardo,0.5,1,uniform 0.0 1.0,$q_{1; \mathrm{Leonardo}}$,
host_ldc_q2_Leonardo,0.5,1,uniform 0.0 1.0,$q_{2; \mathrm{Leonardo}}$,
#surface brightness per instrument and companion,,,,,
b_sbratio_Leonardo,0.0,0,trunc_normal 0 1 0.0 0.0,$J_{b; \mathrm{Leonardo}}$,
#albedo per instrument and companion,,,,,
host_geom_albedo_Leonardo,0.0,0,trunc_normal 0 1 0.0 0.0,$A_{\mathrm{geom}; host; \mathrm{Leonardo}}$,
b_geom_albedo_Leonardo,0.0,0,trunc_normal 0 1 0.0 0.0,$A_{\mathrm{geom}; b; \mathrm{Leonardo}}$,
#gravity darkening per instrument and companion,,,,,
host_gdc_Leonardo,0.0,0,trunc_normal 0 1 0.0 0.0,$Grav. dark._{b; \mathrm{Leonardo}}$,
#spots per instrument and companion,,,,,
#errors per instrument,
log_err_flux_Leonardo,-7.0,1,uniform -15.0 0.0,$\log{\sigma_\mathrm{Leonardo}}$,$\log{ \mathrm{rel. flux.} }$
#baseline per instrument,
baseline_gp_matern32_lnsigma_flux_Leonardo,0.0,1,uniform -15.0 15.0,$\mathrm{gp: \ln{\sigma} (Leonardo)}$,
baseline_gp_matern32_lnrho_flux_Leonardo,0.0,1,uniform -15.0 15.0,$\mathrm{gp: \ln{\rho} (Leonardo)}$,

We need a function to be provided with the mainly results from TLS and generates the three files to run the allesfitter fit.

To lunch the fit we just need to run run.py, which looks like:

import allesfitter
allesfitter.show_initial_guess('allesfit')
allesfitter.ns_fit('allesfit')
allesfitter.ns_output('allesfit')