quatrope / feets Goto Github PK

View Code? Open in Web Editor NEW

52.0 6.0 14.0 24.75 MB

feATURE eXTRACTOR FOR tIME sERIES

License: MIT License

Python 66.37% Jupyter Notebook 21.42% TeX 11.73% Shell 0.48%

machine-learning feature-extraction timeseries astronomy python2 python3 scipy astropy

feets's People

Contributors

Stargazers

Watchers

Forkers

gnarayan ryanjmccall stevenstetzler jfastudillo clooney2007 lvferrero jorgemarpa neonyyyyy tristanttran bernardbisi neutrinoceros nikkliapets feliclaria

feets's Issues

Create plot routine for extractor ``feets.extractors.ext_stetson.StetsonJ``

Create plot routine for extractor StetsonJ.

Path: feets.extractors.ext_stetson.py

Features

StetsonJ

Extractor Documentation

These three features are based on the Welch/Stetson variability index I (Stetson, 1996) defined by the equation:

$$I = \sqrt{\frac{1}{n(n-1)}} \sum_{i=1}^n { (\frac{b_i-\hat{b}}{\sigma_{b,i}}) (\frac{v_i - \hat{v}}{\sigma_{v,i}})}$$

where :math:b_i and v_i are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion i, σ_b, i and σ_v, i are the standard errors of those magnitudes, b̂ and hat{v} are the weighted mean magnitudes in the two filters, and n is the number of observation pairs.

Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the "relative error" is calculated as follows:

$$\delta = \sqrt{\frac{n}{n-1}} \frac{v-\hat{v}}{\sigma_v}$$

allowing all residuals to be compared on an equal basis.

StetsonJ

Stetson J is a robust version of the variability index. It is calculated based on two simultaneous light curves of a same star and is defined as:

$$J = \sum_{k=1}^n sgn(P_k) \sqrt{|P_k|}$$

with P_k = δ_{i_k}δ_{j_k}

For a Gaussian magnitude distribution, J should take a value close to zero:
>>> fs = feets.FeatureSpace(only=['StetsonJ'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'StetsonJ': 0.010765631555204736}
References

Warnings

The original FATS documentation says that the result of StetsonJ must be ~0 for gausian distribution but the result is ~-0.41

Create plot routine for extractor ``feets.extractors.ext_beyond1_std.Beyond1Std``

Create plot routine for extractor Beyond1Std.

Path: feets.extractors.ext_beyond1_std.py

Features

Beyond1Std

Extractor Documentation

Beyond1Std

Percentage of points beyond one standard deviation from the weighted mean. For a normal distribution, it should take a value close to 0.32:
>>> fs = feets.FeatureSpace(only=['Beyond1Std'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Beyond1Std': 0.317}
References

Create plot routine for extractor ``feets.extractors.ext_color.Color``

Create plot routine for extractor Color.

Path: feets.extractors.ext_color.py

Features

Color

Extractor Documentation

Color

The color is defined as the difference between the average magnitude of two different bands observations.
>>> fs = feets.FeatureSpace(only=['Color'])
>>> features, values = fs.extract(**lc)
>>> dict(zip(features, values))
{'Color': -0.33325502453332145}
References

Create plot routine for extractor ``feets.extractors.ext_percent_amplitude.PercentAmplitude``

Create plot routine for extractor PercentAmplitude.

Path: feets.extractors.ext_percent_amplitude.py

Features

PercentAmplitude

Extractor Documentation

PercentAmplitude

Largest percentage difference between either the max or min magnitude and the median.
>>> fs = feets.FeatureSpace(only=['PercentAmplitude'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'PercentAmplitude': -168.991253993057}
References

Create plot routine for extractor ``feets.extractors.ext_autocor_length.AutocorLength``

Create plot routine for extractor AutocorLength.

Path: feets.extractors.ext_autocor_length.py

Features

Autocor_length

Extractor Documentation

Autocor_length

The autocorrelation, also known as serial correlation, is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time lag between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies.

For an observed series y₁, y₂, …, y_T with sample mean ȳ, the sample lag −h autocorrelation is given by:

$$\rho_h = \frac{\sum_{t=h+1}^T (y_t - \bar{y})(y_{t-h}-\bar{y})} {\sum_{t=1}^T (y_t - \bar{y})^2}$$

Since the autocorrelation fuction of a light curve is given by a vector and we can only return one value as a feature, we define the length of the autocorrelation function where its value is smaller than e⁻¹ .

References

missing the dict unpack **

Ryan J. McCall to me
Show more
20 Dec
Sorry, the formatting was message up in the last email:

The following line is missing the dict unpack **

https://github.com/carpyncho/feets/blob/10ef238db0c36a903526c84742748feb12b2208c/feets/datasets/ogle3.py#L206

…

Feature StetsonJ return unexpected value

The extractor feets.extractors.ext_stetson.StetsonJ documentation says

For a Gaussian magnitude distribution, J should take a value close to zero.

but the real result is of the feature is ~-0.41

This test is implemented in this test: https://github.com/carpyncho/feets/blob/40157f9bbfad87cc23753bad76a8414614e56814/feets/tests/test_extractors.py#L197-L209

Feature AndersonDarling return unexpected value

The extractor feets.extractors.ext_anderson_darling.AndersonDarling documentation says

For a normal distribution the Anderson-Darling statistic should take values close to 0.25.

but the real result is of the feature is ~-0.60

This test is implemented in this test: https://github.com/carpyncho/feets/blob/40157f9bbfad87cc23753bad76a8414614e56814/feets/tests/test_extractors.py#L123-L130

Create plot routine for extractor ``feets.extractors.ext_linear_trend.LinearTrend``

Create plot routine for extractor LinearTrend.

Path: feets.extractors.ext_linear_trend.py

Features

LinearTrend

Extractor Documentation

LinearTrend

Slope of a linear fit to the light-curve.

>>> fs = feets.FeatureSpace(only=['LinearTrend'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'LinearTrend': -3.2084065290292509e-06}

References

Create plot routine for extractor ``feets.extractors.ext_flux_percentile_ratio.FluxPercentileRatioMid20``

Create plot routine for extractor FluxPercentileRatioMid20.

Path: feets.extractors.ext_flux_percentile_ratio.py

Features

FluxPercentileRatioMid20

Extractor Documentation

In order to caracterize the sorted magnitudes distribution we use percentiles. If F_5, 95 is the difference between 95% and 5% magnitude values, we calculate the following:

flux_percentile_ratio_mid20: ratio F_40, 60/F_5, 95
flux_percentile_ratio_mid35: ratio F_32.5, 67.5/F_5, 95
flux_percentile_ratio_mid50: ratio F_25, 75/F_5, 95
flux_percentile_ratio_mid65: ratio F_17.5, 82.5/F_5, 95
flux_percentile_ratio_mid80: ratio F_10, 90/F_5, 95

For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:

$$\frac{erf^{-1}(2 \cdot 0.6-1)-erf^{-1}(2 \cdot 0.4-1)} {erf^{-1}(2 \cdot 0.95-1)-erf^{-1}(2 \cdot 0.05-1)}$$

So, the expected values for each of the flux percentile features are:

flux_percentile_ratio_mid20 = 0.154
flux_percentile_ratio_mid35 = 0.275
flux_percentile_ratio_mid50 = 0.410
flux_percentile_ratio_mid65 = 0.568
flux_percentile_ratio_mid80 = 0.779

References

Create plot routine for extractor ``feets.extractors.ext_car.CAR``

Create plot routine for extractor CAR.

Path: feets.extractors.ext_car.py

Features

CAR_mean
CAR_sigma
CAR_tau

Extractor Documentation

In order to model the irregular sampled times series we use CAR
(Brockwell and Davis, 2002), a continious time auto regressive model.

CAR process has three parameters, it provides a natural and consistent way of estimating a characteristic time scale and variance of light-curves. CAR process is described by the following stochastic differential equation:

$$dX(t) = - \frac{1}{\tau} X(t)dt + \sigma_C \sqrt{dt} \epsilon(t) + bdt, \\\ for \: \tau, \sigma_C, t \geq 0$$

where the mean value of the lightcurve X(t) is b**τ and the variance is $\frac{\tau\sigma_C^2}{2}$. τ is the relaxation time of the process X(t), it can be interpreted as describing the variability amplitude of the time series. σ_C can be interpreted as describing the variability of the time series on time scales shorter than τ. ϵ(t) is a white noise process with zero mean and variance equal to one.

The likelihood function of a CAR model for a light-curve with observations x − {x₁, …, x_n} observed at times {t₁, …, t_n} with measurements error variances {δ₁², …, δ_n²} is:

$$p (x|b,\sigma_C,\tau) = \prod_{i=1}^n \frac{1}{ [2 \pi (\Omega_i + \delta_i^2 )]^{1/2}} exp \{-\frac{1}{2} \frac{(\hat{x}_i - x^*_i )^2}{\Omega_i + \delta^2_i}\} \\$$ $$x_i^* = x_i - b\tau \\$$ $$\hat{x}_0 = 0 \\$$ $$\Omega_0 = \frac{\tau \sigma^2_C}{2} \\$$ $$\hat{x}_i = a_i\hat{x}_{i-1} + \frac{a_i \Omega_{i-1}}{\Omega_{i-1} + \delta^2_{i-1}} (x^*_{i-1} + \hat{x}_{i-1}) \\$$ $$\Omega_i = \Omega_0 (1- a_i^2 ) + a_i^2 \Omega_{i-1} (1 - \frac{\Omega_{i-1}}{\Omega_{i-1} + \delta^2_{i-1}} )$$

To find the optimal parameters we maximize the likelihood with respect to σ_C and τ and calculate b as the mean magnitude of the light-curve divided by τ.

>>> fs = feets.FeatureSpace(
...     only=['CAR_sigma', 'CAR_tau','CAR_mean'])
>>> features, values = fs.extract(**lc_periodic)
>>> dict(zip(features, values))
{'CAR_mean': -9.230698873903961,
 'CAR_sigma': -0.21928049298842511,
 'CAR_tau': 0.64112037377348619}

References

Create plot routine for extractor ``feets.extractors.ext_eta_color.EtaColor``

Create plot routine for extractor EtaColor.

Path: feets.extractors.ext_eta_color.py

Features

Eta_color

Extractor Documentation

Eta_color (η_color)

Variability index Eta_e (η^e) calculated from the color light-curve.
>>> fs = feets.FeatureSpace(only=['Eta_color'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Eta_color': 1.991749074648397}
References

Create plot routine for extractor ``feets.extractors.ext_q31.Q31``

Create plot routine for extractor Q31.

Path: feets.extractors.ext_q31.py

Features

Extractor Documentation

Q31 (Q_3 − 1)

Q_3 − 1 is the difference between the third quartile, Q₃, and the first quartile, Q₁, of a raw light curve. Q₁ is a split between the lowest 25% and the highest 75% of data. Q₃ is a split between the lowest 75% and the highest 25% of data.
>>> fs = feets.FeatureSpace(only=['Q31'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Q31': 1.3320376563134508}
References

Create plot routine for extractor ``feets.extractors.ext_lomb_scargle.LombScargle``

Create plot routine for extractor LombScargle.

Path: feets.extractors.ext_lomb_scargle.py

Features

Psi_eta
PeriodLS
Psi_CS
Period_fit

Extractor Documentation

PeriodLS

The Lomb-Scargle (L-S) algorithm (Scargle, 1982) is a variation of the Discrete Fourier Transform (DFT), in which a time series is decomposed into a linear combination of sinusoidal functions. The basis of sinusoidal functions transforms the data from the time domain to the frequency domain. DFT techniques often assume evenly spaced data points in the time series, but this is rarely the case with astrophysical time-series data. Scargle has derived a formula for transform coefficients that is similiar to the DFT in the limit of evenly spaced observations. In addition, an adjustment of the values used to calculate the transform coefficients makes the transform invariant to time shifts.

The Lomb-Scargle periodogram is optimized to identify sinusoidal-shaped periodic signals in time-series data. Particular applications include radial velocity data and searches for pulsating variable stars. L-S is not optimal for detecting signals from transiting exoplanets, where the shape of the periodic light-curve is not sinusoidal.

Period_fit

The false alarm probability of the peaks-largest periodogram value. Let's test it for a normal distributed data and for a periodic one.

Psi_CS (Ψ_C**S)

R_C**S applied to the phase-folded light curve (generated using the periods estimated from the Lomb-Scargle method).

Psi_eta (Ψ_η)

η^e index calculated from the folded light curve.

References

Create plot routine for extractor ``feets.extractors.ext_rcs.RCS``

Create plot routine for extractor RCS.

Path: feets.extractors.ext_rcs.py

Features

Extractor Documentation

Rcs - Range of cumulative sum (R_c**s)

R_c**s is the range of a cumulative sum (Ellaway 1978) of each light-curve and is defined as:

$$R_{cs} = max(S) - min(S) \\\ S = \frac{1}{N \sigma} \sum_{i=1}^l (m_i - \bar{m})$$

where max(min) is the maximum (minimum) value of S and l = 1, 2, …, N.

R_c**s should take a value close to zero for any symmetric distribution:
>>> fs = feets.FeatureSpace(only=['Rcs'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Rcs': 0.0094459606901065168}
References

Feature StetsonK return unexpected value

The extractor feets.extractors.ext_stetson.StetsonK documentation says

For a Gaussian magnitude distribution, J should take a value 0.798.

but the real result is of the feature is ~0.2

This test is implemented in this test: https://github.com/carpyncho/feets/blob/40157f9bbfad87cc23753bad76a8414614e56814/feets/tests/test_extractors.py#L211-L219

Create plot routine for extractor ``feets.extractors.ext_flux_percentile_ratio.FluxPercentileRatioMid65``

Create plot routine for extractor FluxPercentileRatioMid65.

Path: feets.extractors.ext_flux_percentile_ratio.py

Features

FluxPercentileRatioMid65

Extractor Documentation

In order to caracterize the sorted magnitudes distribution we use percentiles. If F_5, 95 is the difference between 95% and 5% magnitude values, we calculate the following:

flux_percentile_ratio_mid20: ratio F_40, 60/F_5, 95
flux_percentile_ratio_mid35: ratio F_32.5, 67.5/F_5, 95
flux_percentile_ratio_mid50: ratio F_25, 75/F_5, 95
flux_percentile_ratio_mid65: ratio F_17.5, 82.5/F_5, 95
flux_percentile_ratio_mid80: ratio F_10, 90/F_5, 95

For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:

$$\frac{erf^{-1}(2 \cdot 0.6-1)-erf^{-1}(2 \cdot 0.4-1)} {erf^{-1}(2 \cdot 0.95-1)-erf^{-1}(2 \cdot 0.05-1)}$$

So, the expected values for each of the flux percentile features are:

flux_percentile_ratio_mid20 = 0.154
flux_percentile_ratio_mid35 = 0.275
flux_percentile_ratio_mid50 = 0.410
flux_percentile_ratio_mid65 = 0.568
flux_percentile_ratio_mid80 = 0.779

References

Create plot routine for extractor ``feets.extractors.ext_stetson.StetsonK``

Create plot routine for extractor StetsonK.

Path: feets.extractors.ext_stetson.py

Features

StetsonK

Extractor Documentation

These three features are based on the Welch/Stetson variability index I (Stetson, 1996) defined by the equation:

$$I = \sqrt{\frac{1}{n(n-1)}} \sum_{i=1}^n { (\frac{b_i-\hat{b}}{\sigma_{b,i}}) (\frac{v_i - \hat{v}}{\sigma_{v,i}})}$$

where :math:b_i and v_i are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion i, σ_b, i and σ_v, i are the standard errors of those magnitudes, b̂ and hat{v} are the weighted mean magnitudes in the two filters, and n is the number of observation pairs.

Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the "relative error" is calculated as follows:

$$\delta = \sqrt{\frac{n}{n-1}} \frac{v-\hat{v}}{\sigma_v}$$

allowing all residuals to be compared on an equal basis.

StetsonK

Stetson K is a robust kurtosis measure:

$$\frac{1/N \sum_{i=1}^N |\delta_i|}{\sqrt{1/N \sum_{i=1}^N \delta_i^2}}$$

where the index i runs over all N observations available for the star without regard to pairing. For a Gaussian magnitude distribution K should take a value close to $\sqrt{2/\pi} = 0.798$:
>>> fs = feets.FeatureSpace(only=['StetsonK'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'StetsonK': 0.79914938521401002}
References

Warnings

The original FATS documentation says that the result of StetsonK must be 2/pi=0.798 for gausian distribution but the result is ~0.2

Create plot routine for extractor ``feets.extractors.ext_con.Con``

Create plot routine for extractor Con.

Path: feets.extractors.ext_con.py

Features

Extractor Documentation

Con

Index introduced for the selection of variable stars from the OGLE database (Wozniak 2000). To calculate Con, we count the number of three consecutive data points that are brighter or fainter than 2σ and normalize the number by N − 2.

For a normal distribution and by considering just one star, Con should take values close to 0.045:
>>> fs = feets.FeatureSpace(only=['Con'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Con': 0.0476}
References

Documentation for extractor `Signature`

Please add the documentation for the feets.extractors.ext_signature.Signature extractor.

Create plot routine for extractor ``feets.extractors.ext_flux_percentile_ratio.FluxPercentileRatioMid50``

Create plot routine for extractor FluxPercentileRatioMid50.

Path: feets.extractors.ext_flux_percentile_ratio.py

Features

FluxPercentileRatioMid50

Extractor Documentation

In order to caracterize the sorted magnitudes distribution we use percentiles. If F_5, 95 is the difference between 95% and 5% magnitude values, we calculate the following:

flux_percentile_ratio_mid20: ratio F_40, 60/F_5, 95
flux_percentile_ratio_mid35: ratio F_32.5, 67.5/F_5, 95
flux_percentile_ratio_mid50: ratio F_25, 75/F_5, 95
flux_percentile_ratio_mid65: ratio F_17.5, 82.5/F_5, 95
flux_percentile_ratio_mid80: ratio F_10, 90/F_5, 95

For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:

$$\frac{erf^{-1}(2 \cdot 0.6-1)-erf^{-1}(2 \cdot 0.4-1)} {erf^{-1}(2 \cdot 0.95-1)-erf^{-1}(2 \cdot 0.05-1)}$$

So, the expected values for each of the flux percentile features are:

flux_percentile_ratio_mid20 = 0.154
flux_percentile_ratio_mid35 = 0.275
flux_percentile_ratio_mid50 = 0.410
flux_percentile_ratio_mid65 = 0.568
flux_percentile_ratio_mid80 = 0.779

References

OO Results set

Implement a new result container, possible an object.

It should include features with arbitrary structure, and focusing on
readability of the results.
An example would be to generate visualization for features with a common background,
such as fourier components, or statistical moments.

It should have a method .as_array() which shall return the previous output.

Create plot routine for extractor ``feets.extractors.ext_stetson.StetsonKAC``

Create plot routine for extractor StetsonKAC.

Path: feets.extractors.ext_stetson.py

Features

StetsonK_AC

Extractor Documentation

These three features are based on the Welch/Stetson variability index I (Stetson, 1996) defined by the equation:

$$I = \sqrt{\frac{1}{n(n-1)}} \sum_{i=1}^n { (\frac{b_i-\hat{b}}{\sigma_{b,i}}) (\frac{v_i - \hat{v}}{\sigma_{v,i}})}$$

where :math:b_i and v_i are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion i, σ_b, i and σ_v, i are the standard errors of those magnitudes, b̂ and hat{v} are the weighted mean magnitudes in the two filters, and n is the number of observation pairs.

Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the "relative error" is calculated as follows:

$$\delta = \sqrt{\frac{n}{n-1}} \frac{v-\hat{v}}{\sigma_v}$$

allowing all residuals to be compared on an equal basis.

StetsonK_AC

Stetson K applied to the slotted autocorrelation function of the light-curve.
>>> fs = feets.FeatureSpace(only=['SlottedA_length','StetsonK_AC'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'SlottedA_length': 1.0, 'StetsonK_AC': 0.20917402545294403}
Parameters

T: tau - slot size in days (default=1).

References

Create plot routine for extractor ``feets.extractors.ext_mean_variance.MeanVariance``

Create plot routine for extractor MeanVariance.

Path: feets.extractors.ext_mean_variance.py

Features

Meanvariance

Extractor Documentation

Meanvariance ($\frac{\sigma}{\bar{m}}$)

This is a simple variability index and is defined as the ratio of the standard deviation σ, to the mean magnitude, m̄. If a light curve has strong variability, $\frac{\sigma}{\bar{m}}$ of the light curve is generally large.

For a uniform distribution from 0 to 1, the mean is equal to 0.5 and the variance is equal to 1/12, thus the mean-variance should take a value close to 0.577:
>>> fs = feets.FeatureSpace(only=['Meanvariance'])
>>> features, values = fs.extract(**lc_uniform)
>>> dict(zip(features, values))
{'Meanvariance': 0.5816791217381897}
References

Create plot routine for extractor ``feets.extractors.ext_std.Std``

Create plot routine for extractor Std.

Path: feets.extractors.ext_std.py

Features

Extractor Documentation

Std - Standard deviation of the magnitudes

The standard deviation σ of the sample is defined as:

$$\sigma=\frac{1}{N-1}\sum_{i} (y_{i}-\hat{y})^2$$

For example, a white noise time serie should have σ = 1
>>> fs = feets.FeatureSpace(only=['Std'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Std': 0.99320419310116881}
References

Create plot routine for extractor ``feets.extractors.ext_skew.Skew``

Create plot routine for extractor Skew.

Path: feets.extractors.ext_skew.py

Features

Skew

Extractor Documentation

Skew

The skewness of a sample is defined as follow:

$$Skewness = \frac{N}{(N-1)(N-2)} \sum_{i=1}^N (\frac{m_i-\hat{m}}{\sigma})^3$$

Example:

For a normal distribution it should be equal to zero:
>>> fs = feets.FeatureSpace(only=['Skew'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Skew': -0.00023325826785278685}
References

Create plot routine for extractor ``feets.extractors.ext_eta_e.Eta_e``

Create plot routine for extractor Eta_e.

Path: feets.extractors.ext_eta_e.py

Features

Eta_e

Extractor Documentation

Eta_e (η^e)

Variability index η is the ratio of the mean of the square of successive differences to the variance of data points. The index was originally proposed to check whether the successive data points are independent or not. In other words, the index was developed to check if any trends exist in the data (von Neumann 1941). It is defined as:

$$\eta = \frac{1}{(N-1)\sigma^2} \sum_{i=1}^{N-1} (m_{i+1}-m_i)^2$$

The variability index should take a value close to 2 for a normal distribution.

Although η is a powerful index for quantifying variability characteristics of a time series, it does not take into account unequal sampling. Thus η^r is defined as:

$$\eta^e = \bar{w} \, (t_{N-1} - t_1)^2 \frac{\sum_{i=1}^{N-1} w_i (m_{i+1} - m_i)^2} {\sigma^2 \sum_{i=1}^{N-1} w_i}$$

Where:

$$w_i = \frac{1}{(t_{i+1} - t_i)^2}$$

Example:
>>> fs = feets.FeatureSpace(only=['Eta_e'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Eta_e': 2.0028592616231866}
References

Create plot routine for extractor ``feets.extractors.ext_stetson.StetsonL``

Create plot routine for extractor StetsonL.

Path: feets.extractors.ext_stetson.py

Features

StetsonL

Extractor Documentation

These three features are based on the Welch/Stetson variability index I (Stetson, 1996) defined by the equation:

$$I = \sqrt{\frac{1}{n(n-1)}} \sum_{i=1}^n { (\frac{b_i-\hat{b}}{\sigma_{b,i}}) (\frac{v_i - \hat{v}}{\sigma_{v,i}})}$$

where :math:b_i and v_i are the apparent magnitudes obtained for the candidate star in two observations closely spaced in time on some occasion i, σ_b, i and σ_v, i are the standard errors of those magnitudes, b̂ and hat{v} are the weighted mean magnitudes in the two filters, and n is the number of observation pairs.

Since a given frame pair may include data from two filters which did not have equal numbers of observations overall, the "relative error" is calculated as follows:

$$\delta = \sqrt{\frac{n}{n-1}} \frac{v-\hat{v}}{\sigma_v}$$

allowing all residuals to be compared on an equal basis.

StetsonL

Stetson L variability index describes the synchronous variability of different bands and is defined as:

$$L = \frac{JK}{0.798}$$

Again, for a Gaussian magnitude distribution, L should take a value close to zero:
>>> fs = feets.FeatureSpace(only=['SlottedL'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'StetsonL': 0.0085957106316273714}
References

Create plot routine for extractor ``feets.extractors.ext_gskew.Gskew``

Create plot routine for extractor Gskew.

Path: feets.extractors.ext_gskew.py

Features

Gskew

Extractor Documentation

Median-of-magnitudes based measure of the skew.

Gskew = m_q3 + m_q97 − 2m

Where:

m_q3 is the median of magnitudes lesser or equal than the quantile 3.

m_q97 is the median of magnitudes greater or equal than the quantile 97.

m is the median of magnitudes.

Create plot routine for extractor ``feets.extractors.ext_max_slope.MaxSlope``

Create plot routine for extractor MaxSlope.

Path: feets.extractors.ext_max_slope.py

Features

MaxSlope

Extractor Documentation

MaxSlope

Maximum absolute magnitude slope between two consecutive observations.

Examining successive (time-sorted) magnitudes, the maximal first difference (value of delta magnitude over delta time)
>>> fs = feets.FeatureSpace(only=['MaxSlope'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'MaxSlope': 5.4943105823904741}
References

Create plot routine for extractor ``feets.extractors.ext_slotted_a_length.SlottedA_length``

Create plot routine for extractor SlottedA_length.

Path: feets.extractors.ext_slotted_a_length.py

Features

SlottedA_length

Extractor Documentation

SlottedA_length - Slotted Autocorrelation

In slotted autocorrelation, time lags are defined as intervals or slots instead of single values. The slotted autocorrelation function at a certain time lag slot is computed by averaging the cross product between samples whose time differences fall in the given slot.

$$\hat{\rho}(\tau=kh) = \frac {1}{\hat{\rho}(0)\,N_\tau} \sum_{t_i}\sum_{t_j= t_i+(k-1/2)h }^{t_i+(k+1/2)h} \bar{y}_i(t_i)\,\, \bar{y}_j(t_j)$$

Where h is the slot size, ȳ is the normalized magnitude, ρ̂(0) is the slotted autocorrelation for the first lag, and N_τ is the number of pairs that fall in the given slot.
>>> fs = feets.FeatureSpace(
...     only=['SlottedA_length'], SlottedA_length={"t": 1})
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'SlottedA_length': 1.}
Parameters

T: tau - slot size in days (default=1).

References

Create plot routine for extractor ``feets.extractors.ext_percent_difference_flux_percentile.PercentDifferenceFluxPercentile``

Create plot routine for extractor PercentDifferenceFluxPercentile.

Path: feets.extractors.ext_percent_difference_flux_percentile.py

Features

PercentDifferenceFluxPercentile

Extractor Documentation

PercentDifferenceFluxPercentile

Ratio of F_5, 95 over the median magnitude.

>>> fs = feets.FeatureSpace(only=['PercentDifferenceFluxPercentile'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'PercentDifferenceFluxPercentile': -134.93590403825007}

References

Create plot routine for extractor ``feets.extractors.ext_q31.Q31Color``

Create plot routine for extractor Q31Color.

Path: feets.extractors.ext_q31.py

Features

Q31_color

Extractor Documentation

Q31_color (Q_{3 − 1|B − R})

Q_3 − 1 applied to the difference between both bands of a light curve (B-R).
>>> fs = feets.FeatureSpace(only=['Q31_color'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Q31_color': 1.8840489594535512}
References

Create plot routine for extractor ``feets.extractors.ext_small_kurtosis.SmallKurtosis``

Create plot routine for extractor SmallKurtosis.

Path: feets.extractors.ext_small_kurtosis.py

Features

SmallKurtosis

Extractor Documentation

SmallKurtosis

Small sample kurtosis of the magnitudes.

$$SmallKurtosis = \frac{N (N+1)}{(N-1)(N-2)(N-3)} \sum_{i=1}^N (\frac{m_i-\hat{m}}{\sigma})^4 - \frac{3( N-1 )^2}{(N-2) (N-3)}$$

For a normal distribution, the small kurtosis should be zero:
>>> fs = feets.FeatureSpace(only=['SmallKurtosis'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'SmallKurtosis': 0.044451779515607193}
See http://www.xycoon.com/peakedness_small_sample_test_1.htm

References

Create plot routine for extractor ``feets.extractors.ext_median_abs_dev.MedianAbsDev``

Create plot routine for extractor MedianAbsDev.

Path: feets.extractors.ext_median_abs_dev.py

Features

MedianAbsDev

Extractor Documentation

MedianAbsDev

The median absolute deviation is defined as the median discrepancy of the data from the median data:

MedianAbsoluteDeviation = media**n(|mag − media**n(mag)|)

It should take a value close to 0.675 for a normal distribution:
>>> fs = feets.FeatureSpace(only=['MedianAbsDev'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'MedianAbsDev': 0.66332131466690614}
References

Create plot routine for extractor ``feets.extractors.ext_flux_percentile_ratio.FluxPercentileRatioMid35``

Create plot routine for extractor FluxPercentileRatioMid35.

Path: feets.extractors.ext_flux_percentile_ratio.py

Features

FluxPercentileRatioMid35

Extractor Documentation

In order to caracterize the sorted magnitudes distribution we use percentiles. If F_5, 95 is the difference between 95% and 5% magnitude values, we calculate the following:

flux_percentile_ratio_mid20: ratio F_40, 60/F_5, 95
flux_percentile_ratio_mid35: ratio F_32.5, 67.5/F_5, 95
flux_percentile_ratio_mid50: ratio F_25, 75/F_5, 95
flux_percentile_ratio_mid65: ratio F_17.5, 82.5/F_5, 95
flux_percentile_ratio_mid80: ratio F_10, 90/F_5, 95

For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:

$$\frac{erf^{-1}(2 \cdot 0.6-1)-erf^{-1}(2 \cdot 0.4-1)} {erf^{-1}(2 \cdot 0.95-1)-erf^{-1}(2 \cdot 0.05-1)}$$

So, the expected values for each of the flux percentile features are:

flux_percentile_ratio_mid20 = 0.154
flux_percentile_ratio_mid35 = 0.275
flux_percentile_ratio_mid50 = 0.410
flux_percentile_ratio_mid65 = 0.568
flux_percentile_ratio_mid80 = 0.779

References

Create plot routine for extractor ``feets.extractors.ext_mean.Mean``

Create plot routine for extractor Mean.

Path: feets.extractors.ext_mean.py

Features

Mean

Extractor Documentation

Mean

Mean magnitude. For a normal distribution it should take a value close to zero:
>>> fs = feets.FeatureSpace(only=['Mean'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'Mean': 0.0082611563419413246}
References

Create plot routine for extractor ``feets.extractors.ext_fourier_components.FourierComponents``

Create plot routine for extractor FourierComponents.

Path: feets.extractors.ext_fourier_components.py

Features

Freq2_harmonics
Freq3_harmonics
Freq1_harmonics

Extractor Documentation

Periodic features extracted from light-curves using Lomb-Scargle (Richards et al., 2011)

Here, we adopt a model where the time series of the photometric magnitudes of variable stars is modeled as a superposition of sines and cosines:

y_i(t|f_i)=a_isin(2π**f_it)+b_icos(2π**f_it)+b_i, ∘

where a and b are normalization constants for the sinusoids of frequency f_i and b_i, ∘ is the magnitude offset.

To find periodic variations in the data, we fit the equation above by minimizing the sum of squares, which we denote χ²:

$$\chi^2 = \sum_k \frac{(d_k - y_i(t_k))^2}{\sigma_k^2}$$

where σ_k is the measurement uncertainty in data point d_k. We allow the mean to float, leading to more robust period estimates in the case where the periodic phase is not uniformly sampled; in these cases, the model light curve has a non-zero mean. This can be important when searching for periods on the order of the data span T_tot. Now, define

$$\chi^2_{\circ} = \sum_k \frac{(d_k - \mu)^2}{\sigma_k^2}$$

where μ is the weighted mean

$$\mu = \frac{\sum_k d_k / \sigma_k^2}{\sum_k 1/\sigma_k^2}$$

Then, the generalized Lomb-Scargle periodogram is:

$$P_f(f) = \frac{(N-1)}{2} \frac{\chi_{\circ}^2 - \chi_m^2(f)} {\chi_{\circ}^2}$$

where χ_m²(f) is χ² minimized with respect to a, b and b_∘.

Following Debosscher et al. (2007), we fit each light curve with a linear term plus a harmonic sum of sinusoids:

$$y(t) = ct + \sum_{i=1}^{3}\sum_{j=1}^{4} y_i(t|jf_i)$$

where each of the three test frequencies f_i is allowed to have four harmonics at frequencies f_i, j = j**f_i. The three test frequencies f_i are found iteratively, by successfully finding and removing periodic signal producing a peak in P_f(f) , where P_f(f) is the Lomb-Scargle periodogram as defined above.

Given a peak in P_f(f), we whiten the data with respect to that frequency by fitting away a model containing that frequency as well as components with frequencies at 2, 3, and 4 times that fundamental frequency (harmonics). Then, we subtract that model from the data, update χ_∘², and recalculate P_f(f) to find more periodic components.

Algorithm:

For i = 1, 2, 3

Calculate Lomb-Scargle periodogram P_f(f) for light curve.

Find peak in P_f(f), subtract that model from data.

Update χ_∘², return to Step 1.

Then, the features extracted are given as an amplitude and a phase:

$$A_{i,j} = \sqrt{a_{i,j}^2 + b_{i,j}^2}\\\ \textrm{PH}_{i,j} = \arctan(\frac{b_{i,j}}{a_{i,j}})$$

where A_i, j is the amplitude of the j − t**h harmonic of the i − t**h frequency component and PH_i, j is the phase component, which we then correct to a relative phase with respect to the phase of the first component:

PH′_i, j = PH_i, j − PH₀₀

and remapped to | − π, +π|

References

Create plot routine for extractor ``feets.extractors.ext_dmdt.DeltamDeltat``

Create plot routine for extractor DeltamDeltat.

Path: feets.extractors.ext_dmdt.py

Features

DMDT

Extractor Documentation

DMDT (Delta Magnitude - Delta Time Mapping)

The 2D representations - called dmdt-images hereafter -reflect the underlying structure from variability of the source.

The dmdt-images are translation independent as they consider only the differences in time.

For each pair of points in a light curve we determine the difference in magnitude (dm) and the difference in time (dt). This gives us $p = n/2 = n ∗ (n − 1)/2$ points for a light curve of length $n$. . These points are then binned for a range of dm and dt values. The resulting binned 2D representation is our 2D mapping from the light curve.
>>> fs = feets.FeatureSpace(only=['DMDT'])
>>> rs = fs.extract(**lc_normal)
>>> rs.as_dict()
{'DMDT': array([[0, 0, 1, 1, ..., ]])},
References

Implement signature

The code exists here: https://github.com/carpyncho/feets/blob/master/experiments/ext_signature.py

Ideas:

Metaclass creator for use with this feature?
Posibility in feets to has variable numbers of features in extractor?

Create plot routine for extractor ``feets.extractors.ext_structure_functions.StructureFunctions``

Create plot routine for extractor StructureFunctions.

Path: feets.extractors.ext_structure_functions.py

Features

StructureFunction_index_21
StructureFunction_index_32
StructureFunction_index_31

Extractor Documentation

The structure function of rotation measures (RMs) contains information
on electron density and magnetic field fluctuations.

References

Create plot routine for extractor ``feets.extractors.ext_median_brp.MedianBRP``

Create plot routine for extractor MedianBRP.

Path: feets.extractors.ext_median_brp.py

Features

MedianBRP

Extractor Documentation

MedianBRP (Median buffer range percentage)

Fraction (<= 1) of photometric points within amplitude/10 of the median magnitude
>>> fs = feets.FeatureSpace(only=['MedianBRP'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'MedianBRP': 0.559}
References

Return the locations and FAPs of the top few peaks - LombScargle extractor

Regarding the Lomb-Scargle extractor... I think it might be useful to return the locations and FAPs of the top few peaks. Sometimes due to aliasing issues the single highest peak does not actually correspond with the true frequency (I discuss that a bit in this paper: https://arxiv.org/abs/1703.09824)

remove python 2 as supported language

Remove from tests
Remove six as dependency
support python +3.6

Create plot routine for extractor ``feets.extractors.ext_signature.Signature``

Create plot routine for extractor Signature.

Path: feets.extractors.ext_signature.py

Features

SignaturePhMag

Extractor Documentation

Create plot routine for extractor ``feets.extractors.ext_amplitude.Amplitude``

Create plot routine for extractor Amplitude.

Path: feets.extractors.ext_amplitude.py

Features

Amplitude

Extractor Documentation

Amplitude

The amplitude is defined as the half of the difference between the median of the maximum 5% and the median of the minimum 5% magnitudes. For a sequence of numbers from 0 to 1000 the amplitude should be equal to 475.5.

References

Upper limits

Hi,

I know this is a question that should be better defined, at least referring to a specific statistical test. However, I wonder if you have thought to a way to include upper limits in the analysed time series.

Thanks,
Stefano

flux_percentile_ratio Index off by one error for small time series

I am running a full feature extraction on a time series with 14 measurements (using something like):

import feets
fs = feets.FeatureSpace(data = ['time', 'magnitude', 'error'])
features, values = fs.extract(*ts)

and I receive the following error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-11-7060bccca8c7> in <module>
----> 1 feets_all_features(ts_lists[2116])

<ipython-input-9-c2ebdeac3e58> in feets_all_features(ts_pbs)
     11 
     12     for pb in range(0, 6):
---> 13         features, values = fs.extract(*ts_pbs[pb][0])
     14 
     15         features = [f + f"_{pb}" for f in features]

/epyc/projects/plasticc-ml/envs/PLASTICC/lib/python3.6/site-packages/feets/core.py in extract(self, time, magnitude, error, magnitude2, aligned_time, aligned_magnitude, aligned_magnitude2, aligned_error, aligned_error2)
    290         features = {}
    291         for fextractor in self._execution_plan:
--> 292             result = fextractor.extract(features=features, **kwargs)
    293             features.update(result)
    294 

/epyc/projects/plasticc-ml/envs/PLASTICC/lib/python3.6/site-packages/feets/extractors/core.py in extract(self, **kwargs)
    290             # setup & run te extractor
    291             self.setup()
--> 292             result = self.fit(**fit_kwargs)
    293 
    294             # validate if the extractors generates the expected features

/epyc/projects/plasticc-ml/envs/PLASTICC/lib/python3.6/site-packages/feets/extractors/ext_flux_percentile_ratio.py in fit(self, magnitude)
    203 
    204         F_10_90 = sorted_data[F_90_index] - sorted_data[F_10_index]
--> 205         F_5_95 = sorted_data[F_95_index] - sorted_data[F_5_index]
    206         F_mid80 = F_10_90 / F_5_95
    207 

IndexError: index 14 is out of bounds for axis 0 with size 14

This is due to the fact that ext_flux_percentile_ratio.py calculates the location of the flux percentiles as:

F_10_index = int(math.ceil(0.10 * lc_length))
F_90_index = int(math.ceil(0.90 * lc_length))
F_5_index = int(math.ceil(0.05 * lc_length))
F_95_index = int(math.ceil(0.95 * lc_length))

For my example, lc_length = 14 thus:

F_10_index = 2
F_90_index = 13
F_5_index = 1
F_95_index = 14

but 14 is not a valid index for sorted_data[F_95_index] since sorted_data is of length 14.

Note that this is also an issue in ext_flux_percentile_ratio.py.

I am fixing this right now but just decrementing F_95_index by 1 if it is the same as lc_length, but it would probably be better to do bounds checking on all F_X_index variables or even just redefining lc_length to be lc_length - 1 to make it an index and not a length might work.

Create plot routine for extractor ``feets.extractors.ext_pair_slope_trend.PairSlopeTrend``

Create plot routine for extractor PairSlopeTrend.

Path: feets.extractors.ext_pair_slope_trend.py

Features

PairSlopeTrend

Extractor Documentation

PairSlopeTrend

Considering the last 30 (time-sorted) measurements of source magnitude, the fraction of increasing first differences minus the fraction of decreasing first differences.
>>> fs = feets.FeatureSpace(only=['PairSlopeTrend'])
>>> features, values = fs.extract(**lc_normal)
>>> dict(zip(features, values))
{'PairSlopeTrend': -0.16666666666666666}
References

Create plot routine for extractor ``feets.extractors.ext_flux_percentile_ratio.FluxPercentileRatioMid80``

Create plot routine for extractor FluxPercentileRatioMid80.

Path: feets.extractors.ext_flux_percentile_ratio.py

Features

FluxPercentileRatioMid80

Extractor Documentation

In order to caracterize the sorted magnitudes distribution we use percentiles. If F_5, 95 is the difference between 95% and 5% magnitude values, we calculate the following:

flux_percentile_ratio_mid20: ratio F_40, 60/F_5, 95
flux_percentile_ratio_mid35: ratio F_32.5, 67.5/F_5, 95
flux_percentile_ratio_mid50: ratio F_25, 75/F_5, 95
flux_percentile_ratio_mid65: ratio F_17.5, 82.5/F_5, 95
flux_percentile_ratio_mid80: ratio F_10, 90/F_5, 95

For the first feature for example, in the case of a normal distribution, this is equivalente to calculate:

$$\frac{erf^{-1}(2 \cdot 0.6-1)-erf^{-1}(2 \cdot 0.4-1)} {erf^{-1}(2 \cdot 0.95-1)-erf^{-1}(2 \cdot 0.05-1)}$$

So, the expected values for each of the flux percentile features are:

flux_percentile_ratio_mid20 = 0.154
flux_percentile_ratio_mid35 = 0.275
flux_percentile_ratio_mid50 = 0.410
flux_percentile_ratio_mid65 = 0.568
flux_percentile_ratio_mid80 = 0.779