twitter / anomalydetection Goto Github PK

View Code? Open in Web Editor NEW

3.6K 3.6K 777.0 1.34 MB

Anomaly Detection with R

License: GNU General Public License v3.0

R 100.00%

anomalydetection's People

Contributors

Stargazers

Watchers

Forkers

zhshch gsee biswapanda starkmchen xjzhou srt32 martindale zackhui kakkanatt hazzledazzle epetrou aimran meteoritt adamatw nieksand owenvallis ahardjasa maria-pedroto yliuhb suensummit xuanhan863 luceracloud sdutheone pthairu profkumar priyankt68 sunjue-heavyrain sajain williamren jordansread qrtzz strogo cauyrd arachid3 ramnathv darrkj pfjob09 bojjaanil pbhalesain saurabh14m lesaffrea cerdman huanfachen doudououc parisa-taherian easyfmxu apsaltis renecnielsen auguronomics jhochenbaum darlwen christoschristofidis selcukakbas mobilipia saisrinivasa nugenius webvul kernelsvm jianf7 odp keyor hmenag1 snowwolph ssbk1234 arturochian adeshola sheltowt lstuder piersharding cozos ai4labs cocheok bobuva packetiq drperpo rmaestre zitagao ppope paolovaona cloudxtreme andradeandrey fiftymission shafcodes orientier7 zengqiang2006 cnzach viveksck lajulajay nandanpr konggas raaka1 fdoperezi achun2080 rrdalmeida abatanero sunliangms parthasen neilhong jsholmes ci84ro

anomalydetection's Issues

Can't install AnomalyDetection

On OS X 10.9.5
R 3.1.2

library(devtools)
devtools::install_github("twitter/AnomalyDetection")
Downloading github repo twitter/AnomalyDetection@master
Error in function (type, msg, asError = TRUE) :
SSL certificate problem: unable to get local issuer certificate

Why the software history was not kept?

Hi there,

I'm a researcher studying software evolution. As part of my current research, I'm studying the implications of open-sourcing a proprietary software, for instance, if the project succeed in attracting newcomers. AnomalyDetection was in my list. However, I observed that the software history of when the software was developed as a proprietary software was not kept after the transition to Github.

Knowing that software history is indispensable for developers (e.g., developers need to refer to history several times a day), I would like to ask AnomalyDetection developers the following four brief questions:

Why did you decide to not keep the software history?
Do the core developers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
Do the newcomers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
How does the lack of history impacted on software evolution? Does it placed any burden in understanding and evolving the software?

Thanks in advance for your collaboration,

Gustavo Pinto, PhD
http://www.gustavopinto.org

csv import

Hello

I might be asking a dumb question but I tried to use my brain and it failed ;-)
When trying to use the library I import data from a .csv (data_raw <- read.csv(file = file_name, header = FALSE, sep = "," ).
I get a table with two rows, one containing a timestamp and the second a value.

My problem is whenerver I try res = AnomalyDetectionTs(data_raw, max_anoms=0.02, direction='both', plot=TRUE
or res = AnomalyDetectionVer(data_raw[1], max_anoms=0.02, direction='both', plot=TRUE
I get Error in Summary.factor(1:339, na.rm = FALSE) : ‘max’ not meaningful for factors

Encounter Error when do anomaly detection on a constant series

hi, when I try anomaly detection on a constant series, there is an error. I know it's impossible to find out anomaly from that kind of data. I just think it's better to tell "there is no anomaly" than throw out error.

test <- rep(1,1000)
AnomalyDetectionVec(test, period=14, plot=T, direction='both')
Error in if (R > lam) num_anoms <- i :
missing value where TRUE/FALSE needed

Failed to install Anomaly Detection

OS X 10.9.2
R version 3.1.1
here is the error message:
devtools::install_github("twitter/AnomalyDetection")
Downloading github repo twitter/AnomalyDetection@master
Error in download(dest, src, auth) : client error: (403) Forbidden

seems other person also faces the same problem, any solution?

using anomalydetection for a time series data package but there is an error i am getting

PFB the dataset: weekly data for a metrics. Want to detect anomalies in this time series. The error I get is : Error in if (data_sigma == 0) break :
missing value where TRUE/FALSE needed
1 2013-01-01 59.94
2 2013-01-08 59.65
3 2013-01-15 61.56
4 2013-01-22 58.37
5 2013-01-29 58.07
6 2013-02-05 57.31
7 2013-02-12 58.53
8 2013-02-19 63.22
9 2013-02-26 60.21
10 2013-03-05 59.09
11 2013-03-12 57.19
12 2013-03-19 55.97
13 2013-03-26 59.96

Is there an optimal way to do point anomaly detection ?

Does anyone know how to optimally use this to check if a given data point is an anomaly or not ? Specifically, the use case is to use 1-3 month, 1minute aggregated dataset as an input and decide if the next 1 minute datapoint is an anomaly or not. I am also interested to see if anyone has adapted this to make it an online anomaly detection engine. Appreciate any pointers. What I am doing right now is to call AnomalyDetectionTs with only_last='hr' for each and every incoming datapoint and it tends to be pretty slow.

No more auto-print of results if "No anomalies detected."

Suppose I have a twelve-month-periodic vector vec, for which there are no anomalies. The following should not happen automatically, but currently it does.

anomaly_detection_obj <- AnomalyDetectionVec(vec, period = 12, threshold = "p95")
# [1] "No amomalies detected."

At the very least, whether to auto-print this message should be an argument in the function.

[Question] What's a good way to choose "longterm_period"?

This is a n00b question. Looking at the article (https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series) and code, it seems that AnomalyDetectionTs requires piecewise_median_period_weeks to be greater than or eq to 2 weeks.

If I have higher frequency data in a shorter timeline, I presume that I should use AnomalyDetectionVec and calibrate longterm_period manually?

What would be a good way to determine an optimal longterm_period and what are the drawbacks of choosing a shorter/longer period? Thanks in advance for your insights!

Error in data.frame

I am getting following error message:
Error in data.frame(timestamp = all_anoms[[1]], anoms = all_anoms[[2]], :
arguments imply differing number of rows: 1, 0

Data looks like this:
1 2014-12-28 00:00:00 46.25243
2 2014-12-28 01:00:00 43.16433
3 2014-12-28 02:00:00 40.06927
4 2014-12-28 03:00:00 39.27673
5 2014-12-28 04:00:00 40.28478
6 2014-12-28 05:00:00 47.17522
7 2014-12-28 06:00:00 56.34756
8 2014-12-28 07:00:00 66.45515

and method call is like this:
AnomalyDetectionTs(data, max_anoms=0.05, threshold = "None", direction='both', plot=FALSE, only_last = "day", e_value = TRUE)

Suggestion: Identify and Remove Linear Trend Along with Seasonal Component

The generalized ESD method normalizes deviation from the mean based on an estimate of the population variance. If the data has an uncompensated, appreciable linear trend this is equivalent to estimating the noise in the data to be much higher than true noise in the signal and many outlying data points will be removed.

This package uses stl from the R stats library to remove the seasonal component means, and identfies the trend in the data but it doesn't remove it before doing the ESD analysis. My suggestion is to just use the remainder column of data_decomp for ESD analysis (optionally subtracting the median).

From https://github.com/twitter/AnomalyDetection/blob/master/R/detect_anoms.R

# -- Step 1: Decompose data. This returns a univarite remainder which will be used for anomaly detection. Optionally, we might NOT decompose.
    data_decomp <- stl(ts(data[[2L]], frequency = num_obs_per_period),
                       s.window = "periodic", robust = TRUE)

    # Remove the seasonal component, and the median of the data to create the univariate remainder
    data <- data.frame(timestamp = data[[1L]], count = (data[[2L]]-data_decomp$time.series[,"seasonal"]-median(data[[2L]])))

Here is a trivial example of the kind of issue this can cause:
Run the example
AnomalyDetectionVec(raw_data[,2], max_anoms=0.02, period=1440, direction=’both’, plot=TRUE)

Add a linear trend and run again
new_data = raw_data + 0.01*(1:14398)
AnomalyDetectionVec(new_data[,2], max_anoms=0.02, period=1440, direction=’both’, plot=TRUE)

case when max_outliers = 0

If the user is running AnomalyDetectionTs() we can assume that they are looking for outliers. Therefore, could a warning be thrown if the user sets a percentage (max_anoms) that results in max_outliers being 0?

On a similar note, I think this degenerate case demonstrates that it is safer to iterate over seq_len(max_outliers) rather than 1:max_outliers.

Consider changing 'plot.new()' to 'NULL' in the output when 'plot = FALSE'

Line 287 - 291 of vec_anom_detection.R (and similarly in ts) is:

 if(plot){
    return (list(anoms = anoms, plot = xgraph))
  } else {
    return (list(anoms = anoms, plot = plot.new()))
  }

Consider changing the plot = plot.new() to plot = NULL or removing it altogether. When using in a non interactive environment (such as a Shiny app), this can run into issues with plot devices since 'plot.new()' interactively builds a plot.

The current workaround is to set plot = TRUE in the function call and then just never use the plot in your code but it would be cleaner to change this in the code directly.

Error message: Error in if (gran >= 86400) { : missing value where TRUE/FALSE needed

I'm trying to use AnomalyDetectionTs() exactly as described in the example, but with my own data.
When i execute this command:

res = AnomalyDetectionTs(my_data, max_anoms=0.02, direction='both', plot=TRUE)

i get the following error:

Error in if (gran >= 86400) { : missing value where TRUE/FALSE needed

This is how my data looks like:

str(my_data)
'data.frame': 3841 obs. of 2 variables:
$ INFO_DATE : POSIXct, format: "2015-01-11 00:01:21" "2015-01-11 00:20:55" ...
$ QUANTITY : int 5881 9565 11268 12376 12983 13454 13956 14409 15613 21024 ...

Do you have any idea how to solve this problem?

Issues using daily data with the "long_term" option

I'm not sure that this package was meant to be used on daily data, as Twitter seems to be using it for very granular minutely data. But anyways, here are the issues I've encountered

Data Set: Daily timestamp/count pairs for the past two years (so around 730 rows)

With "long_term=true" and daily data (therefore "gran=day" "period = 7"), AnomalyDetectionTs will split the dataset into two week periods of 14 rows for each day. (ts_anom_detection.R, lines 168-177)

This causes two issues:

detect_anoms is passed a dataset of 14 rows and num_obs_per_period of 7, which the causes the STL function to throw the error "stl : series is not periodic or has less than two periods"

stl(ts(data[[2]], frequency=num_obs_per_period), s.window="periodic", robust=TRUE)
(detect_anoms.R, line 33)

I think this happens for one of two reasons. One, the STL function needs to dataset to have 2*frequency + 1 observations, which is a given for minutely/hourly data in a two week period, but not for days (14 days in two weeks). Two, it could happen when the last two-week subset is less than two weeks. For example, 53 weeks of data with the long_term enabled will create 26 2-week intervals and 1 1-week interval - the last 1-week interval will throw "series is not periodic or has less than two periods" when passed into STL.
max_anoms on two-week intervals of daily data will always end up being 0 (0.02 * 14 days = 0), unless you have a very large max_anoms. Two week periods are probably too small for daily data.

Apologies if the expectation was to fix the issues and create a pull-request :), I'm not sure if the S-H-ESD is meant to be used on daily data.

-Arwin from Adroll

License for AnomalyDetection

Is there any chance of this going under another open source license, say MIT?

AnomalyDetection Installation Failed

Hey, i've had trouble installing AnomalyDetection, the code that it returns is below. R Version 3.1.2, Windows Version is Windows 7 Service Pack 1.

Code is below

devtools::install_github('twitter/AnomalyDetection')
Downloading github repo twitter/AnomalyDetection@master
Installing AnomalyDetection
"C:/R/R-3.1.2/bin/x64/R" --vanilla CMD INSTALL "C:\Users\Colin Glaes\AppData\Local\Temp\RtmpWsGLMv\devtools2464d8b3899\twitter-AnomalyDetection-4eb1baf"
--library="C:/R/R-3.1.2/library" --install-tests

installing source package 'AnomalyDetection' ...
** R
** data
*** moving datasets to lazyload DB
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
ARGUMENT 'Glaes\AppData\Local\Temp\Rtmpsnxpqp\Rin363c5c64245' ignored

Error: object 'ÿþ' not found
Execution halted
*** arch - x64
ARGUMENT 'Glaes\AppData\Local\Temp\Rtmpsnxpqp\Rin363c574477c2' ignored

Error: object 'ÿþ' not found
Execution halted
ERROR: loading failed for 'i386', 'x64'

removing 'C:/R/R-3.1.2/library/AnomalyDetection'
Error: Command failed (1)

Issue - period length for time series decomposition

Hello team,

I started exploring this package and I am struck.
I have a data.frame which contains some parameter values captured every 15 minutes , hence 96 records for one day. I have data for 27 days.

I get the below error when I try to run:

> names(a)
[1] "DTime"          "Paramter"


> unique(as.Date(a$DTime))
 [1] "2016-06-27" "2016-06-28" "2016-06-29" "2016-06-30" "2016-06-09"
 [6] "2016-06-10" "2016-06-11" "2016-06-12" "2016-06-13" "2016-06-14"
[11] "2016-06-15" "2016-06-16" "2016-06-17" "2016-06-18" "2016-06-19"
[16] "2016-06-20" "2016-06-21" "2016-06-22" "2016-06-23" "2016-06-24"
[21] "2016-06-25" "2016-06-26" "2016-07-01" "2016-07-02" "2016-07-03"
[26] "2016-07-04" "2016-06-08"

> head(a)
                DTime Paramter
1 2016-06-27 00:00:00          13.03
2 2016-06-27 00:15:00           1.58
3 2016-06-27 00:30:00           1.39
4 2016-06-27 00:45:00           1.61
5 2016-06-27 01:00:00           6.99
6 2016-06-27 01:15:00           1.71

> AnomalyDetectionTs(a,   max_anoms = 0.01)
Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period,  :
  must supply period length for time series decomposition

I tried longterm = T but didnt help. Please let me know how to solve this.

Error Messages from AnomalyDetection: R_idx <- data[[1]][temp_max_id] and if(R > lam)

Hey guys,

I've ran some numbers through the library and run into a few issues.

When i was running a large amount of datasets through the package I continued being presented the error message below referencing line 89 of "detect_anoms.R"

Error in R_idx[i] <- data[[1]][temp_max_idx] : replacement has length zero

I noticed that the datasets which tripped the error seemed to have near constant time-series datasets with a low number of unique values (i.e. constant 0's for 1 month constant 1's for 2 months). so I set a minimum unique value for the dataset to get around it (I started at one and went up to nine). This allowed more datasets to get through but I eventually ran into the below error message referencing line 101 when i set the minimum unique value at 9.

Error in if (R > lam) num_anoms <- i :
missing value where TRUE/FALSE needed

I successfully ran all my datasets after setting the minimum unique value at 10, however i would like to know whether or not it is possible to run the package without this unique value threshold.

Thanks!

AnomalyDetectionVec Vector needs to be periodic?

I got this error when running AnomalyDetectionVec():

Error in stl(ts(data[[2L]], frequency = num_obs_per_period), s.window = "periodic",  : 
              series is not periodic or has less than two periods

which is triggered from running this stl(ts(c(2,4,5,5,4,3)), s.window = "periodic")

It looks like this vector needs to be periodic? Any workaround for this? Let me know and I might be able to help you make changes in the code.

Thanks,
Yuan

period problem with AnomalyDetectionTs

Hi everybody,

After successfully running the example, I created an own data set, which has the same format like raw_data, I create an myData, which has the same structure as the raw_data. But there still two places are a little different

It constains missing value in the second column (raw_data has no missing value)
The timestamp is just for one day, the time interval is every 15 seconds. (raw_data has 5 day history and the time interval is every minute)

It looks like:
1 1970-01-01 01:00:55 NA
2 1970-01-01 01:00:10 NA
3 1970-01-01 01:00:25 2.871
4 1970-01-01 01:00:40 2.654
5 1970-01-01 01:00:55 3.060
6 1970-01-01 01:00:10 9.074

after I run the same command like the example:

res = AnomalyDetectionTs(myData, max_anoms=0.02, direction='both', plot=TRUE)

I got the error message:

Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period, : must supply period length for time series decomposition

How can I fix this problem?

If I don't know the period, can I still find the anomalies?

Thanks very much for the great work!

Best Regards

Conny

Anom detection needs at least 2 periods worth of data

str(bar)
'data.frame': 506 obs. of 2 variables:
$ timestamp: POSIXct, format: "2014-08-25 00:00:00" "2014-08-25 00:10:00" ...
$ count : num 40465895 54157589 34727655 38576160 36686470 ...

res = AnomalyDetectionTs(bar, direction='both', max_anoms=0.02, plot=TRUE)
Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period, :
Anom detection needs at least 2 periods worth of data

What's the definition of period here? The data contains a time series for about 4 days with granularity of 10 minutes.

Posting the data frame "bar" here
https://www.dropbox.com/s/1j263k6srq18qpp/bar.Rda?dl=0

Cannot detect anomaly with custom dataset

I keep on getting below error when trying to detect anomaly with a custom data set which contains only a list of <timestamp, integer> list
res = AnomalyDetectionTs(data, max_anoms=0.1, direction='both', plot=TRUE) Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period, : must supply period length for time series deomosition

below is the first few lines of my data set

                     date   size
1     2014-11-09 03:39:31  19512
2     2014-11-09 03:42:20   5308
3     2014-11-09 03:46:14      0
4     2014-11-09 03:46:15   5270
5     2014-11-09 03:50:19    822
6     2014-11-09 03:52:58   5319
7     2014-11-09 03:53:23   5379
8     2014-11-09 03:53:23    266
9     2014-11-09 03:53:23     21
10    2014-11-09 03:53:23   7199
11    2014-11-09 03:53:23  15414
12    2014-11-09 03:53:23  95786
13    2014-11-09 03:53:24  12417
14    2014-11-09 03:53:26  29156
15    2014-11-09 03:53:27    462
16    2014-11-09 04:00:28      0
17    2014-11-09 04:00:29   5270
18    2014-11-09 04:01:54  51491
19    2014-11-09 04:02:05   5326
20    2014-11-09 04:06:10  47288

Error when running AnomalyDetection with Rscript

Hello,

when running AnomalyDetection inside R (gui or interactive terminal) I have no errors, but when running with Rscript I've got the following error:

Error in .setupMethodsTables(fdef, initialize = TRUE) :
  trying to get slot "group" from an object of a basic class ("NULL") with no slots
Calls: AnomalyDetectionTs ... getMethodsForDispatch -> .getMethodsTable -> .setupMethodsTables
Execution halted

Thus, to fix this I had to include library(methods) in my script.

Although it is running ok with this, it is generating a Rplots.pdf file after each iteration, which may indicate the cause for the above error.

Short time serie error

Hello there,

I'm trying to apply the anomaly detection function to my time serie. Which is composed by only 60 observations, and it has 2 periods.

This is how I set up in the first place:

res <- AnomalyDetectionVec(data$value, max_anoms=0.4, period=29, 
                                           direction='neg', only_last=FALSE, 
                                           plot=TRUE)

# This is the output
$anoms
data frame with 0 columns and 0 rows
$plot
NULL

I don't know how, but I was able to run it in the first time. But now, I can't even get the graph, all the output variables are NULL.
I already checked if the object's class was compatible, and it matches the same the class as the dataset used in the example ("raw_data").

This is the data:
vector_data.txt

Diego Della Justina, PhD

detect_anoms erroneously reports at least one anomaly, regardless of data

I think I've found a bug in detect_anoms.

Before the main loop, num_anoms is initialized to 0.

At the end of each iteration, you update num_anoms if R is greater than lambda.

Then after the loop, you return R_idx[1L:num_anoms].

So if no elements made R exceed lambda, the return value works out to R_idx[1L:0L]. But this range subscript gives you the first element, not an empty vector:

> foo = c(4,5,6,7)
> foo[1:0]
[1] 4
>

So won't it always report the most extreme value as an outlier, no matter what data you give it? (Of course the user won't see this if they've set a threshold in AnomalyDetection, but they might not do that...)

Error while running from Rscript

AnomalyDetectionTs works file on the R gui however if I run it as a script I am getting the following error

Error in initFields(scales = scales) :
could not find function "initRefFields"
Calls: AnomalyDetectionTs ... initialize -> initialize -> -> initFields
Execution halted

The same thing in the script worked fine in the GUI.

Trivial anomalies are NOT detected

x = 1:5000
x[4900:4910] = 3000
AnomalyDetectionVec(x, period=1440, direction = 'both', e_value = T, plot = T)

I get the following disappointing result:
$anoms
data frame with 0 columns and 0 rows

Removing leading NA's and subtracting the median

Hi guys,

I came across the package which looks great. I have the following 2 questions on the code in 'detect_anoms.R':

In line 51, any leading NA's are replaced by 1. Shouldn't it be 0 (zero)?
In line 37, the median is subtracted from the data. In lines 72-80, the median is subtracted again. Is this correct?

I don't know the details of 'S-H-ESD' algorithm, so excuse me if I'm wrong!

Thanks!

Documentation to describe all dependencies needed to install the R package

I am on a private network disconnected from the internet and I would like to install the R AnomalyDetection packages. Installing local on my laptop from the internet seems to pull in a bunch of other packages. It would be really really useful if there was documentation on the exact packages I would need to transfer in order to install.

I'm also new to R so maybe it's possible there's some equivalent in 'install.packages()' similar to maven's "copy-dependencies" where I can put everything in a folder and tar it up.

Using AnomalyDetection in parallel or in any forked environment fails

Using AnomalyDetection in parallel across a data.frame currently fails with the following error:

Error in (function (display = \"\", width, height, pointsize, gamma, bg,  :  
    a forked child should not open a graphics device

Here is a trivial example to reproduce the problem:

library(parallel)
library(AnomalyDetection)
mclapply(as.data.frame(ts.union(BJsales, BJsales.lead)), AnomalyDetectionVec, period = 5)

Which produces the above errors.

Setting e_value=T causes "differing number of rows" error

Hi, great package.

However, when trying to extract the expected values from my dataset, I get this error:

## a_data holds daily count observations
> str(a_data)
'data.frame':   30 obs. of  2 variables:
 $ date  : POSIXct, format: "2013-01-15 01:00:00" "2013-01-16 01:00:00" "2013-01-17 01:00:00" ...
 $ metric: num  192 123 196 193 172 195 123 158 103 115 ...

## works
> AnomalyDetectionTs(a_data, max_anoms=0.02, direction='both')
$anoms
   timestamp anoms
1 2013-01-20   195

$plot
NULL

## error
> AnomalyDetectionTs(a_data, max_anoms=0.02, direction='both', e_value = T)
Error in data.frame(timestamp = all_anoms[[1]], anoms = all_anoms[[2]],  : 
  arguments imply differing number of rows: 1, 0

The same command works fine with the demo raw_data in the package

> AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', e_value=T)
$anoms
              timestamp    anoms expected_value
1   1980-09-25 16:05:00  21.3510            129
2   1980-09-29 06:40:00 193.1036             97
3   1980-09-29 21:44:00 148.1740             96
...

> str(raw_data)
'data.frame':   14398 obs. of  2 variables:
 $ timestamp: POSIXlt, format: "1980-09-25 14:01:00" "1980-09-25 14:02:00" "1980-09-25 14:03:00" ...
 $ count    : num  182 176 184 178 165 ...

Here is a copy of my data used above (limited to 30 rows). The original data is 900 observations.

                  date (none)
1  2013-01-15 01:00:00    192
2  2013-01-16 01:00:00    123
3  2013-01-17 01:00:00    196
4  2013-01-18 01:00:00    193
5  2013-01-19 01:00:00    172
6  2013-01-20 01:00:00    195
7  2013-01-21 01:00:00    123
8  2013-01-22 01:00:00    158
9  2013-01-23 01:00:00    103
10 2013-01-24 01:00:00    115
11 2013-01-25 01:00:00    138
12 2013-01-26 01:00:00     95
13 2013-01-27 01:00:00    121
14 2013-01-28 01:00:00    143
15 2013-01-29 01:00:00    118
16 2013-01-30 01:00:00    110
17 2013-01-31 01:00:00    107
18 2013-02-01 01:00:00    120
19 2013-02-02 01:00:00     91
20 2013-02-03 01:00:00     93
21 2013-02-04 01:00:00    149
22 2013-02-05 01:00:00    112
23 2013-02-06 01:00:00    109
24 2013-02-07 01:00:00    109
25 2013-02-08 01:00:00     90
26 2013-02-09 01:00:00     74
27 2013-02-10 01:00:00     85
28 2013-02-11 01:00:00    113
29 2013-02-12 01:00:00    107
30 2013-02-13 01:00:00    110

data.frame Column Error

I created a data.frame called foo and attempted to format it exactly like raw_data, but when I set res, I get an error.

My data.frame:

head(foo)
timestamp count
1 2015-05-11 13:54:00 42748.0
2 2015-05-11 13:55:00 44152.0
3 2015-05-11 13:56:00 43642.0
4 2015-05-11 13:57:00 42544.0
5 2015-05-11 13:58:00 41627.0
6 2015-05-11 13:59:00 42138.0

Setting res, getting an error:

res = AnomalyDetectionTs(foo, max_anoms=0.02, direction='both', plot=TRUE)
Error in AnomalyDetectionTs(foo, max_anoms = 0.02, direction = "both", :
data must be a 2 column data.frame, with the first column being a set of timestamps, and the second coloumn being numeric values.

raw_data looks quite like foo:

head(raw_data)
timestamp count
1 1980-09-25 14:01:00 182.478
2 1980-09-25 14:02:00 176.231
3 1980-09-25 14:03:00 183.917
4 1980-09-25 14:04:00 177.798
5 1980-09-25 14:05:00 165.469
6 1980-09-25 14:06:00 181.878

Any idea what I'm doing wrong?

Thanks,

Steve

Possible bug in alpha parameter input check?

Seems like a bug:
https://github.com/twitter/AnomalyDetection/blob/master/R/ts_anom_detection.R#L110

if(!(0.01 <= alpha || alpha <= 0.1))
should be:
if(!(0.01 <= alpha && alpha <= 0.1))

Are parameters "use_decomp" and "use_esd" meanningless for function "detect_anoms"?

The parameters use_decomp and use_esd are not used in the implementation of function detect_anoms, I don't know why do you keep them. Can anyone explain for me?

Is this repo dead?

Plot is empty if any point is <= 0 and y_log is used

To reproduce:

data(raw_data)
raw_data[4000, "count"] <- 0
AnomalyDetectionTs(raw_data, y_log = F, plot = T)
AnomalyDetectionTs(raw_data, y_log = T, plot = T)

This seems to be a problem with coord_trans, as transforming the data to be plotted (after anomaly detection) and skipping add_formatted_y (it can't handle the presence of Na/NaN/Inf) will show all points with the undefined value set to 0.

Alternatively, if the transformation is only for ease of visualization,

AnomalyDetectionTs(raw_data, y_log = F, plot = T)$plot + scale_y_log10()

will transform the axis without changing the value of the data, which might be preferable. It will overwrite add_formatted_y, though.

I haven't submitted a fix since I'm not sure if this is desired behaviour or not, but I'd be happy to if it would be useful. And thank you for the awesome package!

Definition of period in AnomalyDetectionVec

Hi,

I am confused with the 'period' perimeter in in function AnomalyDetectionVec. I have minute level data for a day which is 1440 data record in total. I want to use AnomalyDetectionVec to find anomalies for the dataset. I am wondering should I set period= 24 or period = 60? Can someone explain more in detail on how the period perimeter work in AnomalyDetectionVec.

Thank you
Jim

Tag first release

consistent output for AnomalyDetectionTs()

If there are no anomalies detected then list(anoms = NULL, plot = NULL) is returned. Does this special case need to be made? Instead could a data frame (with 0 rows) and a plot (if plot = TRUE) still be returned for consistency?

What's a good way to choose longterm_period

Failed to install the AnomalyDetection

OS:windows
R version: verison 3.2.2

When i run "devtools::install_github("twitter/AnomalyDetection")

following error is reported:
"Downloading GitHub repo twitter/AnomalyDetection@master
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached
"
what's happened?

thanks
ndoors.

AnomalyDetectionTs drops timezone from POSIXct objects and converts POSIXlt to POSIXct

AnomalyDetectionTs should keep the timestamp column of the output dataset as-is, rather than converting to POSIXct and dropping the timezone attribute:

library(AnomalyDetection)
data(data_raw)
data <- raw_data
data$timestamp <- as.POSIXct(data$timestamp)
attr(data$timestamp, "tzone") 
attr(data$timestamp, "tzone") <- "America/New_York"

res = AnomalyDetectionTs(data, max_anoms=0.002, direction='both', plot=FALSE)
attr(res$anoms, 'tzone')

Dropping the timezone is problematic if you wish to merge the anomalies back to the main dataset:

> merge(data, res$anoms, by='timestamp')
             timestamp   count    anoms
1  1980-09-28 22:40:00 114.308 193.1036
2  1980-09-30 12:26:00 130.222 180.8990
3  1980-09-30 12:30:00 126.721 178.8220
4  1980-09-30 12:31:00 152.956 198.3260
5  1980-09-30 12:32:00 136.004 203.9010
6  1980-09-30 12:33:00 134.589 200.3090
7  1980-09-30 12:34:00 122.490 178.4910
8  1980-09-30 12:36:00 126.806 183.0180
9  1980-09-30 12:38:00 117.334 186.8230
10 1980-09-30 12:39:00 121.061 183.6600
11 1980-09-30 12:40:00 116.924 179.2760
12 1980-09-30 12:41:00 129.097 197.2830
13 1980-09-30 12:42:00 119.566 191.0970
14 1980-09-30 12:43:00 137.694 194.6700
15 1980-09-30 12:46:00 136.876 200.8160
16 1980-09-30 12:47:00 125.126 186.2350
17 1980-09-30 12:48:00 122.008 185.4210
18 1980-09-30 12:49:00 127.935 178.9580
19 1980-09-30 12:51:00 138.159 203.2310
20 1980-09-30 12:52:00 130.939 181.3540
21 1980-09-30 12:53:00 122.351 186.7780
22 1980-09-30 12:55:00 121.120 176.1250
23 1980-09-30 12:56:00 122.707 181.5140
24 1980-09-30 12:57:00 118.378 175.2610
25 1980-10-05 05:18:00 101.332  40.0000
26 1980-10-05 05:28:00 103.798 250.0000
27 1980-10-05 05:38:00 100.839  40.0000

Period not set if granularity is "sec"

Hi there,

I'm trying to run an analysis for a time series that has a granularity of 20 seconds. Unfortunatley, there is a bug in the code...

In AnomalyDetectionTs, I run into the following problem:
I have tested the get_gran function, and it does return "sec". Later on the data is aggregated to minutes, but unfirtunately, the variable gran is never set to "min". When period is defined, the switch statement does not check for "sec" - which is still the value of gran - and thus remains null. This makes the call of detect_anoms crash with the error message "must supply period length for time series decomposition".

  # Aggregate data to minutely if secondly
  if(gran == "sec"){
    x <- format_timestamp(aggregate(x[2], format(x[1], "%Y-%m-%d %H:%M:00"), eval(parse(text="sum"))))
  }

  period = switch(gran,
                  min = 1440,
                  hr = 24,
                  # if the data is daily, then we need to bump the period to weekly to get multiple examples
                  day = 7)
  num_obs <- length(x[[2]])

  if(max_anoms < 1/num_obs){
    max_anoms <- 1/num_obs
  }

I'm pretty sure that setting gran <- "min" in the if(gran=="sec") block would fix the problem.

Error: "data must be a single data frame, list, or vector that holds numeric values" When data is a dataframe in correct format...

Hey everyone.

I am trying to run my dataframe through the AnomalyDetectorVec(). My dataframe is a small one, for now, and I believe it is in the correct format.

Here is the dataframe:

> str(es_out)

'data.frame':   500 obs. of  2 variables:
 $ timestamp_list: POSIXct, format: "2015-07-23 04:10:56" "2015-07-23 04:10:51" "2015-07-23 04:11:11" ...
 $ in_bytes_list : int  3893 3893 2335 2319 3893 125 71 71 52 657 ...

When I try to run it through AnomalyDetectorVec(), I get an error:

> AnomalyDetectionVec(es_out, period=500, plot=TRUE, verbose=TRUE)

Error in AnomalyDetectionVec(es_out, period = 500, plot = TRUE, verbose = TRUE) : 
  data must be a single data frame, list, or vector that holds numeric values.

What is going wrong here? I cannot seem to figure it out...

Here is a dput() of my dataset and my dataframe conversion funciton in a pastebin, for cleanliness.

Dataset dput(): http://pastebin.com/WkY7pvwt
Dataframe conversion function: http://pastebin.com/EsAcVNbV

Any help would be greatly appreciated. As far as I can tell, my dataframe is in the correct format, but I guess it actually isn't.

Thanks!

Error in if (data_sigma == 0) break :

Below is the sample: weekly data for a metrics. Want to detect anomalies in this time series. The error I encounter while running the code is : Error in if (data_sigma == 0) break :
missing value where TRUE/FALSE needed
1 2013-01-01 59.94
2 2013-01-08 59.65
3 2013-01-15 61.56
4 2013-01-22 58.37
5 2013-01-29 58.07
6 2013-02-05 57.31
7 2013-02-12 58.53
8 2013-02-19 63.22
9 2013-02-26 60.21
10 2013-03-05 59.09
11 2013-03-12 57.19
12 2013-03-19 55.97
13 2013-03-26 59.96

Error when granularity is daily and only_last is null

There's seems to be a bug for data with daily granularity. If gran == day, AnomalyDetectionTs does a check:

if(only_last == 'hr')

However, only_last can also be null. If it is, this check generates an error which keeps AnomalyDetectionTs from finishing:

Error in if (only_last == "hr") { : argument is of length zero

I submitted a pull request which fixes the problem by checking only_last for null first.

Also, thanks for a super package.

Failed to Install AnomalyDetection

I am on windows 8. in R 3.1.1 x64 I get the following error:

Warning in library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) :
no library trees found in 'lib.loc'
Error: loading failed
Execution halted
*** arch - x64
Warning in library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) :
no library trees found in 'lib.loc'
Error: loading failed
Execution halted
ERROR: loading failed for 'i386', 'x64'

I tried installing R in cygwin bash and got the following error:

Downloading github repo twitter/AnomalyDetection@master
Installing AnomalyDetection
Error in parse_deps(paste(deps, collapse = ",")) :
Invalid comparison operator in dependency:

(the dependency is left blank)

High frequency data sets anomaly detection

I am trying to perform an anomaly detection on a data set with very high frequency (more than 5/10 row per seconds) and the timestamps are not consecutive (sometimes there is no row for as second

Exemple : 09:23:59 2014-12-19 09:23:59 2014-12-19 09:24:00 2014-12-19 09:24:00 2014-12-19 09:24:02 2014-12-19 09:24:02 2014-12-19 09:24:02

I understand that I should use AnomalyDetectionTs to perform the detection on this type of set.

But my set has 50K rows but the function cannot compute the detection and crashes. Maybe it is also due to the fact that the timeseries are not spaced with a fixe time (sometimes 1sec, or 0 or even 2 secs)?

What are your recommendations to work with this type of dataset ?

Thanks,

Flow

License Clarification

Hello,

The DESCRIPTION file licenses this project under Apache 2.0:
https://github.com/twitter/AnomalyDetection/blob/master/DESCRIPTION#L15

However, the LICENSE file indicates that Twitter, Inc. licenses this project under GPLv3:
https://github.com/twitter/AnomalyDetection/blob/master/LICENSE

The README file specifies that Twitter, Inc. and an unspecified list of "other contributors" license this project under GPLv3 as well:
https://github.com/twitter/AnomalyDetection#copyright-and-license

Can you please clarify the license under which this project is released and the copyright owner(s)? Thanks!

twitter / anomalydetection Goto Github PK

anomalydetection's People

Contributors

Stargazers

Watchers

Forkers

anomalydetection's Issues

Recommend Projects

Recommend Topics

Recommend Org