Giter Site home page Giter Site logo

earthgecko / skyline Goto Github PK

View Code? Open in Web Editor NEW
534.0 28.0 63.0 27.45 MB

Anomaly detection

Home Page: http://earthgecko-skyline.readthedocs.io/en/latest/

License: Other

Shell 3.48% Python 90.47% JavaScript 0.55% CSS 0.17% HTML 5.32% Dockerfile 0.01%
anomaly detection anomaly-detection python timeseries timeseries-analysis graphite influxdb prometheus telegraf

skyline's Introduction

Skyline

Skyline is a real time anomaly detection, time series analysis and performance monitoring system, built to enable passive monitoring on metrics, without the need to configure a model/thresholds for each one. It is designed to be used wherever there are a large quantity of high-resolution time series which need constant monitoring. Once a metrics stream is set up additional metrics are automatically added to Skyline for analysis. Skyline's algorithms attempt to automatically detect what it means for each metric to be anomalous. Once set up and running, Skyline allows the user to train it what is not anomalous on a per metric basis.

Documentation

Skyline documentation is available online at http://earthgecko-skyline.readthedocs.io/en/latest/

The documentation for your version is also viewable in a clone locally in your browser at file://<PATH_TO_YOUR_CLONE>/docs/_build/html/index.html and via the the Skyline Webapp frontend via the docs tab.

Free Managed Service

Anomify is cutting edge version Skyline, built and managed by the team behind Skyline. With a brand new dashboard, full spec API, and intuitive UI, it will help you and your organisation unlock the full power of Skyline and more. Currently, we’re offering it as a free service for Skyline users. Find out more at https://anomify.ai/skyline

Other

https://gitter.im/earthgecko-skyline/Lobby

skyline's People

Contributors

asapien avatar bflad avatar blak3r avatar cbowns avatar chr4 avatar dependabot[bot] avatar dmitrychait avatar draco2003 avatar dvirm avatar earthgecko avatar gescheit avatar jonlives avatar lihengthu avatar mabrek avatar maxgabriel avatar oxtopus avatar shz avatar ssgelm avatar vbichov avatar vinicius0026 avatar wfloutier avatar xiongchiamiov avatar zerobfd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skyline's Issues

seed_data.py testing with UDP does not work

eed_data.py testing with UDP does not work if the HORIZON_IP is not bound to 0.0.0.0

The logic of using a socket UDP test is flawed as it will always pass:

    test_sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    try:
        test_sock.sendto(packet, (socket.gethostname(), settings.UDP_PORT))
        horizon_params_ok = True

ANALYZER_ENABLED False - rename Redis keys error

This is an edge case issue.

With the introduction of the ANALYZER_ENABLED an error is reported where ANALYZER_ENABLED is set to False where the renaming of the Redis keys new_derivative_metrics and new_non_derivative_metrics reports an error. This error is technically correct however in the given configuration expected as these sets are not created.

seed_data.py SKYLINE_URL

Has a missing settings. prefix in v1.2.2

Traceback (most recent call last):
   File "seed_data.py", line 139, in <module>
     seed()
   File "seed_data.py", line 129, in seed
     print ('info :: at %s' % str(SKYLINE_URL))
 NameError: global name 'SKYLINE_URL' is not defined

Webapp js - first element in the list does not load timeseries data

On the initial Panorama page load the Dygraph is
not populated with data because the data fetch is triggered by a mouseover on an
element. I believe the initial load "takes" th first elements place which
results in a mouseover on the first element not to not trigger the data fetch.

I stopped seeing this as a bug a long time ago as your brain quickly learns to
autocorrect this by first selecting any other metric in the list, which will
load the data and then selecting the first element in the list, which then does
load the data and the element is once again active.

Handle z_score agent.py RuntimeWarning

This is not a bug per se in algorithms, however on starting the service and the
tests that analyzer/agent.py runs to test the algorithms can now result in the
user being issued a RuntimeWarning, which always makes people think there is a
problem. It is not it is on a matter of the same expection handling not been
applied in the agent.py algorithm tests.

Seeing as this as a modification to one of the core original algorithms, this is
change is being documented.

[root@skyline-dev ~] /etc/init.d/analyzer start
Testing algorithms
/opt/skyline/github/skyline/skyline/analyzer/algorithms.py:139: RuntimeWarning: invalid value encountered in double_scalars
  z_score = (tail_average - mean) / stdDev
Testing algorithms done
analyzer started with pid 17303
                                                           [  OK  ]
[root@skyline-dev ~]

Reproducible when executed in the same context as agent.py

>>> import numpy as np
>>> import scipy
>>> import statsmodels.api as sm
>>>
>>> timeseries = [[1479741232.0, 1], [1479741233.0, 1], [1479741234.0, 1], [1479741235.0, 1], [1479741236.0, 1], [1479741237.0, 1], [1479741238.0, 1], [1479741239.0, 1]]
>>>
>>> try:
...     t = (timeseries[-1][1] + timeseries[-2][1] + timeseries[-3][1]) / 3
...     print('pass')
...     print(str(t))
... except IndexError:
...     print('fail')
...     t = timeseries[-1][1]
...
pass
1
>>> series = scipy.array([x[1] for x in timeseries])
>>> stdDev = scipy.std(series)
>>> print(stdDev)
0.0
>>>
>>> mean = np.mean(series)
>>> print(mean)
1.0
>>>
>>> tail_average = t
>>> print(tail_average)
1
>>>
>>> z_score = (tail_average - mean) / stdDev
__main__:1: RuntimeWarning: invalid value encountered in double_scalars
>>> print(z_score)
nan
>>>
>>> len_series = len(series)
>>> print(len_series)
8
>>>
>>> threshold = scipy.stats.t.isf(.05 / (2 * len_series), len_series - 2)
>>> print(threshold)
4.1151701176
>>>
>>>
>>>
>>> threshold_squared = threshold * threshold
>>> print(threshold_squared)
16.9346250968
>>>
>>> grubbs_score = ((len_series - 1) / np.sqrt(len_series)) * np.sqrt(threshold_squared / (len_series - 2 + threshold_squared))
>>> print(grubbs_score)
2.1266450872
>>>
>>> if z_score > grubbs_score:
...     print('True')
... else:
...     print('False')
...
False
>>>

Panorama dygraph not aligning correctly

Panorama dygraph not aligning correctly which is caused by the Panorama
timeshift and the user time zone.

The timeshift was not proportional in terms of multiple full_duration
possibilities. It was using / 24, a hard code in calculation of the Graphite
data retrieval parameters from the old skyline.js that was used to pattern
panorama.js with.

Time zone - the dygraphs graphs are compiled in the users tz, not the timeseries
tz. And this has always been a problem, just near really seen as the original
Skyline UI was only probably looking at last 24 hours of Redia data so never
really a problem as no timeshifting and if your Skyline server was in a
different timezone from your browser this would have applied.

Add additional exception handling to Analyzer

There has been a new report of Analyzer stalling and nothing in the log.

All the extension and modifications of Analyzer in previous changes did not add except methods to anything that did not have one. All new additions to Skyline have used much more except methods.

except methods to be added to Analyzer wherever required in a manner suitable to Analyzer in terms of not vomiting into the log, but doing things more gracefully.

alert on stale metrics

Add the functionality for Skyline to send a digest alert on metrics that stop sending data, this digest alert will only be sent once for a metric once between ALERT_ON_STALE_PERIOD and STALE_PERIOD and a key set with the expiry of STALE_PERIOD if the metric has been alerted on. After the STALE_PERIOD has passed, the alert evaluation will no longer be done, ensuring that these alerts are not noisy.

Panorama check file fails

This has always been an occasional error and not successfully debugged as of yet. It is related to the occasional errors being reported relating the Panorama not being able to determine the from_timestamp variable from the metric check file. Mirage does not suffer from this bug relating to load_metric_vars strangely.

 error :: failed to read from_timestamp variable from check file

It is intermittent, but a bug and needs to be debugged more. In progress

[root@skyline-dev-3 ~] cat /var/log/skyline/panorama.log | grep -c "error :: failed to read"
25
[root@skyline-dev-3 ~] cat /var/log/skyline/panorama.log | grep -c "loading metric variables from import"
840
[root@zpf-skyline-dev-3 ~]

Traceback in develop

2016-08-22 16:42:05 :: 7874 :: Traceback (most recent call last):
  File "/opt/skyline/github/skyline/skyline/panorama/panorama.py", line 297, in spin_process
    metric_vars.from_timestamp
AttributeError: 'module' object has no attribute 'from_timestamp'

2016-08-22 16:42:05 :: 7874 :: error :: failed to read from_timestamp variable from check file - /opt/skyline/panaroma/check/1471884121.stats.statsd.graphiteStats.flush_length.txt

In develop Mirage has been moved over to the skyline_functions.py
load_metrics_vars method in an attempt to remove all previous
global metrics_vars declarations but this has not resolved the issue, it
persists with Panorama.

Luminosity's possible process preventing condition

At the section below in luminosity.py spin_process function, there is a possible and noisy luminosity processing prevention I detected. This is probably not faced by environments which have been processing its anomalies without data loss. I previously loaded test metrics and removed most of the test metrics, and probably cleaned up some other metrics too, during Skyline processing. This causes luminosity process to stuck at this point since the metrics couldn't be found in the DB:

if not correlated_metrics:
logger.info('no correlations found for %s anomaly id %s' % (
base_name, str(anomaly_id)))
return False

So this section always returns the spining process without setting the unprocessed found anomaly_id as processed, actually there is a data loss or cleanup somehow but anomaly_id left in the DB there. The reference anomaly_id is taken from this section actually:

if not last_processed_anomaly_id:
query = 'SELECT id FROM luminosity WHERE id=(SELECT MAX(id) FROM luminosity) ORDER BY id DESC LIMIT 1'
results = None
try:
results = mysql_select(skyline_app, query)
except:
logger.error(traceback.format_exc())
logger.error('error :: MySQL quey failed - %s' % query)
if results:
try:
last_processed_anomaly_id = int(results[0][0])
logger.info('last_processed_anomaly_id found from DB - %s' % str(last_processed_anomaly_id))
except:
logger.error(traceback.format_exc())

Due to data loss in terms of metrics we should alter this query: SELECT id FROM luminosity WHERE id=(SELECT MAX(id) FROM luminosity) ORDER BY id DESC LIMIT 1
On the other hand, even this query should simply be rewritten like since we don't need a subselect here: SELECT MAX(id) FROM luminosity.

But I am still uncomfortable of this query which is not properly handling the luminosity unprocessed anomaly ids. First of all, MAX(id) doesn't guarantee the latest unprocessed anomaly id as you know, because the sequence primary ids in a table can be obtained as previously deleted record ids, since they are available at that moment.

I tried this query after the above query in my environment and this works and doesn't stuck the spining process at the non-processable anomaly-id due to non-existing metrics:

now = int(time())
after = now - 600
query = 'SELECT id FROM anomalies WHERE id NOT IN (SELECT DISTINCT id FROM luminosity) AND anomaly_timestamp > \'%s\' ORDER BY anomaly_timestamp ASC LIMIT 1' % str(after)

The time range can be arrange to an optimum window but this is better for an ideal DB query. But I'm still uncomfortable with this condition WHERE id NOT IN (SELECT DISTINCT id FROM luminosity) since it will have a continuously increasing record count in the luminosity table and would cause a performance decrease in a very long run. It could be better to have a luminosity_processed flag in the anomaly table as well for this case and condition could be changed like this:

query = 'SELECT id FROM anomalies WHERE luminosity_processed=0 AND anomaly_timestamp > \'%s\' ORDER BY anomaly_t imestamp ASC LIMIT 1' % str(after)

Cross-Site Scripting Security Vulnerability

Hello,

I noticed a Cross-Site Scripting (XSS) security vulnerability in skyline/webapp/webapp.py.

return resp, 404

The vulnerability can be triggered by accessing "/ionosphere?...&fp_matches=true&fp_id=<script>alert(evil_code)</script>. Here the value of the HTTP fp_id parameter is injected into resulting html page without any prior sanitization. This allows attackers to make users execute arbitrary code, which is a serious security risk.

If your application is meant to be deployed in multi-user environments, where some of the user are not trusted, i would suggest to fix this issue. A trivial fix would be to sanitize the resulting html page:

return flask.escape(resp), 404

I found the bug while testing DeepCode’s AI Code Review. The tool can help you automate the process of finding such (and many other types of) bugs. You can sign-up your repo (free for Open Source) to receive notifications whenever new bugs are detected. You can give it a try here.

Any feedback is more than welcome at [email protected].

Cheers, Victor.

ValueError: can't have unbuffered text I/O

using docker compose
when run skyline got error ValueError: can't have unbuffered text I/O
i try modify file utils/python-daemon/runner.3.0.0.py row 135, It doesn't work

skyline-skyline-docker-skyline-1-1 | error - failed to start panorama
skyline-skyline-docker-skyline-1-1 | Traceback (most recent call last):
skyline-skyline-docker-skyline-1-1 | File "/skyline/skyline/horizon/agent.py", line 203, in
skyline-skyline-docker-skyline-1-1 | run()
skyline-skyline-docker-skyline-1-1 | File "/skyline/skyline/horizon/agent.py", line 197, in run
skyline-skyline-docker-skyline-1-1 | daemon_runner = runner.DaemonRunner(horizon)
skyline-skyline-docker-skyline-1-1 | File "/usr/local/lib/python3.8/site-packages/daemon/runner.py", line 114, in init
skyline-skyline-docker-skyline-1-1 | self._open_streams_from_app_stream_paths(app)
skyline-skyline-docker-skyline-1-1 | File "/usr/local/lib/python3.8/site-packages/daemon/runner.py", line 134, in _open_streams_from_app_stream_paths
skyline-skyline-docker-skyline-1-1 | self.daemon_context.stderr = open(
skyline-skyline-docker-skyline-1-1 | ValueError: can't have unbuffered text I/O
skyline-skyline-docker-skyline-1-1 | error - failed to start horizon
skyline-skyline-docker-skyline-1-1 | [2022-01-07 08:18:19 +0000] [144] [INFO] Starting gunicorn 20.1.0
skyline-skyline-docker-skyline-1-1 | [2022-01-07 08:18:19 +0000] [144] [INFO] Listening at: http://127.0.0.1:8000 (144)
skyline-skyline-docker-skyline-1-1 | [2022-01-07 08:18:19 +0000] [144] [INFO] Using worker: sync
skyline-skyline-docker-skyline-1-1 | [2022-01-07 08:18:19 +0000] [147] [INFO] Booting worker with pid: 147
skyline-skyline-docker-skyline-1-1 | /usr/local/lib/python3.8/importlib/init.py:127: DeprecatedWarning: Call to deprecated function init(...). API class may be removed in a future release, use falcon.App instead.
skyline-skyline-docker-skyline-1-1 | return _bootstrap._gcd_import(name[level:], package, level)
skyline-skyline-docker-skyline-1-1 | flux started with pid 144
skyline-skyline-docker-skyline-1-1 | Traceback (most recent call last):
skyline-skyline-docker-skyline-1-1 | File "/skyline/skyline/vista/agent.py", line 117, in
skyline-skyline-docker-skyline-1-1 | run()
skyline-skyline-docker-skyline-1-1 | File "/skyline/skyline/vista/agent.py", line 108, in run
skyline-skyline-docker-skyline-1-1 | daemon_runner = runner.DaemonRunner(vista)
skyline-skyline-docker-skyline-1-1 | File "/usr/local/lib/python3.8/site-packages/daemon/runner.py", line 114, in init
skyline-skyline-docker-skyline-1-1 | self._open_streams_from_app_stream_paths(app)
skyline-skyline-docker-skyline-1-1 | File "/usr/local/lib/python3.8/site-packages/daemon/runner.py", line 134, in _open_streams_from_app_stream_paths
skyline-skyline-docker-skyline-1-1 | self.daemon_context.stderr = open(
skyline-skyline-docker-skyline-1-1 | ValueError: can't have unbuffered text I/O
skyline-skyline-docker-skyline-1-1 | error - failed to start vista
skyline-skyline-docker-skyline-1-1 | Traceback (most recent call last):
skyline-skyline-docker-skyline-1-1 | File "/skyline/skyline/analyzer/agent.py", line 147, in
skyline-skyline-docker-skyline-1-1 | run()
skyline-skyline-docker-skyline-1-1 | File "/skyline/skyline/analyzer/agent.py", line 138, in run
skyline-skyline-docker-skyline-1-1 | daemon_runner = runner.DaemonRunner(analyzer)
skyline-skyline-docker-skyline-1-1 | File "/usr/local/lib/python3.8/site-packages/daemon/runner.py", line 114, in init
skyline-skyline-docker-skyline-1-1 | self._open_streams_from_app_stream_paths(app)
skyline-skyline-docker-skyline-1-1 | File "/usr/local/lib/python3.8/site-packages/daemon/runner.py", line 134, in _open_streams_from_app_stream_paths
skyline-skyline-docker-skyline-1-1 | self.daemon_context.stderr = open(
skyline-skyline-docker-skyline-1-1 | ValueError: can't have unbuffered text I/O
skyline-skyline-docker-skyline-1-1 | error - failed to start analyzer

webapp is not runnig

I can't start the webapp.
in the settings.py in configured to work with gunicorn and the folowing error (even the file exist!)
Error: './skyline/webapp/gunicorn.py' doesn't exist
webapp started with pid unknown

Memory leak in Analyzer

Analyzer memory leaks under certain conditions in relation to Mirage, an unused Mirage list that is being appended too but not reset and matplotlib savefig in the alerters context.

How Does Skyline Handle Seasonal Data.

I am interested in knowing about the skyline integration on seasonal data. I have looked through 3-sigma algorithms and feel like only FIRST_HOUR_AVERAGE Is the only algorithm that can detect seasonal sequence anomalies. For example, if data on a particular day has low peaks than the previous day than skyline algorithms won't mark it as an anomaly. Or am I understanding the algorithms wrong? Can you please give me some insight on this.

Handle Panorama stampede on restart after not running

Although Panorama is set up to record anomalies with the actual data in the
check file, similar to a queue, if Panorama happens to stall and is restarted
later the Panorama anomaly submissions are somewhat skewed and as it populates
the DB, it will probably fire any Skyline Analyzer or Mirage alerts that are on
any mysql metrics namespaces :)

Although it is creating a Panorama key, the keys will not match the event if
these are old check files, but realistically we still want to record the
anomalies in the DB with the same logical that is applied to current data, in
terms of setting keys or other similar method that will achieve the desired goal.

However that is not a trivial as it may seem I do not think.

Current workaround

Move the check files or let them be processed.

If they are processed, they will skew other metrics, even Skyline metrics and
possibly any related MySQL metrics too.

# Move the check files
PANORAMA_DIR="<YOUR_SKYLINE_PANORAMA_DIR>"  # e.g. PANORAMA_DIR="/opt/skyline/panaroma"

# STOP your Panorama if it is running

mkdir -p "${PANORAMA_DIR}/manually_skipped"
for i in $(find "${PANORAMA_DIR}/check" -type f -name "*.txt")
do
  /bin/mv -f "$i" "${PANORAMA_DIR}/manually_skipped/"
done

# START your Panorama, sorry some anomalies are not recorded

MySQL key_block_size

The declaration of the mysql_key_block_size breaks z_fp_ and z_ts_ tables creation in MySQL 5.7

DatabaseError: (mysql.connector.errors.DatabaseError) 1031 (HY000): Table storage engine for 'z_fp_12345' doesn't have this option [SQL: '\nCREATE TABLE z_fp_12345 (\n\tid INTEGER NOT NULL AUTO_INCREMENT, \n\tfp_id INTEGER NOT NULL, \n\tfeature_id INTEGER NOT NULL, \n\tvalue DOUBLE, \n\tPRIMARY KEY (id)\n)ENGINE=InnoDB CHARSET=utf8 KEY_BLOCK_SIZE=255\n\n']

This was fine up to MySQL 5.6 apparently https://stackoverflow.com/a/46290127

Removing the mysql_key_block_size='255', parameter fixes this and is backwards compatible.

nonNegativeDerivative applied twice in learn.py to existing json data

With the introduction of calculating the nonNegativeDerivative in both Analyzer and Mirage before the initial analysis, both apps are now passing the preprocessed time series to the send_anomalous_metric_to function. However in skyline/ionosphere/learn.py the metric is still being checked to determine if it is in derivative_metrics and the nonNegativeDerivative function is being applied to the existing json data, if tthe json data already exists. This is incorrect. It is correct in the context of learn.py having to fetch the the data from Graphite, however applying nonNegativeDerivative to existing json data is incorrect and does not have the desired result.

[Q] How do I make a metrics appear and select it in the "metric name" section?

I have successfully installed skyline and graphite on the same VM using the script provided on github.
As there is no special error log on the skyline log, everything seems to be normal.

Is there anyone who can help me?
캡처

Incorrect scale in some graphs

It appears that the Graphite graphs created with a from=- use the highest resolution scale, which is incorrect if the graph represents data greater than than the first retention. However if the Graphite graph is created with the parameters of e.g. from=00%3A00_20180801&until=00%3A00_20180807 the graph scale renders correctly.

This has bothered me for a long time, getting round to fixing it.

'AxesSubplot' object has no attribute 'set_axis_bgcolor'

skyline/analyzer/alerters.py and skyline/mirage/mirage_alerters.py makes use of a method from Matplotlib that was removed in 2.2.0
With the Luminosity branch upgrade to matplotlib==2.2.2 for IssueID #2272 an error is now occurring:

2018-04-17 15:21:58 :: 30308 :: Traceback (most recent call last): File "/opt/skyline/github/skyline/skyline/mirage/mirage_alerters.py", line 473, in alert_smtp ax.set_axis_bgcolor('black') AttributeError: 'AxesSubplot' object has no attribute 'set_axis_bgcolor'

2018-04-17 15:21:58 :: 30308 :: error :: alert_smtp - could not build plot

As per scikit-learn/scikit-learn#10762 and https://github.com/scikit-learn/scikit-learn/pull/10763/files

Fix SQL errors

ERROR 1060 (42S21) at line 204: Duplicate column name 'last_checked'
ERROR 1072 (42000) at line 204: Key column 'layer_id' doesn't exist in table

Mutliple SQL Injection Security Vulnerabilities

Hello,

I noticed several SQL Injections in skyline/webapp/ionosphere_backend.py.

for row in engine.execute(stmt):

Unsanitized user input from HTTP parameters:

  1. from_timestamp (line 1200)
  2. until_timestamp (line 1217)
  3. generation_greater_than (line 1233)
  4. layers_id_greater_than (line 1245)
  5. matched_greater_than (line 1275)

are used to build an SQL query, which then gets executed. This allows attackers to own the database (see the OWASP page above for a complete list of risks).

Same story in line

results = connection.execute(metrics_like_query)

and argument metric_like (line 1352).

If this code is running on a publicly available server, then this is a serious security risk and you might want to fix it. As a fix i would advise using prepared statements.

I found the bug while testing DeepCode’s AI Code Review. The tool can help you automate the process of finding such (and many other types of) bugs. You can sign-up your repo (free for Open Source) to receive notifications whenever new bugs are detected. You can give it a try here.

Any feedback is more than welcome at [email protected].
Cheers, Victor.

v3.0.0 misspelt replace file

Release v3.0.0 was updated and replaced to correct missing/misspelt function file in v3.0.0

The release from 2022-05-01 (d8cafc7) was replaced because there was a misspelling in a lost function file that was recreated.
skyline/functions/metrics_manager/namesapce_analysed_events.py
Should be:
skyline/functions/metrics_manager/namespace_analysed_events.py

Redis exception in horizon

While trying to set-up skyline, I am getting a redis exception in horizon logs. The redis server restarted once due to a server reboot.

2021-12-21 16:28:27 :: 9752 :: horizon.worker :: total queue values known for the last 60 seconds - 4
2021-12-21 16:28:27 :: 9752 :: horizon.worker :: average queue size for the last 60 seconds - 0
2021-12-21 16:28:27 :: 9752 :: horizon.worker :: datapoints_sent_to_redis in last 60 seconds - 0
2021-12-21 16:28:27 :: 9752 :: Traceback (most recent call last):
  File "/opt/skyline/github/skyline/skyline/horizon/worker.py", line 505, in run
    self.redis_conn.sadd('horizon.metrics_received', *set(metrics_received))
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/client.py", line 2243, in sadd
    return self.execute_command('SADD', name, *values)
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/client.py", line 901, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/client.py", line 915, in parse_response
    response = connection.read_response()
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/connection.py", line 756, in read_response
    raise response
redis.exceptions.ResponseError: wrong number of arguments for 'sadd' command

2021-12-21 16:28:27 :: 9752 :: horizon.worker :: error adding horizon.metrics_received
2021-12-21 16:28:27 :: 9752 :: renaming key horizon.metrics_received to aet.horizon.metrics_received
2021-12-21 16:28:27 :: 9752 :: Traceback (most recent call last):
  File "/opt/skyline/github/skyline/skyline/horizon/worker.py", line 512, in run
    self.redis_conn.rename('horizon.metrics_received', 'aet.horizon.metrics_received')
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/client.py", line 1747, in rename
    return self.execute_command('RENAME', src, dst)
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/client.py", line 901, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/client.py", line 915, in parse_response
    response = connection.read_response()
  File "/opt/python_virtualenv/projects/skyline-py3810/lib/python3.8/site-packages/redis/connection.py", line 756, in read_response
    raise response
redis.exceptions.ResponseError: no such key

2021-12-21 16:28:27 :: 9752 :: error :: failed to rename Redis key horizon.metrics_received to aet.horizon.metrics_received - no such key
2021-12-21 16:28:42 :: 9754 :: horizon.worker :: worker queue is empty and timed out
2021-12-21 16:28:42 :: 9752 :: horizon.worker :: worker queue is empty and timed out
2021-12-21 16:28:42 :: 9752 :: horizon.worker :: total queue size for the last 10 seconds - 0
2021-12-21 16:28:42 :: 9752 :: horizon.worker :: total queue values known for the last 10 seconds - 1

I tried checking redis if those keys existed or not.

redis /var/run/redis/redis.sock> get aet.horizon.metrics_received
(nil)
redis /var/run/redis/redis.sock> get horizon.metrics_received
(nil)
redis /var/run/redis/redis.sock> type horizon.metrics_received
none
redis /var/run/redis/redis.sock> type aet.horizon.metrics_received
none
redis /var/run/redis/redis.sock>

I have configured graphite to relay metrics to horizon, and I can see I'm recieving data using tcpdump, but I cant get any data into skyline.

Can someone help me with the redis issue?

Ionosphere validate learnt features profiles page

Skyline needs a page for validating all LEARNT features profiles that require
validating all on one page.

The page needs to be populated with a table/blocks for each features profile
that needs to be validated for a metric. Each block/row/rows per features
profile needs to have the Graphite image at their full duration (and learnt
duration?) to compare to the same Graphite images of their patent features
profile at the same resolutions. With 2 tick boxes per features profile to
validate and disable. However the disabled tickbox is a little tricky as
disabling disables all progeny as well, a page refresh would update it though.

This will allow the operator to assess all features profiles that need to
validated for a metric, to be done in one place, without having to use a
multitude of browser tabs. Although this page will not remove the current
validation method of the feature profile, that will still be possible.

Yet another flask context and template and more fun in cyclonic complexity.

The features profile images are available via the API from:

settings.SKYLINE_URL/ionosphere_images?image=/opt/skyline/ionosphere/features_profiles/stats/<graphite_metric_namespace_dir_path>/<timestamp>/<graphite_metric_namespace>.graphite_now.<RESOLUTION_IN_HOURS>h.png
# e.g.
https//skyline-server-eu-1.example.org/ionosphere_images?image=/opt/skyline/ionosphere/features_profiles/stats/skyline-server-eu-1/io/received/1526312070/stats.skyline-server-eu-1.io.received.graphite_now.168h.png

Analyzer also alerting on Mirage metrics now

And so is Mirage.

The previous change that reintroduced 432e3da#diff-54c72b335b6c00a0c5e1b1f4720341a2R683 has resulted in the order matters being compromised so to speak.

The issue presents itself when a metric namespace or wildcard has the absolute or wildcard path mapped to a tuple with the SECOND_ORDER_RESOLUTION_HOURS declared, such as

Such as:

            ('mysql.counters', 'smtp', 3600, 168),
...
...
            ('stats.mysql-master-1.mysql.counters.handlerRead_rnd', 'smtp', 3600, 168),
...
...
            ('stats.*', 'smtp', 3600),

stats.mysql-master-1.mysql.counters.handlerRead_rnd matches and gets sent to Mirage, OK.

2016-08-14 17:06:49 :: 21921 :: mirage check :: stats.mysql-master-1.mysql.counters.handlerRead_rnd at 168 hours - matched by substring
2016-08-14 17:06:49 :: 21921 :: added mirage check :: stats.mysql-master-1.mysql.counters.handlerRead_rnd,742316.0,168
2016-08-14 17:06:49 :: 21921 :: mirage check :: stats.mysql-master-1.mysql.counters.handlerRead_rnd at 168 hours - matched by regex
2016-08-14 17:06:49 :: 21921 :: added mirage check :: stats.mysql-master-1.mysql.counters.handlerRead_rnd,742316.0,168

But if there is an ALERTS tuple further down the ALERTS tuples such as:

            ('stats.*', 'smtp', 3600),

This run will send the metric to the Analyzer alert, as it should. However that
does make it more difficult to separate out Mirage metrics by wildcards

2016-08-14 17:06:49 :: 21921 :: triggering alert :: stats.mysql-master-1.mysql.counters.handlerRead_rnd 742316.0 via smtp - matched by regex

So the simple solution here will be to add a Mirage Redis metrics keys for any
Mirage metrics and the alert expiry that are encountered higher up the ALERTS
tuples and have the analyzer_metrics block first check to see if a Mirage
metrics key exists for the metric and only process via Analyzer alerters if not.

That should do it.

Unescaped Graphite target

Mirage is throwing an error when trying to fetch a timeseries with a ":" in the
target name. This probably applies to number of other patterns that may occur
in metric namespaces, such as ones that include open/close parenthesis, "(" and
")". Probably all metacharacters, even though ":" is not a metacharacter per
se. The also affects Crucible I would think.

2016-08-03 08:34:43 :: 25532 :: graphite url - http://graphite:8888/render/?from=09:31_20160802&until=09:40_20160803&target=carbon.relays.graphite-a.destinations.123_213_124_214:2024:None.relayMaxQueueLength&format=json
2016-08-03 08:34:43 :: 25532 :: surfacing timeseries data for carbon.relays.graphite-a.destinations.123_213_124_214:2024:None.relayMaxQueueLength from graphite from 09:31_20160802 to 09:40_20160803
2016-08-03 08:34:43 :: 25532 :: error :: data retrieval failed

This is in a similar vein to this - graphite-project/graphite-web#242

Fixing Skyline to a Graphite version is probably a step to far :)

Ionosphere - fluid approximation - approximately_close on layers

If the value in the D layer is approximately close to the anomalous value in
have a scaled percentage tolerance taking into account the range, like with
Min-Max scaling.

This could be equally true for D1 and E layer (if the time series has sufficient
range).

This will suit high range metrics more than low range metrics, but should make
those anomalies that are like, D >= 30 and anomalous data point = 31, less,
which is not a bad thing.

Test dependency updates

Dependency updates to be tested with Python-2.7.12

scipy==0.17.1 to scipy==0.18.0
pytz==2016.4 to pytz==2016.6.1
pyparsing==2.1.5 to pyparsing==2.1.8
matplotlib==1.5.1 to matplotlib==1.5.2
msgpack-python==0.4.7 to msgpack-python==0.4.8
requests==2.10.0 to requests==2.11.1

Ionosphere learn countdown clock at 0 when client and server timezone differs

For the duration of the Ionosphere learn countdown, if the client and server have a different time zone settings the countdown clock will display 00:00:00 for the entire duration of the countdown. Once the Ionosphere learn period has passed the countdown clock is removed as normal and replaced with

Countdown :: valid to learn :: True

Apply time zone fix to skyline.js

The Panorama UI implemented moment.js and moment-timzone to allow for the
Dygraph rendering of the timeseries to use either a fixed timezone e.g. the
Graphite server's time zone or user the browser time zone as per
#14

This should be ported to the now view at some point, for now pointing it out in
the UI under Known bugs section.

AttributeError: 'list' object has no attribute 'metric' - Panorama not working

Hello Team,

I am getting the following error in fresh installation:

2020-07-11 03:17:06 :: 29074 :: Traceback (most recent call last):
File "/opt/skyline/github/skyline/skyline/panorama/panorama.py", line 740, in spin_process
metric_vars_array = self.new_load_metric_vars(str(metric_check_file))
File "/opt/skyline/github/skyline/skyline/panorama/panorama.py", line 478, in new_load_metric_vars
'debug :: metric_vars determined - metric variable - metric - %s' % str(metric_vars.metric))
AttributeError: 'list' object has no attribute 'metric'

I can not any entries in skyline database. No entries in anomaly's table and metrics table. Panorama not working

Hard code IONOSPHERE_PROCESSES to 1

Currently Ionosphere has settings.IONOSPHERE_PROCESSES which allows for the
operator to specific how many processes Ionosphere can spawn. However
Ionosphere should never need more than 1 and should be effectively hard
coded as. This variable is only declared for the purpose of
maintaining a standard set up in each module and to possibly enable more
than one processor on Ionosphere in the future, should there be a
requirement for Ionosphere to analyse the metrics quicker. Running
Ionosphere with more than one process is untested so it is being
hard coded to be 1

Example

You don't perhaps have an example floating around?

New slack messaging does not handle public channel

The new slack messaging released in v1.2.6 does not handle a public channel as the slack response for a public channel has a different schema. That also goes for free or hosted slack. v.1.2.16 only works with free slack workspace and a private channel.

[Q] The "horizon.test.pickle" test is getting an error.

I am trying to install the latest version of skyline, so after changing the installation script as shown below, I installed skyline on Centos8.

[root@localhost dawn]# vim skyline.dawn.sh
...
...
if [ ! -f "/tmp/skyline.dawn.skyline.${SKYLINE_RELEASE}.txt" ]; then
  echo "Checking out Skyline at $SKYLINE_RELEASE"
  sleep 1
  cd /opt/skyline/github/skyline || exit 1
  #git checkout $SKYLINE_RELEASE
  if [ $? -ne 0 ]; then
    echo "error :: failed to check out Skyline at $SKYLINE_RELEASE"
    exit 1
  fi
...
...
echo "Seeding Skyline with data"
sleep 2
cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}" || exit 1
source bin/activate
#bin/python${PYTHON_MAJOR_VERSION} /opt/skyline/github/skyline/utils/seed_data.py
deactivate
cd /tmp || exit
...
...

After installation, I tried to "Seeding Skyline with data" in the installation script and an error occurred.
I have tried it several times, but I always get the same error.

I did the test like below.

- I opened a terminal window to check if the connection to port 2024 was successful. (Connection success)
[root@localhost ~]# nc -vz 127.0.0.1 2024
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 127.0.0.1:2024.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.

- I opened another terminal window and executed the command below, but a connection error occurred.
[root@localhost ~]# PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
[root@localhost ~]# PROJECT="skyline-py386"
[root@localhost ~]# cd "${PYTHON_VIRTUALENV_DIR}/projects/${PROJECT}"
[root@localhost skyline-py386]# source bin/activate
(skyline-py386) [root@localhost skyline-py386]#
(skyline-py386) [root@localhost skyline-py386]# bin/python${PYTHON_MAJOR_VERSION} /opt/skyline/github/skyline/utils/seed_data.py
notice :: testing the Horizon parameters
info   :: settings.UDP_PORT :: 2025
info   :: settings.HORIZON_IP :: 127.0.0.1
info   :: connect_test_metric :: horizon.test.params
notice :: Horizon parameters OK
notice :: 1000 data points to push via a pickle to Horizon
info   :: for metric :: horizon.test.pickle
info   :: using end_timestamp 1616837281 and initial 1616836281
notice :: adding anomalous data point - [1616837266, 33672.0] - value was 11393 and was modified with + 22279
notice :: adding anomalous data point - [1616837267, 30720.0] - value was 11534 and was modified with + 19186
notice :: adding anomalous data point - [1616837268, 33696.0] - value was 11451 and was modified with + 22245
notice :: adding anomalous data point - [1616837269, 30183.0] - value was 11392 and was modified with + 18791
notice :: adding anomalous data point - [1616837270, 30392.0] - value was 11449 and was modified with + 18943
notice :: adding anomalous data point - [1616837271, 33254.0] - value was 11458 and was modified with + 21796
notice :: adding anomalous data point - [1616837272, 34505.0] - value was 11438 and was modified with + 23067
notice :: adding anomalous data point - [1616837273, 32679.0] - value was 11423 and was modified with + 21256
notice :: adding anomalous data point - [1616837274, 33418.0] - value was 11388 and was modified with + 22030
notice :: adding anomalous data point - [1616837275, 35372.0] - value was 11456 and was modified with + 23916
notice :: adding anomalous data point - [1616837276, 30460.0] - value was 11471 and was modified with + 18989
notice :: adding anomalous data point - [1616837277, 31075.0] - value was 11340 and was modified with + 19735
notice :: adding anomalous data point - [1616837278, 32567.0] - value was 11398 and was modified with + 21169
notice :: adding anomalous data point - [1616837279, 31994.0] - value was 11409 and was modified with + 20585
notice :: adding anomalous data point - [1616837280, 31082.0] - value was 11473 and was modified with + 19609
notice :: sending 1000 data points
sent 100 of 1000 data points to Horizon via pickle for horizon.test.pickle to 127.0.0.1:2024
sent 200 of 1000 data points to Horizon via pickle for horizon.test.pickle to 127.0.0.1:2024
sent 300 of 1000 data points to Horizon via pickle for horizon.test.pickle to 127.0.0.1:2024
sent 400 of 1000 data points to Horizon via pickle for horizon.test.pickle to 127.0.0.1:2024
Traceback (most recent call last):
  File "/opt/skyline/github/skyline/utils/seed_data.py", line 59, in pickle_data_to_horizon
    **sock.connect((ip, port))
TimeoutError: [Errno 110] Connection timed out**

error :: failed to send pickle data to Horizon
error :: failed to send 100 data points to Horizon via pickle for horizon.test.pickle

- There seems to be no special error log.
[root@localhost ~]# find /var/log/skyline -type f -name "*.log" | while read skyline_logfile
> do
>   echo "#####
> # Checking for errors in $skyline_logfile"
>   cat "$skyline_logfile" | grep -B2 -A10 -i "error ::\|traceback" | tail -n 60
>   echo ""
>   echo ""
> done
#####
# Checking for errors in /var/log/skyline/webapp.access.log

#####
# Checking for errors in /var/log/skyline/boundary.log

#####
# Checking for errors in /var/log/skyline/ionosphere.log

#####
# Checking for errors in /var/log/skyline/luminosity.log

#####
# Checking for errors in /var/log/skyline/panorama.log

#####
# Checking for errors in /var/log/skyline/analyzer.log

#####
# Checking for errors in /var/log/skyline/horizon.log

#####
# Checking for errors in /var/log/skyline/webapp.log

- However, a connection error occur.
[root@localhost ~]# nc -vz 127.0.0.1 2024
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection timed out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.