getsentry / snuba Goto Github PK

Search the seas for your lost treasure.

License: Other

Shell 0.21% Python 89.15% HTML 0.02% Makefile 0.08% Dockerfile 0.10% TypeScript 4.12% JavaScript 0.02% Rust 6.08% Jsonnet 0.23% Ruby 0.01%

tag-production

snuba's Introduction

Snuba is a service that provides a rich data model on top of Clickhouse together with a fast ingestion consumer and a query optimizer.

Snuba was originally developed to replace a combination of Postgres and Redis to search and provide aggregated data on Sentry errors. Since then it has evolved into the current form where it supports most time series related Sentry features over several data sets.

Click here for the full documentation.

Features:

Provides a database access layer to the Clickhouse distributed data store.
Provides a graph logical data model the client can query through the SnQL language which provides functionalities similar to those of SQL.
Support multiple separate data sets in a single installation.
Provides a rule based query optimizer.
Provides a migration system to apply DDL changes to Clickhouse both in a single node and distributed environment.
Ingest data directly from Kafka
Supports both point in time queries and streaming queries.

snuba's People

Contributors

Stargazers

Watchers

Forkers

jessestuart pombredanne forkkit denisgolius appva fpacifici jiankunking chhetripradeep cafebazaar anthonynsimon alexef kristjanstrojan olliezhu glockio isabella232 yiguolei konstantin-popov linkwithzelda cdavantzis channyclaus sentry-arm kokizzu jaanae asteny jiazemin sueannioanis mcannizz pjlogan39 renatonascimento01 wockan yordis bumarcell buildjet elijahahianyo slach wescarr turbotankist m0nikasingh nohupped chengshihao1018 codewdhruv klboke wagpa caiyonghang adel-almasaabi eea ryantam626 frank-m qbx2 suprnana rockdrilla adonskoy yiliaofan bkk-bcd

snuba's Issues

Would enabling threads improve anything?

I don't mean to be naive, but I was looking at the startup logs between a Sentry 9.1.1 docker image and a Snuba:latest docker image and assumed they would be the same. However, I noticed that Sentry explicitly says that python threads support enabled, whereas Snuba explicitly says the opposite, *** Python threads support is disabled. You can enable it with --enable-threads ***.

I just wanted to verify that Snuba would not benefit from this setting given that Sentry does.

This is pretty low-pri as it's just a question.

Snuba Log

Sentry Log

Pypi release

Hi.
I could not find snuba on pypi, and I could not find documentation on snuba packaging either.
Is there any plan to release snuba on Pypi?

Migration from sentry_local -> errors_local seems to have not taken place

Hi folks,

We run Sentry 10 internally, but do not use the "onpremise" setup as we want to split out the containers for use with Kubernetes orchestration/separate hosts. In our current install we are running the 20.12.1 release for both Sentry and Snuba, although these have been upgraded from an older version from months ago.

In #1354 it appears we migrate events in Clickhouse to live in the errors_local table, and this migration seems to have a side-effect of fixing a bug that forced retention to 30d (getsentry/self-hosted#612).

However, even after applying Snuba migrations such that the migration above was applied:

I have no name!@qa-snubabox--000346c809ef45482:/usr/src/snuba$ /pay/snuba/stripe-snuba-entrypoint.sh snuba bootstrap --force
2021-01-13 00:22:14,466 Failed to create topic events
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'events' already exists."}
2021-01-13 00:22:14,467 Failed to create topic errors-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'errors-replacements' already exists."}
2021-01-13 00:22:14,467 Failed to create topic cdc
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'cdc' already exists."}
2021-01-13 00:22:14,467 Failed to create topic outcomes
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'outcomes' already exists."}
2021-01-13 00:22:14,467 Failed to create topic ingest-sessions
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'ingest-sessions' already exists."}
2021-01-13 00:22:14,468 Failed to create topic event-replacements
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'event-replacements' already exists."}
2021-01-13 00:22:14,468 Failed to create topic snuba-commit-log
Traceback (most recent call last):
  File "/usr/src/snuba/snuba/cli/bootstrap.py", line 91, in bootstrap
    future.result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
cimpl.KafkaException: KafkaError{code=TOPIC_ALREADY_EXISTS,val=36,str="Topic 'snuba-commit-log' already exists."}
2021-01-13 00:22:14,492 Running migration: 0012_errors_make_level_nullable
2021-01-13 00:22:14,511 Finished: 0012_errors_make_level_nullable
2021-01-13 00:22:14,520 Running migration: 0009_transactions_fix_title_and_message
2021-01-13 00:22:14,543 Finished: 0009_transactions_fix_title_and_message
2021-01-13 00:22:14,552 Running migration: 0002_discover_add_deleted_tags_hash_map
2021-01-13 00:22:14,570 Finished: 0002_discover_add_deleted_tags_hash_map
2021-01-13 00:22:14,579 Running migration: 0003_discover_fix_user_column
2021-01-13 00:22:14,592 Finished: 0003_discover_fix_user_column
2021-01-13 00:22:14,601 Running migration: 0004_discover_fix_title_and_message
2021-01-13 00:22:14,619 Finished: 0004_discover_fix_title_and_message
2021-01-13 00:22:14,628 Running migration: 0002_sessions_aggregates
2021-01-13 00:22:14,692 Finished: 0002_sessions_aggregates

Events still seem to live in the sentry_local table, although errors_local does exist:

qa-clickhousebox--09bfa9ed07cedab60.northwest.stripe.io :) select count(*) from errors_local

SELECT count(*)
FROM errors_local

┌─count()─┐
│       0 │
└─────────┘

1 rows in set. Elapsed: 0.002 sec.

qa-clickhousebox--09bfa9ed07cedab60.northwest.stripe.io :) select count(*) from sentry_local

SELECT count(*)
FROM sentry_local

┌─count()─┐
│ 4911118 │
└─────────┘

1 rows in set. Elapsed: 0.001 sec.

When we run Snuba's cleanup task with DEFAULT_RETENTION_DAYS = 1 we get a log event back saying that zero partitions have been dropped.

Are we missing something w.r.t. running Snuba migrations, or is there a bug in the code somewhere? The output of snuba migrations list appears to signal that migration 0011_rebuild_errors was applied out successfully:

events
[X]  0001_events_initial
[X]  0002_events_onpremise_compatibility
[X]  0003_errors
[X]  0004_errors_onpremise_compatibility
[X]  0005_events_tags_hash_map
[X]  0006_errors_tags_hash_map
[X]  0007_groupedmessages
[X]  0008_groupassignees
[X]  0009_errors_add_http_fields
[X]  0010_groupedmessages_onpremise_compatibility
[X]  0011_rebuild_errors
[X]  0012_errors_make_level_nullable

Thank you!

[Question] Snuba upgrade process

I see we are now doing regular releases every month. Since postgresql and clickhouse are the two schema based systems in the whole architecture. I am aware that PostgreSQL schema changes are handled via django migrations which works flawless. I am unsure about how are we manage clickhouse schema changes right now. Whether we need to run snuba migrate command during each version upgrade ? I am worried about messing up my production snuba setup.

I see most of your configuration code goes to your ops repo which isn't open-sourced and documentation around operations side of snuba isn't much. It will be great to cover these things somewhere.

Thank you in advance.

/cc @lynnagara

cimpl.KafkaException: KafkaError{code=CLUSTER_AUTHORIZATION_FAILED

Hello, When I used snuba, I made a mistake, and I want to trouble you guys to see that my Kafka cluster does not need authentication.

2020-06-04 14:23:39,309 Failed to create topic snuba-commit-log Traceback (most recent call last): File "/home/maintain/workspaces/snuba/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/cli/bootstrap.py", line 93, in bootstrap future.result() File "/usr/local/python3/lib/python3.7/concurrent/futures/_base.py", line 425, in result return self.__get_result() File "/usr/local/python3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception cimpl.KafkaException: KafkaError{code=CLUSTER_AUTHORIZATION_FAILED,val=31,str="Broker: Cluster authorization failed"} 2020-06-04 14:23:39,309 Failed to create topic outcomes Traceback (most recent call last): File "/home/maintain/workspaces/snuba/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/cli/bootstrap.py", line 93, in bootstrap future.result() File "/usr/local/python3/lib/python3.7/concurrent/futures/_base.py", line 425, in result return self.__get_result() File "/usr/local/python3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception cimpl.KafkaException: KafkaError{code=CLUSTER_AUTHORIZATION_FAILED,val=31,str="Broker: Cluster authorization failed"}

Releases?

Any plans on using proper releases for this project?

Having to run off latest and getting unintended upgrades/updates/bugs/incompatibilities with other apps isn't very nice.

Edit: To clarify, I'm talking about both Github releases and Dockerhub tags

Connection aborted.', RemoteDisconnected('Remote end closed connection without response')

Snuba service has been restarted recently. The error message is as follows:

`2020-03-30 10:44:48,315 Worker flush took 21ms
2020-03-30 10:44:51,315 Flushing 3 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=1437250, hi=1437252)}): forced:False size:False time:True
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
Traceback (most recent call last):
File "", line 2, in raise_from
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
six.raise_from(e, None)
File "/usr/local/lib/python3.7/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
rv = real_getresponse(self, *args, **kwargs)
File "/usr/local/lib/python3.7/http/client.py", line 1344, in getresponse

load_entry_point('snuba', 'console_scripts', 'snuba')()

http.client.RemoteDisconnected: Remote end closed connection without response
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
File "/usr/local/lib/python3.7/http/client.py", line 275, in _read_status

return self.main(*args, **kwargs)
version, status, reason = self._read_status()

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in call
raise RemoteDisconnected("Remote end closed connection without"
During handling of the above exception, another exception occurred:
File "/usr/local/bin/snuba", line 11, in
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
return _process_result(sub_ctx.command.invoke(sub_ctx))
return ctx.invoke(self.callback, **ctx.params)
return callback(*args, **kwargs)
consumer.run()
File "/usr/src/snuba/snuba/cli/consumer.py", line 156, in consumer
File "/usr/src/snuba/snuba/utils/streams/batching.py", line 137, in run
File "/usr/src/snuba/snuba/utils/streams/batching.py", line 142, in _run_once
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
self._run_once()
rv = self.invoke(ctx)
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 603, in urlopen
self._flush()
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 641, in urlopen
File "/usr/src/snuba/snuba/consumer.py", line 98, in flush_batch
self.__writer.write(inserts)
chunked=True,
self.worker.flush_batch(self.__batch_results)
_stacktrace=sys.exc_info()[2])
File "/usr/src/snuba/snuba/utils/streams/batching.py", line 242, in _flush
File "/usr/src/snuba/snuba/clickhouse/http.py", line 73, in write
raise value.with_traceback(tb)
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 368, in increment
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 685, in reraise
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 383, in _make_request
File "/usr/local/lib/python3.7/http/client.py", line 1344, in getresponse
File "", line 2, in raise_from
rv = real_getresponse(self, *args, **kwargs)
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 306, in begin
File "/usr/local/lib/python3.7/http/client.py", line 275, in _read_status
six.raise_from(e, None)
version, status, reason = self._read_status()
chunked=chunked)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
File "/usr/local/lib/python3.7/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse
response.begin()
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
raise RemoteDisconnected("Remote end closed connection without"`

This is because there are too many connections or the network connection is broken?

Kafka TLS encryption unavailable as a configurable option

snuba/snuba/cli/bootstrap.py

Lines 51 to 58 in 4b9555c

    
           logger.debug("Attempting to connect to Kafka (attempt %d)", attempts) 
        
           client = AdminClient( 
        
               { 
        
                   "bootstrap.servers": ",".join(bootstrap_server), 
        
                   "socket.timeout.ms": 1000, 
        
               } 
        
           ) 
        
           client.list_topics(timeout=1)

When connecting to Kafka cluster via confluent_kafka client, it is not possible to use
a kafka cluster with TLS setup because the parameters used to construct the client are
hardcoded and not configurable from elsewhere (as shown in the snippet above).

Specifically, we would need ssl.key.location, ssl.certificate.location, and security.protocol
from kakfa config to be able to achieve this.

The right change to make here is probably to add an environment variable pointing to the path
where the kafka config is written to so that the configuration for kafka client can be
overriden with it.

Upgrade from 20.8.9.6 to 21.2.2.8 is failing

Environment

We run sentry10 on Kubernetes with EKS. Sentry is used to track our application errors following this pipeline: https://develop.sentry.dev/architecture/#event-pipeline

Our setup is based on the following helm chart:
https://github.com/sentry-kubernetes/charts/tree/develop/sentry

We are currently running the 20.8.9.6 clickhouse version in our setup and would like to upgrade to 21.2.2.8. Unfortunately, upon the upgrade, snuba is throwing an error complaining about not being able to reach sentry-clickhouse.

snuba.clickhouse.errors.ClickhouseWriterError: [170] Requested cluster 'sentry-clickhouse' not found (version 21.2.2.8 (official build)
Caught ClickhouseWriterError(code=170, message="Requested cluster 'sentry-clickhouse' not found (version 21.2.2.8 (official build))", row=None),
│ 2021-02-22 15:53:09,529 Starting                                                                                                                                         │
│ 2021-02-22 15:53:12,543 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 6973887}                                                               │
│ 2021-02-22 15:53:12,544 Initialized processing strategy: <snuba.utils.streams.processing.strategies.streaming.transform.TransformStep object at 0x7f4a33804ca0>          │
│ 2021-02-22 15:53:13,306 Time limit reached, closing <Batch: 443 messages, open for 0.75 seconds>...                                                                      │
│ 2021-02-22 15:53:13,307 Starting new HTTP connection (1): clickhouse:8123                                                                                                │
│ 2021-02-22 15:53:13,318 Finished sending data from <HTTPWriteBatch: 443 rows (2443898 bytes)>.                                                                           │
│ 2021-02-22 15:53:13,344 http://clickhouse:8123 "POST /?load_balancing=in_order&insert_distributed_sync=1&query=INSERT+INTO+default.sentry_dist+FORMAT+JSONEachRow HTTP/1 │
│ 2021-02-22 15:53:13,345 Received response for <HTTPWriteBatch: 443 rows (2443898 bytes)>.                                                                                │
│ 2021-02-22 15:53:13,345 Caught ClickhouseWriterError(code=170, message="Requested cluster 'sentry-clickhouse' not found (version 21.2.2.8 (official build))", row=None), │
│ 2021-02-22 15:53:13,345 Terminating <snuba.utils.streams.processing.strategies.streaming.transform.TransformStep object at 0x7f4a33804ca0>...                            │
│ 2021-02-22 15:53:13,345 Terminating <snuba.utils.streams.processing.strategies.streaming.collect.CollectStep object at 0x7f4a33804ac0>...                                │
│ 2021-02-22 15:53:13,345 Terminating <snuba.consumer.ProcessedMessageBatchWriter object at 0x7f4a3373f4c0>...

Steps to Reproduce

Our current setup is running with the 20.8.9.6 on EKS
We just change the image tag from 20.8.9.6 to 21.2.2.8
We get this error message on our snuba consumers.

Expected Result

The upgrade would go smoothly.

Actual Result

The upgrade didn't go smoothly 😕

Please let us know if we need to provide more info regarding this.
Thank you in advance for your time!

Add ability to change Clickhouse database name

Currently, the database name in connection isn't set, so clickhouse_driver uses default database name, and we can't change it via env variables. But the CLI scripts (like cleanup, migration, etc.) have the database option, where we can set it up.

That would be great if the database name option will be added to the env variables.

We're using Sentry on-premise with external Clickhouse installation, where we store some other applications data, so the default database name is confusing.

Migrate CI from Travis to GitHub Actions

Travis sprung some surprise billing changes on us awhile, so we are now prioritizing removing usage of TravisCI completely. This issue will be used by @getsentry/productivity to track overall progress across the organization, but please let us know here if there's anything we should be aware of.

Remove events specific methods from the dataset class

Remove this:
https://github.com/getsentry/snuba/blob/master/snuba/datasets/__init__.py#L37-L85

This depends on moving all the dataset specific query logic into the dataset class: #333

Move events specific query processing into the events datraset

Like in #294, a lot of the query processing in util.py is events specific and does not make sense for other datasets (like tags and promoted tags).

This logic should move into the datasets class so that we can evolve the query language independently than dataset specific code and so that dataset specific needed can be preserved.

This is an interesting way of doing this
https://github.com/getsentry/snuba/pull/294/files#diff-d66c14952fe0964290d998b69ff61affR20

Support for array indexing

As seen in.

SELECT
    exception_stacks.type[1],
    count(),
    topK(3)(project_id),
    uniq(project_id)
FROM dev
WHERE (project_id IN (1, 2, 3, 4, 5, 6, 7, 8)) AND notEmpty(exception_stacks.type)
GROUP BY exception_stacks.type[1]
ORDER BY count() DESC
LIMIT 5

It was also suggested to try doing:

SELECT any(exception_stacks.type) AS type ... GROUP BY type ...

But this yields a different error:

DB::Exception: Unknown identifier (in GROUP BY): any(exception_stacks.type)

This is not able to be used in either the SELECT itself, or GROUP BY, I assume any operators.

snuba consumer left kafka cluster peacefully without exiting

Environment

How do you use Sentry?
self-hosted (Sentry 10.1.0.dev0)

Steps to Reproduce

Cannot reproduce easily. The service was operating until there was a timeout in polling which exceeded max.poll.interval.ms, and the snuba consumer decided to leave the consumer group.

Expected Result

Since we run our Sentry on k8s, it would be handled properly if the snuba consumer exited when librdkafka decides to leave the consumer group.

Actual Result

The snuba consumer stayed running even though it is no longer subscribing to the topic. As a result, no events were processed.

Logs:

%4|1624919321.320|MAXPOLL|rdkafka#consumer-2| [thrd:main]: Application maximum poll interval (300000ms) exceeded by 481ms (adjust max.poll.interval.ms for long-running message processing): leaving group

This was also the last log message of snuba consumer and it stayed there doing nothing afterwards.

TypeError: getattr(): attribute name must be string

I am running sentry onpremise and am seeing this stack trace coming from the snuba-cleanup process

snuba-cleanup_1           |   File "/usr/local/bin/snuba", line 11, in <module>
snuba-cleanup_1           |     load_entry_point('snuba', 'console_scripts', 'snuba')()
snuba-cleanup_1           |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
snuba-cleanup_1           |     return self.main(*args, **kwargs)
snuba-cleanup_1           |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
snuba-cleanup_1           |     rv = self.invoke(ctx)
snuba-cleanup_1           |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
snuba-cleanup_1           |     return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-cleanup_1           |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
snuba-cleanup_1           |     return ctx.invoke(self.callback, **ctx.params)
snuba-cleanup_1           |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
snuba-cleanup_1           |     return callback(*args, **kwargs)
snuba-cleanup_1           |   File "/usr/src/snuba/snuba/cli/cleanup.py", line 50, in cleanup
snuba-cleanup_1           |     setup_logging(log_level)
snuba-cleanup_1           |   File "/usr/src/snuba/snuba/environment.py", line 16, in setup_logging
snuba-cleanup_1           |     level=getattr(logging, level, settings.LOG_LEVEL).upper(),
snuba-cleanup_1           | TypeError: getattr(): attribute name must be string

I can't tell if this is a configuration error with how sentry onpremise is configured or a lower level issue with snuba iteslf.

[onprem] [20.11.1] Column "_tags_hash_map" causes exception

Upgraded to 20.11.1 recently and ran the migrations. One of them added a materialized column _tags_hash_map.

The snuba consumer --storage events process then goes into a crash loop because it tries to write to that column. I'm not sure why yet, it's tough to debug in production without stopping the event flow so I will try to debug tonight during non-peak hours, but thought I would mention the exception here.

# Exception after adding the column to the sentry_dist table
2020-12-01 19:34:32,536 Caught ClickhouseWriterError(code=44, message='Received from ch-shard0-production.default.svc.cluster.local:9000. DB::Exception: Cannot insert column _tags_hash_map, because it is MATERIALIZED column.. (version 20.3.21.2 (official build))', row=None), shutting down...
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/consumer.py", line 161, in consumer
    consumer.run()
  File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 109, in run
    self._run_once()
  File "/usr/src/snuba/snuba/utils/streams/processing/processor.py", line 144, in _run_once
    self.__processing_strategy.poll()
  File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/transform.py", line 55, in poll
    self.__next_step.poll()
  File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 122, in poll
    self.__close_and_reset_batch()
  File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 104, in __close_and_reset_batch
    self.__batch.close()
  File "/usr/src/snuba/snuba/utils/streams/processing/strategies/streaming/collect.py", line 64, in close
    self.__step.close()
  File "/usr/src/snuba/snuba/consumer.py", line 217, in close
    self.__insert_batch_writer.close()
  File "/usr/src/snuba/snuba/consumer.py", line 90, in close
    self.__writer.write(
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 175, in write
    batch.join()
  File "/usr/src/snuba/snuba/clickhouse/http.py", line 129, in join
    raise ClickhouseWriterError(code, message, row)
snuba.clickhouse.errors.ClickhouseWriterError: [44] Received from ch-shard0-production.default.svc.cluster.local:9000. DB::Exception: Cannot insert column _tags_hash_map, because it is MATERIALIZED column.. (version 20.3.21.2 (official build))

The workaround for now seems to be dropping the column from the sentry_dist table; leaving it in the _local table doesn't seem to be causing any problems. The only downside is that sometimes there are exceptions from the organization-stats api about the missing column, but it doesn't seem to be in the critical path.

# The non-critical error
DB::Exception: Missing columns: '_tags_hash_map' while processing query: 'SELECT toDateTime(intDiv(toUInt32(timestamp), 300) * 300, 'Universal') AS time, count() AS count FROM sentry_dist PREWHERE environment = 'production' WHERE (deleted = 0) AND (timestamp >= toDateTime('2020-11-30T10:37:01', 'Universal')) AND (timestamp < toDateTime('2020-12-01T10:37:01', 'Universal')) AND (project_id IN tuple(15)) AND has(_tags_hash_map, cityHash64('requestType=graphql'))

Support column aliasing in selected_columns

Something maybe like:

selected_columns=[('foo', 'bar'), 'a', 'b']

To translate to:

SELECT foo AS bar, a, b

Allow configuring sasl arguments for connecting to kafka

Hi there!

Our Kafka cluster requires when bootstrapping connection:

sasl_mechanism
sasl_plain_username
sasl_plain_password

Any chance this can be configured in snuba? I can make a PR if you point me into the right direction.

Add ability to configure clickhouse username and password

Currently it uses the default user of clickhouse which is configured without any password in default installation. In case we need to use a shared clickhouse cluster, we would need to specify username/password to enforce the per user resource quota.

snuba (?) requires ClickHouse to run in UTC timezone.

snuba (?) requires ClickHouse to run in UTC timezone. otherwise event details cannot be retrieved due to the time gap between llogged and requested datetime.

No configuration knob is present at the time. And no debugging messages either.

Move events Kafka message processing into the EventsProcessor class

As of #329 we will have a processor class instantiated by the dataset. This should replace the processor.py file (a lot of independent functions.
We can encapsulate that logic into EventsProcessor and remove some cumbersome column list passed around.

[onprem] [20.11.1] "Could not construct valid time interval" errors in subscription consumer

I upgraded to snuba 20.11.1 yesterday and one error I haven't been able to figure out yet is this constant error in snuba-subscription-consumer-events that produces invalid timestamp ranges.

2020-12-01 19:03:34,104 Flushing 29 items (from {Partition(topic=Topic(name='events'), index=5): Offsets(lo=268199555, hi=268199556), Partition(topic=Topic(name='events'), index=1): Offsets(lo=9283937, hi=9283938), Partition(topic=Topic(name='events'), index=7): Offsets(lo=173129991, hi=173130006), Partition(topic=Topic(name='events'), index=10): Offsets(lo=147809943, hi=147809956)})
snuba.utils.types.InvalidRangeError: (datetime.datetime(2020, 12, 1, 19, 3, 32, 220000), datetime.datetime(2020, 12, 1, 19, 3, 32, 195000))
    raise InvalidRangeError(self.lower, self.upper)
Traceback (most recent call last):
  File \"/usr/src/snuba/snuba/subscriptions/consumer.py\", line 129, in poll
    time_interval = Interval(previous_message.timestamp, message.timestamp)
  File \"<string>\", line 5, in __init__
  File \"/usr/src/snuba/snuba/utils/types.py\", line 67, in __post_init__
2020-12-01 19:03:33,833 Could not construct valid time interval between MessageDetails(offset=147809945, timestamp=datetime.datetime(2020, 12, 1, 19, 3, 32, 220000)) and Message(partition=Partition(topic=Topic(name='events'), index=10), offset=147809946)!
2020-12-01 19:03:31,990 Offset commit took 2ms

snuba-subscription-consumer-events

        - snuba
        - subscriptions
        - --auto-offset-reset=latest
        - --consumer-group=snuba-events-subscriptions-consumers
        - --topic=events
        - --result-topic=events-subscription-results
        - --dataset=events
        - --commit-log-topic=snuba-commit-log
        - --commit-log-group=snuba-consumers
        - --delay-seconds=60
        - --schedule-ttl=60
        - --log-level=DEBUG
        - --partitions=12

Any ideas?

Segmentation Fault when running bootstrap

So I've seen this happen in a couple different scenarios. I'm trying to run Sentry in Kubernetes. After getting everything installed, I go to bootstrap Snuba (create the Kafka topics). I've experienced this with both Confluent Kafka as well as Apache Kafka (and multiple versions of each). I've also experienced this in both Minikube and AWS EKS clusters.

/usr/src/snuba# LOG_LEVEL=debug snuba bootstrap --force
2020-01-08 18:39:41,151 Using Kafka with ('kafka-cp-kafka-headless.kafka:9092',)
2020-01-08 18:39:41,165 Attempting to connect to Kafka (attempt 0)
Segmentation fault (core dumped)

But if I add some debug log statements, it starts to work....
Here is my git diff which caused it to work suddenly

diff --git a/snuba/cli/bootstrap.py b/snuba/cli/bootstrap.py
index 28f52f8..23a85fb 100644
--- a/snuba/cli/bootstrap.py
+++ b/snuba/cli/bootstrap.py
@@ -35,7 +35,6 @@ def bootstrap(
     if kafka:
         logger.debug("Using Kafka with %r", bootstrap_server)
         from confluent_kafka.admin import AdminClient, NewTopic

         attempts = 0
         while True:
             try:
@@ -58,6 +57,7 @@ def bootstrap(
                 time.sleep(1)

         topics = []
+        logger.debug("Made Connection Successfully")
         for name in DATASET_NAMES:
             dataset = get_dataset(name)
             table_writer = dataset.get_table_writer()
@@ -71,14 +71,14 @@ def bootstrap(
                             replication_factor=topic_spec.replication_factor,
                         )
                     )
-
+        print("Created Topics")
         for topic, future in client.create_topics(topics).items():
             try:
                 future.result()
                 logger.info("Topic %s created", topic)
             except Exception as e:
                 logger.error("Failed to create topic %s", topic, exc_info=e)
-
+        print("Actually created topics now")
     from snuba.clickhouse.native import ClickhousePool

     attempts = 0

It started to work after the 3rd log statement was added.

Has anyone else experienced this?

Snuba cleanup for sentry onpremise

Environment

Sentry self-hosted 21.3.0 (based on docker-compose from here https://github.com/getsentry/onpremise/blob/21.3.0/docker-compose.yml)

Steps to Reproduce

Setup all containers and up snuba-cleanup container
Check logs for snuba-cleanup: Every 5 minutes in log - Dropped 0 partitions on None
It looks like variable CLICKHOUSE_HOST is ignored here

snuba/snuba/cli/cleanup.py

Line 13 in 41d7fe7

"--clickhouse-host", help="Clickhouse server to write to.",

After manual run command in container - snuba cleanup --clickhouse-host CLICKHOUSE_HOST_HERE --dry-run True
i got Dropped 0 partitions on CLICKHOUSE_HOST_HERE

Expected Result

Pass variable https://github.com/getsentry/onpremise/blob/bdd2686021cfea07507bc07d2756ac34a775c680/docker-compose.yml#L44 into cleanup command

Actual Result

variable is None instead of clickhouse host

I'am not sure, bug this or not.

Unknown field found while parsing JSONEachRow format

I'm getting the above error while I'm receiving an error event. Currently, I'm using a self-hosted docker of Sentry. I have also asked about it in the forum, but haven't received any response regarding this. Here is the stack trace of the error.

snuba-consumer_1          | 2020-02-07 02:22:33,189 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 450}
snuba-consumer_1          | 2020-02-07 02:22:40,701 Flushing 1 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=450, hi=450)}): forced:False size:False time:True
snuba-consumer_1          | Traceback (most recent call last):
snuba-consumer_1          |   File "/usr/local/bin/snuba", line 11, in <module>
snuba-consumer_1          |     load_entry_point('snuba', 'console_scripts', 'snuba')()
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
snuba-consumer_1          |     return self.main(*args, **kwargs)
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
snuba-consumer_1          |     rv = self.invoke(ctx)
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
snuba-consumer_1          |     return _process_result(sub_ctx.command.invoke(sub_ctx))
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
snuba-consumer_1          |     return ctx.invoke(self.callback, **ctx.params)
snuba-consumer_1          |   File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
snuba-consumer_1          |     return callback(*args, **kwargs)
snuba-consumer_1          |   File "/usr/src/snuba/snuba/cli/consumer.py", line 156, in consumer
snuba-consumer_1          |     consumer.run()
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/batching.py", line 137, in run
snuba-consumer_1          |     self._run_once()
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/batching.py", line 142, in _run_once
snuba-consumer_1          |     self._flush()
snuba-consumer_1          |   File "/usr/src/snuba/snuba/utils/streams/batching.py", line 233, in _flush
snuba-consumer_1          |     self.worker.flush_batch(self.__batch_results)
snuba-consumer_1          |   File "/usr/src/snuba/snuba/consumer.py", line 98, in flush_batch
snuba-consumer_1          |     self.__writer.write(inserts)
snuba-consumer_1          |   File "/usr/src/snuba/snuba/clickhouse/http.py", line 95, in write
snuba-consumer_1          |     raise ClickHouseError(int(code), type, message)
snuba-consumer_1          | snuba.clickhouse.http.ClickHouseError: [117] DB::Exception: Unknown field found while parsing JSONEachRow format: _tags_flattened: (at row 1)
snuba-consumer_1          | + '[' consumer = bash ']'
snuba-consumer_1          | + '[' c = - ']'
snuba-consumer_1          | + '[' consumer = api ']'
snuba-consumer_1          | + snuba consumer --help
snuba-consumer_1          | + set -- snuba consumer --auto-offset-reset=latest --max-batch-time-ms 750
snuba-consumer_1          | + exec gosu snuba snuba consumer --auto-offset-reset=latest --max-batch-time-ms 750
snuba-consumer_1          | 2020-02-07 02:22:50,589 New partitions assigned: {Partition(topic=Topic(name='events'), index=0): 451}

I would love the help to find and fix this issue.

snuba 21.1.0 fails to run: ImportError: cannot import name 'PickleBuffer' from 'pickle'

Env:
Self-hosted sentry, built from sources.
Snuba: 21.1.0
Python 3.7.7 (default, Jun 7 2020, 08:15:55)

Newly installed snuba fails to run (btw why does the egg always have version 0.0.0 ?):

Jan 24 10:06:05 sentry snuba[18690]: ImportError: cannot import name 'PickleBuffer' from 'pickle' (/usr/local/lib/python3.7/pickle.py)
Jan 24 10:06:05 sentry snuba[18684]: Traceback (most recent call last):
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/bin/snuba", line 11, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     load_entry_point('snuba==0.0.0', 'console_scripts', 'snuba')()
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
Jan 24 10:06:05 sentry snuba[18684]:     return get_distribution(dist).load_entry_point(group, name)
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point
Jan 24 10:06:05 sentry snuba[18684]:     return ep.load()
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2443, in load
Jan 24 10:06:05 sentry snuba[18684]:     return self.resolve()
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve
Jan 24 10:06:05 sentry snuba[18684]:     module = __import__(self.module_name, fromlist=['__name__'], level=0)
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/cli/__init__.py", line 21, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     module = __import__(module_name, globals(), locals(), ["__name__"])
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/cli/bootstrap.py", line 6, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.datasets.factory import ACTIVE_DATASET_NAMES, get_dataset
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/datasets/factory.py", line 4, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.datasets.dataset import Dataset
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/datasets/dataset.py", line 6, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.datasets.entities.factory import get_entity
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/datasets/entities/factory.py", line 4, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.datasets.entity import Entity
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/datasets/entity.py", line 5, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.datasets.plans.query_plan import ClickhouseQueryPlan
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/datasets/plans/query_plan.py", line 7, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.clickhouse.processors import QueryProcessor
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/clickhouse/processors.py", line 6, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.request.request_settings import RequestSettings
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/request/__init__.py", line 12, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.request.request_settings import RequestSettings
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/request/request_settings.py", line 5, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.state.rate_limit import get_global_rate_limit_params, RateLimitParameters
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/state/__init__.py", line 18, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from snuba.utils.streams.backends.kafka import (
Jan 24 10:06:05 sentry snuba[18684]:   File "/usr/local/lib/python3.7/site-packages/snuba-0.0.0-py3.7.egg/snuba/utils/streams/backends/kafka.py", line 9, in <module>
Jan 24 10:06:05 sentry snuba[18684]:     from pickle import PickleBuffer
Jan 24 10:06:05 sentry snuba[18684]: ImportError: cannot import name 'PickleBuffer' from 'pickle' (/usr/local/lib/python3.7/pickle.py)

Create a unified Snuba consumer

Currently we have many Snuba consumers and seems like we are going to add more as we add more features around Snuba. This poses a maintenance and synchronization issue for on-premise as the growing number of consumers is hard to maintain. People also keep forgetting to add new consumers to the self-hosted repo, causing issues or missing features (this has happened twice now).

Proposal: introduce a consumers command to Snuba CLI that runs all core consumers with default settings (the settings we use in self-hosted) using multiprocessing so the consumers are still logically separate.

/cc @fpacifici @lynnagara @chadwhitacre

Snuba Issue startup

Hello,

we will instll sentry in an openshift environment but we have trouble with sentry-snuba-api
startup log:

Image: getsentry/snuba:fdcc3229b24b659a48b4934e4ffc0ccaa6364a37

+ '[' a = - ']'
--
  | + snuba api --help
  | + set -- snuba api
  | + set gosu snuba snuba api
  | + exec gosu snuba snuba api
  | error: failed switching to "snuba": operation not permitted

any ideas ?

Kafka with tls auth

Hello
we are using a kafka cluster with auth
But I can't find the path to add creads into snuba settings

Support for non trivial filters in WHERE clause

See #115 for the example query.

But unable to specify WHERE notEmpty(...) through the JSON API.

Syntax error in example query from docs

Environment

The sample query in the README.md has a syntax error (missing comma after the to_date field).

I'm happy to put together a pull request to fix it, but I'm not sure if it's autogenerated from somewhere else, since it's also used as the homepage for the web server itself.

I will note that, even adding the trailing comma, we stil

Steps to Reproduce

Current contents (in HEAD from master):

{
    "project":[1,2],
    "selected_columns": ["tags[environment]"],
    "aggregations": [
        ["max", "received", "last_seen"]
    ],
    "conditions": [
        ["tags[environment]", "=", "prod"]
    ],
    "from_date": "2011-07-01T19:54:15",
    "to_date": "2018-07-06T19:54:15"
    "granularity": 3600,
    "groupby": ["group_id", "time"],
    "having": [],
    "issues": [],
}

which yields

{
    "error": {
        "type": "json",
        "message": "Expecting ',' delimiter or '}': line 12 column 5 (char 291)"
    }
}

Note that adding the trailing comma yields a different syntax error; however, I'm not sure what the fix is here, because the property names all appear to be in single quotes. (Removing the trailing comma from the last entry doesn't fix it either - instead of a 400 we get an empty HTTP response.)

{
    "error": {
        "type": "json",
        "message": "Expecting property name enclosed in double quotes: line 16 column 1 (char 386)"
    }
}

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 38: invalid continuation byte

Snuba consumer failed on processing messages contain invalid UTF-8 encoding. I think this problem come from rapidjson (not from snuba). If we have message like this

{"name":"root","funcName":"handle_message","lineno":127,"message":"'utf-8' codec can't encode character '\\udce4' in position 181: surrogates not allowed","exc_info":"File \"/ve/lib/python3.8/site-packages/aiohttp/http_writer.py\", line 161, in _py_serialize_headers\n    return line.encode('utf-8') + b'\\r\\n'\nUnicodeEncodeError: 'utf-8' codec can't encode character '\\udce4' in position 181: surrogates not allowed","timestamp":"2020-07-28T08:52:56.788995+00:00","interpreter":"python","level_name":"ERROR"}

Snubar consumer fired exeption and exit with exit code = 1

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 38: invalid continuation byte
  File "snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "snuba/cli/consumer.py", line 146, in consumer
    consumer.run()
  File "snuba/utils/streams/processing.py", line 132, in run
    self._run_once()
  File "snuba/utils/streams/processing.py", line 143, in _run_once
    self.__processing_strategy.process(msg)
  File "snuba/utils/streams/batching.py", line 151, in process
    result = self.__worker.process_message(message)
  File "snuba/consumer.py", line 60, in process_message
    rapidjson.loads(message.payload.value),

Associate to datasets a way to generate the clickhouse tables for production enviroinment

The current dataset implementation (see Dataset class) provides emthods that generate the DDL statement to create and maintain the underlying table in Clickhouse.
This is limited to the dev environment in that this assumes one node Clickhouse and no distributed tables.

In production things would be different:

api.py should not automatically create a table where it does not exists
the DDL statement would be different for distributed tables
dataset right now does not allow to store different values for DDL parameters between prod and dev (only local vs distributed is supported)

Before all those steps we need to define the process to deploy a dataset in production (so how to create the table) and how to perform migrations.

Make Snuba bootstrap less noisy

Currently, the bootstrap command prints out transitional errors to the console for debugging purposes. This causes confusion for self-hosted users when they are installing.

Instead of printing these at all times, we should only show these if the logging level is set to debug/verbose.

Add possibility to prefix snuba kafka topics

I have a on prem deployment and the kafka deployment we have is not only used by sentry. Currently the topics are generic like "event". Not possible to really tell that the topic belongs to sentry. It would be nice to have an option to pass a prefix value and all sentry topics will be prefix by that value

I am not sure if this is the right place to open this issue or in the main sentry repo but since I think it's very specific to snuba. I opened here and can reopen in sentry repo if @BYK thinks otherwise

slow snuba healthcheck response

Environment

On-premise
Sentry 21.1.0

Steps to Reproduce

After upgrade from 20.8 to 21.1 load to Clickhouse raised x2 times

Then I found that "sentry-snuba-api" pods from time to time restarted.
The reason

Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  6m23s (x1534 over 18h)  kubelet  Readiness probe failed: Get http://XXX:1218/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  2m19s (x1532 over 18h)  kubelet  Liveness probe failed: Get http://XXX:1218/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

XXX - - [16/Feb/2021:07:52:04 +0000] "POST /query HTTP/1.1" 200 420 "tsdb" "-"
XXX- - [16/Feb/2021:07:52:04 +0000] "POST /query HTTP/1.1" 200 582 "tsdb" "-"
SIGINT/SIGQUIT received...killing workers...
worker 1 buried after 1 seconds
goodbye to uWSGI.
stream closed

We use Sentry helm chart https://github.com/sentry-kubernetes/charts/blob/9.0.0/sentry/templates/deployment-snuba-api.yaml#L67-L86

and they set timeout = 2 sec

Almost all time response from / takes 200 ms

But I faced when sometimes it takes more than 2 sec

example

kubectl -n sentry exec -ti sentry-snuba-api-6ff68d6c77-24zlg -- bash

time curl http://localhost:1218/

<html/>
...
</html>
real	0m3.339s
user	0m0.008s
sys	0m0.000s

Expected Result

Reduce count of killing container events

Actual Result

Could you please explain do we need increase timeout from 2 seconds -> 5 seconds for example

Or may it's a problem on default uwsgi settings when uwsgi pool is queued and even query to "/" takes lot of time?

Make replacer.py and the replacer script events specific.

With the basic dataset implementation we put a lot of events specific methods in the abstract dataset because the replacer still reasons instead of generic dataset. This is not needed. All the replacer logic is very specific to the events data model. We can make it depend dirctly on the events dataset.

Snuba still writes to table sentry_local instead of errors_local

Environment

Sentry self-hosted/on-premise 21.6.1

Steps to Reproduce

Upgraded from 20.8.0 to 21.6.1
Migration created table errors_local and moved data there.

SELECT COUNT(*)
FROM errors_local

┌─COUNT()─┐
│ 7489513 │
└─────────┘

After all snuba process started (21.6.1) table errors_local doesn't increase but the table sentry_local started increasing

SELECT COUNT(*)
FROM sentry_local

┌─COUNT()─┐
│   16672 │
└─────────┘

In web dashboard I see only old data. Web is using new table errors_local

Expected Result

As far as I understand the table sentry_local is deprecated.
Snuba has to work with table errors_local.

Actual Result

Table sentry_local still using by snuba.

Consumer Decode Error: The surrogate pair in string is invalid.

Snuba Consumer on startup appears to be reading Kafka and struggling to parse the offset response with the following error:

Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 11, in <module>
    load_entry_point('snuba', 'console_scripts', 'snuba')()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/consumer.py", line 162, in consumer
    consumer.run()
  File "/usr/src/snuba/snuba/utils/streams/batching.py", line 137, in run
    self._run_once()
  File "/usr/src/snuba/snuba/utils/streams/batching.py", line 152, in _run_once
    self._handle_message(msg)
  File "/usr/src/snuba/snuba/utils/streams/batching.py", line 177, in _handle_message
    result = self.worker.process_message(msg)
  File "/usr/src/snuba/snuba/consumer.py", line 60, in process_message
    value = rapidjson.loads(message.payload.value)
rapidjson.JSONDecodeError: Parse error at offset 35912: The surrogate pair in string is invalid.

I've tried adjusting the auto-reset-offset to earliest to change position but it hasn't worked. Looking into other options, but just raising this as a bug

Separate Kafka message processing from validation in datasets

This part of #294 :
#333

Split the processing from the validation method
Add the json schema to validate the Kafka message
make it optional for new datasets.

Improve docker tag names

It seems that old builds listed here:

https://hub.docker.com/r/getsentry/snuba/builds

are not available for pull

root@baops05:/opt/onpremise# docker pull getsentry/snuba:ab0ca24f2aa1058fcf6df036617f765a23275d86
Error response from daemon: manifest for getsentry/snuba:ab0ca24f2aa1058fcf6df036617f765a23275d86 not found

Because all images are tagged just like "latest".
And currently there is no way to pull working image

Can you please remove corrupted image and tag previous working as latest?
Would also be good to keep possibility to pull by commit hashtags for easier rollbacks in case something breaks

abstractionAdd DROP table support to the dataset

The new dataset abstraction provides all what is needed to build tables and bootstrap a dev environment. This works well for test environment and dev environment.
The test scripts should also be able to tear down a dataset and now the DROP table call is just hardcoded in multiple places.
We should move this statement into the dataset class in the same way it was done in #294

Add support for replicated tables to bootstrap

In production clickhouse setup, generally it's recommend to use replicated tables for mergetree family for high availability. Right now, there is no way to create replicated table via snuba bootstrap.

snuba silently stops processing new events

Versions:
Python 2.7.18
snuba, version 0.0.0, built from git, last commit in the master branch:

commit 048d7cf1cc0df866830194f76d5ec46a2dd1f51c (HEAD -> master, origin/master, origin/HEAD)
Author: Filippo Pacifici <[email protected]>
Date:   Thu Jun 11 11:38:30 2020 -0700

Snuba silently stops processing new events. This is really everything I can say: it still shows new HTTP
activity in its log, but new events aren't appearing in the Sentry.

Snuba is running with SNUBA_LOG_LEVEL="WARNING".

Workaround: service restart each 6 hours.

Support Redis Authentication

I'm trying to install Snuba on my Kubernetes instance alongside Sentry.
Sentry's Helm chart installs Redis with a password (It generates a secret), and there was no option for me to specify that password for Snuba.

I opened up the source code and it looks like a simple solution:
Another setting (REDIS_PASSWORD) that would be passed to startup_nodes and to StrictRedis' constructor on the snuba/redis.py module.

The environment variable DATASET_MODE does not work

We deployed sentry ourselves, where clickhous builds clusters, so we wanted snuba to create distributed tables when it created tables.I specified DATASET_MODE to be dist and distributed, but the created table is still the local table.

table name
errors_local
groupassignee_local
groupedmessage_local
outcomes_hourly_local
outcomes_mv_hourly_local
outcomes_raw_local
sentry_local
sessions_hourly_local
sessions_hourly_mv_local
sessions_raw_local
transactions_local

I also changed settings.py to take DATASET_MODE from the environment variable.
https://github.com/getsentry/snuba/blob/master/snuba/settings.py#L15

snuba doesn't fully log Kafka/Confluent connectivity issues.

Right now snuba doesn't fully log Kafka/Confluent connectivity issues.

For instance when Snuba is able to connect to kafka and retrieve events this looks like this:

May 25 02:32:50 dev syslogd: last message repeated 7 times
May 25 02:33:07 dev snuba[53439]: 2020-05-25 02:33:07,431 Flushing 1 items (from {Partition(topic=Topic(name='events'), index=0): Offsets(lo=1170, hi=1170)}): forced:False size:False time:True
May 25 02:33:08 dev snuba[53439]: 2020-05-25 02:33:08,347 Worker flush took 915ms
May 25 02:33:10 dev snuba[53445]: 2020-05-25 02:33:10,385 Flushing 1 items (from {Partition(topic=Topic(name='event-replacements'), index=0): Offsets(lo=1, hi=1)}): forced:False size:False time:True
May 25 02:33:10 dev snuba[53445]: 2020-05-25 02:33:10,518 Replacing 1268 rows took 76ms
May 25 02:33:10 dev snuba[53445]: 2020-05-25 02:33:10,518 Worker flush took 132ms

But when, for example, Kafka hostname cannot be resolved, and upon connection it's returned in further connect descriptor, Snuba is silent about this fact. It can only be determined using strace/truss/ktrace/systemtap/dtrace frameworks.

ClickHouse queries times out for issues with events in million

Hi,

We are running Sentry version 20.7.2. We have few issues which have events in 10-20 millions. Such events fail to load in dashboard. On looking at the logs of snuba-api, i came across this exception:

2020-09-03 15:20:57,494 Error running query: SELECT (arrayElement((arrayJoin(arrayFilter((pair -> in(arrayElement(pair, 1), tuple('device.family', 'browser.name', 'url', 'client_os', 'colo', 'level', 'logger', 'browser', 'sentry:release', 'device', 'client_os.name', 'os'))), arrayMap((x, y -> [x, y]), tags.key, tags.value))) AS snuba_all_tags), 1) AS tags_key), (arrayElement(snuba_all_tags, 2) AS tags_value), (count() AS count), (min(timestamp) AS first_seen), (max(timestamp) AS last_seen) FROM sentry_local PREWHERE in((nullIf(group_id, 0) AS group_id), tuple(362352)) WHERE equals(deleted, 0) AND greaterOrEquals(timestamp, toDateTime('2020-08-12T12:02:46', 'Universal')) AND less(timestamp, toDateTime('2020-09-03T15:20:28', 'Universal')) AND in(project_id, tuple(44)) AND in(project_id, tuple(44)) AND in(tags_key, tuple('browser', 'browser.name', 'client_os', 'client_os.name', 'colo', 'device', 'device.family', 'level', 'logger', 'os', 'sentry:release', 'url')) GROUP BY (tags_key, tags_value) ORDER BY count DESC LIMIT 9 BY tags_key LIMIT 1000 OFFSET 0
timed out waiting for value

This look very expensive clickhouse full-scan (non-indexed) query to me.

Thank you.

Sentry components upgrade fails for snuba: 20.6.0 -> 20.7.2

Hi,

I tried upgrading each component of my sentry setup (sentry, snuba and relay) from 20.6.0 to 20.7.2. As far as I know in this release (20.7.2), relay is now a mandatory component and events are ingested into kafka directly from relay. After upgrading, i noticed that snuba-consumer started failing with this stacktrace:

2020-08-02 16:21:17,992 New partitions assigned: {Partition(topic=Topic(name='sentry.staging.events'), index=0): 1}
Traceback (most recent call last):
  File "/usr/local/bin/snuba", line 33, in <module>
    sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/src/snuba/snuba/cli/consumer.py", line 146, in consumer
    consumer.run()
  File "/usr/src/snuba/snuba/utils/streams/batching.py", line 285, in run
    self._run_once()
  File "/usr/src/snuba/snuba/utils/streams/batching.py", line 296, in _run_once
    self.__processor.process(msg)
  File "/usr/src/snuba/snuba/utils/streams/batching.py", line 139, in process
    result = self.__worker.process_message(message)
  File "/usr/src/snuba/snuba/consumer.py", line 60, in process_message
    rapidjson.loads(message.payload.value),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 0: invalid start byte

On looking at the kafka message for events topic, i found the message weird with some non-unicode character:
https://pastebin.com/raw/iwhMW3Jd

Logs on relay side looks like this:

2020-08-02T16:02:36Z [relay_server::actors::project] DEBUG: project 2 initialized without state
2020-08-02T16:02:36Z [relay_server::actors::project] DEBUG: project 2 state requested
2020-08-02T16:02:36Z [relay_server::actors::project] DEBUG: project 2 state request amended
2020-08-02T16:02:36Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 1)
2020-08-02T16:02:36Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-02T16:02:36Z [relay_server::actors::project] DEBUG: project 2 state updated
2020-08-02T16:11:15Z [relay_server::actors::project] DEBUG: project 2 state requested
2020-08-02T16:11:15Z [relay_server::actors::project] DEBUG: project 2 state request amended
2020-08-02T16:11:15Z [relay_server::actors::project_upstream] DEBUG: updating project states for 1/1 projects (attempt 1)
2020-08-02T16:11:15Z [relay_server::actors::project_upstream] DEBUG: sending request of size 1
2020-08-02T16:11:15Z [relay_server::actors::project] DEBUG: project 2 state updated

I am not sure what am i missing. Thank you for the help in advance.

	logger.debug("Attempting to connect to Kafka (attempt %d)", attempts)
	client = AdminClient(
	{
	"bootstrap.servers": ",".join(bootstrap_server),
	"socket.timeout.ms": 1000,
	}
	)
	client.list_topics(timeout=1)

getsentry / snuba Goto Github PK

snuba's Introduction

Features:

snuba's People

Contributors

Stargazers

Watchers

Forkers

snuba's Issues

Environment

Steps to Reproduce

Expected Result

Actual Result

Environment

Steps to Reproduce

Expected Result

Actual Result

Environment

Steps to Reproduce

Expected Result

Actual Result

Environment

Steps to Reproduce

Environment

Steps to Reproduce

Expected Result

Actual Result

Environment

Steps to Reproduce

Expected Result

Actual Result

Recommend Projects

Recommend Topics

Recommend Org