eclipsetrading / ksnap Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 7.0 69 KB

KSnap is for making Kafka Data Snapshots

License: MIT License

Python 100.00%

ksnap's People

Contributors

Stargazers

Watchers

Forkers

rubatharisan alex817 keenan-s burkand datalounges derekl-beep

ksnap's Issues

Python error when deallocating objects

Python runtime state: initialized
Current thread 0x00007fb377639740 (most recent call first):
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/manager.py", line 80 in run
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/__main__.py", line 23 in main
  File "/opt/miniconda3/envs/ksnap/bin/ksnap", line 8 in <module>
/bin/kafka-data-snapshot: line 67: 26988 Aborted                 "${ksnap_bin}" backup -b "${broker_list}" -t "${topics}" -d "${dest_tmp_dir}" --ignore-missing-topics```

Memory Exhaustion during Backup Process with Multiple Topics

The backup process involved reading messages from multiple topics before writing them to disk. However, this approach had the potential to consume excessive memory on the host system when dealing with a large number of topics.

To address this issue, an improvement is suggested. Instead of waiting to read all messages from all topics before writing them to disk, the backup process saves the data after reading each individual topic. This iterative approach ensures that the memory usage remains manageable, as the process iterates through all the necessary topics for backup.

Ksnap stopped consuming messages from topics prematurely when connecting to a kafka broker with slow connection

There is a mechanism where ksnap reader waits for messages and exit if there is no new messages after x secs. This serves the purpose of consuming all messages and exit the consuming stage if ksnap has a good and fast connection with target kafka cluster.

I think we should get the latest offsets of all topics and wait until we receive messages with said offsets then exit along with a generous consumer timeout.

Do full errors check in backup workflow

Currently in backup workflow the tool will exit on first error (i.e. topic does not exist). It should first check all errors (w/o starting the backup process itself) and print all errors if any.

Unable to serialise headers with value of None

  File "/opt/miniconda3/envs/ksnap/bin/ksnap", line 8, in <module>
    sys.exit(main())
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/__main__.py", line 23, in main
    ksnap_manager.run()
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/manager.py", line 81, in run
    self.backup()
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/manager.py", line 55, in backup
    data_flow_manager.write(offsets, partitions)
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/data_flow.py", line 60, in write
    partition.to_file(file_path)
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/partition.py", line 45, in to_file
    [m.to_row() for m in self.messages])
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/partition.py", line 45, in <listcomp>
    [m.to_row() for m in self.messages])
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/message.py", line 21, in to_row
    headers = [
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/message.py", line 22, in <listcomp>
    {"key": key, "val": b64encode(val).decode("ascii")}
  File "/opt/miniconda3/envs/ksnap/lib/python3.9/base64.py", line 58, in b64encode
    encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'NoneType'

Optimizing backup process: Reduce memory consumption while minimizing processing time

The commit 51b7eee addressed the issue of high memory consumption during the backup process. However, it resulted in an extended processing time for backups.

This issue aims to find a more optimal solution that balances between reducing memory usage and minimizing backup processing time. The goal is to improve the efficiency of the backup process without compromising its overall performance.

SSL/SASL_SSL

What about SSL/SASL_SSL?
How to pass configuration to connect?

Version v2.0.5 WARNING Unknown error fetching data for topic-partition

Hey
Version v2.0.5 backup results in the following errors

2021-06-03 22:59:53,327 WARNING  Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=2)
2021-06-03 22:59:53,327 WARNING  Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=11)
2021-06-03 22:59:53,328 WARNING  Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=8)
2021-06-03 22:59:53,328 WARNING  Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=5)
2021-06-03 22:59:53,447 WARNING  Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=1)
2021-06-03 22:59:53,447 WARNING  Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=4)

Version v2.0.3 works fine

Need few information about topics and duplicate data in kafka

Hi,

I got to know about this tool from this discussion itadventurer/kafka-backup#52
I have a similar requirement for taking point-in-time snapshots of Kafka data and this tool is solving most of my requirements.

I have few questions:

Is there any way to provide regular expression for topics like if I want to take backup for all topics, can I pass "*" instead of a list of all topics?
If either key or value is duplicate in Kafka data then the backup tool is throwing error "sqlite3.IntegrityError" and skipping those data points. I am new to Kafka, so it is the right behaviour, I mean Kafka is not supposed to stream duplicate data (key or value) or values with null keys? If so, how to handle this error

Thank you

Implement admin and writer classes with kafka-python as underlying library.

To have the tool not requiring both confluent and kafka-python libraries installed to work.

eclipsetrading / ksnap Goto Github PK

ksnap's People

Contributors

Stargazers

Watchers

Forkers

ksnap's Issues

Python error when deallocating objects

Memory Exhaustion during Backup Process with Multiple Topics

Ksnap stopped consuming messages from topics prematurely when connecting to a kafka broker with slow connection

Do full errors check in backup workflow

Unable to serialise headers with value of None

Optimizing backup process: Reduce memory consumption while minimizing processing time

SSL/SASL_SSL

Version v2.0.5 WARNING Unknown error fetching data for topic-partition

Need few information about topics and duplicate data in kafka

Implement admin and writer classes with kafka-python as underlying library.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent