eclipsetrading / ksnap Goto Github PK
View Code? Open in Web Editor NEWKSnap is for making Kafka Data Snapshots
License: MIT License
KSnap is for making Kafka Data Snapshots
License: MIT License
Python runtime state: initialized
Current thread 0x00007fb377639740 (most recent call first):
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/manager.py", line 80 in run
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/__main__.py", line 23 in main
File "/opt/miniconda3/envs/ksnap/bin/ksnap", line 8 in <module>
/bin/kafka-data-snapshot: line 67: 26988 Aborted "${ksnap_bin}" backup -b "${broker_list}" -t "${topics}" -d "${dest_tmp_dir}" --ignore-missing-topics```
The backup process involved reading messages from multiple topics before writing them to disk. However, this approach had the potential to consume excessive memory on the host system when dealing with a large number of topics.
To address this issue, an improvement is suggested. Instead of waiting to read all messages from all topics before writing them to disk, the backup process saves the data after reading each individual topic. This iterative approach ensures that the memory usage remains manageable, as the process iterates through all the necessary topics for backup.
There is a mechanism where ksnap reader waits for messages and exit if there is no new messages after x secs. This serves the purpose of consuming all messages and exit the consuming stage if ksnap has a good and fast connection with target kafka cluster.
I think we should get the latest offsets of all topics and wait until we receive messages with said offsets then exit along with a generous consumer timeout.
Currently in backup workflow the tool will exit on first error (i.e. topic does not exist). It should first check all errors (w/o starting the backup process itself) and print all errors if any.
File "/opt/miniconda3/envs/ksnap/bin/ksnap", line 8, in <module>
sys.exit(main())
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/__main__.py", line 23, in main
ksnap_manager.run()
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/manager.py", line 81, in run
self.backup()
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/manager.py", line 55, in backup
data_flow_manager.write(offsets, partitions)
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/data_flow.py", line 60, in write
partition.to_file(file_path)
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/partition.py", line 45, in to_file
[m.to_row() for m in self.messages])
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/partition.py", line 45, in <listcomp>
[m.to_row() for m in self.messages])
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/message.py", line 21, in to_row
headers = [
File "/opt/miniconda3/envs/ksnap/lib/python3.9/site-packages/ksnap/message.py", line 22, in <listcomp>
{"key": key, "val": b64encode(val).decode("ascii")}
File "/opt/miniconda3/envs/ksnap/lib/python3.9/base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'NoneType'
The commit 51b7eee addressed the issue of high memory consumption during the backup process. However, it resulted in an extended processing time for backups.
This issue aims to find a more optimal solution that balances between reducing memory usage and minimizing backup processing time. The goal is to improve the efficiency of the backup process without compromising its overall performance.
What about SSL/SASL_SSL?
How to pass configuration to connect?
Hey
Version v2.0.5 backup results in the following errors
2021-06-03 22:59:53,327 WARNING Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=2)
2021-06-03 22:59:53,327 WARNING Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=11)
2021-06-03 22:59:53,328 WARNING Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=8)
2021-06-03 22:59:53,328 WARNING Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=5)
2021-06-03 22:59:53,447 WARNING Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=1)
2021-06-03 22:59:53,447 WARNING Unknown error fetching data for topic-partition TopicPartition(topic='esp.commands', partition=4)
Version v2.0.3 works fine
Hi,
I got to know about this tool from this discussion itadventurer/kafka-backup#52
I have a similar requirement for taking point-in-time snapshots of Kafka data and this tool is solving most of my requirements.
I have few questions:
Thank you
To have the tool not requiring both confluent and kafka-python libraries installed to work.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.