Comments (21)
Hey guys,
is the compaction still not planned? We take an incremental snapshot every 5 minutes, so for one day we will end up with 288 incremental snapshots (~10 MB). If you now try to restore this it will take ~30-45min which is totally impracticable. We will not be able to implement a control plane move to other seeds if we have these huge restore times.
So, please think about compacting the incremental snapshots so that we can significantly speed up the restoration process.
from etcd-druid.
We had listed 4 topics for optimization for etcd-backup-restore.
- Delta snapshot - Completed (except for the large value case)
- Full snapshot - Completed
- Database verification - on going
- Database restoration - next up
I think that this requirement for compaction of delta snapshots comes under the database restoration optimisation and should be picked along with it. We can even pick it in parallel to the topic of database verification if we have the bandwidth.
from etcd-druid.
We haven't planned compacting the incremental backup files into one as of now. But if you want we create a sub command that will take the incremental snaps and compact them and push them at command execution.
from etcd-druid.
We just need this for our restore process which is manual at the moment. So it would be enough if there would be a sub command which works on a folder containing the full backup and the incremental updates (e.g. downloaded from the corresponding S3 bucket folder) and creates a compacted full backup at the end. We could run this locally. This one we could then easily feed in when we restore the cluster.
from etcd-druid.
There is one way to do this indirectly. If you delete the member directory and do a data directory initialization, the data directory will be restored from the latest full and incremental snapshot. Can you check if this is sufficient?
from etcd-druid.
You can use etcdbrctl restore
subcommand to restore from full snapshot including set of incremental snapshots. This will give you working <etcd-data-directory>
. if this is not your requirement and you want only full snapshot file to feed in to kubify, you can can use db
file from <etcd-data-direcotry>/member/snap/db
.
from etcd-druid.
Thanks guys! I will have a look into it.
from etcd-druid.
Yes. This is needed.
from etcd-druid.
@shreyas-s-rao @swapnilgm Should this ticket be rather moved to https://github.com/gardener/etcd-druid/issues?
While we are at it, maybe there are more issues in this repo that should go there?
from etcd-druid.
Makes sense. These issue were crated prior to introducing etcd-druid.
from etcd-druid.
I think there is no real benefit of doing incrementals at all, when looking at our environment, full backup is ~100MB and a increment ~30-40MB uncompressed. I really would propose to switch to full backups only but compress them with lz4, this will lead to smaller full backups than the actual incremental files. The decompression will add ~0.2 sec. per file before the actual restore can start.
But overall the restoration time will decrease by factors.
related to: gardener/etcd-backup-restore#263
from etcd-druid.
@majst01 We require incremental snapshots in a general sense to avoid frequent large full snapshots which usually create a network costs. If you do want to avoid delta snapshots, you can simply configure chart/etcd-backup-restore/values.yaml
by setting the backup.deltaSnapshotPeriod
value to anything less than 1s
to completely disable delta snapshots, and you can also configure the backup.schedule
value to set the full snapshot schedule to a higher frequency.
Regarding backup compression, we have opened gardener/etcd-backup-restore#255 and it's on the project roadmap.
from etcd-druid.
Thanks @shreyas-s-rao for the hints.
We tried ourselves to set the backup to do full backups only, but from our experience setting backup.schedule
had no effect.
Maybe @Gerrit91 can give you more information on that.
We instead modified the generated etcd
resource manually which is not the way to go.
from etcd-druid.
We tried ourselves to set the backup to do full backups only, but from our experience setting backup.schedule had no effect.
@majst01 I just tried the following and delta snapshots were disabled with only fullsnapshots enabled.
backup:
...
fullSnapshotSchedule: "* * * * *"
...
deltaSnapshotPeriod: 0s
...
time="2020-10-01T07:09:19Z" level=info msg="Taking scheduled snapshot for time: 2020-10-01 07:09:19.1179478 +0000 UTC" actor=snapshotter
{"level":"warn","ts":"2020-10-01T07:09:19.126Z","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
time="2020-10-01T07:09:19Z" level=info msg="Successfully opened snapshot reader on etcd" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Total time to save snapshot: 0.004196 seconds."
time="2020-10-01T07:09:19Z" level=info msg="Successfully saved full snapshot at: Backup-1601536159/Full-00000000-00000001-1601536159" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Will take next full snapshot at time: 2020-10-01 07:10:00 +0000 UTC" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Setting status to : 200" actor=backup-restore-server
time="2020-10-01T07:09:19Z" level=info msg="Starting snapshotter..." actor=backup-restore-server
time="2020-10-01T07:09:19Z" level=info msg="Taking scheduled snapshot for time: 2020-10-01 07:09:19.132893 +0000 UTC" actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="There are no updates since last snapshot, skipping full snapshot." actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Stopping full snapshot..." actor=snapshotter
time="2020-10-01T07:09:19Z" level=info msg="Resetting full snapshot to run after 40.8607142s" actor=snapshotter
We instead modified the generated etcd resource manually which is not the way to go.
Did you mean the above?
from etcd-druid.
I think there is no real benefit of doing incrementals at all, when looking at our environment, full backup is ~100MB
It really depends on the workload pattern in the cluster. In most clusters the delta snapshots are much smaller than 100M but in the seed and garden control-plane, it is quite high. As a consequence, the full snapshot size is also quite high (2-3G).
We have an issue to compress/decompress snapshots (full as well as delta) gardener/etcd-backup-restore#255.
We also have this issue to compact the delta snapshots in the background.
In the special case, the full snapshots are small and comparable to the delta snapshot, it might still make sense to do only full snapshots as you mentioned. Would it make sense to make this configurable in gardener?
from etcd-druid.
@amshuman-kr in which resource did you set these setting ?
backup:
...
fullSnapshotSchedule: "* * * * *"
...
deltaSnapshotPeriod: 0s
...
from etcd-druid.
The etcd
resource is deployed by Gardener and I think only the extension-providers modify the values via webhooks. The gardener-extension-provider-gcp has a Schedule
field in the deployment config, but as far as I can see from the code this config flag is not used.
from etcd-druid.
@amshuman-kr in which resource did you set these setting ?
The Etcd
resource of etcd-druid
.
from etcd-druid.
/assign
from etcd-druid.
This issue was partially addressed in gardener/etcd-backup-restore#301. The functionality will be complete once #191 is completed.
from etcd-druid.
This issue is now fully addressed with #197. Hence closing it.
The snapshot compaction feature will be available in etcd-druid v0.6.0
release shortly.
from etcd-druid.
Related Issues (20)
- [BUG] Etcd-druid removes the scale-up annotation even if scale-up didn't succeed. HOT 1
- [BUG] Wrong `.status.replicas` is set in etcd resource when cluster is marked for scale-up HOT 1
- ☂️ Improvements in etcd-backup-restore and in etcd-druid for Scale-up feature HOT 2
- [Feature] Druid-controlled updates to the pods in the etcd cluster HOT 1
- [Feature] Harmonize scaling operations of the etcd cluster
- [Feature] Introduce `Task`/`Operation` concept for out-of-band operations HOT 6
- [Feature] Enhanced snapshot compaction based on events size HOT 1
- Rework druid documentation HOT 4
- [Enhancement] New condition to ensure all etcd's join a single cluster HOT 2
- [BUG] If peerUrl TLS not enabled for non-HA migrate to HA then druid is recreates the statefulset as well as adds a scale-up annotation
- [Test] Add e2e tests while scaling a non-HA (peerUrl TLS is not enabled) to a HA etcd cluster (peerUrl TLS will get enabled) HOT 1
- [Feature] Alerts for the compaction job metrics HOT 4
- [Feature] Load some data to ETCD instances in every e2e tests
- ☂️ Replace etcd-custom-image with etcd-wrapper HOT 3
- [Feature] ☂️ Monitor compaction jobs running on shoot control planes HOT 1
- ☂️ [Epic]: Switch to Distroless images for etcd-wrapper and etcd-backup-restore HOT 1
- [Feature] Support setting imagePullSecrets and imagePullPolicy for etcd and backup images HOT 1
- [Enhancement] Improve BackupReady conditions HOT 2
- [BUG] etcd stuck during restore after redeploying the etcd instance CR HOT 4
- [Feature] Scrape compaction metrics available from druid controller by Cache Prometheus HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etcd-druid.