Comments (9)
I have not done too much digging into the code to see how the state is managed but is disk space such a high commodity that a cleanup event must be triggered every N minutes?
My suggestion would be to have the cleanup even run at the program start (possibly delayed) and expose a cleanup command to the end-user to run on-demand (i.e. dura cleanup
). For those users who never turn off their computer, you could even have the cleanup run once every day.
Anyway, this is a neat issue to have and I commend you for wanting to save end-user disks. I look forward to seeing how this issue gets resolved.
from dura.
Perhaps there could be a system similar to log rotation, 'branch rotation'. Dura could create dura-0
, dura-1
dura-2
, ..., and rotate those after a period of time or number of commits. That'd give you an "active" branch, and "archived" branches that are removed after enough time.
Not sure if that's stepping into the territory of being too complex?
from dura.
Following up on my last message, an octopus merge is a merge commit with more than 2 parents. There's apparently no hard limit, except that Github history viewer won't handle 100k and a 66-way merge in the Linux kernel seemed to break viewers.
You could do either the ring buffer approach or the B-tree approach. They both seem to do a let better as you increase parents.
I think I'll take a stab at this. I'll make it configurable, so that you can effectively toggle the behavior on/off, reduce/enhance the effect, etc. I think I need to see it in action
from dura.
iirc, Mac and Linux file systems throw a file save event. You would write code to look for that event and filter to the git folder watched.
from dura.
That's more in line with efficiency concerns of #5
This issue is about removing branches (and objects) that aren't needed. I'm thinking about heuristics like
- Remove if the dura branch is based on a commit that no longer exists
- Remove if the dura branch is more than 60 days old
from dura.
Ring Buffer approach
I'll call @JakeStanger's idea the "ring buffer" approach. There's a lot of variations in it, but it amounts to
- keep the last
N
branches - abandon the rest to
git gc
(fwiw using a date instead of an index solves some of your problems)
The main problem with this approach is deciding which backups are safe to lose. Meanwhile, people are choking on branches. There's probably a discrepancy between "how much data they're okay losing" and "how many branches they want to see". i.e. Many users would prefer to see only the most recent dura
branch, but they also want the ability to rollback to hours ago even though there's 5 regular commits in between.
So how do we
- keep relevant branches easily accessible
- not lose data, except when it's really old
B-Tree approach
Disclaimer: this idea is terrible and I love it
Each commit can have multiple parents. I'm not sure what the limit is, but that's your base. Create commits that refer to other commits such that you build up a B-Tree with a single commit on top. That's your dura branch (#31). You now only have one branch.
The leaf nodes of this B-tree are the commits that are currently dura/
branches (so it's only a B-tree if you ignore all the real commits). The non-leaf nodes would be fabricated commits. Every commit needs to point at a "tree" (current process), for non-leaf nodes we could use the tree of the right-most leaf commit. That would make sensible diffs, in case you decide to checkout a non-leaf commit.
Adding commits
B-trees are fast to append to. You always add to the right-most (newest) side of the tree. When the tree fills up, you make a new parent. New regular commits would create log2(N)
commits (well, maybe better than log2
).
Removing commits
When the history gets truly old and crusty (2 years seems adequate), removing from the left side is just as fast as inserting on the right.
Elephant
The elephant in the room is that this could explode the width of the git log in tig
and other Git clients, such that you'd see so many lines that dura would make Git history unusable.
I wonder if there's a way to make this not matter. Maybe you can manipulate the timestamps so they appear later (I don't think this will work).
Maybe we can solve this through hybrid mode.
Hybrid mode
Mixing the approaches, we can use the B-tree for cold storage, and the ring buffer for more relevant commits (hot storage?).
One idea is to have maintain all dura
branches in a B-tree, but then also maintain a short ring buffer of dura branches. The user can use tig --all --no dura
to ignore the B-tree (assuming the B-tree is called dura
). Another variation on that is to have the B-tree maintain all but N
branches and use the ring buffer for the hot branches.
from dura.
I'd agree it's clear the ring-buffer idea isn't sufficient by itself. I was trying to come up with answers to the ring-buffer questions and basically re-invented your idea of hot/cold storage, so the hybrid mode sounds good to me.
I must admit the B-Tree approach is mostly going over my head. It might be a good idea to visualize it somehow, if not for this then for end-user documentation.
from dura.
Alirght, here's my best shot at whiteboarding it.
B-Tree
- black — the usual Git commits
- blue — today's
dura/{hash}
branches, the output ofsnapshots::capture
- red — the proposed B-tree with base 2 for effect. A Git history viewer would not draw it this way, I"m just drawing it horizontally to make it obvious why I'm calling it a B-tree.
There's no limit to the number of parents, so you could in theory have 1 red node with all blue branches parenting into it, like a spider with 200 legs (9 in this case). But there really are limits (they just aren't stated), so we have to put a limit on it, so you need an octopus or B-tree.
Octopus Collective
Another variation is to only have a 1st level of the B-tree. This would vastly reduce the number of branches, but it becomes harder to ignore the octopus commits. With the B-tree you can do git log --all --exclude refs/heads/dura
, but with git log --all --exclude refs/heads/dura/octo/*
.
I'm starting to see the value in the Octopus Collective. I had initially thought it would be hard to exclude patterns of branches until I wrote this.
from dura.
Thanks for taking the time to draw those out, that does help a lot. Sorry it's taken me a while to get back to it.
So let's say someone's been working on a huge refactor and not made any 'real' commits for several days. Assuming the hybrid ring buffer/octopus model was implemented, and you want to restore a commit from a few days ago ago that's no longer in the ring buffer:
- Does that mean the octopus commit's parents no longer exist?
- Could a specific parent be restored easily, or do all the parents of the octopus get merged into one single restore point?
from dura.
Related Issues (20)
- Update: Upgrade clap to V3.1.6 & replace deprecated code in main.rs HOT 2
- Dura doesn't create any branches on github codespaces HOT 2
- brew install dura HOT 4
- Couldnt you just git commit more often? HOT 4
- Error: UnbornBranch HOT 6
- Update clap
- Data Corruption: Don't allow dura to run as root
- Ubuntu install instructions HOT 2
- Don’t create unnecessary branches
- Chronological git log HOT 2
- feature request: remove old dura branches periodically HOT 1
- support install linux and macOS with brew HOT 15
- `dura checkout` command HOT 6
- Tests broken HOT 3
- Binaries built by Github are not working HOT 6
- Prevent pushin dura branches to remote HOT 4
- stabilize config loading HOT 2
- Dura fails to capture anything if a worktree is present as a subdir of the working directory HOT 4
- Make dura branch names more informative HOT 3
- Feature Request: `dura status` command HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dura.