Comments (8)
I probably get it, let me try tomorrow to see if this will work. Thanks!
from git.
I probably get it, let me try tomorrow to see if this will work. Thanks!
To be honest, if it works, you will most likely want to use a better programming language to implement this ;-)
from git.
if I delete all the objects, and spend an hour running
git reset --hard
, it only generates about 8GB of objects.
Unfortunately, there is no way (that I am aware of) to accomplish this with Git in any faster way.
A hacky way around that would be to perform a fresh partial clone from the current one (using file://<path>
, so that really only the needed objects are fetched), and then move the Git objects from the fresh partial clone to the original clone (deleting the original Git objects first), trusting the authoritative remote repository to have the now-missing objects, still.
This concern about the missing objects is with merit, by the way. Imagine that you performed an interactive rebase in the worktree, and then relied on the reflog to refer to the otherwise-unreachable objects. With that "hacky way" I described, those would become missing objects and since they were never pushed, irretrievably lost ones at that.
from git.
Why are the objects that were never pushed not stored in .git/objects?
from git.
Why are the objects that were never pushed not stored in .git/objects?
Oh, right, you want to trim the shared alternate object database, not .git/objects
!
Typically, it is an unsolvable problem how to determine what objects in a share alternate object database can be deleted: you never know which repositories use this as an alternate object database.
To safely remove objects from such a shared alternate object database, it would have to be known which repositories use that alternate, and then you would have to determine all objects which correspond to checked-out files from all the worktrees of those repositories.
However, in this instance, .scalarCache/<key>/
is an alternate object database that is populated via the scheduled prefetch
task. So maybe you could try moving the objects out of the way (not deleting them just yet!), then running git maintenance run prefetch
in the worktree to re-populate the Scalar cache, and take it from there? A couple of Git objects might still be needed, and they would be fetched (most likely individually). In the worst case, you would end up spending those 8h again. In the best case, it would address the question how to trim .scalarCache
.
from git.
Yes, I am using git maintenance
now, and it is very slow.
from git.
The idea is that, for the most part, I only need the objects for my current checkout version, so I want to remove the unneeded objects after updating the code.
from git.
@miku1958 do you have a single clone with a single worktree? If so, you may still be able to get what you need by enumerating the object names (SHAs) corresponding to the checked-out files. git ls-files --sparse --stage
will give you a start, but you have to exclude sparse directories (lines end in /
) and submodules (lines start with 160000
). The idea being that you feed these object names to git pack-objects trimmed-pack
via stdin
, to obtain a pack corresponding to the objects you want to retain.
This is not enough, though: you will need the object name of the current HEAD
commit as well as of the involved trees that correspond to (at least partially) checked-out folders.
Here is an attempt to do that:
(
# the blobs corresponding to the checked-out files
git ls-files --sparse --stage |
grep -ve '^160000' -e '/$' |
cut -c 8-47 &&
# the commit and root tree
git rev-parse HEAD HEAD^{tree} &&
# the trees corresponding to the checked-out files
git ls-files --sparse |
sed -n 's/\/[^/][^/]*$//p' |
uniq |
sort |
uniq |
xargs -d '\n' rev-parse
) |
git pack-objects trimmed-pack
Note: This will not be enough, as even something as simple as git show
will then want to fetch objects. Here would be a call to enumerate the blob objects of the parent commit corresponding to the files modified in HEAD
:
git diff --no-abbrev --raw HEAD^! |
sed -ne '/^:160000/d' -e 's/^[^ ]* [^ ]* \([^ ]*\).*$/\1/p'
But even that would not be enough, it would forget the parent commit's object name as well as all of the involved trees. And since trees can be deep, a shell scriptlet to enumerate those would have to look something like this:
git diff --no-abbrev --raw HEAD^! |
sed -ne '/^:160000/d' -e '/\//{
s/^[^ ]* [^ ]* [^ ]* [^ ]* .\t\(.*\/\)\?.*$/HEAD^:\1/
:1
s/\/[^/]*$//
p
/\//b1
}' |
uniq |
sort |
uniq
At this stage, we're already on a journey to a ridiculously-involved shell script, and I am sure that I forgot something crucial that also needs to go into that trimmed packfile...
from git.
Related Issues (20)
- git Difftool command doesn't work HOT 1
- Git executable for MacOS is not released for ARM (AppleSilicon) HOT 2
- Winget community repository still on 2.38.1.0.0 HOT 3
- git add taking long HOT 1
- Apple Silicon/macOS ARM build for microsoft/git
- git.exe crashes on exit in mi_thread_init on windows 2.40.1.vfs.0.0 HOT 17
- Add Windows ARM64 native release for microsoft/git HOT 1
- Repeated Indexing Outputs/Resolving Deltas output when pulling after upgrading to 2.40.1.vfs.0.2 HOT 4
- Redirecting stdout causes the program to stop for 40 seconds HOT 1
- Git maintenance scheduling impacting scalability of large repos. HOT 1
- microsoft.git not respecting --disable-interactivity flag in winget HOT 5
- git-maintenance doesn't work on WSL HOT 19
- Crashing during sparse-checkout operations HOT 5
- Google.com
- unable to normalize alternate object path HOT 2
- git fetch --depth 1 is not working HOT 4
- git pull takes a long time
- microsoft-git cask returns a 404 in the latest version of Git
- `git for-each-repo` stops on non-existent repo (Windows?) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from git.