Giter Site home page Giter Site logo

Comments (12)

rubyist avatar rubyist commented on August 27, 2024

How much data are we talking? More than an in-memory map and writing to disk with gob?

from git-lfs.

technoweenie avatar technoweenie commented on August 27, 2024

We may be good with just a simple key/value map. The key should be based on the filename. If a filename is queued again, we can just update that key's value. The last 3 items just depend on the ability to figure out what is being pushed, so we can look around for git media blobs.

from git-lfs.

technoweenie avatar technoweenie commented on August 27, 2024

One other option: Change push so it requires the remote and branch (which should be provided by the pre-push hook. The pre-push hook gets this in STDIN:

{local ref}       {sha} {remote ref}       {remote sha}
refs/heads/master 67890 refs/heads/foreign 12345

We can get this easily with:

$ git ls-remote -h origin master
b336a0f59a945b53259856e94cfb2440bfd5ca4e    refs/heads/master

from git-lfs.

mhagger avatar mhagger commented on August 27, 2024

We can get this easily with:

$ git ls-remote -h origin master
b336a0f59a945b53259856e94cfb2440bfd5ca4e  refs/heads/master

Easy, yes, but don't forget that git ls-remote requires a round-trip to the upstream repo so it is a bit expensive.

from git-lfs.

mhagger avatar mhagger commented on August 27, 2024

It seems to me that instead of recording the filename, it would be more robust and easier to handle the SHA-1 of the blob representing the pointer file. If and only if that blob has to be pushed to the git server, then the corresponding git-media blob has to be pushed to the media server.

from git-lfs.

rubyist avatar rubyist commented on August 27, 2024

Ok, I'm leveling up my git game here, want to check in to make sure I'm on the right track. Here's a scenario:

I've got 3 commits I want to push and sync, efd7, da5e, and 60a0. When I run git push origin master, the pre-commit hook will get: origin https://github.com/rubyist/mediarepo on its command line, and refs/heads/master efd7 refs/head/master 9b77 on its stdin. Cool.

Now, I want to figure out what objects it's about to push so I know what to sync. I know that I could do git rev-list --objects efd7 ^9b77 and get a list of objects. Is this an accurate representation of what it's about to push? I'm thinking you just trawl through that list, ignore anything that isn't a blob, and 💥 you've got your list of objects to sync.

Does that sound legit? Is there a better way to get that list?

from git-lfs.

mhagger avatar mhagger commented on August 27, 2024

s/pre-commit/pre-push/, as you obviously know.

git rev-list --objects efd7 ^9b77 will include all of the objects being pushed, but it can also include far more objects, namely objects that already exist on the remote because they are reachable from a different branch. For example, if I create a new commit and push it:

git checkout -b mybranch master
git push origin mybranch

then the hook script will get

refs/heads/mybranch efd7 refs/heads/mybranch 0000

and your rev-list command will list all of the commits on mybranch even though those commits are on master, and many of them are likely already present at origin.

(Actually, your command would fail because ^0000000000000000000000000000000000000000 is not a valid argument to git rev-list. So in this case the analogous command would be

git rev-parse efd7

. But you know what I mean.)

AFAIK there is no way to get at the information you want (namely, the list of objects being newly pushed to the remote) without either hacking git or duplicating git's negotiation with the remote server 😦 Essentially you would need to know the value of all of the remote server's references before your push.

Maybe @peff knows a trick.

from git-lfs.

technoweenie avatar technoweenie commented on August 27, 2024

Essentially you would need to know the value of all of the remote server's references before your push.

This should work, right?

# Do we want pull request refs too?
$ git ls-remote  --heads --tags
From https://github.com/github/git-media
fe6bba2b69f7c18dbf274a9bb958b75f05a50468    refs/heads/ensureprefix
3dc9a647e8b6df76f4f3f5cf026262f09b78f819    refs/heads/master
d71891cb680f16246c236946f07fbe8e49f52365    refs/tags/v0.0.1
4843542e284497c1e3745eed63059f791604868e    refs/tags/v0.1.0
690c8da8487c76b3db7bb12a35d9bab84be484ba    refs/tags/v0.2.0
d5255d7be0d70996b8561a26998f2e0300a2710a    refs/tags/v0.2.1
26fe77e8f089272f44b6968a194b3e0ae4d9a578    refs/tags/v0.2.1-p1
a3f157876e6c30a4cb9ad385feebd33dad5e1b75    refs/tags/v0.2.2
3dc9a647e8b6df76f4f3f5cf026262f09b78f819    refs/tags/v0.2.3

If we're pushing a new branch, we'd want to find the nearest ref (probably master, but it'd depend on where the push originated from).

@rubyist: Having a "dry run" option would be useful here. Have git media push --dry-run list the media blobs it intends to push. Maybe a --debug option could show more detailed info about the git refs it used to come up with that list.

from git-lfs.

rubyist avatar rubyist commented on August 27, 2024

In the case of a new branch where git really is pushing all of the objects, we can only know whether or not we should sync a git media file by asking the git media server. The git media server should already prevent uploading a full blob it already knows about, so maybe it's OK to use the full list that rev-parse gives? We could perhaps add a remote-list like command of our own to avoid all the roundtrips that just 201 or whatever.

We'll definitely add a --dry-run option.

from git-lfs.

peff avatar peff commented on August 27, 2024

There's not really a trick here. The closest simple thing is probably git rev-list --objects $all_the_things_you_are_pushing --not --remotes=$remote_you_are_pushing_to. That is slightly different than what git generates because we are using what's in refs/remotes (i.e., the cache from the last time we fetched), whereas git push will actually ask the upstream for all of its ref tips. So it falls down in a few cases:

  1. We might not be pushing to a named ref with tracking branches at all (e.g., git push $url).
  2. The tracking branches might not be up to date. Usually this means we have a subset of what the other side has, and we'll err on the side of thinking we have to send more objects. But if the other side has rewound a ref, we could err in the other direction.
  3. The other side may have refs not included in what we've fetched (e.g., refs/pull, or even .have tips from alternates repositories).

You can get closer with:

# Find out what the other side has, but omit any objects that we don't
# have ourselves. Give them a `^`-negation, and then use the
# result to limit our object list.
git ls-remote $remote |
cut -f1 |
git cat-file --batch-check |
perl -lne 'print "^$1" if /^([0-9a-f]{40})/' |
git rev-list --objects $stuff_you_are_pushing --stdin

That's still not quite right. The server side of a push may advertise slightly more refs than ls-remote will report, as it will advertise tips for alternates repos it has access to. We don't currently do that ourselves, but other servers might.

Git doesn't actually do any other fancy negotiation to find common refs during a push, like it does for a fetch. The assumption is that you're typically building on what the other side has, so one of their refs should be a subset of what you already have. That could change in the future, but I don't think anybody has ever proposed to make it so.

from git-lfs.

mhagger avatar mhagger commented on August 27, 2024

@peff, does the server also report peeled versions of tags?

It sounds like a git ls-remote --haves $remote command or something like that might be useful here.

Though really, it is a shame to have to run git ls-remote separately. It costs another round-trip, and possibly opens up race conditions between the pre-push script trying to second-guess what will be sent vs. the git push itself which knows exactly what it will send. For this purpose it would be better to have a hook that gets, on stdin, a list of all objects that are about to be pushed.

from git-lfs.

peff avatar peff commented on August 27, 2024

@mhagger Yeah, I never noticed that, but we don't advertise peeled tags for pushes. Another issue: receive-pack does not necessarily advertise the same set of refs as upload-pack. Specifically, our servers do not advertise refs/pull for pushes (because it confuses --mirror pushers).

I agree that ls-remote --haves would be useful and would solve all of these problems, but that's going to mean a dependency on a newer version of git. If we're going that route, we might as well go with a separate sent-objects hook, which has all the advantages you noted.

from git-lfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.