Having a managed library to read .git repository is a plus for the .NET ecosystem.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yest but in my <a href="https://github.com/kekyo/GitReader/pull/4" data-hovercard-type

Awesome work! 😍 about gitreader HOT 16 CLOSED

jairbubbles commented on May 20, 2024 1

Awesome work! 😍

from gitreader.

Comments (16)

jairbubbles commented on May 20, 2024 1

I quickly test to remove Commit / Tag resolving for each reference and as expected the Primitive / Structured have similar performance:

Method	Mean	Error	StdDev
GitReader	4.562 ms	0.0652 ms	0.0544 ms
GitReaderStructured	4.143 ms	0.0820 ms	0.0976 ms
LibGit2sharp	24.052 ms	0.4013 ms	0.3557 ms

from gitreader.

kekyo commented on May 20, 2024

Thanks your comment!

Interesting information, I wrote GitReader to remove libgit2sharp from RelaxVersioner, but of course for general purpose. From your code, I've tried to map the functionality you want GitReader to have:

string RemoteUrl : GitReader does not have remote fetch ability, so I do not plan to make it possible to retrieve URLs at this time. However, it is not difficult to make URL acquisition possible, and I will consider it if necessary.
IGitCommit HeadCommit : Already can get it.
IGitBranch CurrentBranch : Already can get it.
IReadOnlyList<GitBranch> Branches : Already can get it.
IGitTag[] Tags : Already can get it.
IGitRemote[] Remotes : GitReader does not have remote fetch ability, so I do not plan to make it.
IGitStash[] Stashes : Can not get it, but I will consider it positively.
DateTimeOffset LastCommitDate : Already can get it from head commit.
int BehindDefaultBranch : (?) Required manually calculation.
IGitBranch DefaultRemoteBranch : Already can get it from remote branch list.
IGitOperation CurrentOperation : (?)

from gitreader.

jairbubbles commented on May 20, 2024

Happy to know you have great plans for your lib. We use libGit2Sharp for other things and totally replacing it with GitReader is probably a long shot (our application is basically a full featured git GUI). We use git.exe for commands (including fetch and status) and libGit2Sharp to introspect the repository, build the graph, compute diff between commits, edit remotes...

from gitreader.

kekyo commented on May 20, 2024

Well, I have often wanted to analyze Git commit graphs (I don't do it full time now, but in the past I have held both CI and progress analysis maintainer roles). One of my motivations is that it would be useful to have such a library as an infrastructure that can be easily handled for such purposes ;)

from gitreader.

kekyo commented on May 20, 2024

@jairbubbles 0.10.0 released.

After the merged, I did some tweaking for consistency. If you have any problems, please throw them here or create a separate issue if appropriate.

from gitreader.

jairbubbles commented on May 20, 2024

@kekyo Cool! I have started some work to benchmark GitReader vs LibGit2Sharp. In a nusthell, what I see is that it's faster when using the "primitive" open (but we're not getting all info that LibGit2Sharp is providing) but it's a lot slower when using the "structure" open which is too bad as the data structures are a lot user friendly.

from gitreader.

jairbubbles commented on May 20, 2024

Method	Mean	Error	StdDev
GitReader	4.604 ms	0.0415 ms	0.0347 ms
GitReaderStructured	187.221 ms	17.3900 ms	51.0018 ms
LibGit2sharp	20.431 ms	0.4045 ms	0.4496 ms

from gitreader.

kekyo commented on May 20, 2024

Primitive access has been able to reduce latency more than expected 😄

The difficulty is that when we open the repository in Structures interface, it is loading packed indexes, branches, tags, stashes, and many other things...
However, since these are asynchronous operations, it is difficult to say that they cannot be done on demand when accessing Repository.Branches.get() its non-awaitable.

At first, I thought about designing a method like Repository.GetBranchesAsync(), but then there is the problem of what to do with Commit.Branches.get(). There is a way to make everything including these methods awaitable, but that would affect convenience, so I left it out of the basic design of Structures interface.

from gitreader.

jairbubbles commented on May 20, 2024

Yest but in my benchmark I'm also getting the references in the "primitive" mode. My guess is that it's the commit resolving which is taking a lot of time. I'm wondering if a lazy evaluation approach like in LibGit2sharp wouldn't be better.

from gitreader.

kekyo commented on May 20, 2024

For example, even in the Structures interface, we may be able to use the idea to stop reading Branches, Tags, etc. in bulk when they are opened, and instead have them call an asynchronous method that explicitly reads them. Suppose we could control what information to read with FillFlags like the following:

[Flags]
enum FillFlags
{
  None = 0x00,
  Branches = 0x01,
  RemoteBranches = 0x02,
  Tags = 0x04,
  Stashes = 0x08,
  All = 0x0f,
}

// (Defaulted: FillFlags.All)
using var repository = await Repository.Factory.OpenStructureAsync(FillFlags.None);

// All refernces are NOT loaded.
Trace.Assert(repository.Branches.Count == 0);
Trace.Assert(repository.RemoteBranches.Count == 0);
Trace.Assert(repository.Tags.Count == 0);
Trace.Assert(repository.Stashes.Count == 0);

// The commit doesn't fixup any additional informations.
var commit = await repository.GetCommitAsync("....");

Trace.Assert(commit.Branches.Count == 0);
Trace.Assert(commit.RemoteBranches.Count == 0);
Trace.Assert(commit.Tags.Count == 0);

// After delayed but explicitly reading:
await repository.FillImmediateAsync(FillFlags.Branches | FillFlags.Tags);

Trace.Assert(repository.Branches.Count >= 1);
Trace.Assert(repository.Tags.Count >= 1);

// (this may require careful implementation of the process in Commit to make this possible)
Trace.Assert(commit.Branches.Count >= 1);
Trace.Assert(commit.Tags.Count >= 1);

By explicitly calling FillImmediateAsync(), users can control the timing of time-consuming tasks themselves. And by default, everything is read automatically, so the convenience of the current Structures interface is not lost.

from gitreader.

jairbubbles commented on May 20, 2024

@kekyo I agree that we need control but reading references is not really slow when they are packed. We also need to control commits / tag resolving, it would be some kind of prefetch option. Do you want to pay the price of resolving right away when you open the repository or when you access objects later on?

For instance if you have 490 packed branches, 10 branches in refs/heads/. The cost would be in the commits resolving as you'll have to resolve 500 commits.

Moreover, I feel like it's mostly useless to resolve all branches or tags, it's unlikely that you need that info for all them, at least for most common scenarios.

As for controlling references retrieval why not exposing directly the methods on the repository?

// Method for each types?
public class Respository
{
  IReadonLyDictionary<string, Branch> GetBranchesAsync(ResolvingFlags ...)
  IReadonLyDictionary<string, Branch> GetRemoteBranchesAsync(ResolvingFlags ...)
  IReadonLyCollection<Stash> GetStashesAsync(ResolvingFlags ...)
  ...
}
// Or more generic?
public class Respository
{
  IReadonLyDictionary<string, Branch> GetReferencesAsync(ReferenceTypes...)
  ...
}

public enum ReferenceTypes
{
  Branch,
  RemoteBranch,
  Stash,
  Tag 
}

If we provide enough control through these methods we wouldn't need structures vs primitives anymore which would make the code a lot simpler / easier to consume and it would cover a lot a different use cases.

from gitreader.

jairbubbles commented on May 20, 2024

In my use case, we open the repository to get its info as soon as the file watcher detects a change so we want this to be as fast as possible. I was thinking that it would be interesting to be able to keep the object cache between several repository opening.

// Persistent cache that we would be kept in memory
static ObjectsCache  cache = new ObjectsCache();

// When we refresh we would pass the cache
var repository = Factory.OpenRepository(cache);
var branch = await repository.GetHeadBranchAsync();
var commit = await branch.GetCommitAsync(); // If the commit didn't change we didn't price to look for commits in the disk, it's already in the cache

from gitreader.

kekyo commented on May 20, 2024

I see, so you are saying that you would eliminate property accesses such as Repository.Branches and switch to an awaitable asynchronous method like Repository.GetBranchesAsync(). I thought about that too, but I figured that making it a method would be a bad debugging experience.

An immediate example is the test result of Verify(model) in GitReader.Tests, which is property-accessible, so the test is easy to write. If access to branches and tags were only possible via asynchronous method calls, we need to write code to retrieve this information every time with asynchoronous method calling.

Since this example is test code for GitReader, it is fine to write labor-intensive asynchronous method call code, but it is easy to imagine that this kind of labor would be required in general use. Since the Structures interface is a high-level interface, I thought it would be desirable to make it easier to use, even at the compromise of performance loss.

(I think it would be better if there was something in between the Structures interface and the Primitive interface, but I also think that having too many options is a problem...)

from gitreader.

jairbubbles commented on May 20, 2024

but I figured that making it a method would be a bad debugging experience.

How so? I mean you have one method call for one what you need, it's pretty straight forward.

so the test is easy to write

Well for Verify creating a wrapper class is probably the best approach, it gives you control on what you want to test:

internal class RepositoryWrapper
{
  async Task<RepositoryWrapper> InitAsync(string gitPath)
  {
    var repository = Factory.OpenRepository(cache);
    Branches  = await GetBranchesAsync();
    RemoteBranches  = await GetBranchesAsync();
  }
  
   IReadonlyDictionary<string, Branch> Branches { get; }
   IReadonlyDictionary<string, Branch> RemoteBranches { get; }
   ...
}

I feel like the high level interface should:

expose user friendly classes like Branch, Tag and so on
as performant as possible 😍

But having a low level is also super interesting for more advanced scenarios but I would expose things like:

ReadPackedRefs
ReadReferences
ReadGitConfig
....

It wouldn't expose a class Repository for that API it could be only static methods that takes a .git path.

from gitreader.

jairbubbles commented on May 20, 2024

In #3 I'm not resolving anymore the commits, it's still slower because of the tags resolving (I have many in the repo I'm benchmarking). I see an optimisation by treating the info about peeled tags in packed-refs, it's currently ignored.

Method	Mean	Error	StdDev
GitReader	3.833 ms	0.0397 ms	0.0310 ms
GitReaderStructured	97.224 ms	6.3351 ms	18.6791 ms
LibGit2sharp	19.009 ms	0.2665 ms	0.2492 ms

from gitreader.

kekyo commented on May 20, 2024

Thank you again your suggestions, GitReader reached 1.0.0!

This issue is closed, please open new issue when you want to.

from gitreader.

Awesome work! 😍 about gitreader HOT 16 CLOSED

Comments (16)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent