Comments (16)
I quickly test to remove Commit / Tag resolving for each reference and as expected the Primitive / Structured have similar performance:
Method | Mean | Error | StdDev |
---|---|---|---|
GitReader | 4.562 ms | 0.0652 ms | 0.0544 ms |
GitReaderStructured | 4.143 ms | 0.0820 ms | 0.0976 ms |
LibGit2sharp | 24.052 ms | 0.4013 ms | 0.3557 ms |
from gitreader.
Thanks your comment!
Interesting information, I wrote GitReader to remove libgit2sharp from RelaxVersioner, but of course for general purpose. From your code, I've tried to map the functionality you want GitReader to have:
string RemoteUrl
: GitReader does not have remote fetch ability, so I do not plan to make it possible to retrieve URLs at this time. However, it is not difficult to make URL acquisition possible, and I will consider it if necessary.IGitCommit HeadCommit
: Already can get it.IGitBranch CurrentBranch
: Already can get it.IReadOnlyList<GitBranch> Branches
: Already can get it.IGitTag[] Tags
: Already can get it.IGitRemote[] Remotes
: GitReader does not have remote fetch ability, so I do not plan to make it.IGitStash[] Stashes
: Can not get it, but I will consider it positively.DateTimeOffset LastCommitDate
: Already can get it from head commit.int BehindDefaultBranch
: (?) Required manually calculation.IGitBranch DefaultRemoteBranch
: Already can get it from remote branch list.IGitOperation CurrentOperation
: (?)
from gitreader.
Happy to know you have great plans for your lib. We use libGit2Sharp for other things and totally replacing it with GitReader is probably a long shot (our application is basically a full featured git GUI). We use git.exe for commands (including fetch and status) and libGit2Sharp to introspect the repository, build the graph, compute diff between commits, edit remotes...
from gitreader.
Well, I have often wanted to analyze Git commit graphs (I don't do it full time now, but in the past I have held both CI and progress analysis maintainer roles). One of my motivations is that it would be useful to have such a library as an infrastructure that can be easily handled for such purposes ;)
from gitreader.
@jairbubbles 0.10.0 released.
After the merged, I did some tweaking for consistency. If you have any problems, please throw them here or create a separate issue if appropriate.
from gitreader.
@kekyo Cool! I have started some work to benchmark GitReader vs LibGit2Sharp. In a nusthell, what I see is that it's faster when using the "primitive" open (but we're not getting all info that LibGit2Sharp is providing) but it's a lot slower when using the "structure" open which is too bad as the data structures are a lot user friendly.
from gitreader.
Method | Mean | Error | StdDev |
---|---|---|---|
GitReader | 4.604 ms | 0.0415 ms | 0.0347 ms |
GitReaderStructured | 187.221 ms | 17.3900 ms | 51.0018 ms |
LibGit2sharp | 20.431 ms | 0.4045 ms | 0.4496 ms |
from gitreader.
Primitive access has been able to reduce latency more than expected 😄
The difficulty is that when we open the repository in Structures interface, it is loading packed indexes, branches, tags, stashes, and many other things...
However, since these are asynchronous operations, it is difficult to say that they cannot be done on demand when accessing Repository.Branches.get()
its non-awaitable.
At first, I thought about designing a method like Repository.GetBranchesAsync()
, but then there is the problem of what to do with Commit.Branches.get()
. There is a way to make everything including these methods awaitable, but that would affect convenience, so I left it out of the basic design of Structures interface.
from gitreader.
Yest but in my benchmark I'm also getting the references in the "primitive" mode. My guess is that it's the commit resolving which is taking a lot of time. I'm wondering if a lazy evaluation approach like in LibGit2sharp wouldn't be better.
from gitreader.
For example, even in the Structures interface, we may be able to use the idea to stop reading Branches, Tags, etc. in bulk when they are opened, and instead have them call an asynchronous method that explicitly reads them. Suppose we could control what information to read with FillFlags
like the following:
[Flags]
enum FillFlags
{
None = 0x00,
Branches = 0x01,
RemoteBranches = 0x02,
Tags = 0x04,
Stashes = 0x08,
All = 0x0f,
}
// (Defaulted: FillFlags.All)
using var repository = await Repository.Factory.OpenStructureAsync(FillFlags.None);
// All refernces are NOT loaded.
Trace.Assert(repository.Branches.Count == 0);
Trace.Assert(repository.RemoteBranches.Count == 0);
Trace.Assert(repository.Tags.Count == 0);
Trace.Assert(repository.Stashes.Count == 0);
// The commit doesn't fixup any additional informations.
var commit = await repository.GetCommitAsync("....");
Trace.Assert(commit.Branches.Count == 0);
Trace.Assert(commit.RemoteBranches.Count == 0);
Trace.Assert(commit.Tags.Count == 0);
// After delayed but explicitly reading:
await repository.FillImmediateAsync(FillFlags.Branches | FillFlags.Tags);
Trace.Assert(repository.Branches.Count >= 1);
Trace.Assert(repository.Tags.Count >= 1);
// (this may require careful implementation of the process in Commit to make this possible)
Trace.Assert(commit.Branches.Count >= 1);
Trace.Assert(commit.Tags.Count >= 1);
By explicitly calling FillImmediateAsync()
, users can control the timing of time-consuming tasks themselves. And by default, everything is read automatically, so the convenience of the current Structures interface is not lost.
from gitreader.
@kekyo I agree that we need control but reading references is not really slow when they are packed. We also need to control commits / tag resolving, it would be some kind of prefetch option. Do you want to pay the price of resolving right away when you open the repository or when you access objects later on?
For instance if you have 490 packed branches, 10 branches in refs/heads/. The cost would be in the commits resolving as you'll have to resolve 500 commits.
Moreover, I feel like it's mostly useless to resolve all branches or tags, it's unlikely that you need that info for all them, at least for most common scenarios.
As for controlling references retrieval why not exposing directly the methods on the repository?
// Method for each types?
public class Respository
{
IReadonLyDictionary<string, Branch> GetBranchesAsync(ResolvingFlags ...)
IReadonLyDictionary<string, Branch> GetRemoteBranchesAsync(ResolvingFlags ...)
IReadonLyCollection<Stash> GetStashesAsync(ResolvingFlags ...)
...
}
// Or more generic?
public class Respository
{
IReadonLyDictionary<string, Branch> GetReferencesAsync(ReferenceTypes...)
...
}
public enum ReferenceTypes
{
Branch,
RemoteBranch,
Stash,
Tag
}
If we provide enough control through these methods we wouldn't need structures vs primitives anymore which would make the code a lot simpler / easier to consume and it would cover a lot a different use cases.
from gitreader.
In my use case, we open the repository to get its info as soon as the file watcher detects a change so we want this to be as fast as possible. I was thinking that it would be interesting to be able to keep the object cache between several repository opening.
// Persistent cache that we would be kept in memory
static ObjectsCache cache = new ObjectsCache();
// When we refresh we would pass the cache
var repository = Factory.OpenRepository(cache);
var branch = await repository.GetHeadBranchAsync();
var commit = await branch.GetCommitAsync(); // If the commit didn't change we didn't price to look for commits in the disk, it's already in the cache
from gitreader.
I see, so you are saying that you would eliminate property accesses such as Repository.Branches
and switch to an awaitable asynchronous method like Repository.GetBranchesAsync()
. I thought about that too, but I figured that making it a method would be a bad debugging experience.
An immediate example is the test result of Verify(model)
in GitReader.Tests, which is property-accessible, so the test is easy to write. If access to branches and tags were only possible via asynchronous method calls, we need to write code to retrieve this information every time with asynchoronous method calling.
Since this example is test code for GitReader, it is fine to write labor-intensive asynchronous method call code, but it is easy to imagine that this kind of labor would be required in general use. Since the Structures interface is a high-level interface, I thought it would be desirable to make it easier to use, even at the compromise of performance loss.
(I think it would be better if there was something in between the Structures interface and the Primitive interface, but I also think that having too many options is a problem...)
from gitreader.
but I figured that making it a method would be a bad debugging experience.
How so? I mean you have one method call for one what you need, it's pretty straight forward.
so the test is easy to write
Well for Verify creating a wrapper class is probably the best approach, it gives you control on what you want to test:
internal class RepositoryWrapper
{
async Task<RepositoryWrapper> InitAsync(string gitPath)
{
var repository = Factory.OpenRepository(cache);
Branches = await GetBranchesAsync();
RemoteBranches = await GetBranchesAsync();
}
IReadonlyDictionary<string, Branch> Branches { get; }
IReadonlyDictionary<string, Branch> RemoteBranches { get; }
...
}
I feel like the high level interface should:
- expose user friendly classes like Branch, Tag and so on
- as performant as possible 😍
But having a low level is also super interesting for more advanced scenarios but I would expose things like:
- ReadPackedRefs
- ReadReferences
- ReadGitConfig
- ....
It wouldn't expose a class Repository for that API it could be only static methods that takes a .git path.
from gitreader.
In #3 I'm not resolving anymore the commits, it's still slower because of the tags resolving (I have many in the repo I'm benchmarking). I see an optimisation by treating the info about peeled tags in packed-refs, it's currently ignored.
Method | Mean | Error | StdDev |
---|---|---|---|
GitReader | 3.833 ms | 0.0397 ms | 0.0310 ms |
GitReaderStructured | 97.224 ms | 6.3351 ms | 18.6791 ms |
LibGit2sharp | 19.009 ms | 0.2665 ms | 0.2492 ms |
from gitreader.
Thank you again your suggestions, GitReader reached 1.0.0!
This issue is closed, please open new issue when you want to.
from gitreader.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gitreader.