Giter Site home page Giter Site logo

aspiers / git-deps Goto Github PK

View Code? Open in Web Editor NEW
297.0 14.0 47.0 1.23 MB

git commit dependency analysis tool

License: GNU General Public License v2.0

Python 50.50% CSS 3.85% JavaScript 1.86% CoffeeScript 37.63% HTML 1.88% Shell 4.29%
git dependency-analysis graph dependency-graph commits cli visualization web-app d3 porting

git-deps's Introduction

Code Climate

git-deps

git-deps is a tool for performing automatic analysis of dependencies between commits in a git repository. Here's a screencast demonstration:

YouTube screencast

I have blogged about git-deps and related tools, and also publically spoken about the tool several times:

Contents

Background theory

It is fairly clear that two git commits within a single repo can be considered "independent" from each other in a certain sense, if they do not change the same files, or if they do not change overlapping parts of the same file(s).

In contrast, when a commit changes a line, it is "dependent" on not only the commit which last changed that line, but also any commits which were responsible for providing the surrounding lines of context, because without those previous versions of the line and its context, the commit's diff might not cleanly apply (depending on how it's being applied, of course). So all dependencies of a commit can be programmatically inferred by running git-blame on the lines the commit changes, plus however many lines of context make sense for the use case of this particular dependency analysis.

Therefore the dependency calculation is impacted by a "fuzz" factor parameter (c.f. patch(1)), i.e. the number of lines of context which are considered necessary for the commit's diff to cleanly apply.

As with many dependency relationships, these dependencies form edges in a DAG (directed acyclic graph) whose nodes correspond to commits. Note that a node can only depend on a subset of its ancestors.

Caveat

It is important to be aware that any dependency graph inferred by git-deps may be semantically incomplete; for example it would not auto-detect dependencies between a commit A which changes code and another commit B which changes documentation or tests to reflect the code changes in commit A. Therefore git-deps should not be used with blind faith. For more details, see the section on Textual vs. semantic (in)dependence below.

Motivation

Sometimes it is useful to understand the nature of parts of this dependency graph, as its nature will impact the success or failure of operations including merge, rebase, cherry-pick etc. Please see the USE-CASES.md file for more details.

Installation

Please see the INSTALL.md file.

Usage

Please see the USAGE.md file.

Textual vs. semantic (in)dependence

Astute readers will note that textual independence as detected by git-deps is not the same as semantic / logical independence. Textual independence means that the changes can be applied in any order without incurring conflicts, but this is not a reliable indicator of logical independence.

For example a change to a function and corresponding changes to the tests and/or documentation for that function would typically exist in different files. So if those changes were in separate commits within a branch, running git-deps on the commits would not detect any dependency between them even though they are logically related, because changes in different files (or even in different areas of the same files) are textually independent.

So in this case, git-deps would not behave exactly how we might want. And for as long as AI is an unsolved problem, it is very unlikely that it will ever develop totally reliable behaviour. So does that mean git-deps is useless? Absolutely not!

Firstly, when best practices for commit structuring are adhered to, changes which are strongly logically related should be placed within the same commit anyway. So in the example above, a change to a function and corresponding changes to the tests and/or documentation for that function should all be within a single commit. (Although this is not the only valid approach; for a more advanced meta-history grouping mechanism, see git-dendrify.)

Secondly, whilst textual independence does not imply logical independence, the converse is expected to be more commonly true: logical independence often implies textual independence (or stated another way, textual dependence often implies logical dependence). So while it might not be too uncommon for git-deps to fail to detect the dependency between logically-related changes, it should be rarer that it incorrectly infers a dependency between logically unrelated changes. In other words, its false negatives are generally expected to be more common than its false positives. As a result it is likely to be more useful in determining a lower bound on dependencies than an upper bound. Having said that, more research is needed on this.

Thirdly, it is often unhelpful to allow the quest for the perfect become the enemy of the good - a tool does not have to be perfect to be useful; it only has to be better than performing the same task without the tool.

Further discussion on some of these points can be found in an old thread from the git mailing list.

Ultimately though, "the proof is in the pudding", so try it out and see!

Development / support / feedback

Please see the CONTRIBUTING.md file.

History

Please see the HISTORY.md file.

Credits

Special thanks to SUSE for partially sponsoring the development of this software. Thanks also to everyone who has contributed code, bug reports, and other feedback.

License

Released under GPL version 2 in order to be consistent with git's license, but I'm open to the idea of dual-licensing if there's a convincing reason.

git-deps's People

Contributors

aspiers avatar bmwiedemann avatar dependabot[bot] avatar emantor avatar hagai-helman avatar jeremysalwen avatar mcepl avatar mstefani avatar toabctl avatar valodim avatar wetneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

git-deps's Issues

graph: add hyperlinks with git:// URLs

The user can configure their desktop to handle the git:// URLs in whatever way they want, e.g. run a wrapper script which launches gitk on the local repository. This is sort of a poor man's solution to #23, but it could be reasonably useful nonetheless.

allow integration with other git web frontends

This tool would provide the most value when integrated with other web frontends to git. Since the webserving section of the code is so lightweight, this should not be hard to do.

One route might be to keep the existing webserver running on a different port to the other frontend and then embed the graph visualization via <iframe>. However that probably wouldn't allow the tight integration which would deliver most value, e.g. being able to click on a commit in the graph to visit that commit in the frontend. It would also suffer from the restriction of one graph server (and hence TCP port) per repository.

I'd love to hear ideas from web frontend maintainers on this.

graph: some links break on subsequent XHRs

After the first set of nodes/links are rendered, any subsequent query causes older links to get detached so they stay where they are. I've been poring over this for ages but no luck yet.

allow running from any directory

Currently git deps must be run from the root (top directory) of the git repo. It would be better if it could be run from any subdirectory, or even from outside the repo being examined. See also #22.

license mismatch

git-deps script is GPL-3+, whereas COPYING is GPL-2+. The 3+ was a mistake since I wanted this to be consistent with the license of git itself.

server not working: 404 request to cola.v3.min.js

Setting up a git-deps server on a clean ubuntu 15.04 system (in a docker image) results in 404 requests trying to get /node_modules/webcola/WebCola/cola.v3.min.js, although /node_modules/webcola/WebCola/cola.min.js exists.

I worked around this issue in https://github.com/paulwellnerbou/git-deps-docker creating a symlink when building the docker image:

RUN ln -s /git-deps/html/node_modules/webcola/WebCola/cola.min.js /git-deps/html/node_modules/webcola/WebCola/cola.v3.min.js

Maybe its the node/npm version doing different naming than expected?

graph: support recursion

Every query submitted, whether via the form or by clicking on a plus icon to expand (see #15) should have the option of recursing to a given level rather than just discovering immediate dependencies.

fatal: -L invalid line number: 0

While trying to backport a commit for oslo.messaging (commit 49f9429911b13dc43bc93ea6fd4f5f38dfddb8ee to stable branch 1.4.1), I got the following error:

[snipped]
293a34c1560425bf963c9079a3f38e58fbef9423 8a8685b62ff3e17e3f3ff4042ac828ae88b0151c
95119398e29eae0fcb613e404697ee46a2a6f23a 48a1cfae8ab1cf873ecd2f4146d87c9f001c7e0d
fatal: -L invalid line number: 0
Traceback (most recent call last):
  File "/home/tom/bin/git-deps", line 787, in <module>
    main()
  File "/home/tom/bin/git-deps", line 781, in main
    cli(options, args)
  File "/home/tom/bin/git-deps", line 668, in cli
    detector.find_dependencies(dependent_rev)
  File "/home/tom/bin/git-deps", line 402, in find_dependencies
    self.find_dependencies_with_parent(dependent, parent)
  File "/home/tom/bin/git-deps", line 426, in find_dependencies_with_parent
    self.blame_hunk(dependent, parent, path, hunk)
  File "/home/tom/bin/git-deps", line 452, in blame_hunk
    blame = subprocess.check_output(cmd)
  File "/usr/lib64/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['git', 'blame', '--porcelain', '-L', '0,+0', '7ca7fed9ea7b85089158c2e026e6c3c0864552b3', '--', 'openstack/common/messaging/_executors/__init__.py']' returned non-zero exit status 128

Reproducable with:

git checkout -b 141 1.4.1
git deps -r 49f9429911b13dc43bc93ea6fd4f5f38dfddb8ee

graph: add spinner

When a commit has a large number of dependencies, it can take a while for the server to respond with the JSON. There should be a spinner displayed to show when work is in progress.

git-deps shows unrelated commits when dependency of 'merge' commits are checked

I'm using the git-deps tool for linux kernel backporting work.

When git-deps is run to find any dependent commits in linux kernel there is a chance it encounters 'merge' commits. git-deps output is very huge when it checks dependency for 'merge' commits and git-deps needs to be stopped manually since it runs for a very long time.

Maybe '--no-merges' needs to be used to avoid merge commits?

KeyError in blame_hunk

I ran into some bug (on openSUSE-13.2 and Factory) using:

git clone git://github.com/openstack/python-keystoneclient.git
cd python-keystoneclient/
git-deps 3d6d749e6f0fef682a88758e1a2f6c9e8e7bd23c

this produced

7920899af119d1697c333d202ca3272f167c19b0
[...some more hashes...]
Traceback (most recent call last):
  File "/home/bernhard/bin/git-deps", line 790, in <module>
    main()
  File "/home/bernhard/bin/git-deps", line 784, in main
    cli(options, args)
  File "/home/bernhard/bin/git-deps", line 671, in cli
    detector.find_dependencies(dependent_rev)
  File "/home/bernhard/bin/git-deps", line 402, in find_dependencies
    self.find_dependencies_with_parent(dependent, parent)
  File "/home/bernhard/bin/git-deps", line 426, in find_dependencies_with_parent
    self.blame_hunk(dependent, parent, path, hunk)
  File "/home/bernhard/bin/git-deps", line 529, in blame_hunk
    rev = line_to_culprit[line_num]
KeyError: 2

graph: highlight commits/dependencies on mouseover

When hovering over a commit, all dependent and depending commits should be highlighted, and also the corresponding arrows between them. Similarly when hovering over a line, all dependent and depending commits should be highlighted.

Highlighting could be achieved simply by increasing the border / line thickness.

graph: allow collapsing nodes

It should be possible to reverse the effect of clicking the plus icon. For each dependency of the node to be collapsed, if no other node in the graph has a dependency on it, it should be removed from the graph.

graph: add expand icons

Each commit node on the graph should have a plus icon which can be clicked to find dependencies for that commit.

need proper installation mechanism

Currently it's expected that git-deps will be run from the top of the git repo (see #27 and #22), and this only works by following the installation instructions which say to do:

ln -s /path/to/git-deps/repo/git-deps ~/bin

However that's a really lame installation process, which also assumes that the files under html/ will be retrieved from relative to the git-deps scripts. Instead it should be parametrized to facilitate clean packaging. @bmwiedemann has already written a Makefile in https://build.opensuse.org/package/show/home:bmwiedemann/git-deps which supports the usual $DESTDIR and $PREFIX variables, although maybe we should go the Pythonic route and write a setup.py instead - I'm not sure.

interactive graph visualization

Currently the output is text only, but I think it would be more useful to visualise the dependencies as an interactive graph where you could zoom in/out, pan around, hover over each node to see commit meta-data, click on leaf nodes to request further recursion, and so on. Nodes could be coloured according to commit author, and sized according to the diffstats.

Clearly this should be cross-platform and based on some modern rendering technology, so HTML/CSS/Javascript seems the obvious choice. Dependency inference is too expensive to generate the full graph as a static web page, so I plan to extend the tool so it can act as a lightweight web server, e.g.

$ git deps --web --port 8080

and then you could simply point your browser at http://localhost:8080 to interact with the graph. It might look a little like this, but with interactive zoom/pan/hover/click functionality like this.

Since a lot of the hard work in the above examples is already done by cola.js, most likely I will use that in conjunction with d3.js.

Since the tool is already written in Python, I am considering using a very lightweight web framework such as Flask. (I suspect Django would be overkill for this application which is essentially stateless.)

Another approach might be to integrate it into an existing git web frontend written in Python. However I trawled through

but couldn't find any Python-based frontend which looked like it was in active development. Perhaps the most promising I could find was:

Although it would be nicer if the visualization could be harnessed by any web front-end, and this is probably doable quite easily via <iframe> or perhaps something a bit more elegant.

look into patch theory

There is quite a lot of prior art in the area of patch theory (in particular from the darcs community), e.g.

We should look into these and see what can be applied to git-deps.

graph: fix logger bug

In debug mode, each time the Flask app reloads itself, a new logger handler(?) gets created. So over time, every line logged via logger gets duplicated an increasing amount.

CLI: exclude-commits functionality broken

The -e/--exclude-commits functionality depends on branch_contains().

branch_contains unconditionally truncates sha1 to an 8 character prefix. This is potentially problematic by itself.

However, the more problematic issue in the function comes in the following test: "result = out == sha1". Here, 'out' comes from git-merge-base, which will give a complete sha1. This could be fixed by testing whether 'sha1' is a prefix of 'out', though given the potential issues inherent in chopping 'sha1' at 8 characters, it would probably be better to remove the "[:8]" from the start of the function.

idea: blame-deps (aka 3D-blame)

The idea is to do something like

git deps some-file.txt

This will open the git-deps browser with the contents of the file loaded into a pane in a typical blame view:

e5392d87  1) This is some file.
e5392d87  2) =====================================
e5392d87  3)
69d951f2  4) Look at all the nice contents.
e5392d87  5)

The magical thing is that each of the revisions are clickable - doing so will open the git-deps graph for the clicked commit.

git-deps -e or --exclude-commits still shows cherry-picked commits

git-deps -r -e test_branch dbde0abe, shows

dbde0abe 7a4c5de2
dbde0abe fa77dcfa
fa77dcfa fd034a84
fa77dcfa 813e5727

Even though 'test' branch has the commit 'fa77dcfa' cherry-picked from master,

commit d14423fe
Author: Test
Date: Mon Apr 11 19:11:04 2011 -0400

Test
(cherry picked from commit fa77dcfa)

git-deps will show in the list of dependent commit.

graph: use dagre for initial positioning not constraints

Specifying constraints for row grouping and ordering seems a bit too inflexible to give webcola room to find a really good layout, so maybe it would be better to use dagre just to calculate the initial layout and then allow webcola to do the rest. This is probably needed anyway in order to take advantage of dagre's suggested ordering within each row.

package for openSUSE

Would be nice provide this as an rpm with dependencies on Flask, pygit2, and the various node modules (uh-oh).

write tests

Embarrassing to admit I wrote this as a quick hack without following TDD methodology. In my defence, I was doing it in coffee breaks at an openSUSE conference ;-p

graph: stream commits and dependencies and render incrementally

If there are a lot of new nodes and dependencies, the web page will appear to hang whilst they are being calculated. Adding a spinner (see #28) will go some way towards helping with this, but it would be better if the nodes and dependencies are streamed incrementally to the browser and rendered as soon as they are received. oboe.js looks perfect for this!

CLI: allow arbitrary `git log` output format

Would be useful to be able to support arbitrary --format strings, so that for example you could obtain a list of commit authors in order to satisfy the use case "who should I give a heads up about my feature branch?"

don't fork git merge-base

It should be possible to replace invocations of git merge-base with calls to the pygit2 API. This should speed things up a bit.

disconnected subgraphs get tangled

The dagre layout doesn't work well with disconnected subgraphs - the subgraphs often get tangled producing bad results. It might be better to keep track of the subgraphs and then build constraints accordingly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.