vaibhavsagar / duffer Goto Github PK

View Code? Open in Web Editor NEW

66.0 66.0 4.0 2.13 MB

A git-compatible content tracker in Haskell.

Home Page: http://vaibhavsagar.com/duffer

Haskell 70.16% Shell 0.97% Jupyter Notebook 27.42% Nix 1.45%

git haskell

duffer's People

Contributors

Stargazers

Watchers

Forkers

porterjamesj amar47shah mcne65 jmpak

duffer's Issues

Merging/Diffing

This library doesn't currently support any merging/diffing operations. There are existing libraries that provide text diffs, maybe we could use those?

There should be a function that takes one of: a SHA1, a symref (HEAD, ORIG_HEAD, etc.), or a branch/lightweight tag name (master, test, etc.) and provides the corresponding object if the provided information unambiguously identifies one. This is what git cat-file -p does.

More unit tests.

We have alright integration tests to check that the system as a whole seems to be doing the right thing, but we have minimal unit tests that check that individual units are working correctly. This would be useful if/when things break so we can isolate the change that is causing the problem, something that the current tests are very bad at.

Thin Packfiles

The packfile resolution strategies do not handle thin packfiles. Maybe we should extend them to be able to?

Refactor to use Maybe/Either

I have calls to error everywhere, this should be refactored to use Maybe/Either as appropriate.

Unpack single object from packfile.

When generating a packfile, git heuristics account for the fact that older objects are infrequently accessed. These objects are more likely to be stored later in the packfile as deltas against newer objects. This means that we should provide a way of getting just a single object out of a packfile instead of unpacking all of them.

Correctly decode and encode git indexes v2, v3 and v4

A rudimentary implementation of v2 is available here.

Documentation

This thing has zero documentation. Why is that?

HTTP API

We should be able to query repositories over the network. This could be fun to implement. Maybe we should use servant?

Getting Started Guide

Would be useful to explain how to get started with this project, i.e. basic build/test steps. Info on stack would be useful for non-Haskellers who are interested.

Example App

Nobody knows how to use this library. I think we should have a sample app, such as a to-do list app that keeps track of all previous versions.

Pluggable backends

My vision for this library is to support pluggable backends like libgit2 does. I think this should be implemented as a Repository typeclass with the minimum methods to support using an arbitrary backend as a database (e.g. writeObject and readObject, perhaps a hasObject method?).

Handling GPG-signed objects.

Commits and tags can be GPG-signed. These might parse due to a quirk in the parser implementation but we should handle this (and other commit extras) like hs-git does.

GraphQL Interface

Since git commits form an acyclic directed graph, we should be able to query a repository using a graph query language instead of git log's arcane format. I think this would be awesome to implement.

Refactor to use ByteString Builder.

Duffer.Loose.Objects has concats and appends everywhere. This probably isn't good for performance and we should refactor this to use ByteString.Builder. We can roughly measure the difference using stack test --profile.

Tests fail after `git repack -ad`

After git repack -ad the tests fail because an offset delta representing highlight.js is encoded differently by our logic, which means that the CRC is different. Strangely, the decoded representation of both the original delta and our incorrect encoding are the same.

Split bit-twiddling out of Duffer.Pack.Entries into own module

The packfile types and bit-twiddling helpers are currently intermingled, and separation would also aid a refactoring to e.g. a Vector of Word8s.

This test does too many things.

In an attempt to reduce duplication, this one test checks that:

A decoded packfile can be encoded to be equal to the input.
A decoded re-encoded packfile is equal to a decoded packfile.
The CRCs of an encoded packfile match the CRCs provided in the pack index.
A list of objects resolved with reference to the index matches the list of objects resolved without reference to the index.
The hashes of the resolved objects match the ones found in the pack index.
The resolved objects can be written to disk (writeObject works correctly).
We can generate a list of pack index entries that matches our input.

This is absurd. It would be better to split this test into many smaller tests and test writeObject separately so that it is explicit that the rest of the tests depend on the presence of the correct loose objects.

The API needs to be better.

The library as currently implemented provides the absolute minimum API for writing applications using git as a database/storage layer. This needs to be improved. So far my best idea for this is an in-memory repository representation.

Doesn't handle packed references.

The library currently handles only loose references, but it needs to handle at least reading packed references.

Can't generate a pack index from a packfile.

Objective

This library supports reading packfile contents only if the corresponding pack index is also present. In some situations (i.e. git clone) we will be streamed a packfile and expected to generate the index ourselves. This should be supported by the library.

Solution

Generate a map of offsets to bytestrings representing each entry.

Approach

git packfiles contain some content that is compressed and some content that is not compressed. Parsing the uncompressed content is straightforward but the length of each compressed section is unknown from previous input. Instead the length of the decompressed output is provided, which isn't
helpful for our purposes as none of the libraries currently in use support streaming decompression.

The solution is to use a streaming IO library with zlib decompression support such as Pipes or Conduit to separate a packfile into entries and generate the offsets of each pack entry. This can then be processed using our existing functions to generate the necessary pack index.

This issue has not been closed.

Packfile streaming doesn't handle (Left (Right value))

Cannot reproduce and fix locally: https://travis-ci.org/vaibhavsagar/duffer/builds/160462242