Giter Site home page Giter Site logo

patch's Introduction

Patch - apply your unified diffs in pure OCaml

The loosely specified diff file format is widely used for transmitting differences of line-based information. The motivating example is opam, which is able to validate updates being cryptographically signed (e.g. conex) by providing a unified diff.

The test-based infered specification implemented in this library is the following grammar.

decimal := [0-9]+
any := any character except newline

filename := "/dev/null" | any except tab character
file := filename "\t" any "\n"
mine := "--- " file
theirs := "+++ " file

no_newline = "\ No newline at end of file"
hunk_line_prefix := " " | "-" | "+"
hunk_line := hunk_line_prefix any | no_newline
range := decimal "," decimal | decimal
hunk_hdr := "@@ -" range " + " range " @@\n"
hunk := hunk_hdr line+

diff := mine theirs hunk+

In addition, some support for the git diff format is available, which contains diff --git a/nn b/nn as separator, prefixes filenames with a/ and b/, and may contain extra headers, especially for pure renaming: rename from <path> followed by rename to <path>. The git diff documentation also mentions that a diff file itself should be an atomic operation, thus all - files corrspond to the files before applying the diff (since patch only does single diff operations, and requires the old content as input). You have to ensure to provide the correct data yourself.

A diff consists of a two-line header containing the filenames (or "/dev/null" for creation and deletion) followed by the actual changes in hunks. A complete diff file is represented by a list of diff elements. The OCaml types below, provided by this library, represent mine and theirs as operation (edit, delete, create). Since a diff is line-based, if the file does not end with a newline character, the line in the diff always contains a newline, but the special marker no_newline is added to the diff. The range information carries start line and chunk size in the respective file, with two side conditions: if the chunk size is 0, the start line refers to after which the chunk should be added or deleted, and if the chunk size is omitted (including the comma), it is set to 1. NB from practical experiments, only "+1" and "-1" are supported.

type operation =
  | Edit of string * string
  | Delete of string
  | Create of string
  | Rename_only of string * string

type hunk (* positions and contents *)

type t = {
  operation : operation ;
  hunks : hunk list ;
  mine_no_nl : bool ;
  their_no_nl : bool ;
}

In addition to parsing a diff and applying it, support for generating a diff from old and new file contents is also provided.

Shortcomings

The function patch assumes that the patch applies cleanly, and does not check this assumption. Exceptions may be raised if this assumption is violated. The git diff format allows further features, such as file permissions, and also a "copy from / to" header, which I was unable to spot in the wild.

Installation

opam install patch

Documentation

The API documentation can be browsed online.

patch's People

Contributors

hannesm avatar kit-ty-kate avatar gasche avatar

Stargazers

Puneeth Chaganti avatar Thomas Gazagnaire avatar Ali Caglayan avatar David Sancho avatar Tim ats avatar Jules Aguillon avatar Hyeseong Kim avatar Sangwoo Joh avatar Rizo I avatar Darren Li avatar Sora Morimoto avatar Andrey Popp avatar Bikal Lem avatar C For C's Sake avatar Lucas Pluvinage avatar Jochen Bartl avatar Arto Bendiken avatar Calascibetta Romain avatar Seb Mondet avatar  avatar

Watchers

 avatar Louis Gesbert avatar James Cloos avatar  avatar C For C's Sake avatar  avatar

patch's Issues

Robust filename parser

Issue extracted from a discussion in #9 (comment)

Different implementations of diffs output different formats for the filename with spaces or special characters (such as backslashes, which are valid on Windows). For example with spaces:

  • GNU diff:
    --- "a b"	2024-04-02 13:32:43.427214939 +0100
    +++ "a c"	2024-04-02 13:32:34.520202398 +0100
    @@ -1 +0,0 @@
    -test
    
  • Git diff:
    diff --git a/a b b/a c
    index 039727e..e69de29 100644
    --- a/a b
    +++ b/a c
    @@ -1 +0,0 @@
    -test
    
  • Busybox diff:
    --- a b
    +++ a c
    @@ -1 +0,0 @@
    -test
    

Feature request: File creation from diff -ruN

The documentation of GNU patch states:

You can create a file by sending out a diff that compares /dev/null or an empty file dated the
Epoch (1970-01-01 00:00:00 UTC) to the file you want to create.

This is used, for example by (GNU) diff -ruN, where the -N option reads:

-N, --new-file
treat absent files as empty

Currently, this implementation of patch would return Edit on such an input instead of the expected Create

which diff format to support?

the overall question is: what is the input for this library? it seems like the term "unified diff" is not well-defined, and "extensible". the pragmatic solution is "whatever opam gives us".

opam itself uses gpatch -p1 -i p' to apply the patch (which has been processed by translate_patch (something with CR/LF, but unsure about the concrete semantics).

generating the diffs (for the repositories I mostly care about):

  • HTTP uses the repository backend, calling out to diff -ruaN (recursive, unified, -a is "treat all files as text", -N is "new file": if a file found in new which doesn't exist in old, treat old as empty)
  • git uses git repo_root ~stdout:patch_file [ "-c" ; "diff.noprefix=false" ; "diff" ; "--no-ext-diff" ; "-R" ; "-p" ; rref; "--" ] (which means: -c is "combined", no-ext-diff to use builtin one, -R swap args (new/old), -p "something to use with patch, a unified diff", diff.noprefix=false ensure a -p1 is generated)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.