mendsley / bsdiff Goto Github PK

bsdiff and bspatch are libraries for building and applying patches to binary files.

License: Other

Shell 0.43% C 94.84% Makefile 1.00% M4 3.74%

bsdiff's Introduction

bsdiff/bspatch

bsdiff and bspatch are libraries for building and applying patches to binary files.

The original algorithm and implementation was developed by Colin Percival. The algorithm is detailed in his paper, Naïve Differences of Executable Code. For more information, visit his website at http://www.daemonology.net/bsdiff/.

I maintain this project separately from Colin's work, with the goal of making the core functionality easily embeddable in existing projects.

Contact

@MatthewEndsley
https://github.com/mendsley/bsdiff

License

This project is governed by the BSD 2-clause license. For details see the file titled LICENSE in the project root folder.

Overview

There are two separate libraries in the project, bsdiff and bspatch. Each are self contained in bsdiff.c and bspatch.c The easiest way to integrate is to simply copy the c file to your source folder and build it.

The overarching goal was to modify the original bsdiff/bspatch code from Colin and eliminate external dependencies and provide a simple interface to the core functionality.

I've exposed relevant functions via the _stream classes. The only external dependency not exposed is memcmp in bsdiff.

This library generates patches that are not compatible with the original bsdiff tool. The incompatibilities were motivated by the patching needs for the game AirMech https://www.carbongames.com and the following requirements:

Eliminate/minimize any seek operations when applying patches
Eliminate any required disk I/O and support embedded streams
Ability to easily embed the routines as a library instead of an external binary
Compile+run on all platforms we use to build the game (Windows, Linux, NaCl, OSX)

Compiling

The libraries should compile warning free in any moderately recent version of gcc. The project uses <stdint.h> which is technically a C99 file and not available in Microsoft Visual Studio. The easiest solution here is to use the msinttypes version of stdint.h from https://code.google.com/p/msinttypes/. The direct link for the lazy people is: https://msinttypes.googlecode.com/svn/trunk/stdint.h.

If your compiler does not provide an implementation of <stdint.h> you can remove the header from the bsdiff/bspatch files and provide your own typedefs for the following symbols: uint8_t, uint64_t and int64_t.

Examples

Each project has an optional main function that serves as an example for using the library. Simply defined BSDIFF_EXECUTABLE or BSPATCH_EXECUTABLE to enable building the standalone tools.

Reference

bsdiff

struct bsdiff_stream
{
	void* opaque;
	void* (*malloc)(size_t size);
	void  (*free)(void* ptr);
	int   (*write)(struct bsdiff_stream* stream,
				   const void* buffer, int size);
};

int bsdiff(const uint8_t* old, int64_t oldsize, const uint8_t* new,
           int64_t newsize, struct bsdiff_stream* stream);

In order to use bsdiff, you need to define functions for allocating memory and writing binary data. This behavior is controlled by the stream parameter passed to to bsdiff(...).

The opaque field is never read or modified from within the bsdiff function. The caller can use this field to store custom state data needed for the callback functions.

The malloc and free members should point to functions that behave like the standard malloc and free C functions.

The write function is called by bsdiff to write a block of binary data to the stream. The return value for write should be 0 on success and non-zero if the callback failed to write all data. In the default example, bzip2 is used to compress output data.

bsdiff returns 0 on success and -1 on failure.

bspatch

struct bspatch_stream
{
	void* opaque;
	int (*read)(const struct bspatch_stream* stream,
	            void* buffer, int length);
};

int bspatch(const uint8_t* old, int64_t oldsize, uint8_t* new,
            int64_t newsize, struct bspatch_stream* stream);

The bspatch function transforms the data for a file using data generated from bsdiff. The caller takes care of loading the old file and allocating space for new file data. The stream parameter controls the process for reading binary patch data.

The opaque field is never read or modified from within the bspatch function. The caller can use this field to store custom state data needed for the read function.

The read function is called by bspatch to read a block of binary data from the stream. The return value for read should be 0 on success and non-zero if the callback failed to read the requested amount of data. In the default example, bzip2 is used to decompress input data.

bspatch returns 0 on success and -1 on failure. On success, new contains the data for the patched file.

bsdiff's People

Contributors

Stargazers

Watchers

Forkers

alepharchives bywyu carbongames cmartinbaughman sudoplz qfiard soneyworld oztc justinzhou wendal gcrean bmharper charlesjean jamessmithsage giuseppe uikit0 swat32exe gubaojian cooloppo mastahype sanikoyes milewang cloudhi wangqi504635 cgwalters vincent-le-normand takeshineshiro simonren wildgenie tomsparrow25 melangex piaoapiao bensonchen again4you samuelhuang bb-coder yuesahanjiang wangzl2011 bogon happy-ferret chonrp27512 zeratel w2hhda btcreate ccpgames happyyang fifa0329 widebluesky ossystems vocky yilab quxiaofeng plynkus lee1124amy icedream veselov 01hyang hauntid moriartyz stb-tester hiekay laizhouzhang 377376701 yongqingli jiejacket zyxxoo paulo-casanova lingfeng0303 nferreira es-yincheng jeffpc1993 thesamprice lbrb garsonlab tangyu1018 arunkumar-mourougappane utansuo q360344070 interestingandroid hellomercury arncarveris liufeng420 frankswu maple-yang lukw00heck jlhuu leiyuwei lynnpi jiangzhhhh hfighter rainutopia walter-xie magelive seth-yang a-littlebear githubzhaoliang liu2guang tempbottle drake127 solotic

bsdiff's Issues

Recursive directory diff support

Recursive directory diff is a very important feature RTPatch has, and is missing on bsdiff.

Current code only supports files up to 2GB in size

As it currently stands, the code only supports files up to 2GB in size, as it relies on read being able to return the entire file length in a single call. This also assumes that the underlying device is ready and capable of returning the entire file in a single shot. Both can be untrue under certain conditions.

Here is a quick patch to allow reading more than 2GB in size (just for the diff side of things currently):

index 628f1c1..481f00f 100644
--- a/bsdiff.c
+++ b/bsdiff.c
@@ -373,6 +373,19 @@ static int bz2_write(struct bsdiff_stream* stream, const void* buffer, int size)
        return 0;
 }
 
+static off_t readFileTo(int fd, off_t size, uint8_t* buf)
+{
+       off_t bytesRead = 0;
+       int inc = 0;
+       while (bytesRead < size)
+       {
+               inc = read(fd, buf + bytesRead, size - bytesRead);
+               if (inc > 0) bytesRead += inc;
+               else break;
+       }
+       return bytesRead;
+}
+
 int main(int argc,char *argv[])
 {
        int fd;
@@ -397,7 +410,7 @@ int main(int argc,char *argv[])
                ((oldsize=lseek(fd,0,SEEK_END))==-1) ||
                ((old=malloc(oldsize+1))==NULL) ||
                (lseek(fd,0,SEEK_SET)!=0) ||
-               (read(fd,old,oldsize)!=oldsize) ||
+               (readFileTo(fd,oldsize,old)!=oldsize) ||
                (close(fd)==-1)) err(1,"%s",argv[1]);
 
 
@@ -407,7 +420,7 @@ int main(int argc,char *argv[])
                ((newsize=lseek(fd,0,SEEK_END))==-1) ||
                ((new=malloc(newsize+1))==NULL) ||
                (lseek(fd,0,SEEK_SET)!=0) ||
-               (read(fd,new,newsize)!=newsize) ||
+               (readFileTo(fd,newsize,new)!=newsize) ||
                (close(fd)==-1)) err(1,"%s",argv[2]);
 
        /* Create the patch file */

VLE encoding and fix for #1

Hello @mendsley,

I do not think you might be interested on the following changes, but if you are just drop me a line and I will create one or two pretty pull requests for these : )

a) breaking change
I have replaced the fixed bucket size of [8*3] bytes for patches as seen in offtin and offtout functions to a variable length encoding that can take from 1 to 10 bytes for each 64-bit signed integer encoded. That makes a range from 3 bytes (best case) to 30 bytes (worst case) instead of a 24-bytes fixed case.

Pros:

smaller patches to the wire, which is more important to me.

Cons:

patches are not backwards compatible anymore (neither with bsdiff tool).
someway slower (overhead from 3 to 30 callback calls instead of a single call of 24 bytes)

b) fix for #1
I have applied the solution as seen on this thread

Pros:

It seems to work.

Cons:

it would need some more (real) extensive testing and/or proper unit testing to see I did not break it.
the license for SAIS is MIT instead of BSD-2 :/

Happy new year btw,

r-lyeh

bsdiff with zlib instead of bz2

Hello,

Is bsdiff compatible with other compression libraries?

I need a version with zlib, so I tried replacing bz2 write/read with zlib equivalent.

bsdiff seems to work however when running bspatch the sanity check fails every time:

		/* Sanity-check */
		if (ctrl[0]<0 || ctrl[0]>INT_MAX ||
			ctrl[1]<0 || ctrl[1]>INT_MAX ||
			newpos+ctrl[0]>newsize)
			return -1;

I don't understand why but the values in the ctrl array are way too high are negative.

This is what I got by running a debug session:

(gdb) p ctrl
$1 = {2248591341461585215, -5979881847581223336, -7917639821140655946}

Opposed to the bz2 version which can give something like this:

(gdb) p ctrl
$4 = {0, 3000, 589}

Is there any advice to adapt bsdiff for a different compression method?

Thanks,

Benjamin

PS: here are the stream functions I adapted to zlib:

static int bz2_read(const struct bspatch_stream* stream, void* buffer, int length)
{
	int bytes;
	gzFile* gz;

	gz = (gzFile*) stream->opaque;
	bytes = gzread(*gz, buffer, length);
	if (0 == bytes) {
		return -1;
	}

	return 0;
}

static int bz2_write(struct bsdiff_stream* stream, const void* buffer, int size)
{
	int bytes;
	gzFile* gz;

	gz = (gzFile*) stream->opaque;
	bytes = gzwrite(*gz, buffer, size);
	if (0 == bytes) {
		return -1;
	}

	return 0;
}

questions about api

Hello
I am looking for a bug in bsdiff using fuzzer
This is the plan to be reported to you.

However, there are questions
Do you know the bsdiff API list or document?
I really want to make sure that the bugs I found are triggered outside.
BR

Question about CPU load 100%

Use the bsdiff algorithm to perform file differences, and find that only one CPU is 100% loaded, and the other CPUs are idle. Why is the bsdiff algorithm not a parallel difference algorithm?

Can't build on Mac (Catalina)

Hello
When I try to make build bsdiff or bspatch on Mac (Catalina) I got this error:

cc bspatch.c -o bspatch
Undefined symbols for architecture x86_64:
"_main", referenced from:
implicit entry/start for main executable
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [bspatch] Error 1

Undoing patches

Hi
I am looking to use bsdiff to keep a record of the changes I make to a file. As I see it, I keep the original file and then I can keep patching all the diffs to that file to get to the current state of that file. However I have been wondering if I can go the other way around and if I have the current state of the file, can I "strip" off patches and get back to the original state of the file? I don't know if that reads sensically at all!
If I have a file, let's call them v0.x for example,
V0.1
V0.2
V0.3

Then bsdiff will give me the diffs between each version. If I then have v0.1 I can patch it twice to get to v0.3.
However if I have v0.3 is there a way I can patch it to get back to v0.2 and therefrom 0.1?
I have the diff between v0.2 and v0.3 but I guess that's not the same as having the diff of v0.3 and v0.2 is it?
I had wondered if there was a nifty XOR of the diff or something that could be done to go backwards??

I would appreciate any help you can give. Thanks

Alex

Question about memory usage

My mistake. Please close.

Reported crashes with 4.3

Hi Matthew,

There are reports about crashes with bsdiff 4.3
http://stackoverflow.com/questions/12751775/why-does-bsdiff-exe-have-trouble-with-this-smaller-file

I wonder if you investigated the problem?

missing ; in bspatch.c on line 161

this cause compile failed

diff/patch is wrong

When i use this repository,i diff and patch local successfull , but maybe you not use the right BSDIFF arithmetic，Because Android and Server (This repository https://github.com/malensek/jbsdiff) use the same text , They can diff and patch right,but iOS platform diff binary post to server,it's wrong。

SPDX identifier to License

Would be nice, if the license had actually the official SPDX-identifier in the license.
I believe the SPDX identifer would be SPDX short identifier: BSD-2-Clause.

Can the file write in bspatch be changed to stream?

Could bspatch write out a flush as it goes to avoid needing to allocate the entire file?
It isn't clear to me if newpos is strictly increasing or not -- are ctrl[1] and ctrl[0] always positive?

bsdiff _without_ compression

hi,
do you know of anyone who has produced a variation of bsdiff/bspatch that does not use bzlib or any other form of compression? my reason: i'm after creating binaries of bsdiff and bspatch that do not require any external libraries (such as bzlib or libc) that can therefore run on a wide range of linux distros without recompiling. no compression is needed as gzip is available anyway to compress/uncompress the difference file.

i'm a 'poor' C programmer, normally using pascal (FPC/lazarus), and this is for use with a GUI application (10,000 lines of source) that runs under linux. it is part of an attempt to work around the recent glibc symbol versioning changes that break binary backwards compatibility with lazarus GUI programs.

cheers,
robert rozee

y=-y is spurious

Thanks for making the code of bspatch.c available on GitHub. I noticed offtin has an unneeded test:

if(buf[7]&0x80) y=-y;

I don't believe the high bit can ever be set because of y=buf[7]&0x7F executed earlier in offtin.

And negation on an unsigned is questionable. The sign will never change as is expected with negation.

static int64_t offtin(uint8_t *buf)
{
	int64_t y;

	y=buf[7]&0x7F;
	y=y*256;y+=buf[6];
	y=y*256;y+=buf[5];
	y=y*256;y+=buf[4];
	y=y*256;y+=buf[3];
	y=y*256;y+=buf[2];
	y=y*256;y+=buf[1];
	y=y*256;y+=buf[0];

	if(buf[7]&0x80) y=-y;

	return y;
}

problem: errx(1, "Corrupt patch\n");

I bsdiff the file by command on PC and I want to bspatch the file on the iphone,but when I run the code,I found here:
/* Check for appropriate magic */
if (memcmp(header, "ENDSLEY/BSDIFF43", 16) != 0)
errx(1, "Corrupt patch\n");
I will go into if, so it won't ran following code.
I want to know why,please help me!

Using keyword new as a variable name

The title says it all