Giter Site home page Giter Site logo

jedisct1 / libpuzzle Goto Github PK

View Code? Open in Web Editor NEW
264.0 34.0 70.0 256 KB

A library to quickly find visually similar images

Home Page: http://libpuzzle.pureftpd.org

License: ISC License

Shell 0.39% PHP 12.08% C 78.56% Makefile 2.03% M4 4.63% C++ 2.32%

libpuzzle's Introduction

                               .:. LIBPUZZLE .:.
                               
                         http://libpuzzle.pureftpd.org


            ------------------------ BLURB ------------------------


The Puzzle library is designed to quickly find visually similar images (gif,
png, jpg), even if they have been resized, recompressed, recolored or slightly
modified.

The library is free, lightweight yet very fast, configurable, easy to use and
it has been designed with security in mind. This is a C library, but it also
comes with a command-line tool and PHP bindings.


          ------------------------ REFERENCE ------------------------
          
          
The Puzzle library is a implementation of "An image signature for any kind of
image", by H. CHI WONG, Marschall BERN and David GOLDBERG.


         ------------------------ COMPILATION ------------------------
         
         
In order to load images, the library relies on the GD2 library.
You need to install gdlib2 and its development headers before compiling
libpuzzle.
The GD2 library is available as a pre-built package for most operating systems.
Debian and Ubuntu users should install the "libgd2-dev" or the "libgd2-xpm-dev"
package.
Gentoo users should install "media-libs/gd".
OpenBSD, NetBSD and DragonflyBSD users should install the "gd" package.
MacPorts users should install the "gd2" package.
X11 support is not required for the Puzzle library.

Once GD2 has been installed, configure the Puzzle library as usual:

./configure

This is a standard autoconf script, if you're not familiar with it, please
have a look at the INSTALL file.

Compile the beast:

make

Try the built-in tests:

make check

If everything looks fine, install the software:

make install

If anything goes wrong, please submit a bug report to:
                       libpuzzle [at] pureftpd [dot] org


            ------------------------ USAGE ------------------------
         
         
The API is documented in the libpuzzle(3) and puzzle_set(3) man pages.
You can also play with the puzzle-diff test application.
See puzzle-diff(8) for more info about the puzzle-diff application.

In order to be thread-safe, every exported function of the library requires a
PuzzleContext object. That object stores various run-time tunables.

Out of a bitmap picture, the Puzzle library can fill a PuzzleCVec object :

  PuzzleContext context;
  PuzzleCVec cvec;
  
  puzzle_init_context(&context);
  puzzle_init_cvec(&context, &cvec);
  puzzle_fill_cvec_from_file(&context, &cvec, "directory/filename.jpg");

The PuzzleCvec structure holds two fields:
  signed char *vec:  a pointer to the first element of the vector
  size_t sizeof_vec: the number of elements
  
The size depends on the "lambdas" value (see puzzle_set(3)).

PuzzleCvec structures can be compared:

  d = puzzle_vector_normalized_distance(&context, &cvec1, &cvec2, 1);
  
d is the normalized distance between both vectors. If d is below 0.6, pictures
are probably similar.

If you need further help, feel free to subscribe to the mailing-list (see
below).


          ------------------------ INDEXING ------------------------
         

How to quickly find similar pictures, if they are millions of records?

The original paper has a simple, yet efficient answer.

Cut the vector in fixed-length words. For instance, let's consider the
following vector:

[ a b c d e f g h i j k l m n o p q r s t u v w x y z ]

With a word length (K) of 10, you can get the following words:

[ a b c d e f g h i j ] found at position 0
[ b c d e f g h i j k ] found at position 1
[ c d e f g h i j k l ] found at position 2
etc. until position N-1

Then, index your vector with a compound index of (word + position).

Even with millions of images, K = 10 and N = 100 should be enough to have very
little entries sharing the same index.

Here's a very basic sample database schema:

+-----------------------------+
|          signatures         |
+-----------------------------+
| sig_id | signature | pic_id |
+--------+-----------+--------+

+--------------------------+
|           words          |
+--------------------------+
| pos_and_word | fk_sig_id |
+--------------+-----------+

I'd recommend splitting at least the "words" table into multiple tables and/or
servers.

By default (lambas=9) signatures are 544 bytes long. In order to save storage
space, they can be compressed to 1/third of their original size through the
puzzle_compress_cvec() function. Before use, they must be uncompressed with
puzzle_uncompress_cvec().


         ------------------------ PUZZLE-DIFF ------------------------
         

A command-line tool is also available for scripting or testing.

It is installed as "puzzle-diff" and comes with a man page.

Sample usage:

- Output distance between two images:

$ puzzle-diff pic-a-0.jpg pics-a-1.jpg
0.102286

- Compare two images, exit with 10 if they look the same, exit with 20 if
they don't (may be useful for scripts):

$ puzzle-diff -e pic-a-0.jpg pics-a-1.jpg
$ echo $?
10

- Compute distance, without cropping and with computing the average intensity
of the whole blocks:

$ puzzle-diff -p 1.0 -c pic-a-0.jpg pic-a-1.jpg
0.0523151


  ------------------------ COMPARING IMAGES WITH PHP ------------------------
  
  
A PHP extension is bundled with the Libpuzzle package, and it provides PHP
bindings to most functions of the library.

Documentation for the Libpuzzle PHP extension is available in the README-PHP
file.
         

    ------------------------ APPS USING LIBPUZZLE ------------------------


Here are third-party projects using libpuzzle:

* ftwin - http://jok.is-a-geek.net/ftwin.php
  ftwin is a tool useful to find duplicate files according to their content on
your file system.

* Python bindings for libpuzzle: PyPuzzle
  https://github.com/ArchangelSDY/PyPuzzle


           ------------------------ STATUS ------------------------


This project is unfortunately not maintained any more. Pull requests are
always welcome, but I don't use this library any more and I don't have enough
spare time to actively work on it.

libpuzzle's People

Contributors

archangelsdy avatar benny- avatar fujidig avatar jedisct1 avatar waldyrious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libpuzzle's Issues

Installed in the server, server says everything is OK, but no functions wont work

Basically, I don't know where else to turn. We requested our host master to install libpuzzle with PHP extension. They wrote:
root@node01 [~]# php -i | grep libpuzzle
libpuzzle
libpuzzle support => enabled

Which really doesn't tell me anything. But confirm_libpuzzle_compiled() is failing. Very simply, the functions don't work. We have tried to get the files from http://www.pureftpd.org/project/libpuzzle and from here, but the problem remains.

extension_loaded("libpuzzle") returns true.

But functions are empty, yet PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD returns 0.2.

I just don't know where to go from here.

vector_normalized_distance varies depending on size of image

A user of my Perl module, Image::Libpuzzle has created an issue for my repo of the same title.

357r4bd/Image-Libpuzzle#5

I can reproduce it using puzzle-diff, so it is not an issue with my module (itself a wrapper around libpuzzle).

I am using libpuzzle-0.11 on OSX. I am not sure what OS/lib version the reporter of the original issue (above) is using.

Attached are the images and a script to demonstrate the issue. The script used to generated the scaled versions of the original image is also included (generate.pl).
issue-files.zip

Below is 2 sets of output, one using my module (compare.pl) and a shell script using puzzle-diff (compare.sh):

Using Image::Libpuzzle:

$ ./compare.pl
100: 0.634182997760008
200: 0.61913918736689
300: 0.0698092831405639
400: 0.604140770698438
500: 0.62196043417488
600: 0.0420496681819964
700: 0.605117883422495
800: 0.622808505614176
900: 0.0419672983694311
1000: 0.606714082142257

Using puzzle-diff:

$ ./compare.sh
100: 0.634183
200: 0.619139
300: 0.0698093
400: 0.604141
500: 0.62196
600: 0.0420497
700: 0.605118
800: 0.622809
900: 0.0419673
1000: 0.606714

Please advise. Thank you.

previous_level never updated?

Just browsing the source for dvec.c, in puzzle_autocrop_axis() it seems that previous_level is initialized to zero but never updated. Shouldn't it be updated each time through the inner do loop and then reset afterward? The way it is now, it seems that the chunk_contrast calculation is terribly off...

Transparency is not taken into account

I use puzzle-diff -e -E 0 input.png output.png for all images here.

Some equal images are incorrectly marked unequal.
Some unequal images are incorrectly marked equal.

The following images are not equal according to puzzle-diff:
Applebloom and Babs Seed
Applebloom, Babs Seed and a hidden demon

The following images are equal according to puzzle-diff:
Applebloom and Babs Seed
Only Babs Seed
Only Applebloom

The following images are equal according to puzzle-diff:
disciplinarykill
revolverkill
disciplinarykill.alphaless
caberkill

Everything here has to do with transparency. Fully transparent pixels can still have colors, libpuzzle incorrectly takes these invisible colors into account.

You can see the colors by removing the alpha using image-magick:

convert "./input.png" -alpha off "./alphaless.png"

Failed REGRESS-1 Test

I pulled the .11 version to my Ubuntu system. Compiled the code and ran the tests. The first test did not pass. What information do you need from me to debug this issue?

Missing Configuration Script?

First, thanks for making this software available. I'm looking forward to working with it. However, when I did a 'git clone' of the project, I was unable to run './configure' in the resulting directory. I downloaded the project from http://download.pureftpd.org/pub/pure-ftpd/misc/libpuzzle/releases/libpuzzle-0.11.tar.gz and was able to configure without any issues. Can you update the compilation instructions so that they tell how to compile the github code? Or add the './configure' script and supporting files?

Thanks.

Public download of the compiled lib ?

Hello,
I'd like to use this lib with PHP on a 12 factor app (On the heroku platfom).

And I'd like to know if there was a way to just install (GET) it without re-compiling it, providing my version of php etc.

Or could it be installed with composer/packagist ?

Or are there any other php image detection lib that does that anywhere ?

Thanks ! :)

Segfault on php 5.4.x with ZTS

Hi Frank,

I was trying to compile it to run on PHP with ZTS support, but it's causing a segmentation fault. Is it a known issue ? Do you plan to support ZTS on PHP ?

Kind regards,

puzzle_fill_cvec_from_file method return false on a valid image

Hi,
i'm having problem trying to generate the signature form a jpeg file.
using puzzle_fill_cvec_from_file('path/to/file.jpg');
I don't know why but it return false, instead than a signature (as all the other images i am processing).

Anybody had the same issue?

setting lambdas to anything other than 9 triggers an error

I've tried this with php and python using the cbinding and whenever I use lambda values other 9 and do the compress/uncompress pattern the code throws an error:

BUG File: [compress.c] Line: [59]
Aborted

Works fine in puzzle-diff and if I don't use compression

Maintainer

Is there someone who would like to step up as new maintainer for this project?

IndexError for Ruby FFI wrapper

I'm trying to write a Ruby FFI wrapper around libpuzzle, which you can find here: https://github.com/neezer/ffi-libpuzzle

I'm just trying to get a cvec vector (compressed or uncompressed) for an image, to persist in my database, using the wrapper above like this:

require 'ffi-libpuzzle'
file_path = File.expand_path(File.dirname(__FILE__), 'test/fixtures/test.jpg')
s = Puzzle.new file_path
s.cvec_sig

Which results in this error:

Memory access offset=8 size=1 is out of bounds (IndexError)

I think I have the C struct correctly mirrored in FFI, in lib/ffi-libpuzzle/libpuzzle.rb, but I'm really new to this and don't know C really well. Anyone have any insights as to what I'm doing wrong here?

NOTE: I know that this issue isn't about libpuzzle directly, but I figured that there would be other more knowledgable folk than I who might know how to make this work, so I figured it was worth posting here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.