Giter Site home page Giter Site logo

vitali-fedulov / imagehash Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 44 KB

Fast image similarity search with hash tables (Golang). Version 1

License: MIT License

Go 100.00%
image-hash image-hashing image-similarity near-duplicate near-duplicate-detection similar-images similarity similarity-detection similarity-search

imagehash's Introduction

Fast similar image search with Go ➔ LATEST version

This is a fast and RAM-friendly hash-table-based image comparison package for large image collections (thousands and more). Resized and near-duplicate images can be found with it.

Demo (image search and clustering)

Algorithm

Go doc

Major (semantic) versions have their own repositories and are mutually incompatible:

Major version Repository Comment
1 imagehash (this) has a minor generalization defect, but still very good
2 imagehash2 recommended, with the generalization defect fixed

Parameters

When numBuckets parameter is low (~4), the package is a rough pre-filtering first step. Then the second precise step is needed with images4 on the image set produced in the first step. This 2 step sequence (imagehash > images4) is necessary, because direct one-to-all comparison with images4 might be slow for very large image collections. For small image sets it is easier to skip the first step altogether.

When numBuckets is very high (~200), be sure to do a few tests, because, as in the example below, only 10 dimensions (pixels in Y channel) are used from the total 11x11*3=363 pixel values in the icon. This could under-represent some images, because bucket width is very small for high numBuckets. This under-representation issue is fixed in version 2 (imagehash2).

Example of comparing 2 photos using imagehash

The demo shows only the hash-based similarity testing (without making actual hash table). But hash table is implied in full implementation.

package main

import (
	"fmt"
	"github.com/vitali-fedulov/imagehash"
	"github.com/vitali-fedulov/images4"
)

const (
	// Recommended hyper-space parameters for initial trials.

	// Experiment by increasing numBuckets from 4 to 230 or higher.
	// It will make your searches faster, more precise, but maybe too strict.
	// It corresponds to the level of granularity of hyperspace quantization.
	// The higher the value, the more granular is N-space sub-division.
	// This example uses 10-dimensional vectors, splitting the 10-space into
	// 4^10 = 1048576 hypercubes. 4 splits one pixel brightness values into
	// 4 buckets. For numBuckets = 230, there will be 4×10²³ possible hypercubes.
	numBuckets = 4

	// I usually do not change epsPct parameter.
	// epsPct defines the range of uncertainty at hypercube borders,
	// when a nearest similar point may end up in the nearby hypercube,
	// thus having a different hash. The larger the value, the larger
	// the uncertainty range is. Larger values may produce larger hashSets,
	// which could be compute-expensive. 0.25 corresponds to 25% of bucket
	// width.
	epsPct = 0.25
)

func main() {

	// Open and decode photos (skipping error handling for clarity).
	img1, _ := images4.Open("1.jpg")
	img2, _ := images4.Open("2.jpg")

	// Icons are compact image representations needed for comparison.
	icon1 := images4.Icon(img1)
	icon2 := images4.Icon(img2)

	// Hash table values.

	// Value to save to the hash table as a key with corresponding
	// image ids. Table structure: map[centralHash][]imageId.
	// imageId is simply an image number in a directory tree.
	centralHash := imagehash.CentralHash(
		icon1, imagehash.HyperPoints10, epsPct, numBuckets)

	// Hash set to be used as a query to the hash table. Each hash from
	// the hashSet has to be checked against the hash table.
	// See more info in the package "hyper" README.
	hashSet := imagehash.HashSet(
		icon2, imagehash.HyperPoints10, epsPct, numBuckets)

	// Checking hash matches. In full implementation this will
	// be done on the hash table map[centralHash][]imageId.
	foundSimilarImage := false
	for _, hash := range hashSet {
		if centralHash == hash {
			foundSimilarImage = true
			break
		}
	}

	// Image comparison result.
	if foundSimilarImage {
		fmt.Println("Images are approximately similar.")
	} else {
		fmt.Println("Images are distinct.")
	}

	// Then use func Similar of package images4 for final
	// confirmation of image similarity. That is:
	// if images4.Similar(icon1, icon2) == true {
	//    fmt.Println("Images are definitely similar")
	// }
}

For advanced users

An alternative to using images4 package is generating multiple hash sets on different pixel sub-sets of the icon with package imagehash, so that search results of one hash set can be joined with another, or several hash sets. Each join operation will improve the result. Look at var HyperPoints10 description to understand how to create such different pixel sub-sets.

imagehash's People

Contributors

vitali-fedulov avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

miracvbasaran

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.