Giter Site home page Giter Site logo

simhash-py's Introduction

Simhash Near-Duplicate Detection

Build Status

Status: Production Team: Big Data Scope: External Open Source: MIT Critical: Yes

This library enables the efficient identification of near-duplicate documents using simhash using a C++ extension.

Usage

simhash differs from most hashes in that its goal is to have two similar documents produce similar hashes, where most hashes have the goal of producing very different hashes even in the face of small changes to the input.

The input to simhash is a list of hashes representative of a document. The output is an unsigned 64-bit integer. The input list of hashes can be produced in several ways, but one common mechanism is to:

  1. tokenize the document
  2. consider overlapping shingles of these tokens (simhash.shingle)
  3. hash these overlapping shingles
  4. input these hashes into simhash.compute

This has the effect of considering phrases in a document, rather than just a bag of the words in it.

Once we've produced a simhash, we would like to compare it to other documents. For two documents to be considered near-duplicates, they must have few bits that differ. We can compare two documents:

import simhash

a = simhash.compute(...)
b = simhash.compute(...)
simhash.num_differing_bits(a, b)

One of the key advantages of simhash is that it does not require O(n^2) time to find all near-duplicate pairs from a set of hashes. Given a whole set of simhashes, we can find all pairs efficiently:

import simhash

# The `simhash`-es from our documents
hashes = []

# Number of blocks to use (more in the next section)
blocks = 4
# Number of bits that may differ in matching pairs
distance = 3
matches = simhash.find_all(hashes, blocks, distance)

All the matches returned are guaranteed to be all pairs where the hashes differ by distance bits or fewer. The blocks parameter is less intuitive, but is best described in this article or in the paper. The best parameter to choose depends on the distribution of the input simhashes, but it must always be at least one greater than the provided distance.

Internally, find_all takes blocks C distance passes to complete. The idea is that as that value increases (for instance by increasing blocks), each pass completes faster. In terms of memory, find_all takes O(hashes + matches) memory.

Building

This is installable via pip:

pip install git+https://github.com/seomoz/simhash-py.git

It can also be built from git:

git submodule update --init --recursive
python setup.py install

or

pip install simhash-py

under osx, you should

export MACOSX_DEPLOYMENT_TARGET = 10.x (10.9,10.10...)

first

Benchmark

This is a rough benchmark, but should help to give you an idea of the order of magnitude for the performance available. Running on a single core on a vagrant instance on a 2015 MacBook Pro:

$ ./bench.py --random 1000000 --blocks 5 --bits 3
Generating 1000000 hashes
Starting Find all
     Ran Find all in 1.595416s

Architecture

Each document gets associated with a 64-bit hash calculated using a rolling hash function and simhash. This hash can be thought of as a fingerprint for the content. Two documents are considered near-duplicates if their hashes differ by at most k bits, a parameter chosen by the user.

In this context, there is a large corpus of known fingerprints, and we would like to determine all the fingerprints that differ by our query by k or fewer bits. To accomplish this, we divide up the 64 bits into at m blocks, where m is greater than k. If hashes A and B differ by at most k bits, then at least m - k groups are the same.

Choosing all the unique combinations of m - k blocks, we perform a permutation on each of the hashes for the documents so that those blocks are first in the hash. Perhaps a picture would illustrate it better:

63------53|52------42|41-----32|31------21|20------10|09------0|
|    A    |     B    |    C    |     D    |     E    |    F    |

If m = 6, k = 3, we'll choose permutations:
- A B C D E F
- A B D C E F
- A B E C D F
...
- C D F A B E
- C E F A B D
- D E F A B C

This generates a number of tables that can be put into sorted order, and then a small range of candidates can be found in each of those tables for a query, and then each candidate in that range can be compared to our query.

The corpus is represented by the union of these tables, could conceivably be hosted on a separate machine. And each of these tables is also amenable to sharding, where each shard would comprise a contiguous range of numbers. For example, you might divide a table into 256 shards, where each shard is associated with each of the possible first bytes.

The best partitioning remains to be seen, likely from experimentation, but the basis of this is the table. The table tracks hashes inserted into it subject to a permutation associated with the table. This permutation is described as a vector of bitmasks of contiguous bit ranges, whose populations sum to 64.

Example

Let's suppose that our corpus has a fingerprint:

0100101110111011001000101111101110111100001010011101100110110101

and we have a query:

0100101110111011011000101111101110011100001010011100100110110101

and they differ by only three bits which happen to fall in blocks B, D and E:

63------53|52------42|41-----32|31------21|20------10|09------0|
|    A    |     B    |    C    |     D    |     E    |    F    |
|         |          |         |          |          |         |
0000000000000000010000000000000000100000000000000001000000000000

Since any fingerprint matching the query differs by at most 3 bits, at most 3 blocks can differ, and at least 3 must match. Whatever table has the 3 blocks that do not differ as the leading blocks will match the query when doing a scan. In this case, the table that's permuted A C F B D E will match. It's important to note that it's possible for a query to match from more than one table. For example, if two of the non-matching bits are in the same block, or the query differs by fewer than 3 bits.

32-Bit Systems

The only requirement of simhash-py is that it has uint64_t.

simhash-py's People

Contributors

acteq avatar b4hand avatar oyiptong avatar rmax avatar rth avatar scriptedworld avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simhash-py's Issues

simhash.find_all

I install this package on windows system with py-3.5 environment ,but I find the function
" simhash.num_differing_bits" can be called ,but when I call the function "simhash.find_all" ,it has some errors,and not return result.

Create PyPi release

As brought up in #32, we should have a public pypi release available.

However, there's already a pypi package called simhash, but historically, this project has used import simhash despite the name conflict.

I'm not sure how much of a problem it is to have the package name differ from the module name, but we should figure out some way to get this published. That may involve renaming the package, if necessary.

Results of simhash.find_all()

I have the following code where 'a' and 'b' have the same value. However, they are not considered as simhash pairs by the simhash.find_all() method.

a = 8550830854347186281
b = 8550830854347186281

print ("Inputs differ in "+str(simhash.num_differing_bits(a, b))+" bits.")

all_simhash_pairs = simhash.find_all([a,b], 2, 1)
print ("Simhash Pair counts = "+str(len(all_simhash_pairs)))

Shouldn't they be included in the results? Or, maybe I didn't understand simhashing properly.

find_all() returns wrong results

import simhash

corpus = simhash.Corpus(6, 3)
corpus.insert(1)
corpus.insert(2)
corpus.find_all(2)

Running this code returns:
[2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L]

test_get_all fails

For some reason, find/find_all works well works, but calling hashes() on a corpus results in completely wrong hashes being returned.

Running OSX 10.10.3 with Python 2.7.6, just cloned from the repo and followed the instructions.

Could it have something to do with 64 vs 32bit when compiling, which I am getting errors about?

During compile I get a lot of these errors:

warning: implicit conversion loses integer precision: 'uint64_t' (aka 'unsigned long long') to 'Simhash::hash_t' (aka 'unsigned long') [-Wshorten-64-to-32]

trouble building

I'm trying to compile simhash-py on Mac OS X 10.8 and it doesn't seem to be working for me.

I've got:
Python 2.7.3 installed via pythonbrew
A virtualenv managed by pythonbrew
Cython installed via pip in the venv
Judy installed via brew

Running the setup.py from simhash-py just gives me sadness.

Here's the pip freeze output:

$ pip freeze
Cython==0.19.1
PyYAML==3.10
bottle==0.11.6
gevent==0.13.8
greenlet==0.4.1
msgpack-python==0.3.0
pyzmq==13.1.0
requests==1.2.3
smhcluster==0.1.0
wsgiref==0.1.2
zerorpc==0.4.3

$ brew info judy
judy: stable 1.0.5
http://judy.sourceforge.net/
/usr/local/Cellar/judy/1.0.5 (101 files, 1.6M) *
Built from source
https://github.com/mxcl/homebrew/commits/master/Library/Formula/judy.rb

The terminal output for setup.py build is at: https://gist.github.com/oyiptong/cd70b79fc5a3692828d9

Conda-forge package

As mentionned in issue #32, it would have been useful to submit simhash-py to https://conda-forge.github.io/
so that it could be simply installed with

conda install -c conda-forge simhash-py

without the need to compile the C++ code.

Once a Pull Request with the setup is submitted at https://github.com/conda-forge/staged-recipes and accepted, it would create a repository https://github.com/conda-forge/simhash-py-feedstock that would have to be updated for every new release. This should allow to create a pre-built Python package for Linux and possibly Mac OS (Windows won't install won't be available for now cf #31 ). The questions I was wondering about are,

  • who should make that PR submission (I can do it, if necessary)
  • who should be added to the maintainers of that conda-forge/simhash-py-feedstock setup ?
  • there is still the naming issue, discussed in issue #32, the package would be called simhash-py but would be imported as simhash, since the simhash package already exist at https://github.com/conda-forge/simhash-feedstock . Not sure if there is a better way of doing it?

@b4hand What do you think?

seems like find_all return error

hashes= [193817277094257410, 193817277094257410, 10105197385570471215]
matches = simhash.find_all(hashes, 4, 3)
print(matches)

the result is an empty list [], seems like something wrong in the code

License?

@dlecocq Another similar question: what is the license for your fine code?

Installation instructions not working

The readme currently indicates that simhash-py can be installed with,

pip install simhash-py

however there is no such package at https://pypi.python.org/pypi and this command fails with "No matching distribution found for simhash-py".

Besides the latest releases at https://github.com/seomoz/simhash-py/releases are from 2012.

I am currently trying to submit simhash-py to https://conda-forge.github.io/ so that it could be installed using,

conda install -c conda-forge simhash-py

without the need of a compiler.

I can't find any official URL from which a .tar.gz with simhash-py can be downloaded (except for cloning the github repo)...

Would it be possible to upload it to PyPi , or at least push a tag to create a new release on Github? Thank you!

simhash.Corpus constructor hangs the program on Windows 8

Hi,

I have managed to install simhash-py on Windows 8. However, as in #2 and #3, running simhash.Corpus(6,3) hangs the program. Here's the environment description:

  1. OS: Windows 8 Enterprise, x64,
  2. Python: 2.7.8, x64,
  3. Cython: 0.21.2;
  4. Judy - compiled with VS 12.0 C++ compiler targetting amd64. Then copied manually to Python directory;

As a side note, I had to edit Jenkins.h - removed importing "sys/param.h", then simplified determining big vs small endianness (Windows is little endian in all of my environments):

 # define HASH_LITTLE_ENDIAN 1
 # define HASH_BIG_ENDIAN 0

Is simhash-cpp 100x faster than simhash-py?

Hello:

Thanks very much for sharing the great codes! Your works are wonderful.
I have a question about the efficiency of simhash-cpp and simhash-py.

I installed simhash-cpp (https://github.com/seomoz/simhash-cpp) and simhash-py (https://github.com/seomoz/simhash-py) and run the benchmark. I got the following results:
(1) simhash-cpp:
../simhash-cpp/src$ ./bench 1000000
blocks=6, bits=3
Inserting 1000000 hashes...
Running 4000000 queries...
Queries complete with 0 errors
Running time: total=0.705171s, avg=0.17629275us
There are 9999999 items in the table

(2) simhash-py:
../simhash-py/bench.py --random 1000000 --blocks 6 --bits 3
Generating 1000000 hashes
Generating 1000000 queries
Starting Bulk Insertion
Ran Bulk Insertion in 7.518402s, avg: 7.518402us
Starting Bulk Find First
Ran Bulk Find First in 13.021438s, avg: 13.021438us
Starting Bulk Find All
Ran Bulk Find All in 14.687295s, avg: 14.687295us
Starting Bulk Removal
Ran Bulk Removal in 8.982185s, avg: 8.982185us

Based on the above results, I found that the average times over 1000000 hashes of each query are:
simhash-cpp is 0.17629275us and simhash-py is 13.021438us.
So simhash-cpp is about 100x faster than simhash-py. However, I checked the codes of simhash-py. I found that simhash-py is actually built on simhash-cpp. In my view, simhash-py is just a python wrapper of simhash-cpp. So I think simhash-py should be slower than simhash-cpp, but their difference should not up to almost 100x.
My question is why simhash-cpp is about 100x faster than simhash-py.
I don't know if my understanding is right, or if I missed something. If I made something wrong, please correct me!

Thanks!

Explanation in README.md should more descriptive

This project is awesome but when I am trying to use it from my purpose to detect near-duplicate document e.g json, I'm not getting enough information on how to try to do that? It shows only to compute

import simhash

a = simhash.compute(...) 
b = simhash.compute(...)
simhash.num_differing_bits(a, b)

OR how to find matches using

import simhash

hashes = []
blocks = 4
distance = 3
matches = simhash.find_all(hashes, blocks, distance)`

but before that how can I make hashes of my documents? Can anyone update the README.md or post a full step by step example/tutorial to implement this simhash using python?

output of simhash.compute method

I printed the output of simhash.compute() method -- both its type and value. I noticed that the type is integer and value is 19 digit number (eg: 8550830854347186281) . Shouldn't it be a 64 digit fingerprint consisting of only 0s and 1s ?

Install failure

Hi, I'm trying to install your awesome software.

Zichuans-MacBook-Pro:simhash-py zichuanwang$ sudo python setup.py install
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'dependencies'
warnings.warn(msg)
running install
running build
running build_py
running build_ext
skipping 'simhash/table.cpp' Cython extension (up-to-date)
building 'simhash.table' extension
cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c simhash/table.cpp -o build/temp.macosx-10.9-intel-2.7/simhash/table.o
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for C/ObjC but not for C++
simhash/table.cpp:235:37: error: simhash-cpp/src/simhash.h: No such file or directory
simhash/table.cpp:238:36: error: simhash-cpp/src/hash.hpp: No such file or directory
simhash/table.cpp:438: error: ‘Simhash’ has not been declared
simhash/table.cpp:438: error: ISO C++ forbids declaration of ‘Table’ with no type
simhash/table.cpp:438: error: expected ‘;’ before ‘’ token
simhash/table.cpp:439: error: ‘Simhash’ has not been declared
simhash/table.cpp:439: error: ISO C++ forbids declaration of ‘hash_t’ with no type
simhash/table.cpp:439: error: expected ‘;’ before ‘search_mask’
simhash/table.cpp:470: error: ‘Simhash’ has not been declared
simhash/table.cpp:472: error: ‘Simhash’ has not been declared
simhash/table.cpp:473: error: ‘Simhash’ has not been declared
simhash/table.cpp:473: error: expected identifier before ‘
’ token
simhash/table.cpp:473: error: ‘Simhash’ has not been declared
simhash/table.cpp:473: error: ISO C++ forbids declaration of ‘hash_t’ with no type
simhash/table.cpp:473: error: ‘hash_t’ declared as function returning a function
simhash/table.cpp:475: error: ‘Simhash’ has not been declared
simhash/table.cpp:477: error: ‘Simhash’ has not been declared
simhash/table.cpp:478: error: ‘Simhash’ has not been declared
simhash/table.cpp:498: error: ‘Simhash’ has not been declared
simhash/table.cpp:500: error: ‘Simhash’ has not been declared
simhash/table.cpp:501: error: ‘Simhash’ has not been declared
simhash/table.cpp:501: error: ‘Simhash’ has not been declared
simhash/table.cpp:743: error: ‘Simhash’ has not been declared
simhash/table.cpp:743: error: expected ‘,’ or ‘...’ before ‘pyx_v_h’
simhash/table.cpp:745: error: ‘Simhash’ has not been declared
simhash/table.cpp:745: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp:746: error: ‘Simhash’ has not been declared
simhash/table.cpp:746: error: expected initializer before ‘__pyx_f_7simhash_5table_7PyTable_find_first’
simhash/table.cpp:748: error: ‘Simhash’ has not been declared
simhash/table.cpp:748: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:750: error: ‘Simhash’ has not been declared
simhash/table.cpp:750: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:751: error: ‘Simhash’ has not been declared
simhash/table.cpp:751: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:758: error: ‘Simhash’ has not been declared
simhash/table.cpp:758: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:760: error: ‘Simhash’ has not been declared
simhash/table.cpp:760: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:761: error: ‘Simhash’ has not been declared
simhash/table.cpp:761: error: expected ‘,’ or ‘...’ before ‘__pyx_v_a’
simhash/table.cpp:789: error: ‘Simhash’ has not been declared
simhash/table.cpp:789: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp:791: error: ‘Simhash’ has not been declared
simhash/table.cpp:791: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp:792: error: ‘Simhash’ has not been declared
simhash/table.cpp:792: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:794: error: ‘Simhash’ has not been declared
simhash/table.cpp:794: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:796: error: ‘Simhash’ has not been declared
simhash/table.cpp:796: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:797: error: ‘Simhash’ has not been declared
simhash/table.cpp:797: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:806: error: ‘Simhash’ has not been declared
simhash/table.cpp:806: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:808: error: ‘Simhash’ has not been declared
simhash/table.cpp:808: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’
simhash/table.cpp:809: error: ‘Simhash’ has not been declared
simhash/table.cpp:809: error: expected ‘,’ or ‘...’ before ‘__pyx_v_a’
simhash/table.cpp: In function ‘PyObject* _pyx_f_7simhash_5table_PyHash(PyObject, int)’:
simhash/table.cpp:901: error: ‘Simhash’ has not been declared
simhash/table.cpp:901: error: ‘Simhash’ has not been declared
simhash/table.cpp:901: error: ‘_pyx_v_hasher’ was not declared in this scope
simhash/table.cpp: In function ‘PyObject
__pyx_f_7simhash_5table_PyHashFp(PyObject
, int)’:
simhash/table.cpp:1312: error: ‘Simhash’ has not been declared
simhash/table.cpp:1312: error: ‘Simhash’ has not been declared
simhash/table.cpp:1312: error: ‘__pyx_v_hasher’ was not declared in this scope
simhash/table.cpp: In function ‘int __pyx_pf_7simhash_5table_7PyTable___cinit
*(pyx_obj_7simhash_5table_PyTable_, PyObject_, PyObject)’:
simhash/table.cpp:1682: error: ‘Simhash’ was not declared in this scope
simhash/table.cpp:1682: error: template argument 1 is invalid
simhash/table.cpp:1682: error: template argument 2 is invalid
simhash/table.cpp:1682: error: invalid type in declaration before ‘;’ token
simhash/table.cpp:1690: error: ‘Simhash’ is not a class or namespace
simhash/table.cpp:1690: error: expected ;' before ‘__pyx_t_5’ simhash/table.cpp:1751: error: ‘__pyx_t_5’ was not declared in this scope simhash/table.cpp:1751: error: ‘Simhash’ is not a class or namespace simhash/table.cpp:1752: error: request for member ‘push_back’ in ‘__pyx_v_perms’, which is of non-class type ‘int’ simhash/table.cpp:1772: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp:1772: error: expected type-specifier before ‘Simhash’ simhash/table.cpp:1772: error: expected;' before ‘Simhash’
simhash/table.cpp:1781: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘search_mask’
simhash/table.cpp:1781: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp: In function ‘void __pyx_pf_7simhash_5table_7PyTable_2__dealloc
*(pyx_obj_7simhash_5table_PyTable_)’:
simhash/table.cpp:1835: error: ‘struct _pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp: In function ‘PyObject
__pyx_f_7simhash_5table_7PyTable_hashes(_pyx_obj_7simhash_5table_PyTable, int)’:
simhash/table.cpp:1860: error: ‘Simhash’ has not been declared
simhash/table.cpp:1860: error: expected ;' before ‘__pyx_v_it’ simhash/table.cpp:1927: error: ‘__pyx_v_it’ was not declared in this scope simhash/table.cpp:1927: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp:1937: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp: In function ‘PyObject_ __pyx_f_7simhash_5table_7PyTable_insert_bulk(__pyx_obj_7simhash_5table_PyTable_, PyObject_, int)’: simhash/table.cpp:2056: error: ‘Simhash’ has not been declared simhash/table.cpp:2056: error: expected ;' before ‘__pyx_t_8’
simhash/table.cpp:2157: error: ‘__pyx_t_8’ was not declared in this scope
simhash/table.cpp:2157: error: ‘Simhash’ has not been declared
simhash/table.cpp:2158: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp: At global scope:
simhash/table.cpp:2245: error: ‘Simhash’ has not been declared
simhash/table.cpp:2245: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp: In function ‘PyObject
__pyx_f_7simhash_5table_7PyTable_insert(__pyx_obj_7simhash_5table_PyTable
, int)’:
simhash/table.cpp:2259: error: ‘__pyx_skip_dispatch’ was not declared in this scope
simhash/table.cpp:2266: error: ‘__pyx_v_h’ was not declared in this scope
simhash/table.cpp:2310: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp:2310: error: ‘__pyx_v_h’ was not declared in this scope
simhash/table.cpp: In function ‘PyObject* __pyx_pw_7simhash_5table_7PyTable_9insert(PyObject*, PyObject*)’:
simhash/table.cpp:2342: error: ‘Simhash’ has not been declared
simhash/table.cpp:2342: error: expected;' before ‘__pyx_v_h’ simhash/table.cpp:2350: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp:2350: error: ‘Simhash’ has not been declared simhash/table.cpp:2358: error: ‘Simhash’ has not been declared simhash/table.cpp:2358: error: expected )' before ‘__pyx_v_h’
simhash/table.cpp: At global scope:
simhash/table.cpp:2365: error: ‘Simhash’ has not been declared
simhash/table.cpp:2365: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp: In function ‘PyObject* __pyx_pf_7simhash_5table_7PyTable_8insert(__pyx_obj_7simhash_5table_PyTable*, int)’:
simhash/table.cpp:2374: error: ‘__pyx_v_h’ was not declared in this scope
simhash/table.cpp: In function ‘PyObject* __pyx_f_7simhash_5table_7PyTable_remove_bulk(__pyx_obj_7simhash_5table_PyTable*, PyObject*, int)’:
simhash/table.cpp:2411: error: ‘Simhash’ has not been declared
simhash/table.cpp:2411: error: expected;' before ‘__pyx_t_8’ simhash/table.cpp:2512: error: ‘__pyx_t_8’ was not declared in this scope simhash/table.cpp:2512: error: ‘Simhash’ has not been declared simhash/table.cpp:2513: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp: At global scope: simhash/table.cpp:2600: error: ‘Simhash’ has not been declared simhash/table.cpp:2600: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp: In function ‘PyObject\* __pyx_f_7simhash_5table_7PyTable_remove(__pyx_obj_7simhash_5table_PyTable_, int)’: simhash/table.cpp:2614: error: ‘__pyx_skip_dispatch’ was not declared in this scope simhash/table.cpp:2621: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp:2665: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp:2665: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp: In function ‘PyObject_ __pyx_pw_7simhash_5table_7PyTable_13remove(PyObject_, PyObject_)’: simhash/table.cpp:2697: error: ‘Simhash’ has not been declared simhash/table.cpp:2697: error: expected ;' before ‘__pyx_v_h’
simhash/table.cpp:2705: error: ‘__pyx_v_h’ was not declared in this scope
simhash/table.cpp:2705: error: ‘Simhash’ has not been declared
simhash/table.cpp:2713: error: ‘Simhash’ has not been declared
simhash/table.cpp:2713: error: expected)' before ‘__pyx_v_h’ simhash/table.cpp: At global scope: simhash/table.cpp:2720: error: ‘Simhash’ has not been declared simhash/table.cpp:2720: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp: In function ‘PyObject\* __pyx_pf_7simhash_5table_7PyTable_12remove(__pyx_obj_7simhash_5table_PyTable_, int)’: simhash/table.cpp:2729: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp: At global scope: simhash/table.cpp:2755: error: ‘Simhash’ has not been declared simhash/table.cpp:2755: error: expected initializer before ‘__pyx_f_7simhash_5table_7PyTable_find_first’ simhash/table.cpp:583: warning: inline function ‘PyObject_ __Pyx_GetModuleGlobalName(PyObject_)’ used but never defined simhash/table.cpp:586: warning: inline function ‘PyObject_ __Pyx_PyObject_Call(PyObject_, PyObject_, PyObject_)’ used but never defined simhash/table.cpp:317: warning: inline function ‘PyObject_ __Pyx_PyInt_FromSize_t(size_t)’ used but never defined simhash/table.cpp:725: warning: inline function ‘size_t __Pyx_PyInt_As_size_t(PyObject_)’ used but never defined simhash/table.cpp:278: warning: inline function ‘char_ __Pyx_PyObject_AsString(PyObject_)’ used but never defined simhash/table.cpp:316: warning: inline function ‘Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject_)’ used but never defined simhash/table.cpp:596: warning: inline function ‘void __Pyx_ExceptionSwap(PyObject**, PyObject**, PyObject**)’ used but never defined simhash/table.cpp:592: warning: inline function ‘void __Pyx_ErrFetch(PyObject**, PyObject**, PyObject**)’ used but never defined simhash/table.cpp:591: warning: inline function ‘void __Pyx_ErrRestore(PyObject_, PyObject_, PyObject_)’ used but never defined simhash/table.cpp:727: warning: inline function ‘PyObject_ __Pyx_PyInt_From_unsigned_PY_LONG_LONG(long long unsigned int)’ used but never defined simhash/table.cpp:729: warning: inline function ‘int64_t __Pyx_PyInt_As_int64_t(PyObject_)’ used but never defined simhash/table.cpp:731: warning: inline function ‘int __Pyx_PyInt_As_int(PyObject_)’ used but never defined simhash/table.cpp:723: warning: inline function ‘uint64_t __Pyx_PyInt_As_uint64_t(PyObject_)’ used but never defined simhash/table.cpp:614: warning: inline function ‘PyObject_ __Pyx_PyObject_CallOneArg(PyObject_, PyObject_)’ used but never defined simhash/table.cpp:617: warning: inline function ‘PyObject\* __Pyx_PyObject_CallNoArg(PyObject_)’ used but never defined simhash/table.cpp:719: warning: inline function ‘PyObject_ __Pyx_PyInt_From_uint64_t(uint64_t)’ used but never defined simhash/table.cpp:640: warning: inline function ‘int __Pyx_PyObject_Append(PyObject_, PyObject_)’ used but never defined simhash/table.cpp:279: warning: ‘char\* __Pyx_PyObject_AsStringAndSize(PyObject_, Py_ssize_t_)’ declared ‘static’ but never defined simhash/table.cpp:284: warning: ‘PyObject\* __Pyx_PyUnicode_FromString(const char_)’ declared ‘static’ but never defined simhash/table.cpp:314: warning: ‘int __Pyx_PyObject_IsTrue(PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:315: warning: ‘PyObject\* __Pyx_PyNumber_Int(PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:409: warning: ‘__pyx_m’ defined but not used simhash/table.cpp:410: warning: ‘__pyx_d’ defined but not used simhash/table.cpp:411: warning: ‘__pyx_b’ defined but not used simhash/table.cpp:412: warning: ‘__pyx_empty_tuple’ defined but not used simhash/table.cpp:413: warning: ‘__pyx_empty_bytes’ defined but not used simhash/table.cpp:414: warning: ‘__pyx_lineno’ defined but not used simhash/table.cpp:415: warning: ‘__pyx_clineno’ defined but not used simhash/table.cpp:416: warning: ‘__pyx_cfilenm’ defined but not used simhash/table.cpp:417: warning: ‘__pyx_filename’ defined but not used simhash/table.cpp:480: warning: ‘__pyx_vtabptr_7simhash_5table_PyTable’ defined but not used simhash/table.cpp:503: warning: ‘__pyx_vtabptr_7simhash_5table_PyCorpus’ defined but not used simhash/table.cpp:581: warning: ‘PyObject_ __Pyx_GetBuiltinName(PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:594: warning: ‘int __Pyx_GetException(PyObject__, PyObject__, PyObject__)’ declared ‘static’ but never defined simhash/table.cpp:598: warning: ‘void __Pyx_ExceptionSave(PyObject__, PyObject__, PyObject__)’ declared ‘static’ but never defined simhash/table.cpp:599: warning: ‘void __Pyx_ExceptionReset(PyObject_, PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:602: warning: ‘void __Pyx_RaiseArgtupleInvalid(const char_, int, Py_ssize_t, Py_ssize_t, Py_ssize_t)’ declared ‘static’ but never defined simhash/table.cpp:604: warning: ‘void __Pyx_RaiseDoubleKeywordsError(const char_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:606: warning: ‘int __Pyx_ParseOptionalKeywords(PyObject_, PyObject**_, PyObject_, PyObject**, Py_ssize_t, const char_)’ declared ‘static’ but never defined simhash/table.cpp:611: warning: ‘PyObject_ __Pyx_PyObject_CallMethO(PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:638: warning: ‘PyObject\* __Pyx_PyObject_CallMethod1(PyObject_, PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:644: warning: ‘void __Pyx_WriteUnraisable(const char_, int, int, const char_, int)’ declared ‘static’ but never defined simhash/table.cpp:677: warning: ‘PyObject_ __Pyx_GetItemInt_List_Fast(PyObject_, Py_ssize_t, int, int)’ declared ‘static’ but never defined simhash/table.cpp:683: warning: ‘PyObject_ __Pyx_GetItemInt_Tuple_Fast(PyObject_, Py_ssize_t, int, int)’ declared ‘static’ but never defined simhash/table.cpp:684: warning: ‘PyObject_ __Pyx_GetItemInt_Generic(PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:686: warning: ‘PyObject\* __Pyx_GetItemInt_Fast(PyObject_, Py_ssize_t, int, int, int)’ declared ‘static’ but never defined simhash/table.cpp:700: warning: ‘int __Pyx_SetVtable(PyObject_, void_)’ declared ‘static’ but never defined simhash/table.cpp:711: warning: ‘__pyx_code_cache’ defined but not used simhash/table.cpp:712: warning: ‘int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry_, int, int)’ declared ‘static’ but never defined simhash/table.cpp:713: warning: ‘PyCodeObject\* __pyx_find_code_object(int)’ declared ‘static’ but never defined simhash/table.cpp:714: warning: ‘void __pyx_insert_code_object(int, PyCodeObject_)’ declared ‘static’ but never defined simhash/table.cpp:717: warning: ‘void __Pyx_AddTraceback(const char_, int, int, const char_)’ declared ‘static’ but never defined simhash/table.cpp:721: warning: ‘PyObject_ __Pyx_Import(PyObject_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:733: warning: ‘PyObject\* __Pyx_PyInt_From_long(long int)’ declared ‘static’ but never defined simhash/table.cpp:735: warning: ‘long int __Pyx_PyInt_As_long(PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:737: warning: ‘int __Pyx_check_binary_version()’ declared ‘static’ but never defined simhash/table.cpp:739: warning: ‘int __Pyx_InitStrings(__Pyx_StringTabEntry_)’ declared ‘static’ but never defined simhash/table.cpp:747: warning: ‘PyObject\* __pyx_f_7simhash_5table_7PyTable_find_first_bulk(__pyx_obj_7simhash_5table_PyTable_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:748: warning: ‘PyObject\* __pyx_f_7simhash_5table_7PyTable_find_all(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:749: warning: ‘PyObject_ __pyx_f_7simhash_5table_7PyTable_find_all_bulk(__pyx_obj_7simhash_5table_PyTable_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:750: warning: ‘PyObject\* __pyx_f_7simhash_5table_7PyTable_permute(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:751: warning: ‘PyObject_ __pyx_f_7simhash_5table_7PyTable_unpermute(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:752: warning: ‘PyObject_ __pyx_f_7simhash_5table_8PyCorpus_hashes(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:753: warning: ‘PyObject_ __pyx_f_7simhash_5table_8PyCorpus_insert_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:754: warning: ‘PyObject\* __pyx_f_7simhash_5table_8PyCorpus_insert(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:755: warning: ‘PyObject\* __pyx_f_7simhash_5table_8PyCorpus_remove_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:756: warning: ‘PyObject\* __pyx_f_7simhash_5table_8PyCorpus_remove(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:757: warning: ‘PyObject\* __pyx_f_7simhash_5table_8PyCorpus_find_first_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:758: warning: ‘PyObject\* __pyx_f_7simhash_5table_8PyCorpus_find_first(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:759: warning: ‘PyObject_ __pyx_f_7simhash_5table_8PyCorpus_find_all_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_, int)’ declared ‘static’ but never defined simhash/table.cpp:760: warning: ‘PyObject\* __pyx_f_7simhash_5table_8PyCorpus_find_all(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:761: warning: ‘PyObject_ __pyx_f_7simhash_5table_8PyCorpus_distance(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:772: warning: ‘__pyx_ptype_7simhash_5table_PyTable’ defined but not used simhash/table.cpp:773: warning: ‘__pyx_ptype_7simhash_5table_PyCorpus’ defined but not used simhash/table.cpp:780: warning: ‘__pyx_builtin_MemoryError’ defined but not used simhash/table.cpp:782: warning: ‘__pyx_builtin_range’ defined but not used simhash/table.cpp:792: warning: ‘PyObject_ __pyx_pf_7simhash_5table_7PyTable_14find_first(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:793: warning: ‘PyObject_ __pyx_pf_7simhash_5table_7PyTable_16find_first_bulk(__pyx_obj_7simhash_5table_PyTable_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:794: warning: ‘PyObject\* __pyx_pf_7simhash_5table_7PyTable_18find_all(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:795: warning: ‘PyObject_ __pyx_pf_7simhash_5table_7PyTable_20find_all_bulk(__pyx_obj_7simhash_5table_PyTable_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:796: warning: ‘PyObject\* __pyx_pf_7simhash_5table_7PyTable_22permute(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:797: warning: ‘PyObject_ __pyx_pf_7simhash_5table_7PyTable_24unpermute(__pyx_obj_7simhash_5table_PyTable_, int)’ declared ‘static’ but never defined simhash/table.cpp:798: warning: ‘PyObject_ __pyx_pf_7simhash_5table_7PyTable_11search_mask___get**(**pyx_obj_7simhash_5table_PyTable*)’ declared ‘static’ but never defined simhash/table.cpp:799: warning: ‘int __pyx_pf_7simhash_5table_8PyCorpus___init**(**pyx_obj_7simhash_5table_PyCorpus_, PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:800: warning: ‘PyObject_ __pyx_pf_7simhash_5table_8PyCorpus_2hashes(__pyx_obj_7simhash_5table_PyCorpus_)’ declared ‘static’ but never defined simhash/table.cpp:801: warning: ‘PyObject_ __pyx_pf_7simhash_5table_8PyCorpus_4insert_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:802: warning: ‘PyObject\* __pyx_pf_7simhash_5table_8PyCorpus_6insert(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:803: warning: ‘PyObject\* __pyx_pf_7simhash_5table_8PyCorpus_8remove_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:804: warning: ‘PyObject\* __pyx_pf_7simhash_5table_8PyCorpus_10remove(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:805: warning: ‘PyObject\* __pyx_pf_7simhash_5table_8PyCorpus_12find_first_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:806: warning: ‘PyObject\* __pyx_pf_7simhash_5table_8PyCorpus_14find_first(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:807: warning: ‘PyObject_ __pyx_pf_7simhash_5table_8PyCorpus_16find_all_bulk(__pyx_obj_7simhash_5table_PyCorpus_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:808: warning: ‘PyObject\* __pyx_pf_7simhash_5table_8PyCorpus_18find_all(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:809: warning: ‘PyObject_ __pyx_pf_7simhash_5table_8PyCorpus_20distance(__pyx_obj_7simhash_5table_PyCorpus_, int)’ declared ‘static’ but never defined simhash/table.cpp:810: warning: ‘PyObject_ __pyx_pf_7simhash_5table_8PyCorpus_6tables___get**(**pyx_obj_7simhash_5table_PyCorpus_)’ declared ‘static’ but never defined simhash/table.cpp:811: warning: ‘PyObject_ __pyx_pf_7simhash_5table_8PyCorpus_14differing_bits___get**(**pyx_obj_7simhash_5table_PyCorpus_)’ declared ‘static’ but never defined simhash/table.cpp:812: warning: ‘PyObject_ __pyx_tp_new_7simhash_5table_PyTable(PyTypeObject_, PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:813: warning: ‘PyObject_ __pyx_tp_new_7simhash_5table_PyCorpus(PyTypeObject_, PyObject_, PyObject_)’ declared ‘static’ but never defined simhash/table.cpp:814: warning: ‘__pyx_k_W’ defined but not used simhash/table.cpp:815: warning: ‘__pyx_k_a’ defined but not used simhash/table.cpp:816: warning: ‘__pyx_k_b’ defined but not used simhash/table.cpp:817: warning: ‘__pyx_k_d’ defined but not used simhash/table.cpp:818: warning: ‘__pyx_k__2’ defined but not used simhash/table.cpp:819: warning: ‘__pyx_k_re’ defined but not used simhash/table.cpp:820: warning: ‘__pyx_k_main’ defined but not used simhash/table.cpp:821: warning: ‘__pyx_k_test’ defined but not used simhash/table.cpp:822: warning: ‘__pyx_k_flags’ defined but not used simhash/table.cpp:823: warning: ‘__pyx_k_range’ defined but not used simhash/table.cpp:824: warning: ‘__pyx_k_split’ defined but not used simhash/table.cpp:825: warning: ‘__pyx_k_utf_8’ defined but not used simhash/table.cpp:826: warning: ‘__pyx_k_append’ defined but not used simhash/table.cpp:827: warning: ‘__pyx_k_encode’ defined but not used simhash/table.cpp:828: warning: ‘__pyx_k_hashes’ defined but not used simhash/table.cpp:829: warning: ‘__pyx_k_import’ defined but not used simhash/table.cpp:830: warning: ‘__pyx_k_insert’ defined but not used simhash/table.cpp:831: warning: ‘__pyx_k_remove’ defined but not used simhash/table.cpp:832: warning: ‘__pyx_k_xrange’ defined but not used simhash/table.cpp:833: warning: ‘__pyx_k_UNICODE’ defined but not used simhash/table.cpp:834: warning: ‘__pyx_k_permute’ defined but not used simhash/table.cpp:835: warning: ‘__pyx_k_distance’ defined but not used simhash/table.cpp:836: warning: ‘__pyx_k_find_all’ defined but not used simhash/table.cpp:837: warning: ‘__pyx_k_diff_bits’ defined but not used simhash/table.cpp:838: warning: ‘__pyx_k_itertools’ defined but not used simhash/table.cpp:839: warning: ‘__pyx_k_permutors’ defined but not used simhash/table.cpp:840: warning: ‘__pyx_k_unpermute’ defined but not used simhash/table.cpp:841: warning: ‘__pyx_k_find_first’ defined but not used simhash/table.cpp:842: warning: ‘__pyx_k_num_blocks’ defined but not used simhash/table.cpp:843: warning: ‘__pyx_k_pyx_vtable’ defined but not used simhash/table.cpp:844: warning: ‘__pyx_k_MemoryError’ defined but not used simhash/table.cpp:845: warning: ‘__pyx_k_insert_bulk’ defined but not used simhash/table.cpp:846: warning: ‘__pyx_k_remove_bulk’ defined but not used simhash/table.cpp:847: warning: ‘__pyx_k_combinations’ defined but not used simhash/table.cpp:848: warning: ‘__pyx_k_find_all_bulk’ defined but not used simhash/table.cpp:849: warning: ‘__pyx_k_find_first_bulk’ defined but not used simhash/table.cpp:850: warning: ‘__pyx_n_s_MemoryError’ defined but not used simhash/table.cpp:854: warning: ‘__pyx_n_s_a’ defined but not used simhash/table.cpp:855: warning: ‘__pyx_n_s_append’ defined but not used simhash/table.cpp:856: warning: ‘__pyx_n_s_b’ defined but not used simhash/table.cpp:857: warning: ‘__pyx_n_s_combinations’ defined but not used simhash/table.cpp:859: warning: ‘__pyx_n_s_diff_bits’ defined but not used simhash/table.cpp:860: warning: ‘__pyx_n_s_distance’ defined but not used simhash/table.cpp:862: warning: ‘__pyx_n_s_find_all’ defined but not used simhash/table.cpp:863: warning: ‘__pyx_n_s_find_all_bulk’ defined but not used simhash/table.cpp:864: warning: ‘__pyx_n_s_find_first’ defined but not used simhash/table.cpp:865: warning: ‘__pyx_n_s_find_first_bulk’ defined but not used simhash/table.cpp:868: warning: ‘__pyx_n_s_import’ defined but not used simhash/table.cpp:871: warning: ‘__pyx_n_s_itertools’ defined but not used simhash/table.cpp:872: warning: ‘__pyx_n_s_main’ defined but not used simhash/table.cpp:873: warning: ‘__pyx_n_s_num_blocks’ defined but not used simhash/table.cpp:874: warning: ‘__pyx_n_s_permute’ defined but not used simhash/table.cpp:876: warning: ‘__pyx_n_s_pyx_vtable’ defined but not used simhash/table.cpp:877: warning: ‘__pyx_n_s_range’ defined but not used simhash/table.cpp:882: warning: ‘__pyx_n_s_test’ defined but not used simhash/table.cpp:883: warning: ‘__pyx_n_s_unpermute’ defined but not used simhash/table.cpp:884: warning: ‘__pyx_kp_s_utf_8’ defined but not used simhash/table.cpp:885: warning: ‘__pyx_n_s_xrange’ defined but not used simhash/table.cpp:886: warning: ‘__pyx_int_0’ defined but not used simhash/table.cpp:888: warning: ‘__pyx_int_64’ defined but not used simhash/table.cpp:1265: warning: ‘PyObject_ __pyx_pw_7simhash_5table_1PyHash(PyObject_, PyObject_)’ defined but not used simhash/table.cpp:1264: warning: ‘__pyx_doc_7simhash_5table_PyHash’ defined but not used simhash/table.cpp:1575: warning: ‘PyObject\* __pyx_pw_7simhash_5table_3PyHashFp(PyObject_, PyObject_)’ defined but not used simhash/table.cpp:1574: warning: ‘__pyx_doc_7simhash_5table_2PyHashFp’ defined but not used simhash/table.cpp:1622: warning: ‘int __pyx_pw_7simhash_5table_7PyTable_1__cinit**(PyObject_, PyObject_, PyObject_)’ defined but not used simhash/table.cpp:1815: warning: ‘void **pyx_pw_7simhash_5table_7PyTable_3__dealloc**(PyObject_)’ defined but not used simhash/table.cpp:2198: warning: ‘**pyx_doc_7simhash_5table_7PyTable_6insert_bulk’ defined but not used simhash/table.cpp:2340: warning: ‘__pyx_doc_7simhash_5table_7PyTable_8insert’ defined but not used simhash/table.cpp:2553: warning: ‘__pyx_doc_7simhash_5table_7PyTable_10remove_bulk’ defined but not used simhash/table.cpp:2695: warning: ‘__pyx_doc_7simhash_5table_7PyTable_12remove’ defined but not used simhash/table.cpp:2754: warning: ‘PyObject\* __pyx_pw_7simhash_5table_7PyTable_15find_first(PyObject_, PyObject_)’ declared ‘static’ but never defined cc1plus: warning: command line option "-Wstrict-prototypes" is valid for C/ObjC but not for C++ simhash/table.cpp:235:37: error: simhash-cpp/src/simhash.h: No such file or directory simhash/table.cpp:238:36: error: simhash-cpp/src/hash.hpp: No such file or directory simhash/table.cpp:438: error: ‘Simhash’ has not been declared simhash/table.cpp:438: error: ISO C++ forbids declaration of ‘Table’ with no type simhash/table.cpp:438: error: expected ‘;’ before ‘_’ token simhash/table.cpp:439: error: ‘Simhash’ has not been declared simhash/table.cpp:439: error: ISO C++ forbids declaration of ‘hash_t’ with no type simhash/table.cpp:439: error: expected ‘;’ before ‘search_mask’ simhash/table.cpp:470: error: ‘Simhash’ has not been declared simhash/table.cpp:472: error: ‘Simhash’ has not been declared simhash/table.cpp:473: error: ‘Simhash’ has not been declared simhash/table.cpp:473: error: expected identifier before ‘_’ token simhash/table.cpp:473: error: ‘Simhash’ has not been declared simhash/table.cpp:473: error: ISO C++ forbids declaration of ‘hash_t’ with no type simhash/table.cpp:473: error: ‘hash_t’ declared as function returning a function simhash/table.cpp:475: error: ‘Simhash’ has not been declared simhash/table.cpp:477: error: ‘Simhash’ has not been declared simhash/table.cpp:478: error: ‘Simhash’ has not been declared simhash/table.cpp:498: error: ‘Simhash’ has not been declared simhash/table.cpp:500: error: ‘Simhash’ has not been declared simhash/table.cpp:501: error: ‘Simhash’ has not been declared simhash/table.cpp:501: error: ‘Simhash’ has not been declared simhash/table.cpp:743: error: ‘Simhash’ has not been declared simhash/table.cpp:743: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp:745: error: ‘Simhash’ has not been declared simhash/table.cpp:745: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp:746: error: ‘Simhash’ has not been declared simhash/table.cpp:746: error: expected initializer before ‘__pyx_f_7simhash_5table_7PyTable_find_first’ simhash/table.cpp:748: error: ‘Simhash’ has not been declared simhash/table.cpp:748: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:750: error: ‘Simhash’ has not been declared simhash/table.cpp:750: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:751: error: ‘Simhash’ has not been declared simhash/table.cpp:751: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:758: error: ‘Simhash’ has not been declared simhash/table.cpp:758: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:760: error: ‘Simhash’ has not been declared simhash/table.cpp:760: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:761: error: ‘Simhash’ has not been declared simhash/table.cpp:761: error: expected ‘,’ or ‘...’ before ‘__pyx_v_a’ simhash/table.cpp:789: error: ‘Simhash’ has not been declared simhash/table.cpp:789: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp:791: error: ‘Simhash’ has not been declared simhash/table.cpp:791: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp:792: error: ‘Simhash’ has not been declared simhash/table.cpp:792: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:794: error: ‘Simhash’ has not been declared simhash/table.cpp:794: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:796: error: ‘Simhash’ has not been declared simhash/table.cpp:796: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:797: error: ‘Simhash’ has not been declared simhash/table.cpp:797: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:806: error: ‘Simhash’ has not been declared simhash/table.cpp:806: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:808: error: ‘Simhash’ has not been declared simhash/table.cpp:808: error: expected ‘,’ or ‘...’ before ‘__pyx_v_query’ simhash/table.cpp:809: error: ‘Simhash’ has not been declared simhash/table.cpp:809: error: expected ‘,’ or ‘...’ before ‘__pyx_v_a’ simhash/table.cpp: In function ‘PyObject\* __pyx_f_7simhash_5table_PyHash(PyObject_, int)’: simhash/table.cpp:901: error: ‘Simhash’ has not been declared simhash/table.cpp:901: error: ‘Simhash’ has not been declared simhash/table.cpp:901: error: ‘__pyx_v_hasher’ was not declared in this scope simhash/table.cpp: In function ‘PyObject_ __pyx_f_7simhash_5table_PyHashFp(PyObject*, int)’: simhash/table.cpp:1312: error: ‘Simhash’ has not been declared simhash/table.cpp:1312: error: ‘Simhash’ has not been declared simhash/table.cpp:1312: error: ‘__pyx_v_hasher’ was not declared in this scope simhash/table.cpp: In function ‘int __pyx_pf_7simhash_5table_7PyTable___cinit**(**pyx_obj_7simhash_5table_PyTable_, PyObject_, PyObject*)’: simhash/table.cpp:1682: error: ‘Simhash’ was not declared in this scope simhash/table.cpp:1682: error: template argument 1 is invalid simhash/table.cpp:1682: error: template argument 2 is invalid simhash/table.cpp:1682: error: invalid type in declaration before ‘;’ token simhash/table.cpp:1690: error: ‘Simhash’ is not a class or namespace simhash/table.cpp:1690: error: expected ;' before ‘__pyx_t_5’
simhash/table.cpp:1751: error: ‘__pyx_t_5’ was not declared in this scope
simhash/table.cpp:1751: error: ‘Simhash’ is not a class or namespace
simhash/table.cpp:1752: error: request for member ‘push_back’ in ‘__pyx_v_perms’, which is of non-class type ‘int’
simhash/table.cpp:1772: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp:1772: error: expected type-specifier before ‘Simhash’
simhash/table.cpp:1772: error: expected;' before ‘Simhash’ simhash/table.cpp:1781: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘search_mask’ simhash/table.cpp:1781: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp: In function ‘void __pyx_pf_7simhash_5table_7PyTable_2__dealloc**(**pyx_obj_7simhash_5table_PyTable_)’: simhash/table.cpp:1835: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp: In function ‘PyObject_ __pyx_f_7simhash_5table_7PyTable_hashes(__pyx_obj_7simhash_5table_PyTable_, int)’: simhash/table.cpp:1860: error: ‘Simhash’ has not been declared simhash/table.cpp:1860: error: expected ;' before ‘__pyx_v_it’
simhash/table.cpp:1927: error: ‘__pyx_v_it’ was not declared in this scope
simhash/table.cpp:1927: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp:1937: error: ‘struct _pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp: In function ‘PyObject
__pyx_f_7simhash_5table_7PyTable_insert_bulk(pyx_obj_7simhash_5table_PyTable, PyObject, int)’:
simhash/table.cpp:2056: error: ‘Simhash’ has not been declared
simhash/table.cpp:2056: error: expected ;' before ‘__pyx_t_8’ simhash/table.cpp:2157: error: ‘__pyx_t_8’ was not declared in this scope simhash/table.cpp:2157: error: ‘Simhash’ has not been declared simhash/table.cpp:2158: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp: At global scope: simhash/table.cpp:2245: error: ‘Simhash’ has not been declared simhash/table.cpp:2245: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp: In function ‘PyObject* __pyx_f_7simhash_5table_7PyTable_insert(__pyx_obj_7simhash_5table_PyTable*, int)’: simhash/table.cpp:2259: error: ‘__pyx_skip_dispatch’ was not declared in this scope simhash/table.cpp:2266: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp:2310: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’ simhash/table.cpp:2310: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp: In function ‘PyObject* __pyx_pw_7simhash_5table_7PyTable_9insert(PyObject*, PyObject*)’: simhash/table.cpp:2342: error: ‘Simhash’ has not been declared simhash/table.cpp:2342: error: expected;' before ‘__pyx_v_h’
simhash/table.cpp:2350: error: ‘__pyx_v_h’ was not declared in this scope
simhash/table.cpp:2350: error: ‘Simhash’ has not been declared
simhash/table.cpp:2358: error: ‘Simhash’ has not been declared
simhash/table.cpp:2358: error: expected )' before ‘__pyx_v_h’ simhash/table.cpp: At global scope: simhash/table.cpp:2365: error: ‘Simhash’ has not been declared simhash/table.cpp:2365: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’ simhash/table.cpp: In function ‘PyObject* __pyx_pf_7simhash_5table_7PyTable_8insert(__pyx_obj_7simhash_5table_PyTable*, int)’: simhash/table.cpp:2374: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp: In function ‘PyObject* __pyx_f_7simhash_5table_7PyTable_remove_bulk(__pyx_obj_7simhash_5table_PyTable*, PyObject*, int)’: simhash/table.cpp:2411: error: ‘Simhash’ has not been declared simhash/table.cpp:2411: error: expected;' before ‘__pyx_t_8’
simhash/table.cpp:2512: error: ‘__pyx_t_8’ was not declared in this scope
simhash/table.cpp:2512: error: ‘Simhash’ has not been declared
simhash/table.cpp:2513: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp: At global scope:
simhash/table.cpp:2600: error: ‘Simhash’ has not been declared
simhash/table.cpp:2600: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp: In function ‘PyObject* __pyx_f_7simhash_5table_7PyTable_remove(_pyx_obj_7simhash_5table_PyTable, int)’:
simhash/table.cpp:2614: error: ‘__pyx_skip_dispatch’ was not declared in this scope
simhash/table.cpp:2621: error: ‘__pyx_v_h’ was not declared in this scope
simhash/table.cpp:2665: error: ‘struct __pyx_obj_7simhash_5table_PyTable’ has no member named ‘tbl’
simhash/table.cpp:2665: error: ‘_pyx_v_h’ was not declared in this scope
simhash/table.cpp: In function ‘PyObject
pyx_pw_7simhash_5table_7PyTable_13remove(PyObject, PyObject)’:
simhash/table.cpp:2697: error: ‘Simhash’ has not been declared
simhash/table.cpp:2697: error: expected ;' before ‘__pyx_v_h’ simhash/table.cpp:2705: error: ‘__pyx_v_h’ was not declared in this scope simhash/table.cpp:2705: error: ‘Simhash’ has not been declared simhash/table.cpp:2713: error: ‘Simhash’ has not been declared simhash/table.cpp:2713: error: expected)' before ‘__pyx_v_h’
simhash/table.cpp: At global scope:
simhash/table.cpp:2720: error: ‘Simhash’ has not been declared
simhash/table.cpp:2720: error: expected ‘,’ or ‘...’ before ‘__pyx_v_h’
simhash/table.cpp: In function ‘PyObject* __pyx_pf_7simhash_5table_7PyTable_12remove(_pyx_obj_7simhash_5table_PyTable, int)’:
simhash/table.cpp:2729: error: ‘_pyx_v_h’ was not declared in this scope
simhash/table.cpp: At global scope:
simhash/table.cpp:2755: error: ‘Simhash’ has not been declared
simhash/table.cpp:2755: error: expected initializer before ‘pyx_f_7simhash_5table_7PyTable_find_first’
simhash/table.cpp:583: warning: inline function ‘PyObject
Pyx_GetModuleGlobalName(PyObject)’ used but never defined
simhash/table.cpp:586: warning: inline function ‘PyObject
Pyx_PyObject_Call(PyObject, PyObject, PyObject
)’ used but never defined
simhash/table.cpp:317: warning: inline function ‘PyObject
__Pyx_PyInt_FromSize_t(size_t)’ used but never defined
simhash/table.cpp:725: warning: inline function ‘size_t Pyx_PyInt_As_size_t(PyObject)’ used but never defined
simhash/table.cpp:278: warning: inline function ‘char
_Pyx_PyObject_AsString(PyObject)’ used but never defined
simhash/table.cpp:316: warning: inline function ‘Py_ssize_t _Pyx_PyIndex_AsSsize_t(PyObject)’ used but never defined
simhash/table.cpp:596: warning: inline function ‘void __Pyx_ExceptionSwap(PyObject**, PyObject**, PyObject**)’ used but never defined
simhash/table.cpp:592: warning: inline function ‘void Pyx_ErrFetch(PyObject**, PyObject**, PyObject**)’ used but never defined
simhash/table.cpp:591: warning: inline function ‘void Pyx_ErrRestore(PyObject, PyObject, PyObject
)’ used but never defined
simhash/table.cpp:727: warning: inline function ‘PyObject
__Pyx_PyInt_From_unsigned_PY_LONG_LONG(long long unsigned int)’ used but never defined
simhash/table.cpp:729: warning: inline function ‘int64_t _Pyx_PyInt_As_int64_t(PyObject)’ used but never defined
simhash/table.cpp:731: warning: inline function ‘int _Pyx_PyInt_As_int(PyObject)’ used but never defined
simhash/table.cpp:723: warning: inline function ‘uint64_t Pyx_PyInt_As_uint64_t(PyObject)’ used but never defined
simhash/table.cpp:614: warning: inline function ‘PyObject
Pyx_PyObject_CallOneArg(PyObject, PyObject)’ used but never defined
simhash/table.cpp:617: warning: inline function ‘PyObject* Pyx_PyObject_CallNoArg(PyObject)’ used but never defined
simhash/table.cpp:719: warning: inline function ‘PyObject
__Pyx_PyInt_From_uint64_t(uint64_t)’ used but never defined
simhash/table.cpp:640: warning: inline function ‘int Pyx_PyObject_Append(PyObject, PyObject)’ used but never defined
simhash/table.cpp:279: warning: ‘char* Pyx_PyObject_AsStringAndSize(PyObject, Py_ssize_t)’ declared ‘static’ but never defined
simhash/table.cpp:284: warning: ‘PyObject* _Pyx_PyUnicode_FromString(const char)’ declared ‘static’ but never defined
simhash/table.cpp:314: warning: ‘int _Pyx_PyObject_IsTrue(PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:315: warning: ‘PyObject* _Pyx_PyNumber_Int(PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:409: warning: ‘__pyx_m’ defined but not used
simhash/table.cpp:410: warning: ‘__pyx_d’ defined but not used
simhash/table.cpp:411: warning: ‘__pyx_b’ defined but not used
simhash/table.cpp:412: warning: ‘__pyx_empty_tuple’ defined but not used
simhash/table.cpp:413: warning: ‘__pyx_empty_bytes’ defined but not used
simhash/table.cpp:414: warning: ‘__pyx_lineno’ defined but not used
simhash/table.cpp:415: warning: ‘_pyx_clineno’ defined but not used
simhash/table.cpp:416: warning: ‘pyx_cfilenm’ defined but not used
simhash/table.cpp:417: warning: ‘pyx_filename’ defined but not used
simhash/table.cpp:480: warning: ‘pyx_vtabptr_7simhash_5table_PyTable’ defined but not used
simhash/table.cpp:503: warning: ‘pyx_vtabptr_7simhash_5table_PyCorpus’ defined but not used
simhash/table.cpp:581: warning: ‘PyObject
Pyx_GetBuiltinName(PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:594: warning: ‘int Pyx_GetException(PyObject, PyObject
, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:598: warning: ‘void Pyx_ExceptionSave(PyObject, PyObject, PyObject
)’ declared ‘static’ but never defined
simhash/table.cpp:599: warning: ‘void Pyx_ExceptionReset(PyObject, PyObject, PyObject
)’ declared ‘static’ but never defined
simhash/table.cpp:602: warning: ‘void _Pyx_RaiseArgtupleInvalid(const char, int, Py_ssize_t, Py_ssize_t, Py_ssize_t)’ declared ‘static’ but never defined
simhash/table.cpp:604: warning: ‘void Pyx_RaiseDoubleKeywordsError(const char, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:606: warning: ‘int _Pyx_ParseOptionalKeywords(PyObject, PyObject**, PyObject, PyObject**, Py_ssize_t, const char_)’ declared ‘static’ but never defined
simhash/table.cpp:611: warning: ‘PyObject_ Pyx_PyObject_CallMethO(PyObject, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:638: warning: ‘PyObject* Pyx_PyObject_CallMethod1(PyObject, PyObject, PyObject_)’ declared ‘static’ but never defined
simhash/table.cpp:644: warning: ‘void Pyx_WriteUnraisable(const char, int, int, const char, int)’ declared ‘static’ but never defined
simhash/table.cpp:677: warning: ‘PyObject_ Pyx_GetItemInt_List_Fast(PyObject, Py_ssize_t, int, int)’ declared ‘static’ but never defined
simhash/table.cpp:683: warning: ‘PyObject
Pyx_GetItemInt_Tuple_Fast(PyObject, Py_ssize_t, int, int)’ declared ‘static’ but never defined
simhash/table.cpp:684: warning: ‘PyObject
Pyx_GetItemInt_Generic(PyObject, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:686: warning: ‘PyObject* _Pyx_GetItemInt_Fast(PyObject, Py_ssize_t, int, int, int)’ declared ‘static’ but never defined
simhash/table.cpp:700: warning: ‘int Pyx_SetVtable(PyObject, void)’ declared ‘static’ but never defined
simhash/table.cpp:711: warning: ‘__pyx_code_cache’ defined but not used
simhash/table.cpp:712: warning: ‘int __pyx_bisect_code_objects(_Pyx_CodeObjectCacheEntry, int, int)’ declared ‘static’ but never defined
simhash/table.cpp:713: warning: ‘PyCodeObject* __pyx_find_code_object(int)’ declared ‘static’ but never defined
simhash/table.cpp:714: warning: ‘void pyx_insert_code_object(int, PyCodeObject)’ declared ‘static’ but never defined
simhash/table.cpp:717: warning: ‘void Pyx_AddTraceback(const char, int, int, const char)’ declared ‘static’ but never defined
simhash/table.cpp:721: warning: ‘PyObject
Pyx_Import(PyObject, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:733: warning: ‘PyObject* __Pyx_PyInt_From_long(long int)’ declared ‘static’ but never defined
simhash/table.cpp:735: warning: ‘long int _Pyx_PyInt_As_long(PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:737: warning: ‘int __Pyx_check_binary_version()’ declared ‘static’ but never defined
simhash/table.cpp:739: warning: ‘int __Pyx_InitStrings(_Pyx_StringTabEntry)’ declared ‘static’ but never defined
simhash/table.cpp:747: warning: ‘PyObject* __pyx_f_7simhash_5table_7PyTable_find_first_bulk(pyx_obj_7simhash_5table_PyTable, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:748: warning: ‘PyObject* __pyx_f_7simhash_5table_7PyTable_find_all(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:749: warning: ‘PyObject
__pyx_f_7simhash_5table_7PyTable_find_all_bulk(pyx_obj_7simhash_5table_PyTable, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:750: warning: ‘PyObject* __pyx_f_7simhash_5table_7PyTable_permute(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:751: warning: ‘PyObject
__pyx_f_7simhash_5table_7PyTable_unpermute(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:752: warning: ‘PyObject
__pyx_f_7simhash_5table_8PyCorpus_hashes(pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:753: warning: ‘PyObject
__pyx_f_7simhash_5table_8PyCorpus_insert_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:754: warning: ‘PyObject* __pyx_f_7simhash_5table_8PyCorpus_insert(pyx_obj_7simhash_5table_PyCorpus, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:755: warning: ‘PyObject* __pyx_f_7simhash_5table_8PyCorpus_remove_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:756: warning: ‘PyObject* __pyx_f_7simhash_5table_8PyCorpus_remove(pyx_obj_7simhash_5table_PyCorpus, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:757: warning: ‘PyObject* __pyx_f_7simhash_5table_8PyCorpus_find_first_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:758: warning: ‘PyObject* __pyx_f_7simhash_5table_8PyCorpus_find_first(pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:759: warning: ‘PyObject
__pyx_f_7simhash_5table_8PyCorpus_find_all_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject, int)’ declared ‘static’ but never defined
simhash/table.cpp:760: warning: ‘PyObject* __pyx_f_7simhash_5table_8PyCorpus_find_all(pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:761: warning: ‘PyObject
__pyx_f_7simhash_5table_8PyCorpus_distance(_pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:772: warning: ‘__pyx_ptype_7simhash_5table_PyTable’ defined but not used
simhash/table.cpp:773: warning: ‘__pyx_ptype_7simhash_5table_PyCorpus’ defined but not used
simhash/table.cpp:780: warning: ‘__pyx_builtin_MemoryError’ defined but not used
simhash/table.cpp:782: warning: ‘_pyx_builtin_range’ defined but not used
simhash/table.cpp:792: warning: ‘PyObject
__pyx_pf_7simhash_5table_7PyTable_14find_first(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:793: warning: ‘PyObject
__pyx_pf_7simhash_5table_7PyTable_16find_first_bulk(pyx_obj_7simhash_5table_PyTable, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:794: warning: ‘PyObject* __pyx_pf_7simhash_5table_7PyTable_18find_all(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:795: warning: ‘PyObject
__pyx_pf_7simhash_5table_7PyTable_20find_all_bulk(pyx_obj_7simhash_5table_PyTable, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:796: warning: ‘PyObject* __pyx_pf_7simhash_5table_7PyTable_22permute(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:797: warning: ‘PyObject
pyx_pf_7simhash_5table_7PyTable_24unpermute(pyx_obj_7simhash_5table_PyTable, int)’ declared ‘static’ but never defined
simhash/table.cpp:798: warning: ‘PyObject
pyx_pf_7simhash_5table_7PyTable_11search_mask___get**(pyx_obj_7simhash_5table_PyTable)’ declared ‘static’ but never defined
simhash/table.cpp:799: warning: ‘int __pyx_pf_7simhash_5table_8PyCorpus___init
*(**pyx_obj_7simhash_5table_PyCorpus
, PyObject
, PyObject
)’ declared ‘static’ but never defined
simhash/table.cpp:800: warning: ‘PyObject
__pyx_pf_7simhash_5table_8PyCorpus_2hashes(pyx_obj_7simhash_5table_PyCorpus)’ declared ‘static’ but never defined
simhash/table.cpp:801: warning: ‘PyObject
__pyx_pf_7simhash_5table_8PyCorpus_4insert_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:802: warning: ‘PyObject* __pyx_pf_7simhash_5table_8PyCorpus_6insert(pyx_obj_7simhash_5table_PyCorpus, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:803: warning: ‘PyObject* __pyx_pf_7simhash_5table_8PyCorpus_8remove_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:804: warning: ‘PyObject* __pyx_pf_7simhash_5table_8PyCorpus_10remove(pyx_obj_7simhash_5table_PyCorpus, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:805: warning: ‘PyObject* __pyx_pf_7simhash_5table_8PyCorpus_12find_first_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:806: warning: ‘PyObject* __pyx_pf_7simhash_5table_8PyCorpus_14find_first(pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:807: warning: ‘PyObject
__pyx_pf_7simhash_5table_8PyCorpus_16find_all_bulk(pyx_obj_7simhash_5table_PyCorpus, PyObject)’ declared ‘static’ but never defined
simhash/table.cpp:808: warning: ‘PyObject* _pyx_pf_7simhash_5table_8PyCorpus_18find_all(pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:809: warning: ‘PyObject
pyx_pf_7simhash_5table_8PyCorpus_20distance(pyx_obj_7simhash_5table_PyCorpus, int)’ declared ‘static’ but never defined
simhash/table.cpp:810: warning: ‘PyObject
pyx_pf_7simhash_5table_8PyCorpus_6tables___get**(**pyx_obj_7simhash_5table_PyCorpus)’ declared ‘static’ but never defined
simhash/table.cpp:811: warning: ‘PyObject
pyx_pf_7simhash_5table_8PyCorpus_14differing_bits___get**(**pyx_obj_7simhash_5table_PyCorpus)’ declared ‘static’ but never defined
simhash/table.cpp:812: warning: ‘PyObject
pyx_tp_new_7simhash_5table_PyTable(PyTypeObject, PyObject, PyObject
)’ declared ‘static’ but never defined
simhash/table.cpp:813: warning: ‘PyObject
pyx_tp_new_7simhash_5table_PyCorpus(PyTypeObject, PyObject, PyObject
)’ declared ‘static’ but never defined
simhash/table.cpp:814: warning: ‘__pyx_k_W’ defined but not used
simhash/table.cpp:815: warning: ‘__pyx_k_a’ defined but not used
simhash/table.cpp:816: warning: ‘__pyx_k_b’ defined but not used
simhash/table.cpp:817: warning: ‘__pyx_k_d’ defined but not used
simhash/table.cpp:818: warning: ‘__pyx_k__2’ defined but not used
simhash/table.cpp:819: warning: ‘__pyx_k_re’ defined but not used
simhash/table.cpp:820: warning: ‘__pyx_k_main’ defined but not used
simhash/table.cpp:821: warning: ‘__pyx_k_test’ defined but not used
simhash/table.cpp:822: warning: ‘__pyx_k_flags’ defined but not used
simhash/table.cpp:823: warning: ‘__pyx_k_range’ defined but not used
simhash/table.cpp:824: warning: ‘__pyx_k_split’ defined but not used
simhash/table.cpp:825: warning: ‘__pyx_k_utf_8’ defined but not used
simhash/table.cpp:826: warning: ‘__pyx_k_append’ defined but not used
simhash/table.cpp:827: warning: ‘__pyx_k_encode’ defined but not used
simhash/table.cpp:828: warning: ‘__pyx_k_hashes’ defined but not used
simhash/table.cpp:829: warning: ‘__pyx_k_import’ defined but not used
simhash/table.cpp:830: warning: ‘__pyx_k_insert’ defined but not used
simhash/table.cpp:831: warning: ‘__pyx_k_remove’ defined but not used
simhash/table.cpp:832: warning: ‘__pyx_k_xrange’ defined but not used
simhash/table.cpp:833: warning: ‘__pyx_k_UNICODE’ defined but not used
simhash/table.cpp:834: warning: ‘__pyx_k_permute’ defined but not used
simhash/table.cpp:835: warning: ‘__pyx_k_distance’ defined but not used
simhash/table.cpp:836: warning: ‘__pyx_k_find_all’ defined but not used
simhash/table.cpp:837: warning: ‘__pyx_k_diff_bits’ defined but not used
simhash/table.cpp:838: warning: ‘__pyx_k_itertools’ defined but not used
simhash/table.cpp:839: warning: ‘__pyx_k_permutors’ defined but not used
simhash/table.cpp:840: warning: ‘__pyx_k_unpermute’ defined but not used
simhash/table.cpp:841: warning: ‘__pyx_k_find_first’ defined but not used
simhash/table.cpp:842: warning: ‘__pyx_k_num_blocks’ defined but not used
simhash/table.cpp:843: warning: ‘__pyx_k_pyx_vtable’ defined but not used
simhash/table.cpp:844: warning: ‘__pyx_k_MemoryError’ defined but not used
simhash/table.cpp:845: warning: ‘__pyx_k_insert_bulk’ defined but not used
simhash/table.cpp:846: warning: ‘__pyx_k_remove_bulk’ defined but not used
simhash/table.cpp:847: warning: ‘__pyx_k_combinations’ defined but not used
simhash/table.cpp:848: warning: ‘__pyx_k_find_all_bulk’ defined but not used
simhash/table.cpp:849: warning: ‘__pyx_k_find_first_bulk’ defined but not used
simhash/table.cpp:850: warning: ‘__pyx_n_s_MemoryError’ defined but not used
simhash/table.cpp:854: warning: ‘__pyx_n_s_a’ defined but not used
simhash/table.cpp:855: warning: ‘__pyx_n_s_append’ defined but not used
simhash/table.cpp:856: warning: ‘__pyx_n_s_b’ defined but not used
simhash/table.cpp:857: warning: ‘__pyx_n_s_combinations’ defined but not used
simhash/table.cpp:859: warning: ‘__pyx_n_s_diff_bits’ defined but not used
simhash/table.cpp:860: warning: ‘__pyx_n_s_distance’ defined but not used
simhash/table.cpp:862: warning: ‘__pyx_n_s_find_all’ defined but not used
simhash/table.cpp:863: warning: ‘__pyx_n_s_find_all_bulk’ defined but not used
simhash/table.cpp:864: warning: ‘__pyx_n_s_find_first’ defined but not used
simhash/table.cpp:865: warning: ‘__pyx_n_s_find_first_bulk’ defined but not used
simhash/table.cpp:868: warning: ‘__pyx_n_s_import’ defined but not used
simhash/table.cpp:871: warning: ‘__pyx_n_s_itertools’ defined but not used
simhash/table.cpp:872: warning: ‘__pyx_n_s_main’ defined but not used
simhash/table.cpp:873: warning: ‘__pyx_n_s_num_blocks’ defined but not used
simhash/table.cpp:874: warning: ‘__pyx_n_s_permute’ defined but not used
simhash/table.cpp:876: warning: ‘__pyx_n_s_pyx_vtable’ defined but not used
simhash/table.cpp:877: warning: ‘__pyx_n_s_range’ defined but not used
simhash/table.cpp:882: warning: ‘__pyx_n_s_test’ defined but not used
simhash/table.cpp:883: warning: ‘__pyx_n_s_unpermute’ defined but not used
simhash/table.cpp:884: warning: ‘__pyx_kp_s_utf_8’ defined but not used
simhash/table.cpp:885: warning: ‘__pyx_n_s_xrange’ defined but not used
simhash/table.cpp:886: warning: ‘__pyx_int_0’ defined but not used
simhash/table.cpp:888: warning: ‘_pyx_int_64’ defined but not used
simhash/table.cpp:1265: warning: ‘PyObject
pyx_pw_7simhash_5table_1PyHash(PyObject, PyObject)’ defined but not used
simhash/table.cpp:1264: warning: ‘__pyx_doc_7simhash_5table_PyHash’ defined but not used
simhash/table.cpp:1575: warning: ‘PyObject* pyx_pw_7simhash_5table_3PyHashFp(PyObject, PyObject)’ defined but not used
simhash/table.cpp:1574: warning: ‘pyx_doc_7simhash_5table_2PyHashFp’ defined but not used
simhash/table.cpp:1622: warning: ‘int pyx_pw_7simhash_5table_7PyTable_1__cinit**(PyObject, PyObject, PyObject
)’ defined but not used
simhash/table.cpp:1815: warning: ‘void pyx_pw_7simhash_5table_7PyTable_3__dealloc(PyObject
)’ defined but not used
simhash/table.cpp:2198: warning: ‘__pyx_doc_7simhash_5table_7PyTable_6insert_bulk’ defined but not used
simhash/table.cpp:2340: warning: ‘__pyx_doc_7simhash_5table_7PyTable_8insert’ defined but not used
simhash/table.cpp:2553: warning: ‘__pyx_doc_7simhash_5table_7PyTable_10remove_bulk’ defined but not used
simhash/table.cpp:2695: warning: ‘__pyx_doc_7simhash_5table_7PyTable_12remove’ defined but not used
simhash/table.cpp:2754: warning: ‘PyObject* pyx_pw_7simhash_5table_7PyTable_15find_first(PyObject, PyObject)’ declared ‘static’ but never defined
fatal error: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/lipo: can't open input file: /var/tmp//cciq1Gez.out (No such file or directory)
error: command 'cc' failed with exit status 1

And ideas?

Infinite Loop When Querying (Fix Detailed)

We've found that for certain inputs and certain queries, certain installations can have an issue where querying a corpus never terminates. There's a gist containing some JSON files with data that can evince the bug.

Ultimately, it's been found to be caused by libJudy. It relies on undefined behavior and when built with newer versions of gcc (4.8 has been confirmed to not be safe) the J1N API call does not work as defined. In particular this call does not increment the scanned index at all in certain cases. We've not tracked down what exactly this case is, nor do we have any plans to. We do, however, have a fix.

Installing libjudy-dev from apt

Some of libjudy-dev from apt (for Ubuntu 12.04, for instance) are known to work well, but others do not. Notably, Ubuntu 14.04's copy does not and there libJudy must be built from source using gcc-4.6. The process is relatively straightforward:

# With libJudy-1.0.5 unpacked
apt-get install -y gcc-4.6
# These are the flags where the 12.04 build that works
export CFLAGS='-Wall -O2'
export CC=`which gcc-4.6`
# These are the configure flags used in the 12.04 build that works
./configure --prefix=/usr --mandir=/usr/share/man
make
make install

100% CPU load when instantiating simhash.Corpus

@dlecocq (following my twitter message).
Here are some more precise elements to diagnose the problem.

Install steps

Install package :
python setup.py install
/usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'dependencies'
warnings.warn(msg)
running install
running build
running build_py
running build_ext
skipping 'simhash/table.cpp' Cython extension (up-to-date)
running install_lib
running install_egg_info
Removing /home/jerry/temp/simhash/lib/python2.7/site-packages/simhash-0.1.0-py2.7.egg-info
Writing /home/jerry/temp/simhash/lib/python2.7/site-packages/simhash-0.1.0-py2.7.egg-info

(I tried an install with and without a virtualenv)

package list

pip freeze
Cython==0.18
argparse==1.2.1
-e git://github.com/seomoz/simhash-py.git@1e2039d#egg=simhash-dev
wsgiref==0.1.2

Judy install without any error message

Lots of differing bits if doc1 = prefix + doc2

I guess this is not desired:

a = 'a b c d e f g h i j'
b = 'x ' + a
c = a + ' x'

# Copied from tests.py
def compute(text):
        tokens = re.split(r'\W+', text.lower(), flags=re.UNICODE)
        shingles = [''.join(shingle) for shingle in
                    simhash.shingle(''.join(tokens), 4)]
        hashes = [simhash.unsigned_hash(s.encode('utf8')) for s in shingles]
        return simhash.compute(hashes)

print(simhash.num_differing_bits(compute(a), compute(b)))

outputs 27 although

print(simhash.num_differing_bits(compute(a), compute(c))

outputs 0.

from .table import PyCorpus as Corpus

after run "sudo python setup install "

Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import simhash
Traceback (most recent call last):
File "", line 1, in
File "simhash/init.py", line 3, in
from .table import PyCorpus as Corpus
ImportError: No module named table

Building on Windows

I was wondering if anyone is using simhash-py on Windows?
As far as I understand on Windows, the 64bit gcc (mingw64) is still experimental and it's better better to use Microsoft compilers, and the compiler used to build Python extensions also depends on the Python version.

I have done some tests for building simhash-py with conda in the rth:win-ci branch (is it based on top of PR #27). The output of the builds is available in Appveyor CI and the situation is the following. For all python versions, because of a bug in distutils on Win. 64bit when using Visual Studio ([1], [2]), setuptools has to be used instead, then

  • Python 2.7: Build fails as the compiler does not find the stdint.h which is not included in Visual Studio 2008 used to compile python 2.7-3.3 extensions.

  • Python 3.4: with Visual Studio 2010, there is a syntax error (probably some compiler flag is missing, I'm not very familiar with VS compiler flags),

    c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\Include\xlocale(323) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
    simhash\simhash-cpp\src\permutation.cpp(37) : warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
    simhash\simhash-cpp\src\permutation.cpp(101) : error C2143: syntax error : missing ',' before ':'
    simhash\simhash-cpp\src\permutation.cpp(101) : error C2530: 'choice' : references must be initialized
    simhash\simhash-cpp\src\permutation.cpp(102) : error C2143: syntax error : missing ';' before '{'
    simhash\simhash-cpp\src\permutation.cpp(104) : error C2143: syntax error : missing ',' before ':'
    simhash\simhash-cpp\src\permutation.cpp(105) : error C2143: syntax error : missing ';' before '{'
    simhash\simhash-cpp\src\permutation.cpp(147) : warning C4334: '<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)
    
  • Python 3.5: with Visual Studio 2015, build passes, but it looks like there is an infinite loop when running test_basic (test.TestFindAll). Because TestFindAll mostly directly calls the simhash-cpp code, I guess this is more a "how to build simhash-cpp with VS 2014" issue.

I'm mostly interested in making at least PY 3.5 work on Windows (and don't have much experience with building C++ code on Windows), @dlecocq would you have any suggestions on how the above issue for PY 3.5 could be debugged? Thanks!

Explanation in README.md is somehow incomprehensive.

How can I actually use this library to detect duplicate document,
The function insert only accept integers so I suppose I should implement minHash myself(Is it true??) and then use your simHash library to detect near-duplicates.
How can I connect this to simhash-db you provided? does that needs something users themselves should implement??
Sorry for being noob in Near duplicate detection bussiness 😄 and Thank you for sharing your work for us 👍

Choice of the hashing function for shingles

simhash-py currently uses the Python internal hash function to hash shingles. While this works well on Python 2.7, it is not ideal when extending this implementation to later python versions as,

  • As of Python 3.2 internal hash function is salted with a random value:

    Note: By default, the hash() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

    This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
    Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

    See also PYTHONHASHSEED.

    This can be however disabled setting the environment variable PYTHONHASHSEED=0 .

  • There is actually no guarantee on the implementation of the builtin hash function, and it appears to have been changed in python 3.4:

    • Python 2.7:

      >>> hash('test')
      2314058222102390712
      
    • Python 3.3 with PYTHONHASHSEED=0,

      >>> hash('test')
      2314058222102390712
      
    • Python 3.4 with PYTHONHASHSEED=0,

      >>> hash('test')
      4418353137104490830
      

      which leads to TestFunctional.test_added_text tests failing on Python 3.4-3.5 in PR #27 , as the number of different bytes found changes from 3 to 5 with this different hashing mechanism.

As far as I understand, the actual algorithm used to hash shingles is not critical, as long as it is consistent, isn't it? How many bytes should the hash be: 64bit? Is it possible to use less? For instance, could one use the 32-bit version of Murmurhash3 (which is used to hash character or word n-grams in scikit-learn and spark) ?

Edit: a subsidiary question: assuming that we have a 32bit hashed shingles, what would the best values to provide to 64bit simhash?

  • 32bit of 0, then 32 bit of hash
  • 32bit of hash repeated twice
  • some function of this 32 bit hash returning 64 bit?

Python 3 compatible PyPI release

Thanks for this project! Is it time for a PyPI release? The current published version isn't compatible with Python 3, but the github version is working for me (tested compute() running on Python 3.7 on Debian and MacOS).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.