robertgr991 / fastdameraulevenshtein Goto Github PK
View Code? Open in Web Editor NEWCython implementation of true Damerau-Levenshtein algorithm.
License: MIT License
Cython implementation of true Damerau-Levenshtein algorithm.
License: MIT License
After failing to install via pip3
, I tried to build directly from source and I believe I just hit the same problem. This what python3 setup.py install
produces:
running install
running bdist_egg
running egg_info
creating fastDamerauLevenshtein.egg-info
writing fastDamerauLevenshtein.egg-info/PKG-INFO
writing dependency_links to fastDamerauLevenshtein.egg-info/dependency_links.txt
writing top-level names to fastDamerauLevenshtein.egg-info/top_level.txt
writing manifest file 'fastDamerauLevenshtein.egg-info/SOURCES.txt'
reading manifest file 'fastDamerauLevenshtein.egg-info/SOURCES.txt'
writing manifest file 'fastDamerauLevenshtein.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.15-x86_64/egg
running install_lib
running build_ext
building 'fastDamerauLevenshtein' extension
creating build
creating build/temp.macosx-10.15-x86_64-3.8
creating build/temp.macosx-10.15-x86_64-3.8/fastDamerauLevenshtein
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -I/usr/local/include -I/usr/local/opt/[email protected]/include -I/usr/local/opt/sqlite/include -I/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/include/python3.8 -c fastDamerauLevenshtein/fastDamerauLevenshtein.c -o build/temp.macosx-10.15-x86_64-3.8/fastDamerauLevenshtein/fastDamerauLevenshtein.o
fastDamerauLevenshtein/fastDamerauLevenshtein.c:4246:9: error: too many arguments to function call, expected 15, have
16
__pyx_empty_bytes /*PyObject *lnotab*/
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fastDamerauLevenshtein/fastDamerauLevenshtein.c:331:82: note: expanded from macro '__Pyx_PyCode_New'
PyCode_New(a, 0, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)
~~~~~~~~~~ ^~~~
/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/include/python3.8/code.h:122:12: note:
'PyCode_New' declared here
PyAPI_FUNC(PyCodeObject *) PyCode_New(
^
1 error generated.
error: command 'clang' failed with exit status 1
The macro at line 326 appears to imply that PyCode_New
will take 16 parameters rather than 15 starting at v. 3.8 but Python documentation appears to me to state otherwise.
Running setup.py install for fastDamerauLevenshtein did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
/home/ec2-user/emr_cluster/lib64/python3.8/site-packages/setuptools/dist.py:642: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
warnings.warn(
running install
running build
running build_ext
building 'fastDamerauLevenshtein' extension
creating build
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/fastDamerauLevenshtein
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/home/ec2-user/emr_cluster/include -I/usr/include/python3.8 -c fastDamerauLevenshtein/fastDamerauLevenshtein.c -o build/temp.linux-x86_64-3.8/fastDamerauLevenshtein/fastDamerauLevenshtein.o
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> fastDamerauLevenshtein
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
Great package but I just noticed a bug with the the score in certain situations. If I run
damerauLevenshtein('some string', 'another one but longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 0.03636... but if I run
damerauLevenshtein('some string', 'another one but longer and longer', deleteWeight=1, insertWeight=3, replaceWeight=6, swapWeight=6, similarity=True)
I get a score of 1.0 implying the two strings are identical.
From what I could see, it looks like the issue stems from the line of code
maxDist = min(len1, len2) * min(replaceWeight, deleteWeight + insertWeight) + (max(len1, len2) - min(len1, len2)) * min(deleteWeight, insertWeight)
which is (assuming I've understood your code) supposed to calculate the maximum distance as the cost of swapping out letters in the shorter word + the cost of adding/removing any excess letters
But for my example strings, I believe it should use the insertWeight at the end rather than min(deleteWeight, insertWeight) - there's no way to get from string1 to string2 by deletion, it definitely needs insertion. So I think basically the min() needs to be replaced with an if that checks whether insertions or deletions will be required to get from string1 to string2.
I'm running python 3.7.3 and fastDamerauLevenshtein v1.0.7
Can not install it after migrate to Mac m2.
Similar issue as here
pip install fastdameraulevenshtein
Collecting fastdameraulevenshtein
Using cached fastDamerauLevenshtein-1.0.7.tar.gz (36 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: fastdameraulevenshtein
Building wheel for fastdameraulevenshtein (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for fastdameraulevenshtein (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [28 lines of output]
/tmp/pip-build-env-3upkms0f/overlay/lib/python3.11/site-packages/setuptools/dist.py:745: SetuptoolsDeprecationWarning: Invalid dash-separated options
!!
********************************************************************************
Usage of dash-separated 'description-file' will not be supported in future
versions. Please use the underscore name 'description_file' instead.
By 2023-Sep-26, you need to update your project and remove deprecated calls
or your builds will no longer be supported.
See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
********************************************************************************
!!
opt = self.warn_dash_deprecation(opt, section)
running bdist_wheel
running build
running build_ext
building 'fastDamerauLevenshtein' extension
creating build
creating build/temp.linux-x86_64-cpython-311
creating build/temp.linux-x86_64-cpython-311/fastDamerauLevenshtein
gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/opt/asl-env/include -I/usr/local/include/python3.11 -c fastDamerauLevenshtein/fastDamerauLevenshtein.c -o build/temp.linux-x86_64-cpython-311/fastDamerauLevenshtein/fastDamerauLevenshtein.o
fastDamerauLevenshtein/fastDamerauLevenshtein.c:209:12: fatal error: longintrepr.h: No such file or directory
209 | #include "longintrepr.h"
| ^~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for fastdameraulevenshtein
Failed to build fastdameraulevenshtein
3.11.4
Hello. Thank you for your project. I have one question, I hope for Your help.
I have two very large arrays of numbers (integers, from 0 to 100), I want to find out how similar these two arrays are. I thought I could translate these lists to string list, and then use Your string similarity metric. However, the size of arrays is very large (about 16,000 elements), and there is not enough memory.
Perhaps there is a way to calculate this metric approximately in order to use less memory, or maybe somehow reduce the array?
Thank you very much for any help
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.