Comments (3)
I just checked, the test passed on my machine. Will investigate the support of Fuzzy's for Windows and Linux. What is the version of Fuzzy in your machine? Mine is 1.2.2
from blocklib.
Same version for me (currently on "linux" (Windows subsystem for Linux). Another test failure has a different clue:
________________________________________________ TestPSig.test_soundex _________________________________________________
self = <test_signature_generator.TestPSig testMethod=test_soundex>
def test_soundex(self):
"""Test signatures generated by soundex."""
dtuple = ('Joyce', 'Wang', 2134)
signature_strategies = [[{'type': 'soundex', 'feature_idx': 0},
{'type': 'soundex', 'feature_idx': 1},]]
> signatures = generate_signatures(signature_strategies, dtuple)
tests/test_signature_generator.py:31:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
blocklib/signature_generator.py:141: in generate_signatures
s = func(**config)
blocklib/signature_generator.py:79: in generate_by_soundex
return soundex(feature)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position 1: ordinal not in range(128)
src/fuzzy.pyx:230: UnicodeDecodeError
from blocklib.
So it looks like Fuzzy
has some pretty serious unresolved issues. I suggest we either find an alternative implementation (perhaps https://pypi.org/project/Metaphone/#description) or drop support for soundex/metaphone - at least for now.
from blocklib.
Related Issues (20)
- Automate release with CI
- feedback on filtering for P-Sig blocking
- Add tests
- Ideas for extra signature strategies
- Python API for signature generation
- Sentinel check for input type HOT 1
- Inconsistent block keys in filtered reversed index with psig
- Convert block key into string
- Throw exception when clks are fed to p-sig blocking HOT 1
- Support column names in blocking schema
- Add number of encodings in blocking metadata HOT 2
- Dependabot couldn't authenticate with https://pypi.python.org/simple/
- float division by zero issue HOT 9
- Docs, examples and tests should use feature names
- Convert printing to logging
- Serialize to a blocking schema
- Blocking Schema consistency
- module 'blocklib.validation' has no attribute 'validate_blocking_schema' HOT 1
- 'CandidateBlockingResult' object has no attribute 'print_summary_statistics' HOT 1
- Dependabot errors HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blocklib.