Comments (2)
Yes, this scenario is supported through configuration, and follows a similar pattern described in NGram token match
The basic idea is to break the elements into subset of strings (grams) and match them. So in the example you gave the 2 strings will be broken down in this fashion (assuming tri-gram subsets)
STP375S-B60/Wnh_1500V_20V02_1756 -> ['STP', 'TP3', 'P37', '375', '75S', ...., 'Wnh', 'nh_', 'h_1' ..... '756']
STP 375S-B60/Wnh -> ['STP', 'TP3', 'P37', '375', '75S', ...., 'Wnh']
So when calculating the results the matching grams will shore up the score for strings having similar substring.
You will most likely never have a 100% match, but you can setup a lower threshold which works for your use case.
Here are the configurable items I would recommend to experiment with
- Use Tokenizer function: Each
Element
object allows you to override it with pre-defined functions. There aretriGramTokenizer
anddecaGramTokenizer
that you can experiment with - You can also define a custom one (see unit test as example) and pass in a function that creates a different gram length
- And change default threshold for Element match if needed
Once you have a good general solution for matching product names using either of this configuration, or any additional code, will be happy to include this as a new element type. Feel free to open a PR to get this going
Hope this help
Thanks
from fuzzy-matcher.
Closing this issue, Feel free to open one if additional support it needed
from fuzzy-matcher.
Related Issues (20)
- Matching On Single Word
- Matching two strings HOT 4
- comparing two string with different dimension HOT 2
- Language Supported HOT 1
- Fuzzy matching issue : only fetching the exact match HOT 9
- Upgrade to Java 11 HOT 5
- Combine Tokenizers for better results HOT 2
- Phone number assumed to be a US number HOT 3
- Help HOT 1
- Kotlin not support HOT 2
- Name List matcher HOT 2
- Is there any way to create my own matchers? HOT 1
- SLF4J Failed to load HOT 3
- upgrade commons-text to a non-vulnerable version HOT 2
- Information on Library usage HOT 5
- Though there is matching result but matcher is not returning. HOT 3
- How to use getScore in Element class? what is the matchingCount? HOT 2
- Questions HOT 1
- Cross-Language Fuzzy Matching: Arabic Document Matching returns 0 matches HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuzzy-matcher.