Comments (4)
Hi Nayan,
The usage of the library looks accurate .
- Regarding 2 results being displayed. The intention is to show which documents match with others. If you prefer you could use applyMatchByGroups which will club all the matching elements in a single group and display it.
- The numbers as input should not be a problem to match. I think your match result might have fallen below the default threshold (0.5) . Each element is separated by a space and matched with others, in the example "Nayan J Bayan 123" it will have 4 tokens , so if you have a matching element with more than 2 tokens that are similar it should match.
Here are modifications to your example with applyMatchByGroups
String[] input = new String[]{"Nayan J 123", "Nayan J Bayan 123"};
Document document = null;
List<Document> documentList = new ArrayList<>();
for (String str : input) {
document = new Document.Builder(str)
.addElement(new Element.Builder<String>().setValue(str).setType(ElementType.TEXT).createElement())
.createDocument();
documentList.add(document);
}
Set<Set<Match<Document>>> result = matchService.applyMatchByGroups(documentList);
result.forEach(entry -> {
entry.forEach(match -> {
System.out.println("Data: " + match.getData() + " Matched With: " + match.getMatchedWith() + " Score: "
+ match.getScore().getResult());
});
});
Hope this helps
Thanks
from fuzzy-matcher.
Hi Manish,
Thanks for prompt response. It was indeed helpful.
document = new Document.Builder(str).addElement(new Element.Builder<String>().setValue(str) .setType(ElementType.TEXT).setThreshold(0.0).createElement()).createDocument();
I played with few more string pairs after setting threshold value to 0.0 like above:
- "Vrij Bhooshan" & "VRIJA BHOOSHAN"
- "Mohammad Ashfaque" & "MOHAMMED ASHFAQUE"
- "Nayan Bayan" & "Nayan Jyoti"
- "Nayan Jyoti Bayan Test" & "Nayan Jyoti B T"
However none of them returned any result. Is it expected behaviour?
Edit:
- I tried to set match type to Nearest Neighbour. However on running, I got below exception:
com.intuit.fuzzymatcher.exception.MatchException: Data Type not supported
Tried the same with ElementType.NAME, met same exception. - Changed the element type to NAME and was able to get very good score of 1.0 for pairs 1 and 2.
However no luck with the rest of two.
Thanks
from fuzzy-matcher.
Hi Nayan,
ElementType.NAME
is a better choice for this kind of match. It uses Soundex to match names, and negates any misspelled or closely spelled differences in names.
Nearest Neighbors are a better choice for numeric and date type elements where values are near each other and not the same.
I see you have reduced Threshold value. If you use that on the document instead of Element, you will see them match
String[] input = new String[]{"Nayan Jyoti Bayan Test", "Nayan Jyoti B T"};
Document document = null;
List<Document> documentList = new ArrayList<>();
for (String str : input) {
document = new Document.Builder(str)
.addElement(new Element.Builder<String>().setValue(str).setType(ElementType.NAME).createElement())
.setThreshold(0.49)
.createDocument();
documentList.add(document);
}
Set<Set<Match<Document>>> result = matchService.applyMatchByGroups(documentList);
result.forEach(entry -> {
entry.forEach(match -> {
System.out.println("Data: " + match.getData() + " Matched With: " + match.getMatchedWith() + " Score: "
+ match.getScore().getResult());
});
});
from fuzzy-matcher.
Thanks for the explanation. Able to get scores after setting threshould at document level.
Thank you again for your time.
Closing the issue as my queries are answered
from fuzzy-matcher.
Related Issues (20)
- comparing two string with different dimension HOT 2
- Language Supported HOT 1
- Fuzzy matching issue : only fetching the exact match HOT 9
- Upgrade to Java 11 HOT 5
- Combine Tokenizers for better results HOT 2
- Phone number assumed to be a US number HOT 3
- Help HOT 1
- Kotlin not support HOT 2
- Name List matcher HOT 2
- Is there any way to create my own matchers? HOT 1
- SLF4J Failed to load HOT 3
- New Element Type for product names HOT 2
- upgrade commons-text to a non-vulnerable version HOT 2
- Information on Library usage HOT 5
- Though there is matching result but matcher is not returning. HOT 3
- How to use getScore in Element class? what is the matchingCount? HOT 2
- Questions HOT 1
- Cross-Language Fuzzy Matching: Arabic Document Matching returns 0 matches HOT 3
- Why Does Matching Fail in These Scenarios? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuzzy-matcher.