Comments (2)
Hi Abhi,
In these examples since 1 out of 2 names matches , you should get a 50% match.
So for example if these names are part of a List of name
List<String> sourceString = Arrays.asList("A Mathur", "ABhishek Mathur", "Donald Trump", "D Trump");
We just need to feed the library with a Document with an Element of Name.
AtomicInteger idCount = new AtomicInteger();
List<Document> sourceDoc = sourceString.stream().map(name -> {
return new Document.Builder(idCount.incrementAndGet() + "")
.addElement(new Element.Builder().setType(NAME).setValue(name).createElement())
.setThreshold(0.4)
.createDocument();
}).collect(Collectors.toList());
Map<String, List<Match<Document>>> result = matchService.applyMatchByDocIdOld(sourceDoc);
Note, that each document needs a Key
, you can feed your own unique key for these.
Also we would need to reduce the Document threshold
a little, since by default it considers a matching document greater than 0.5
You should be able to see the match results , using this same print to console
result.entrySet().forEach(entry -> {
entry.getValue().forEach(match -> {
System.out.println("Data: " + match.getData() + " Matched With: " + match.getMatchedWith() + " Score: " + match.getScore().getResult());
});
});
Result
Data: {[{'A Mathur'}]} Matched With: {[{'ABhishek Mathur'}]} Score: 0.5
Data: {[{'ABhishek Mathur'}]} Matched With: {[{'A Mathur'}]} Score: 0.5
Data: {[{'Donald Trump'}]} Matched With: {[{'D Trump'}]} Score: 0.5
Data: {[{'D Trump'}]} Matched With: {[{'Donald Trump'}]} Score: 0.5
from fuzzy-matcher.
Thanks Manish for the detailed response, however i fear reducing the threshold to under 0. will start matching Miachel to Mitchell, and D Trump to J Trump. I am looking at the rosette api's t see how they are doing this. though they dont have code open sourced.
from fuzzy-matcher.
Related Issues (20)
- Address matching: street containing hyphens HOT 1
- Matching On Single Word
- Matching two strings HOT 4
- comparing two string with different dimension HOT 2
- Language Supported HOT 1
- Fuzzy matching issue : only fetching the exact match HOT 9
- Upgrade to Java 11 HOT 5
- Combine Tokenizers for better results HOT 2
- Phone number assumed to be a US number HOT 3
- Help HOT 1
- Kotlin not support HOT 2
- Name List matcher HOT 2
- Is there any way to create my own matchers? HOT 1
- SLF4J Failed to load HOT 3
- New Element Type for product names HOT 2
- upgrade commons-text to a non-vulnerable version HOT 2
- Information on Library usage HOT 5
- Though there is matching result but matcher is not returning. HOT 3
- How to use getScore in Element class? what is the matchingCount? HOT 2
- Questions HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuzzy-matcher.