Giter Site home page Giter Site logo

Comments (4)

manishobhatia avatar manishobhatia commented on August 16, 2024

Hi Nayan,

The usage of the library looks accurate .

  1. Regarding 2 results being displayed. The intention is to show which documents match with others. If you prefer you could use applyMatchByGroups which will club all the matching elements in a single group and display it.
  2. The numbers as input should not be a problem to match. I think your match result might have fallen below the default threshold (0.5) . Each element is separated by a space and matched with others, in the example "Nayan J Bayan 123" it will have 4 tokens , so if you have a matching element with more than 2 tokens that are similar it should match.

Here are modifications to your example with applyMatchByGroups

       String[] input = new String[]{"Nayan J 123", "Nayan J Bayan 123"};
        
        Document document = null;
        List<Document> documentList = new ArrayList<>();
        for (String str : input) {
            document = new Document.Builder(str)
                    .addElement(new Element.Builder<String>().setValue(str).setType(ElementType.TEXT).createElement())
                    .createDocument();
            documentList.add(document);
        }
        Set<Set<Match<Document>>> result = matchService.applyMatchByGroups(documentList);

        result.forEach(entry -> {
            entry.forEach(match -> {
                System.out.println("Data: " + match.getData() + " Matched With: " + match.getMatchedWith() + " Score: "
                        + match.getScore().getResult());
            });
        });

Hope this helps

Thanks

from fuzzy-matcher.

nayan-jyoti avatar nayan-jyoti commented on August 16, 2024

Hi Manish,

Thanks for prompt response. It was indeed helpful.

document = new Document.Builder(str).addElement(new Element.Builder<String>().setValue(str) .setType(ElementType.TEXT).setThreshold(0.0).createElement()).createDocument();

I played with few more string pairs after setting threshold value to 0.0 like above:

  1. "Vrij Bhooshan" & "VRIJA BHOOSHAN"
  2. "Mohammad Ashfaque" & "MOHAMMED ASHFAQUE"
  3. "Nayan Bayan" & "Nayan Jyoti"
  4. "Nayan Jyoti Bayan Test" & "Nayan Jyoti B T"

However none of them returned any result. Is it expected behaviour?

Edit:

  1. I tried to set match type to Nearest Neighbour. However on running, I got below exception:
    com.intuit.fuzzymatcher.exception.MatchException: Data Type not supported
    Tried the same with ElementType.NAME, met same exception.
  2. Changed the element type to NAME and was able to get very good score of 1.0 for pairs 1 and 2.
    However no luck with the rest of two.

Thanks

from fuzzy-matcher.

manishobhatia avatar manishobhatia commented on August 16, 2024

Hi Nayan,

ElementType.NAME is a better choice for this kind of match. It uses Soundex to match names, and negates any misspelled or closely spelled differences in names.
Nearest Neighbors are a better choice for numeric and date type elements where values are near each other and not the same.

I see you have reduced Threshold value. If you use that on the document instead of Element, you will see them match

String[] input = new String[]{"Nayan Jyoti Bayan Test", "Nayan Jyoti B T"};

        Document document = null;
        List<Document> documentList = new ArrayList<>();
        for (String str : input) {
            document = new Document.Builder(str)
                    .addElement(new Element.Builder<String>().setValue(str).setType(ElementType.NAME).createElement())
                    .setThreshold(0.49)
                    .createDocument();
            documentList.add(document);
        }
        Set<Set<Match<Document>>> result = matchService.applyMatchByGroups(documentList);

        result.forEach(entry -> {
            entry.forEach(match -> {
                System.out.println("Data: " + match.getData() + " Matched With: " + match.getMatchedWith() + " Score: "
                        + match.getScore().getResult());
            });
        });

from fuzzy-matcher.

nayan-jyoti avatar nayan-jyoti commented on August 16, 2024

Thanks for the explanation. Able to get scores after setting threshould at document level.
Thank you again for your time.

Closing the issue as my queries are answered

from fuzzy-matcher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.