Giter Site home page Giter Site logo

Comments (5)

manishobhatia avatar manishobhatia commented on September 18, 2024

Hi Gabe,

The intention of the library is to be domain agnostic, and the fuzzy match be driven entirely by "ElementType".

Currently there are quite a few domain specific type defined like "Name", "Address", "Phone Number", "Email", etc
And some generic types like "Text", "Number", "Date"
The idea is to expand these types as we get more requirements from the open source community and make the library useful in multiple domains

That said even without an enhancement to this library, it supports overriding the default behaviors and make it useful.
For example the Element API allows you to override most of its matching capability
https://github.com/intuit/fuzzy-matcher#element-configuration

Out of these configuration the "PreProcessingFunction" and "TokenizerFunction" gives an ability to inject user defined code at run time (by means of Java Functions), and provides additional flexibility to match most types of data.

If there are specific use cases you run into, feel free to send some details and example data sets, and we can look at including it in our next release.

Hope this helps.

  • Manish

from fuzzy-matcher.

gabe2001 avatar gabe2001 commented on September 18, 2024

Hello Manish,

Thanks for your explanation! I'm perfectly happy with current functionality. The ask was around making a domain (parameters/value types/etc.) exchangeable.
If I want to create a model which is not person related I'd have to "live" with these, ignore them, and add new ones which are of no interest to people related properties.
Perhaps a enhancement idea/request to have the domain implemented as a plug-able (interfaces?) feature.

cheers,
-gabe

from fuzzy-matcher.

manishobhatia avatar manishobhatia commented on September 18, 2024

Hi Gabe,

I like the idea of having plug-able interfaces for various domain , that can enable easy matching. Will take that into consideration in the next iteration of our release.

In the meantime, I wanted to assure that there is little to no impact on having multiple ElementTypes present (both in terms of memory or cpu usage), even if it is not used.

The ElementType are simple easy to use ENUM's which itself is made up of different combinations of Pre-Processing function, Tokenizer Function and Match Type
This just makes it easy for the end-user to implement matching without dwelling too much into the library.

There was a an issue posted earlier, which in-fact alluded to the fact of removing ElementTypes altogether with a similar concern of not making it domain specific. Personally I like your suggestion better.
Will take both this POV's into account, as the library evolves.

We are interested in knowing as to which domains this library has been applied to, to help inform our direction. We have had quite a few usage in the realm of Person and Transaction matching domains. Feel free to comment on which domain you see it being useful.

Thanks again for suggestions and helping this project move into the right direction.

  • manish

from fuzzy-matcher.

gabe2001 avatar gabe2001 commented on September 18, 2024

I believe the sky is the limit here.

Dating sites are an excellent example of variable amount of properties/attributes to be matched with.
Job searching: finding a good candidate based on skills matched with a particular job description.

from fuzzy-matcher.

gabe2001 avatar gabe2001 commented on September 18, 2024

@manishobhatia, after some time I've finally been able to give your library a try! In hindsight my question was completely irrelevant. I did myself allow to be mislead by the addresses example. As you said, the library is completely domain agnostic. The naming of the ElementType function enumerations suggests otherwise.
With that out of the way, what I believe could be beneficial is the addition of a "in" matching. Does a value exist in a set of values, initially with an exact match. The workaround currently is to duplicate the documents for all permutations of the list or lists of values.
cheers,
-gabe

from fuzzy-matcher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.