Giter Site home page Giter Site logo

or-match's People

Contributors

boshrin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

or-match's Issues

Move Documentation to a Wiki

The one gigantic README.md is hard to navigate and maintain. Move the documentation to a wiki somewhere, probably depending on where the project ends up.

Also review second person vs third person style. (The README switches about halfway through from "you" to "one".)

Allow substr And token Searches in Canonical Matches

It might make sense to allow substr (#16) and token (#23) matches to be canonical. One way to do this would be to have a hierarchy of exact/token/substr, and whichever (if any) is first defined for an attribute would be used in determining canonical matches. Another way would be to allow canonical matches to specify search types (like potential matches), though it's not clear that it makes sense to have different canonical rules for the same attribute with these search types.

Require Configured SOR Only

Currently, the match engine will allow any value for SOR, assuming the client is authorized for it. So if a client is authorized for '*', it might send 'sis', 'SIS', or 'student' all meaning the same thing. Perhaps require SORs to be defined in the [sors] section of config.in. This could then allow 'sis' while rejecting the other 2 as undefined.

Sequence For reference_id

Currently, reference_id is generated as a UUID. Offer a configurable option to generate it from a sequence (which can be cast to a varchar so the matchgrid column doesn't need to change).

Use SELECT FOR UPDATE?

It may make sense to use SELECT FOR UPDATE to handle a read/write request to avoid concurrency issues.

Review CIFER ID Match API

Review API documentation for missing features/requirements and implement (or create tickets) as necessary.

Token Search Type

Implement a tokenization search type, that would (eg) treat "anna marie" and "anna" as the same by ignoring everything from the space onwards. This might be doable with SQL's substring(string from pattern for escape).

Check For Existing Reference ID

Especially when using a sequence, check for an existing record with a newly assigned reference ID, and if one exists try again (up until some reasonable max). This will simplify bulk loading existing data that may exist in a conflicting range.

Soundex Search Type

This may or may not be a good/useful idea, but is supported by more databases.

Auto-Define sor and sorid

These two attributes are mandatory, and so should always be defined. (Perhaps inserted into config during parsing?)

Notification on Fuzzy Match

Support notification to an email address (per SOR?) when a new potential match is recorded (202, not 300).

Substring Search type

Like the exact search type, but only operates on a substring (eg: first 5 characters, last 5, second through fifth, etc).

Skip Attributes Consisting of All Blanks/Zeroes

If an attribute consists of only blanks or zeroes (and possibly non-alphanumeric characters according to the configuration), skip it for matching purposes. eg: lastname="", dob="0000-00-00", ssn="000000000".

Perhaps have a configuration option to disable this on a per-attribute basis.

Log Levels

Add support for none / connections / logic / trace (sql)

Link Attribute Columns to SOR+SORID

Add the ability for an attribute to match against SORID. Something like

[attribute:employeeid]
; Match this attribute against SORID where SOR is "HR"
search['sor'] = 'HR'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.