Giter Site home page Giter Site logo

lib_fts's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lib_fts's Issues

Using fts_fuzzy_match concept for searching through multiple documents

Hi @forrestthewoods, great work on the fuzzy match, I'm having a lot of fun with it! This is a question and not an issue but I'm putting it here for future reference.

I'm a junior dev and I thinking about this idea for some time about using your algorithm for searching through multiple txt files for a particular word and then ranking them based on final score. I still don't know how to define the final score and where to start my quest. Also, is this algorithm the best approach for my case or should I consider something else?

Thank you!

Thank you :)

Not really an issue !

Just wanted to thank you for all the math magic on solving ballistic trajectories. If you don't mind, I ported your solution to AssemblyScript, so it can be used as a WASM binary - you can find it here. I made it so that your methods are abstracted away into a core binary, while other implementations may use it. This way it can be agnostic of what web 3D engine it is used with and each can do their own implementation for a tight integration. Currently added an implementation with Playcanvas.

Cheers!

fts_fuzzy_match: Should consecutive matches and full word matches have a higher bonus?

I really enjoyed your article on Sublime's fuzzy search and appreciate your efforts to recreate what seems to be some kind of magic to me.

After playing around on your live demo, I noticed that consecutive matches were weighted less than first character matches. E.g:

Search for cold in Hearthstone cards

  • 23 - Cone of Cold
  • 20 - Cold Blood
  • 20 - Coldarra Drake
  • 15 - Coldlight Seer
  • 13 - Coldlight Oracle
  • 4 - Cobalt Guardian
  • -21 - Ancestral Knowledge

In comparison, searching in Sublime, full word or consecutive matches would rank higher than first letter matches. E.g:

Search for node in a node.js project in Sublime

screen shot 2017-01-06 at 02 36 11

You can see the top results are full word matches and the shortest paths seems to weighted more. It takes something like 20 results for the first non-full match to appear (255).

screen shot 2017-01-06 at 02 37 23

Perhaps this comes down the scale of weighting you are using, as from the screenshot, we can see Sublime scores are in the 200+ region, which allows for a larger spread of scores.

Anyway, its fun to think about nonetheless

fts_fuzzy_match: update JavaScript implementation to version 0.2.0

Version 0.2.0 of fts_fuzzy_match makes a pretty difference in result quality. It now performs an exhaustive search to find the highest scoring match.

The JavaScript implementation is still on Version 0.1.0. If there is any demand I'll update the port. Does anyone actually use this? If yes I'll take the time to do so. Consider this an opportunity for your voice to be heard!

Test examples?

My apologies if I missed this in the source/documentation.

I saw that you have test data for running a query against various wordlists. But do you have set of test examples for ensuring the accuracy of the algorithm (e.g. unit testing/test-driven development sort of thing.)??

I implemented my own algorithm using your post as an example, and saw that you updated it 2 years ago. I never put my algorithm into production, but am now revisiting it and will be tweaking the scoring parameters and rules to try and improve results (and adding the exhaustive search approach is probably something I should do). While updating my code, I am trying to incorporate various examples into my unit testing to ensure that new improvements don't break anything, as well as to allow me to ensure accurate results.

If others have useful examples, I would like to incorporate them to help ensure my own algorithm behaves reasonably.

Thanks, and thanks for a great post!!

Bogus โ€œlicenceโ€ statement

This software is dual-licensed to the public domain and under the following license: you are granted a perpetual, irrevocable license to copy, modify, publish, and distribute this file as you see fit.

This is not possible. A work that can fall under copyright law is always either in the Public Domain or under copyright protection, for any single given legislation. If itโ€™s in the Public Domain, it is not possible to issue a licence for it, because a licence can only be given if itโ€™s under copyright protection (therefore eroding your statement that itโ€™s in the Public Domainโ€ฆ which is not possible for works with still-living authors in many parts of the world anyway).

Your best bet is to change this to a CC0 licence which triggers as soon as there is copyright protection in the residence of licensee (not licensor), which is important.

fts_fuzzy_match: Recursion count doesn't get decremented anywhere

The recursionCount variable can reach the recursionLimit very quickly as it is not being calculated correctly. It means in some cases not all of the possible matches get exhaustively searched.

I think recursionCount is meant to count at what level of recursion you're currently at, but because it doesn't get decremented once a recursive call has exited it's not correct.

I think decrementing recursionCount after the call to fuzzy_match_recursive() fixes the problem.

Thanks for sharing this library by the way - it's really useful.

Leading letter penalty maximum check is inverted?

Hi, unless I'm reading the source wrong it seems that max_leading_letter_penalty should be set as the penalty if the initial penalty is greater than the max.

The current implementation acts like a minimum penalty.

            // Apply leading letter penalty
            int penalty = leading_letter_penalty * matches[0];
            if (penalty < max_leading_letter_penalty)
                penalty = max_leading_letter_penalty;

CAMEL_BONUS is mistakenly added in JavaScript version

neighbor === neighbor.toLowerCase() evaluates to true for separators ('_' and space), so an uppercase letter following'_' or ' ' will get CAMEL_BONUS as well as SEPARATOR_BONUS. This is different from the C++ version where ::islower(neighbor) is false for non-alphabetic characters.

Documentation License?

Hi Forrest,

first of all, thanks for this great and inspiring project and for the generous public domain attribution!

I wanted to ask you about whether the documentation (specifically, the fuzzy_match.md doc) is also public domain and can be therefore reused in other projects (with due credits), or if the public domain license only applies to the code.

I'm working on an open source project on fuzzy search algorithms which will be reusing the fts_fuzzy_match code, and I would like to include also your fuzzy_match.md document, but before doing so I wanted to make sure it's alright with you.

fuzzy_match.md outdated?

It appear the function names have changed. That there is a 0.2.0 version?

Could the different versions be tagged so it's easier to locate the appropriate commits?

fts_fuzzy_match: Handle Transposed Characters

If you don't mind sacrificing a little speed, then it's also nice to handle transpositions, so that "sta" can still match against "SpawnActorTimer", just with a lower score - Sublime Text 3 does this.

This would be worth some experimentation.

fts_fuzzy_match: fts_fuzzy_match.h (C++): Ensure proper match arrays initialization

After compiling the C++ matcher using gcc (which comes with the latest version of Code::Blocks) and running the resulting program, I noticed that the final matches array contains (at some positions, different from the match locations) random uint8_t values.

I also noticed that some match arrays don't seem to be properly initialized: uint8_t matches[256]; , uint8_t bestRecursiveMatches[256]; and uint8_t recursiveMatches[256];. Now, it's been a long while since I last touched C/C++ but I think the idea is that auto/static arrays get allocated but not automatically initialized to 0 or whatever (as opposed to class member arrays). I may, of course, be mistaken... :)

The code:

#include <iostream>
#include <iomanip>
#include <cstdlib>

using namespace std;

#define FTS_FUZZY_MATCH_IMPLEMENTATION
#include "fts_fuzzy_match.h"

using namespace fts;

void match(const char *pattern, const char *s)
{
    int score, max_matches = 64;
    uint8_t *matches = (uint8_t *) calloc(max_matches, sizeof(uint8_t));

    bool matched = fuzzy_match(pattern, s, score, matches, max_matches);
    cout << boolalpha << "Matches: " << matched << "; score: " << score << endl;
    for (int i = 0; i < max_matches; i++) {
        if (i % 15 == 0) {
            cout << endl;
        }
        cout << setw(5) << right << unsigned(matches[i]);
    }

    cout << endl;
}

int main()
{
    char const *s1 = "MockAI.h";
    char const *s2 = "MacroCallback.cpp";
    char const *s3 = "MockGameplayTasks.h";
    char const *s4 = "MovieSceneColorTrack.cpp";

    char const *pattern = "Mock";

    match(pattern, s1);
    match(pattern, s2);
    match(pattern, s3);
    match(pattern, s4);

    return 0;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.