forrestthewoods / lib_fts Goto Github PK
View Code? Open in Web Editor NEWsingle-file public domain libraries
single-file public domain libraries
Can you explain what purpose does bestLetter serve ?
Hi @forrestthewoods, great work on the fuzzy match, I'm having a lot of fun with it! This is a question and not an issue but I'm putting it here for future reference.
I'm a junior dev and I thinking about this idea for some time about using your algorithm for searching through multiple txt files for a particular word and then ranking them based on final score. I still don't know how to define the final score and where to start my quest. Also, is this algorithm the best approach for my case or should I consider something else?
Thank you!
Hello, I'd like to say thank you for the amazing library, I do a Elixir port from the last c++ version (Feb 18, 2017) ,here the url:
https://gist.github.com/WolfDan/7f80583f1b1b3fe23c03294582ea9f96
If like add it to the readme to others can use :)
// Apply unmatched penalty
It is
int unmatched = (int)(str - strBegin) - nextMatch;
where nextMatch is a pointer into matches, so the expression above mixes apples with oranges. IMHO it should be
int unmatched = (int)(str - strBegin) - matches[nextMatch-1] + 1;
Thanks for a great library, we are going to use it in
https://www.prusa3d.com/prusaslicer/
https://github.com/prusa3d/PrusaSlicer
Would be great to be able to install via NPM.
Thoughts?
Not really an issue !
Just wanted to thank you for all the math magic on solving ballistic trajectories. If you don't mind, I ported your solution to AssemblyScript, so it can be used as a WASM binary - you can find it here. I made it so that your methods are abstracted away into a core binary, while other implementations may use it. This way it can be agnostic of what web 3D engine it is used with and each can do their own implementation for a tight integration. Currently added an implementation with Playcanvas.
Cheers!
๐๐๐๐๐๐๐๐
I really enjoyed your article on Sublime's fuzzy search and appreciate your efforts to recreate what seems to be some kind of magic to me.
After playing around on your live demo, I noticed that consecutive matches were weighted less than first character matches. E.g:
Search for cold in Hearthstone cards
In comparison, searching in Sublime, full word or consecutive matches would rank higher than first letter matches. E.g:
Search for node in a node.js project in Sublime
You can see the top results are full word matches and the shortest paths seems to weighted more. It takes something like 20 results for the first non-full match to appear (255).
Perhaps this comes down the scale of weighting you are using, as from the screenshot, we can see Sublime scores are in the 200+ region, which allows for a larger spread of scores.
Anyway, its fun to think about nonetheless
Version 0.2.0 of fts_fuzzy_match makes a pretty difference in result quality. It now performs an exhaustive search to find the highest scoring match.
The JavaScript implementation is still on Version 0.1.0. If there is any demand I'll update the port. Does anyone actually use this? If yes I'll take the time to do so. Consider this an opportunity for your voice to be heard!
My apologies if I missed this in the source/documentation.
I saw that you have test data for running a query against various wordlists. But do you have set of test examples for ensuring the accuracy of the algorithm (e.g. unit testing/test-driven development sort of thing.)??
I implemented my own algorithm using your post as an example, and saw that you updated it 2 years ago. I never put my algorithm into production, but am now revisiting it and will be tweaking the scoring parameters and rules to try and improve results (and adding the exhaustive search approach is probably something I should do). While updating my code, I am trying to incorporate various examples into my unit testing to ensure that new improvements don't break anything, as well as to allow me to ensure accurate results.
If others have useful examples, I would like to incorporate them to help ensure my own algorithm behaves reasonably.
Thanks, and thanks for a great post!!
This software is dual-licensed to the public domain and under the following license: you are granted a perpetual, irrevocable license to copy, modify, publish, and distribute this file as you see fit.
This is not possible. A work that can fall under copyright law is always either in the Public Domain or under copyright protection, for any single given legislation. If itโs in the Public Domain, it is not possible to issue a licence for it, because a licence can only be given if itโs under copyright protection (therefore eroding your statement that itโs in the Public Domainโฆ which is not possible for works with still-living authors in many parts of the world anyway).
Your best bet is to change this to a CC0 licence which triggers as soon as there is copyright protection in the residence of licensee (not licensor), which is important.
The recursionCount variable can reach the recursionLimit very quickly as it is not being calculated correctly. It means in some cases not all of the possible matches get exhaustively searched.
I think recursionCount is meant to count at what level of recursion you're currently at, but because it doesn't get decremented once a recursive call has exited it's not correct.
I think decrementing recursionCount after the call to fuzzy_match_recursive() fixes the problem.
Thanks for sharing this library by the way - it's really useful.
Hi, unless I'm reading the source wrong it seems that max_leading_letter_penalty
should be set as the penalty if the initial penalty is greater than the max.
The current implementation acts like a minimum penalty.
// Apply leading letter penalty
int penalty = leading_letter_penalty * matches[0];
if (penalty < max_leading_letter_penalty)
penalty = max_leading_letter_penalty;
solve_ballistic_arc
the resulting vector is not equal in length to the specified speed
https://i.imgur.com/OE4pf7G.png
and
https://i.imgur.com/uIVhWrO.png
neighbor === neighbor.toLowerCase()
evaluates to true for separators ('_' and space), so an uppercase letter following'_'
or ' '
will get CAMEL_BONUS
as well as SEPARATOR_BONUS
. This is different from the C++ version where ::islower(neighbor)
is false for non-alphabetic characters.
Love the project, but you probably want to add a proper license.
Slap the MIT License there and you're good to go.
Hi Forrest,
first of all, thanks for this great and inspiring project and for the generous public domain attribution!
I wanted to ask you about whether the documentation (specifically, the fuzzy_match.md
doc) is also public domain and can be therefore reused in other projects (with due credits), or if the public domain license only applies to the code.
I'm working on an open source project on fuzzy search algorithms which will be reusing the fts_fuzzy_match
code, and I would like to include also your fuzzy_match.md
document, but before doing so I wanted to make sure it's alright with you.
It appear the function names have changed. That there is a 0.2.0 version?
Could the different versions be tagged so it's easier to locate the appropriate commits?
If you don't mind sacrificing a little speed, then it's also nice to handle transpositions, so that "sta" can still match against "SpawnActorTimer", just with a lower score - Sublime Text 3 does this.
This would be worth some experimentation.
After compiling the C++ matcher using gcc
(which comes with the latest version of Code::Blocks
) and running the resulting program, I noticed that the final matches
array contains (at some positions, different from the match locations) random uint8_t
values.
I also noticed that some match arrays don't seem to be properly initialized: uint8_t matches[256];
, uint8_t bestRecursiveMatches[256];
and uint8_t recursiveMatches[256];
. Now, it's been a long while since I last touched C/C++ but I think the idea is that auto/static arrays get allocated but not automatically initialized to 0 or whatever (as opposed to class member arrays). I may, of course, be mistaken... :)
The code:
#include <iostream>
#include <iomanip>
#include <cstdlib>
using namespace std;
#define FTS_FUZZY_MATCH_IMPLEMENTATION
#include "fts_fuzzy_match.h"
using namespace fts;
void match(const char *pattern, const char *s)
{
int score, max_matches = 64;
uint8_t *matches = (uint8_t *) calloc(max_matches, sizeof(uint8_t));
bool matched = fuzzy_match(pattern, s, score, matches, max_matches);
cout << boolalpha << "Matches: " << matched << "; score: " << score << endl;
for (int i = 0; i < max_matches; i++) {
if (i % 15 == 0) {
cout << endl;
}
cout << setw(5) << right << unsigned(matches[i]);
}
cout << endl;
}
int main()
{
char const *s1 = "MockAI.h";
char const *s2 = "MacroCallback.cpp";
char const *s3 = "MockGameplayTasks.h";
char const *s4 = "MovieSceneColorTrack.cpp";
char const *pattern = "Mock";
match(pattern, s1);
match(pattern, s2);
match(pattern, s3);
match(pattern, s4);
return 0;
}
Hi,
I just made a proposal in Kate Text Editor to use this library for fuzzy filtering in quick-open. Its being considered for general use in KDE and I would just like to ask that if you would be okay if we publish it under LGPL V2
The merge request is here: https://invent.kde.org/utilities/kate/-/merge_requests/140
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.