Giter Site home page Giter Site logo

ijkilchenko / fuzbal Goto Github PK

View Code? Open in Web Editor NEW
135.0 135.0 11.0 6.25 MB

Chrome extension: Gives Ctrl+F like find results which include non-exact (fuzzy) matches using string edit-distance and GloVe/Word2Vec. Also searches by regular expressions.

Home Page: https://chrome.google.com/webstore/detail/fuzbal/lidjpicdkcgjdkgifmmpalkibjeppdof

License: MIT License

JavaScript 83.08% CSS 2.23% HTML 14.69%

fuzbal's People

Contributors

danielrw7 avatar ijkilchenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuzbal's Issues

Order results from top to bottom

The results are ranked by their Liechtenstein distance but, somehow, even if two results are exactly the same, right now they might be out of order with respect to the order they appear on the page. This is a bug.

Use JQuery better

Do $('#helpTips').hide() instead of $(document.getElementById("helpTips")).hide();.

PDFs are not supported.

Unfortunately, I have no idea how to parse the text of PDFS. This would have been good to be able to do, but it may become a wontfix issue.

Investigate how to fix it in any case and investigate what other formats are not supported.

JSON data not necessarily loaded

Initially this extension didn't work, and there were no error messages in the popup.
Later I checked the background page and found these errors:

_generated_background_page.html:1 Error in event handler for (unknown): TypeError: Cannot use 'in' operator to search for 'skip' in undefined
    at subset (chrome-extension://lidjpicdkcgjdkgifmmpalkibjeppdof/background.js:20:18)
    at chrome-extension://lidjpicdkcgjdkgifmmpalkibjeppdof/background.js:28:24
3_generated_background_page.html:1 Error in event handler for (unknown): TypeError: Cannot use 'in' operator to search for 'menu' in undefined
    at subset (chrome-extension://lidjpicdkcgjdkgifmmpalkibjeppdof/background.js:20:18)
    at chrome-extension://lidjpicdkcgjdkgifmmpalkibjeppdof/background.js:28:24

subset(msg.words, vectors) fails if vectors is not yet loaded.

Add a loading icon

I want to add a loading icon that will appear while the results are being processed.

Consider adding wildcard or full regular expressions support

We might consider letting a user use * which will match, say, at most 5 words.

We could also support full regular expressions and let the user type something like /foo..bar/ into the search box to search for a specific regular expression.

Feature request: Keep search box open, and remember place on search results

Hi there, thanks for the great extension!

You open the search box, type a term, and jump to one of the results.  If you close the box to read the webpage, you have to retype in the search box again.  Just wondering if there's a way to keep the box on top, and keep your place in the search results.

Best,
Jeff

how do i index more text?

how do i index more text?
The text in (some) of my webpage is NOT visually appearing in your match results UI (as-in fuzbal works on some of my page's text but not all of it. It shows matches for stuff in the header but not in a particular <a tag </a>

Troubleshooting performed: I have refreshed the page after activating the chrome extension, no luck.
Any existing approach to indexing more page text so that fuzbal considers more matches?

Here is the snippet of HTML on my page that Fuzbal DOES NOT seem to handle (the text that I expect it to handle is literally this camel-cased:

TextThatFuzbalDoesNotShowInItsMatchesUI

<a class="alps-anchor alps-grid-col-0" href="/a1234556256" tabindex="-1" id="123_anchor" title="TextThatFuzbalDoesNotShowInItsMatchesUI "><i class="alps-icon alps-checkbox" role="presentation"></i><i class="abc" role="presentation"></i>yeeee </a>

Try using word2vec output trained over a large data set

Current version, v0.91, uses the output from word2vec which is generated from their default data set (text file of 100 mbs). This output gives a json file which is about 150 mbs (and the extension is then compressed to about 55 mbs in the Chrome Store). This json file takes up about 400 mbs of RAM when the background page reads the file into a Javascript object (for instant vector lookup).

For comparison, 400 mbs of RAM is 4x bigger than what Adblock uses in RAM and 8x bigger than what Lastpass uses. So reducing the word2vec output file somehow would be great.

On the other hand, we need to experiment with training over large data sets and seeing how large the output files become. The added accuracy of word vectors might be very useful.

Try some of the following things while keeping in mind the size/accuracy trade-offs.

  • Train over several gbs worth of training material
  • Cut the number of dimensions in half (should save about half the space?)
  • Round up the floating point values a few digits
  • Somehow only keep most frequent words in the json format
  • Split the json file into two files: one with 10% most frequent words and one with the other 90%
    • Allow the user a setting to choose to load the larger file into RAM as well

Suggestions for unit testing and integration testing

Right now there is no automated tests that can be run, unfortunately. I am not sure how best to perform testing of a Chrome extension (or a JavaScript project in general since I am not yet familiar with JavaScript).

One thing that I would like to do is set up some JavaScript files which can be run with NodeJS. I could refactor some of the methods that I use in content_script.js into a library and then test those methods separately from various Chrome extension specific methods/variables.

Another thing that I would like to explore is mocking Chrome methods/variables using some sort of a library like https://github.com/acvetkov/sinon-chrome but the usefulness of this is unclear at this point. Any suggestions are welcomed.

Better handle switching between fuzzy, regular expression, and exact queries.

Right now the logic behind what functions to use based on what the user typed in (whether it is a regular expression or if a part of a query is in double quotes) is funkalicious. I need to refactor some parts of that.

Improperly formatted queries like "prison "attack (useless space inside double quotes and no space to the right of a closing double quote) make the extension search for an empty regular expression or something like that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.