Giter Site home page Giter Site logo

Comments (20)

PackmanDude avatar PackmanDude commented on May 22, 2024 5

just don't use ruzzian.

from the_silver_searcher.

gjtorikian avatar gjtorikian commented on May 22, 2024

It's about percentage. If something like 10% of the bytes encountered are "suspicious," then it's flagged as binary.

I believe Perl's -B operates the exact same way, but with around 30% instead.

from the_silver_searcher.

vsushkov avatar vsushkov commented on May 22, 2024

I'd like to search in Russian-only files too. Isn't it a good idea for ag?

from the_silver_searcher.

gjtorikian avatar gjtorikian commented on May 22, 2024

I'm not saying it's not a good idea--I'm not even a maintainer for the project!

Perhaps I am mistaken, though. ack is able to find the text, using test and тест. I always thought it stopped at a percentage threshold.

In any event, ag's problem is still about a percentage threshold. The following file, for example, does find matches:

test
т
test
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
testtest
test
test

(In other words, one Russian character amongst other "Latin" ones.)

from the_silver_searcher.

vsushkov avatar vsushkov commented on May 22, 2024

I suggest to check whether a file is binary using filename extension check. As I understand, it should work even faster than the current approach.

from the_silver_searcher.

gjtorikian avatar gjtorikian commented on May 22, 2024

Eh, that's not really efficient. What about filenames without extensions, like ag, find, Makefile. Which of these are binary?

from the_silver_searcher.

vsushkov avatar vsushkov commented on May 22, 2024

Yep, checking filename extension is not a good idea. What about file linux program? It makes a good job in determining file types. I think ag can use it or take some logic from this program.

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024

I tweaked is_binary() to be a little more forgiving. Try it out now. If my change doesn't do the trick, there are other ideas we can try. For example, if the file is UTF-8 encoded, there shouldn't be very many bytes above 0b10111111.

from the_silver_searcher.

skrattaren avatar skrattaren commented on May 22, 2024

Version compiled from git:master treats my LaTeX files (in Russian) as binary :(

 % ag usepackage Руководство_оператора.tex -a
Binary file Руководство_оператора.tex matches.

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024

@krigstask can you give me some files that fail?

from the_silver_searcher.

skrattaren avatar skrattaren commented on May 22, 2024

Here you are: https://gist.github.com/4029832

It's sample rST file, I can't public those LaTeX files, sorry.

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024

A sample is fine. Thanks.

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024

Try master now.

from the_silver_searcher.

skrattaren avatar skrattaren commented on May 22, 2024

Great, seems to work now!

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024


Victory!

from the_silver_searcher.

vsushkov avatar vsushkov commented on May 22, 2024

@ggreer Could you please update homebrew formulae?

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024

Oh yeah I should tag a release at some point in the near future.

from the_silver_searcher.

ggreer avatar ggreer commented on May 22, 2024

New release tagged. The homebrew pull request is here: Homebrew/legacy-homebrew#15911

from the_silver_searcher.

vsushkov avatar vsushkov commented on May 22, 2024

Megacool. Thanks. 🍰

from the_silver_searcher.

sergeevabc avatar sergeevabc commented on May 22, 2024

Windows 7, ag 2.2.0 (downloaded here)
File hello.txt inside folder Test, it contains 'Привет, мир!'
ag мир -> nothing 👎
rg мир -> hello.txt 1:Привет, мир! 👍

from the_silver_searcher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.