Giter Site home page Giter Site logo

tmalsburg / guess-language.el Goto Github PK

View Code? Open in Web Editor NEW
119.0 5.0 14.0 170 KB

Emacs minor mode that detects the language you're typing in. Automatically switches spell checker. Supports multiple languages per document.

Emacs Lisp 100.00%
emacs emacs-lisp language-detection language language-statistics spellcheck

guess-language.el's People

Contributors

aglet avatar andersjohansson avatar dantlev01 avatar djolereject avatar hendursaga avatar humitos avatar joostkremers avatar mihakam avatar peterwvj avatar smoeding avatar syohex avatar tmalsburg avatar wentasah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

guess-language.el's Issues

problems with the set up, I need to restart flyspell mode for each paragraph!?

Hi

I am long time user of ispell, flyspell (pluas own language dependend abbrev table) Usually I have 4 functions, for switching the ispell-dictionary for the for languages I use most (plus their corresponding abbrev tables) and bind this to 4 different keys. The ispell dictionaries I have are:

american.hash -> /var/lib/ispell/american.hash
british.hash -> /var/lib/ispell/british.hash
british-insane.hash -> /var/lib/ispell/british-insane.hash
castellano.hash -> espa~nol.hash
odeutsch.hash -> ogerman.hash
francais.hash -> french.hash

So I changed the setting of guess-language-langcodes according of what is to be used for ispell-change-dictionary which results in (just listing the differences)


(
 (de "deutsch8" "German" "🇩🇪" "German")
 (en "british" "English" "🇬🇧" "English")
 (es "castellano8" nil "🇪🇸" "Spanish")
 (fr "francais" "French" "🇫🇷" "French"))

Now I open a new file, start guess-language-mode (flyspell-mode is on) and type


Damit sind die Voraussetzungen des lokalen Existenzsatzes für
Anfangsdaten $U(T)$ gegeben und man erhält die Existenz einer Lösung
auf dem Intervall $[T,T+\epsilon]$. Aber warum ist das wahr?


Now let us look whether this true.  But then what to we see

Jetzt geht es zurück nach Deutsch. Aber es passiert nichts.

The first paragraph is correctly identified as German, but the second not. Only if I restart flyspell-mode, English is chosen. Is this the expected behaviour? Because if this is not done automatically then I can hit also 2 keys, one for changing the language manually, one for restarting flyspell-mode.

What do I miss?
Regards
Uwe Brauer

Dutch ?

The table of languages that are currently supported by guess-language-mode mentions:
Dutch nl de

Should that not be Dutch nl nl_NL?

Advantage from using local variables ?

Maybe add 'advantage of using local variables' in readme ?

e.g

;; Local Variables:
;; ispell-check-comments: exclusive
;; ispell-local-dictionary: "en"
%%% Local Variables:
%%% ispell-local-dictionary: "british"
%%% End:

Esperanto in `guess-language-langcodes`

Hi, I just noticed that the entry for Esperanto in guess-language-langcodes reads:

(eo     . ("eo"         "English" "🟩"   "Esperanto"))

This should of course be:

(eo     . ("eo"         "Esperanto" "🟩"   "Esperanto"))

flyspell-large-region: Wrong type argument: stringp, nil

M-x flyspell-buffer fails with error flyspell-large-region: Wrong type argument: stringp, nil when guess-language-mode` is enabled. The major mode does not seem to be a factor to reproduce the error (it breaks in fundamental, text and org modes).

Steps to reproduce:

  • Download the latest guess-language version from git (currently bc6fe11) and place it in /tmp
$ cd /tmp/
$ git clone [email protected]:tmalsburg/guess-language.el.git
  • Start emacs with emacs -q
  • Run the following snippet to initialize guess-language (M-x eval-buffer in the scratch buffer):
(add-to-list 'load-path "/tmp/guess-language.el")
(load "guess-language")
     
(setq guess-language-languages '(en fr pt))
(setq guess-language-min-paragraph-length 35)
(setq guess-language-langcodes
      '((en . ("en_US" "English"))
        (fr . ("fr_FR" "French"))
        (pt . ("pt_BR" "Portuguese"))
        ))
  • Create a new buffer and paste the following text:
The following lines must be in a language different from the language of the first line to cause the bug.
The number of lines below seems to be the minimum necessary to make the bug manifest itself in my system.

Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
Uma linha longa o suficiente para que o problema se manifeste.
  • Run M-x flyspell-buffer - Everything works as expected
  • Enable guess-language-mode M-x guess-language-mode
  • Run M-x flyspell-buffer - Flyspell terminates with the following error before the end of the spell checking: flyspell-large-region: Wrong type argument: stringp, nil

Full output of the messages buffer:

For information about GNU Emacs and the GNU system, type C-h C-a.
Mark set
Loading /tmp/guess-language.el/guess-language.el (source)...done
You can run the command ‘eval-buffer’ with M-x ev-b RET
Loading /tmp/guess-language.el/guess-language.el (source)...done
(New file)
Mark set
Starting new Ispell process /usr/bin/aspell with default dictionary...
Checking region...
Spell Checking...100% [manifeste]
Spell Checking completed.
You can run the command ‘flyspell-buffer’ with M-x fl-bu RET
Spell Checking completed.
Guess-Language mode enabled in current buffer
You can run the command ‘guess-language-mode’ with M-x gue-mo RET
Guess-Language mode enabled in current buffer
Checking region...
Ispell process killed
Local Ispell dictionary set to pt_BR
Starting new Ispell process /usr/bin/aspell with pt_BR dictionary...
Detected language: Portuguese
flyspell-large-region: Wrong type argument: stringp, nil

Debugger output:

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  flyspell-external-point-words()
  flyspell-large-region(1 1221)
  flyspell-region(1 1221)
  flyspell-buffer()
  funcall-interactively(flyspell-buffer)
  call-interactively(flyspell-buffer record nil)
  command-execute(flyspell-buffer record)
  execute-extended-command(nil "flyspell-buffer" "flyspell-bu")
  funcall-interactively(execute-extended-command nil "flyspell-buffer" "flyspell-bu")
  call-interactively(execute-extended-command nil nil)
  command-execute(execute-extended-command)

Emacs 26.1
guess-language from git (bc6fe11)

guess-language fail to start

When opening a text file, I get the following error

Error in post-command-hook (flyspell-post-command-hook): (wrong-type-argument symbolp (quote typo-mode))

Looking in the code I found a call (bound-and-true-p 'typo-mode) in the function guess-language-switch-typo-mode-function. Given that bound-and-true-p takes a symbol as argument this is obviously an error.

Removing the quote solves the problem for me

Cursor jumps when text is analysed

I've noticed that the cursor jumps around while text is being analysed by guess-language. Although this is barely noticeable for small paragraphs, it becomes easier to see for larger ones. I've managed to produce this issue using a rather small Emacs configuration, which I store in ~/Desktop/tmp/init.el:

;; ~/Desktop/tmp/init.el
(package-initialize)

(autoload 'flyspell-mode "flyspell" t)
(setq ispell-dictionary "british")
(setq ispell-current-dictionary "british")
(add-hook 'text-mode-hook 'flyspell-mode)

(require 'guess-language)
(setq guess-language-languages '(en da))
(setq guess-language-min-paragraph-length 35)
(add-hook 'text-mode-hook (lambda () (guess-language-mode 1)))

Now, open an empty file (e.g. ~/Desktop/tmp/Demo.txt) in Emacs:

emacs -Q -l ~/Desktop/tmp/init.el ~/Desktop/tmp/Demo.txt

Initially, Emacs uses the british dictionary.

First, I start typing a Danish sentence. Eventually the number of characters becomes 35, which triggers the guess-language analysis. While the analysis is being performed, the cursors jumps between the "misspelled" words. Based on this analysis guess-language correctly switches the dictionary to danish.

Another way to make the cursor jump is by pasting a large chunk of text that contains several "misspelled" words (according to the current dictionary). For example by deleting the danish sentence, and pasting a large chunk of "Lorem Ipsum": since none of these words are included in the Danish dictionary this causes the cursor to jump a lot.

Below is a GIF that illustrates the scenario explained above:

optimised

I had to reduce the quality of the GIF - otherwise Github would not allow me to upload it. Let me know if the quality of the GIF is a problem.

System info:
GNU Emacs 26.0.50.2 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 2017-02-01
guess-language 20170204.311

Add guess-language-after-autoset-hook

I started using this package, and I'm very happy with it. The language detection mechanisms works very well. Thanks!

One feature that I'm missing though is to perform additional actions after the language detection mechanism has finished executing. Usually when I switch from English to Danish dictionary I also set the input method to danish-postfix, which allows me to use the English (US) keyboard layout, while still being able to write Danish letters that are not available on the English keyboard layout. Setting the input method can be done like this:

M-x set-input-method RET danish-postfix RET

Now when I type "oe", Emacs changes it to "ø", "aa" becomes "å" and so on. The important thing, however, is that the keyboard layout remains the same.

Right now I have to change the input method manually every time guess-language switches dictionary. It would be better if guess-language could help me change the input method too. I tried to look for a hook that would allow me to run additional code when the dictionary changes (some ispell hook), but I couldn't really find a clever way to do this. Since other users of guess-language might want to perform a similar task I thought that adding a guess-language-after-autoset-hook would be a good idea (perhaps you can think of a better name)? The hook is intended to be run immediately before guess-language-autoset exits. What do you think?

Dictionary names in `guess-language-langcodes`

I noticed that the dictionary names in guess-language-langcodes sometimes use English names, sometimes native names, and sometimes just a two-letter language code. Cf.:

(cs     . ("czech"      "Czech"   "🇨🇿"   "Czech"))
(da     . ("dansk"      nil       "🇩🇰"   "Danish"))
(de     . ("de"         "German"  "🇩🇪"   "German"))

I ran into this when I noticed that Emacs couldn't spell-check Dutch. Turns out that aspell, which I use as back-end, doesn't recognise nederlands, it only recognised dutch.

I changed the entry in guess-language-langcodes (it's a user option, after all) but it made me wonder if it might make sense to add multiple entries for some languages? If aspell knows the language as dutch, but ispell or hunspell uses nederlands, wouldn't it be better to have entries for both?

Or is this something that should be handled by Emacs' ispell module?

"advice" dependency not found

After some recent changes (d2a1330) guess-language depends on advice 0.1 package which is not on melpa, and consequently guess-language fails to build.

run-at-time

I was wondering what the rationale is behind the run-at-time at https://github.com/tmalsburg/guess-language.el/blob/master/guess-language.el#L247 ? I'm getting a lot of "Blocking call to accept-process-output with quit inhibited" as a result of this. This isn't so bad, but Emacs would also occasionally hang because two of those run-at-time at the same time seem to cause problems (?)

Just calling the body of that function locally seems to work, but I didn't extensively test it.

validate-setq on guess-language-langcodes

Hi,

just upgraded guess-language from MELPA. I have this snippet in my init file:

(use-package guess-language         ; Automatically detect language for Flyspell
  :ensure t
  :defer t
  :init (add-hook 'text-mode-hook #'guess-language-mode)
  :config
  (validate-setq guess-language-langcodes '((en . ("en_GB" "English"))
                                            (it . ("it_IT" "Italian")))
                 guess-language-languages '(en it)
                 guess-language-min-paragraph-length 45)
  :diminish guess-language-mode)

I get this warning upon restarting Emacs:

Error (use-package): guess-language :config: Looking for `(alist :key-type symbol :value-type ...)' in `((en "en_GB" "English") (it "it_IT" "Italian"))' failed because:
Looking for `(repeat (cons symbol list))' in `((en "en_GB" "English") (it "it_IT" "Italian"))' failed because:
Looking for `(cons symbol list)' in `(en "en_GB" "English")' failed because:
Looking for `list' in `("en_GB" "English")' failed because:
  wrong number of elements

I am using validate-setq from @Malabarba's validate to be sure I am setting the right values everywhere in my init file.

Everything's working fine with regular setq, so this is might not be related to your package.

Missing trigrams after install of `gnu elpa` package

Context

I am using GNU Emacs 30.0.60 and installed guess-language.el version 0.0.1 from gnu elpa package. When I call guess-language from a text buffer, the following error happens:

guess-language-compile-regexps: Opening input file: Aucun fichier ou dossier de ce type, /home/matthias/.config/emacs/elpa/guess-language-0.0.1/trigrams/en

Which is not surprising:

matthias@peitho:~$ tree .config/emacs/elpa/guess-language-0.0.1/
.config/emacs/elpa/guess-language-0.0.1/
├── guess-language-autoloads.el
├── guess-language.el
├── guess-language.elc
└── guess-language-pkg.el

1 directory, 4 files

No problem with version from melpa:

matthias@peitho:~$ ls -l .config/emacs/elpa/guess-language-20240528.1319/ | wc -l
73

Possiblity to ignore certain lines

Some modes post content in the buffer that is read-only and not related to the actual content that the user writes.

For example in have or ERC/Circe the messages sent by other users or for Mastodon the same and the keybinds shown.

Very slow in Org buffers

Hi,

I've been experiencing a terrible slow-down in Org buffers recently, especially when inside tables, but also just moving the cursor around partially collapsed headings. A quick profiling showed that guess-language is the culprit, especially the call to how-many in guess-language-region:

- flyspell-post-command-hook                                     2958  89%
 - flyspell-word                                                 2958  89%
  - flyspell-highlight-duplicate-region                          2940  88%
   - run-hook-with-args-until-success                            2940  88%
    - guess-language-function                                    2940  88%
     - guess-language                                            2940  88%
      - guess-language-region                                    2883  87%
         how-many                                                2883  87%
      + backward-paragraph                                         49   1%
      + forward-paragraph                                           8   0%
  + org-mode-flyspell-verify                                       18   0%
+ command-execute                                                 323   9%
+ yas--post-command-handler                                        14   0%
+ redisplay_internal (C function)                                   9   0%
+ timer-event-handler                                               3   0%
+ ...                                                               0   0%

It seems that the longer the Org file, the bigger the slow down. I'm guessing that this may be caused by the fact that backward-paragraph in an Org buffer may travel very far back: in one particular Org file of mine, it moves almost all the way to the beginning of the buffer.

I tried the obvious thing, i.e., use org-backward-paragraph and org-forward-paragraph in guess-language-paragraph if major-mode is org-mode, but that didn't seem to have much of an effect. Perhaps you know of a better way to deal with the issue?

Updating version for GNU ELPA archive required

Hello!

Verison 0.0.1 of package guess-language in GNU ELPA archive is very old and have critical bug: recipe do not contain files from trigrams/ directory.

When guess-language-mode is activated, Emacs show error message:

Error in post-command-hook (flyspell-post-command-hook): (file-missing "Opening input file" "File or folder does not exists" "/home/dunaevsky/.emacs.d/elpa/guess-language-0.0.1/trigrams/en")

Please, update recipe for GNU ELPA archive.

Comparing this package with similar ones

I was looking for a package that would enable me to automatically switch the dictionary of my spell-checker. In addition to this package, I also found auto-dictionary-mode.

I think it would be helpful to add a small section in the README to explain the most important differences between this package and similar ones. It seems to me like this package and auto-dictionary-mode are trying to achieve the same goal. I'm interested in knowing what I gain by using this package in particular (rather than auto-dictionary-mode). Although auto-dictionary-mode does not seem to be actively maintained, it is quite popular (based on its number of downloads). Also, can you say anything about the performance of these two packages?

Thanks in advance.

trigrams for japanese, chinese, korean?

hi, i'm interested in using this just for the guess-language part only (i.e. not the typo-mode setting or spellchecking) but using all possible languages.

is it possible that there's no japanese (ja), chinese (zh), and korean (ko) in the trigrams data? or am i confused about it somehow?

i did a few tests with chinese and japanese texts and guess-language-region returned zu, i.e. Zulu.

but i must be a little confused, as guess_language.py supports those languages, but it doesn't have ja, zh, or ko in its trigrams files.

perhaps the python package simply selects those languages (and greek) by their script, using the Blocks.txt file? would it be possible to support that also in guess-language.el?

i guess if that's the issue i'm encountering it would require a bit of work to support those languages in this package...

org-mode - "Wrong type argument: number-or-marker-p, nil"

Hi there,

I've been experiencing some problems using guess-language and org-mode.

M-x guess-language in an org-file anywhere on the second line of the example below causes the error:

Wrong type argument: number-or-marker-p, nil.

  - First item. I'm just writing something longer so that the minimum
    number of characters is reached.

This problem also shows up (not always) when fly-spelling the whole buffer (M-x flyspell-buffer) stopping the verification before it reaches the end of the file.

My guess this error can be somewhat related to the (last part) of issue #17 (comment) and also commit 65dccb1 which deals with paragraph navigation in org-mode files.

Using:
Emacs 26.1
Org mode 9.1.14
guess-language head from git repository.

Debugger output:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  org-list-struct()
  guess-language-forward-paragraph()
  guess-language()
  funcall-interactively(guess-language)
  call-interactively(guess-language record nil)
  command-execute(guess-language record)
  helm-M-x(nil "guess-language")
  funcall-interactively(helm-M-x nil "guess-language")
  call-interactively(helm-M-x nil nil)
  command-execute(helm-M-x)

Please let me know if you need further info to pinpoint the error.

Trigrams searched in wrong directory?

Hi, this looks like it's exactly the thing I've been looking for, but I'm having trouble setting it up. I installed the package and added the following to my init file:

(use-package guess-language
  :ensure t
  :init
  (add-hook 'text-mode-hook #'guess-language-mode)
  :diminish guess-language-mode)

But when I open a text file, flyspell doesn't work (even though I get a message about it starting up). The *Messages* buffer contains a message such as the following:

Error in post-command-hook (flyspell-post-command-hook): (file-error "Opening input file" "No such file or directory" "/home/joost/src/criticmarkup-emacs/trigrams/en")

The directory in the error message is the one containing the file I just opened.

Am I doing something wrong or is this a bug somewhere?

Adding a language

I wandered if there is possibility that you add short explanation about adding other languages. It just say now that it's "easy to add", but I have no idea how to do that.
I'm specifically interested in Serbian, but I believe this explanation could be made general and of value to users of many other languages.
Thanks for the great package!

Redundant aspell processes after magit-commit

Emacs -Q:

(progn
  (setq package-archives '(("gnu" . "https://elpa.gnu.org/packages/")
                           ("nongnu" . "https://elpa.nongnu.org/nongnu/")))
  (package-install 'magit)
  (package-install 'guess-language)
  (package-activate-all)

  (setq guess-language-languages '(en sl))
  (setq-default ispell-dictionary "slovenian")
  (add-hook 'text-mode-hook 'flyspell-mode)
  (add-hook 'flyspell-mode-hook 'guess-language-mode))

Each M-x magit-commit-create will leave an extra aspell process.
(A bit more info can be found on
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=48379 )

Display emoji symbols in mini buffer to indicate language

Country flags are not appropriate for some languages in which case we might want to show some other fancy unicode character. Motivation: The current language is difficult to spot in busy mode lines.

As far as I understand, Emacs currently lacks the necessary support for combined glyphs like 🇩🇪 (1F1E9 and 1F1EA unified), but in the future this will likely be possible.

Edit: Combined glyphs and native display of color emoji are supported as of version 28.1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.