ropensci / spelling Goto Github PK
View Code? Open in Web Editor NEWTools for Spell Checking in R
Home Page: https://docs.ropensci.org/spelling
License: Other
Tools for Spell Checking in R
Home Page: https://docs.ropensci.org/spelling
License: Other
Hi,
Running spelling::spell_check_test()
fails on the crosstable package with the following error:
spelling::spell_check_package()
#>Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
#> Input is not proper UTF-8, indicate encoding !
#>Bytes: 0x93 0x63 0x79 0x94 [9]
I have no clue where this error can come from and the error message is unfortunately not very informative.
Would it be possible to terminate early from spelling
instead of xml2
so that the path is in the error message?
Of course, if we can also have the line and the specific bad character, it would be even better!
Note that in this case, UTF8 is the default encoding in the package's DESCRIPTION
and in RStudio parameters. R CMD CHECK
completes without error so I guess any encoding problem is not that severe, don't you think?
spell_check()
(I used devtools::spell_check()
)After more debugging, it seems to pertain to this line:
Line 24 in 008417f
In my case, it pointed to my README.md
file which indeed contained special characters. I have no idea how they ended up there though, and they are far too numerous that I can correct it manually (a knitting problem from README.Rmd
I guess).
Since this confusing problem is not that rare (#52, #58, #62), a fix might be found useful.
Here are some proposals:
tryCatch()
on xml2::xml_ns_strip()
so that we can add path
in the error message text <- readLines(path, warn = FALSE, encoding = "UTF-8")
invalid = !validUTF8(text)
if(any(invalid)){
warning(message = c("The file ", path, " has non-UTF-8 characters on rows: ", paste(which(invalid), collapse=", ")))
}
xfun::read_utf8()
to ignore the problem (spell_check_package()
will have no error): opts = options(encoding = "native.enc")
on.exit(options(opts), add = TRUE)
text <- readLines(path, warn = FALSE, encoding = "UTF-8")
We can do the 3 at the same time. I can make a PR if needed.
Changes needed
Use of Markdown syntax
Putting the package name in section names (each section = "spelling versionnumber").
I can make a PR if you're ok with that @jeroen.
Advantage = getting a changelog in the docs website like e.g. https://docs.ropensci.org/codemetar/news/index.html
Hi,
I have a list of files in different directories. When I call spell_check_files(files)
then the result returns in found
only the basename of the file and the line number. Would it be possible to return the pathes, too?
SIgbert
I have a package in which the documentation in written in English (functions) and a second language (vignettes). These languages are set in the Language
field of the DESCRIPTION
file. This is done following the 'Writing R Extensions' manual:
A
βLanguageβ
field can be used to indicate if the package documentation is not in English: this should be a comma-separated list of standard (not private use or grandfathered) IETF language tags as currently defined by RFC 5646 (https://tools.ietf.org/html/rfc5646, see also https://en.wikipedia.org/wiki/IETF_language_tag), i.e., use language subtags which in essence are 2-letter ISO 639-1 (https://en.wikipedia.org/wiki/ISO_639-1) or 3-letter ISO 639-3 (https://en.wikipedia.org/wiki/ISO_639-3) language codes.
When running devtools::spell_check()
, the following error message is issued:
> devtools::spell_check()
Error: Dictionary file not found: pt_BR, EN_US.dic
I haven't found a way around this.
Similar to spell_check_package()
i.e. would use the WORDLIST.
Language as argument, NULL by default. If not given, try to find for language info in a DESCRIPTION file, otherwise assume US English.
Checks
Or maybe it could be simpified somehow into checking all Rmd, and all md without Rmd source π€
Hello!
When I trigger a CI check with GitHub Action, the R-devel configuration of the standard check (as defined here) fails because it can't find spelling...
* checking package dependencies ... ERROR
##[error]Package suggested but not available: βspellingβ
See for example here but it happens with other packages.
This behavior is not observed with Travis.
It may be liked with #50
Thanks :)
I have this in an Rmd:
See ['Configuration'][pkg_config] for details.
which will have fancy quotes in the README.md
, courtesy of pandoc I guess:
See [βConfigurationβ](TODO) for details.
and then spellcheck reports:
Configurationβ README.md:177
I wonder if it would be easy to ignore the fancy quotes? TBH I am not sure why they are considered to be part of the word.
A reasonable workflow is to open files one after another and fix misspellings. Thus it would be useful to have a parameter by_file
(logical, default FALSE
) to sort the output by files, not alphabetically by misspelled word.
(I can submit a patch if that seems useful.)
Hi,
is it possible to include checking .rd
files in spell_check_files
? Now, rd
files are checked as plain text, so code objects are also checked for spelling.
I wonder if spell_check_file_one
could have additional condition:
if (grepl(`\.rd$`, path, ignore.case = TRUE) {
spell_check_file_rd(...)
}
Per help("aspell-utils", package = "utils")
, it's possible to add custom dictionaries controlled via .aspell/defaults.R
, e.g.
Rd_files <- vignettes <- R_files <- description <-
list(encoding = "UTF-8",
language = "en",
dictionaries = c("en_stats", "WORDLIST"))
where WORDLIST
refers to .aspell/WORDLIST.rds
, which comprise an acceptance word list, e.g.
saveRDS(accepted_words, file = ".aspell/WORDLIST.rds", version = 2L)
This can be used to avoid R CMD check --as-cran
NOTEs (which often are reported by the win-builder or CRAN Incoming services), e.g. Possibly mis-spelled words in DESCRIPTION: ...
It would be great if spelling could provide "standards" and functions for setting this up.
Also, maybe spelling could fall back to .aspell/WORDLIST.rds
, if inst/WORDLIST
is not found.
> spelling::spell_check_package(pkg = ".", vignettes = TRUE, lang = "en_US")
Error in hunspell::dictionary(lang, add_words = sort(ignore)) :
unused argument (add_words = sort(ignore))
spell_check_setup()
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
PCDATA invalid Char value 8 [9]
this occurred within the R package: https://github.com/mobiodiv/mobsim
Hello! I use the lifecycle package to document functions, and spelling gives a verbose warning that the lifecycle macro is not defined. This had been documented in #42 and fixed last October in pull request #44 . Would it be possible to make a release so we don't see the warnings when using the CRAN version of spelling? The issue comes up for me when work with staff inexperienced in package development and the warnings throw them off and they can't figure out what is going on.
Many Thanks!
I put MistakeA
in fileA.txt
and MistakeB
in fileB.txt
. When
path <- c("fileB.txt", "fileA.txt")
then spell_check_files(path)
says MistakeA
is found in fileB.txt
and vice versa
library(spelling)
fileA.txt
and fileB.txt
fileA <- '
store
car
MistakeA
road
'
fileB <- '
store
MistakeB
desk
road
'
writeLines(fileA, con = "fileA.txt")
writeLines(fileB, con = "fileB.txt")
files <- c("fileB.txt", "fileA.txt")
FOUND IN
is correctl1 <- lapply(files, spell_check_files)
do.call("rbind", l1)
#> WORD FOUND IN
#> MistakeB fileB.txt:3
#> MistakeA fileA.txt:4
FOUND IN
is mixed upspell_check_files(files)
#> WORD FOUND IN
#> MistakeA fileB.txt:4
#> MistakeB fileA.txt:3
FOUND IN
is correctspell_check_files(sort(files))
#> WORD FOUND IN
#> MistakeA fileA.txt:4
#> MistakeB fileB.txt:3
Created on 2019-02-02 by the reprex package (v0.2.1)
If I try to spell check this file, I get the following error:
> spelling:::spell_check_file_md("README.md")
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
PCDATA invalid Char value 27 [9]
I tried to debug this further, but couldn't manage to find the offending text. If it's of any help, this is traceback I see:
> traceback()
7: read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html,
options = options)
6: read_xml.character(md)
5: xml2::read_xml(md)
4: xml_find_all(x, "//namespace::*[name()='']/parent::*")
3: xml2::xml_ns_strip(xml2::read_xml(md))
2: parse_text_md(path)
1: spelling:::spell_check_file_md("README.md")
Also, here is my session information:
sessioninfo::session_info()
#> - Session info --------------------------------------------------------------
#> hash: women holding hands: dark skin tone, open hands: light skin tone, thermometer
#>
#> setting value
#> version R version 4.1.1 (2021-08-10)
#> os Windows 10 x64 (build 19043)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United Kingdom.1252
#> ctype English_United Kingdom.1252
#> tz Europe/Berlin
#> date 2021-10-09
#> pandoc 2.14.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
#> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.1)
#> digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.1)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.1)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.1)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.1)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.1)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.1)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.36.3 2021-10-09 [1] Github (yihui/knitr@00469e0)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1)
#> pillar 1.6.3 2021-09-26 [1] CRAN (R 4.1.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.1)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.1)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.1)
#> rmarkdown 2.11.3 2021-10-09 [1] Github (rstudio/rmarkdown@5a3e941)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.1)
#> sessioninfo 1.1.1.9000 2021-10-09 [1] Github (r-lib/sessioninfo@1ff2194)
#> spelling * 2.2 2020-10-18 [1] CRAN (R 4.1.1)
#> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.1)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1)
#> styler 1.6.2.9000 2021-10-08 [1] Github (r-lib/styler@7c46e20)
#> tibble 3.1.5 2021-09-30 [1] CRAN (R 4.1.1)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.1)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.1)
#> xfun 0.26 2021-09-14 [1] CRAN (R 4.1.1)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] C:/Users/IndrajeetPatil/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.1/library
Supposed to have common R jargon:
Dictionaries: https://s3.amazonaws.com/rstudio-buildtools/dictionaries/core-dictionaries.zip
From the script here: https://github.com/rstudio/rstudio/blob/master/dependencies/common/install-dictionaries
FYI I have removed the unexported function file_ext
from knitr: yihui/knitr@cd6bed6 This function has been moved to the xfun package.
I am using the following words in my package:
After inserting these words in inst/WORDLIST
and running spelling::spell_check_package()
, the function reports that the words seq
, st
, nd
and EIF
are misspelled.
Currently, my WORDLIST
includes the words seq
, st
, nd
and EIF
to avoid triggering the spell checker, but I would prefer to include the full words. Thanks.
There is a growing interest in multilingual R packages, and this relates to recent rOpenSci work.
Would it be possible to add the ability to spellcheck a package with multiple languages? To address, e.g., the case when documentation or a README is provided in two languages (real-life example). I understand that this might then miss some typos because they are a valid word in the other languages.
From a quick glance, it looks like this might require a change to the hunspell (this change? ropensci/hunspell#37) so feel free to move there if you believe that's more appropriate.
Running spelling::spell_check_test()
fails on the report package with the following error:
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
PCDATA invalid Char value 27 [9]
I seems like the error comes from the README.md, which indeed contains some special characters like percentages etc.
May I know how is it possible either to skip the spelling of that file, or to skip the problematic characters? Thank you
I have a package where spelling has been running spell checks for a while with no problem.
I'm working on an update and I'm getting some strange spelling behaviour and I can't seem to find the origin of it.
It fails during builds:
> if(requireNamespace('spelling', quietly = TRUE))
+ spelling::spell_check_test(vignettes = TRUE,
+ error = FALSE,
+ skip_on_cran = TRUE)
Error in sub(dest, "", xml2::xml_text(node), fixed = TRUE) :
zero-length pattern
Calls: <Anonymous> ... lapply -> FUN -> <Anonymous> -> xml_text<-.xml_node -> sub
Execution halted
But runs just fine interactively, returning NULL
.
I thought maybe there was an issue in my wordlist, so I deleted it, but same issue persists.
Lastly, I thought i'd re-initiate it in case there was some old-timey setup issue. But even setting it up causes the error.
Tried the same with the full path but got the same error.
> spelling::spell_check_setup(".")
Error in sub(dest, "", xml2::xml_text(node), fixed = TRUE) :
zero-length pattern
I belive this code is where its erroring:
Line 31 in 008417f
Though, unsure how/why or why it suddenly started complaining when it did not before.
Suggestions welcome
Similar to the white listed WORDLIST file, but exclude an entire file, similar to a .gitignore
or .lintr
. The use case I have is a single foo.Rd
file has 100s of failing words (gene sequence definitions) that are valid in that context, but I do not want to bloat the WORDLIST with these idiosyncratic, highly specific set of words that are unique to foo.Rd
.
Function use cases:
spelling::spell_check_package()
spelling::spell_check_files()
Best case scenario is this is already possible and I've simply missed it. Thx!
I have some Rmarkdown templates in inst/templates
in my package.
Is there was a way to get spelling::spell_check_package()
to also check these files?
In rmarkdown documents, ignore the rmarkdown::yaml_front_matter
except for title/subtitle fields.
Currently, the word order in WORDLIST
is locale-dependent, which can create large spurious diffs when multiple people contribute to the package but use different locales.
I see two solutions:
method = "radix"
in sort()
. It is to my knowledge the only locale independent sorting methodorig_locale <- Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE", orig_locale))
Sys.setlocale("LC_COLLATE", "C")
The nice thing about the second option is that you can set the locale to the one specified in DESCRIPTION
.
Please let me know if you'd like me to submit a PR for this.
For example:
cat(commonmark::markdown_xml('A link: https://crandb.r-pkg.org is good', extensions = TRUE, sourcepos = TRUE))
now gives:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document sourcepos="1:1-1:40" xmlns="http://commonmark.org/xml/1.0">
<paragraph sourcepos="1:1-1:40">
<text sourcepos="1:1-1:13" xml:space="preserve">A link: </text>
<link sourcepos="1:8-1:32" destination="https://crandb.r-pkg.org" title="">
<text sourcepos="1:8-1:32" xml:space="preserve">https://crandb.r-pkg.org</text>
</link>
<text sourcepos="1:33-1:40" xml:space="preserve"> is good</text>
</paragraph>
</document>
Whereas in previous versions of commonmark it would give:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document sourcepos="1:1-1:40" xmlns="http://commonmark.org/xml/1.0">
<paragraph sourcepos="1:1-1:40">
<text sourcepos="1:1-1:13" xml:space="preserve">A link: </text>
<link destination="https://crandb.r-pkg.org" title="">
<text xml:space="preserve">https://crandb.r-pkg.org</text>
</link>
<text sourcepos="1:33-1:40" xml:space="preserve"> is good</text>
</paragraph>
</document>
Note how <link>
has gained a sourcepos
attribute. I think this is what causes them to be spellchecked.
See current pkgdepends for an example:
β― LANG=C R -q -e 'spelling::spell_check_package()'
> spelling::spell_check_package()
DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
WORD FOUND IN
NANA pkgdepends-package.Rd:218
Is there any way to spell check just the Roxygen documentation tags '#
rather than the generated man files? This would make fixing the spelling errors much more convenient. Thanks!
spell_check_setup()
should list and explain the changes it makes to DESCRIPTION
in its documentation.
The lifecycle
package does define the \lifecycle
macro, but spelling
warns that it is not defined.
git2r::clone(
url = "https://github.com/r-lib/lifecycle",
local_path = "lifecycle"
)
#> cloning into 'lifecycle'...
#> Receiving objects: 1% (9/880), 13 kb
#> Receiving objects: 11% (97/880), 30 kb
#> Receiving objects: 21% (185/880), 94 kb
#> Receiving objects: 31% (273/880), 111 kb
#> Receiving objects: 41% (361/880), 134 kb
#> Receiving objects: 51% (449/880), 150 kb
#> Receiving objects: 61% (537/880), 166 kb
#> Receiving objects: 71% (625/880), 166 kb
#> Receiving objects: 81% (713/880), 183 kb
#> Receiving objects: 91% (801/880), 191 kb
#> Receiving objects: 100% (880/880), 239 kb, done.
#> Local: master /tmp/RtmpRWoEuz/reprex85671d073af/lifecycle
#> Remote: master @ origin (https://github.com/r-lib/lifecycle)
#> Head: [445f7f6] 2019-08-09: Add `is_present()` (#15)
spelling::spell_check_package("lifecycle")
#> DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:47: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:48: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:49: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:50: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:51: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:52: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:53: unknown macro
#> '\lifecycle'
#> Warning in parse_Rd(ifile, encoding = encoding, macros = macros): /tmp/
#> RtmpRWoEuz/reprex85671d073af/lifecycle/man/badge.Rd:54: unknown macro
#> '\lifecycle'
#> WORD FOUND IN
#> backtrace last_warnings.Rd:14,22,23
#> NEWS.md:13
#> backtraces NEWS.md:13
#> lifecycle.Rmd:155,198
#> behaviour deprecate_soft.Rd:68
#> lifecycle.Rmd:38,40,42
#> Codecov README.md:6
#> conjuction lifecycle.Rmd:198
#> invokation lifecycle.Rmd:85
#> programmatically deprecate_soft.Rd:38
#> questining README.md:27
#> lifecycle.Rmd:23
#> rlang's NEWS.md:30
#> signalled lifecycle-package.Rd:16
#> description:8
#> signaller NEWS.md:18,24
#> summarised lifecycle.Rmd:32
#> testthat deprecate_soft.Rd:43
#> verbosity.Rd:11
#> lifecycle.Rmd:83
#> ther lifecycle.Rmd:83
Created on 2019-08-16 by the reprex package (v0.3.0)
devtools::session_info()
#> β Session info ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os Ubuntu 18.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2019-08-16
#>
#> β Packages ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
#> callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
#> commonmark 1.7 2018-12-01 [1] CRAN (R 3.6.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
#> digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> git2r 0.26.1 2019-06-29 [1] CRAN (R 3.6.0)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
#> hunspell 3.0 2018-12-15 [1] CRAN (R 3.6.0)
#> knitr 1.24 2019-08-08 [1] CRAN (R 3.6.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> pkgbuild 1.0.4 2019-08-05 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
#> R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
#> rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
#> rmarkdown 1.14 2019-07-12 [1] CRAN (R 3.6.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> spelling 2.1 2019-03-11 [1] CRAN (R 3.6.0)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
#> usethis 1.5.1.9000 2019-08-11 [1] Github (r-lib/usethis@b241420)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
#> xfun 0.8 2019-06-25 [1] CRAN (R 3.6.0)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] /home/landau/R/R-3.6.0/library
This might be related to #6, I've been unable to successfully troubleshoot the issue on my end. The following spelling.R file is setup to run spell check my package:
if(requireNamespace('spelling', quietly = TRUE))
spelling::spell_check_test(error = FALSE,
skip_on_cran = TRUE)
When I run devtools::check(args = c('--as-cran'))
:
Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
PCDATA invalid Char value 27 [9]
Calls: <Anonymous> ... xml_find_all -> <Anonymous> -> read_xml.character -> read_xml.raw
Execution halted
This isn't too informative. I don't know if spelling checks the html vignettes, but that was my initial thought. So I tried:
if(requireNamespace('spelling', quietly = TRUE))
spelling::spell_check_test(vignettes = FALSE, error = FALSE,
skip_on_cran = TRUE)
And the check is successful.
My final check was to run spelling::spell_check_files()
on the Rmd, html, and .r vignette files. These printed spelling errors but did not fail like above.
My big question is how do I troubleshoot this message? For reference the package is tbrf
and the test failures are shown: https://github.com/mps9506/tbrf/runs/686572191?check_suite_focus=true#step:10:152
A user has reported that parse_text_md()
is documented but not exported.
similar to #29
Note to self https://twitter.com/mattdray/status/1099263514424868864
It would be great if the output could be parametized so that as well as alphabetical there was a 'by line number' option
In my case, at least, this makes more sense working through a document
In the quanteda package we have some non-English language vignettes but only in the vignettes/pkgdown
folder. Previously (last week??), spell_check_package()
did not check those files, but now it does, so we get a very long list of Chinese, Japanese, Spanish etc words that fail to match the English dictionary or WORDLIST.
Does not fail on Travis by the way, only locally.
Is there a way to exclude this subfolder from checking?
URLs specified in the Description
field of the DESCRIPTION file should not be spell-checked.
URLs in Description
are common / encouraged to provide references, and enclosed in angle brackets, which should make them easy to detect and exclude from spell-checks.
urls in markdown (md) documents are spell-checked in spell_check_files()
and spell_check_package()
as long as they have angle brackets around them. This is inconsistent with the behavior of links without angle brackets, which are not spellchecked.
Here is a repex where I create a markdown file from an rmarkdown file using rmarkdown::github_document()
.
writeLines(con = "test.Rmd", text = "
---
output: github_document
---
https://github.com/ropensci/spelling/issues/21
")
rmarkdown::render("test.Rmd", quiet = TRUE)
cat(readLines("test.md"), sep = "\n")
#>
#> <https://github.com/ropensci/spelling/issues/21>
spelling::spell_check_files(c("test.md", "test.Rmd"))
#> WORD FOUND IN
#> github test.md:2
#> https test.md:2
#> ropensci test.md:2
As we can see, only the urls in the .md
file are spellchecked. If the angle brackets get removed, all spell checks pass.
library(magrittr)
readLines("test.md") %>% sub("<", "", .) %>%
sub(">", "", .) %>% writeLines("test.md")
spelling::spell_check_files(c("test.md", "test.Rmd"))
#> No spelling errors found.
It would be very handy if this could be fixed in the spelling package, since I am using rmarkdown::github_document()
for most of my R packages and I don't see an elegant way to run spell_check_package()
without getting spellcheck-warnings because of this behavior.
> spelling::spell_check_text("this package doesn't like contractions like isn't, aren't")
word found
1 aren 1
2 doesn 1
3 isn 1
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] corpus_0.9.1.9000
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 roxygen2_6.0.1 lattice_0.20-35 digest_0.6.12
[5] crayon_1.3.2 withr_2.0.0 commonmark_1.2 grid_3.4.1
[9] R6_2.2.2 magrittr_1.5 stringi_1.1.5 testthat_1.0.2
[13] xml2_1.1.1 Matrix_1.2-10 devtools_1.13.1 tools_3.4.1
[17] stringr_1.2.0 hunspell_2.5 compiler_3.4.1 spelling_1.0
[21] memoise_1.1.0
Hi! :)
I am using a GitHub action from r-lib to run covr on my code. Since I added a spell_check_test
(using usethis::use_spell_check
), I get a warning in the covr GitHub action log:
files differ in number of lines:
6,8c6
< Warning message:
< In spelling::spell_check_test(error = TRUE) :
< Failed to find package source directory
I get no warnings when I run covr locally. It also runs without a warning in the R CMD check GitHub action.
Example commit: https://github.com/and3k/dtutils/runs/564517180 (see "Test coverage" step)
Thanks!
Bela
If I have words with accented characters, e.g. moirΓ©
(as in "moirΓ© vibrations"), then running
spelling::spell_check_files(path = list.files(pattern = ".Rmd"),
lang = 'en-US'
marks this as a spelling error: moirΓ
.
Any possibility of avoiding this?
Occasionally we have LaTeX in md/Rmd files - I know we shouldn't but sometimes it just happens ;)
I think adding a format = "latex"
to
Line 104 in fc619ee
should handle this case and not have any unwanted sided effects
When using references in RMarkdown, it looks like they count as spelling errors (also, keys and values in the YAML header show up as errors, but these are pretty easy to filter out and might even be necessary if the title field needs checking). I'd be happy to make a PR on this, but not sure what the approach should be. Maybe a user-defined filter function that can exclude certain regexes from being checked?
biblio_file <- tempfile(fileext = ".bib")
rmd_file <- tempfile(fileext = ".Rmd")
writeLines(
paste(
"@article{dunnington16,",
" title = {A geochemical perspective on the impact of development at {Alta} {Lake}, {British} {Columbia}, {Canada}},",
" volume = {56},",
" doi = {10.1007/s10933-016-9919-x},",
" number = {4},",
" journal = {Journal of Paleolimnology},",
" author = {Dunnington, Dewey W. and Spooner, Ian S. and White, Chris E. and Cornett, R. Jack and Williamson, Dave and Nelson, Mike},",
" month = nov,",
" year = {2016},",
" pages = {315-330},",
"}",
sep = "\n"
),
biblio_file
)
writeLines(
paste(
"---",
"output: word_document",
sprintf("bibliography: %s", biblio_file),
"---",
"",
"Everything @dunnington16 says is obviously correct",
"",
"Lakes are fantastic [@dunnington16]",
"",
"This Dunnington fellow really has things figured out [-@dunnington16]",
sep = "\n"
),
rmd_file
)
cat(paste(readLines(rmd_file), collapse = "\n"))
#> ---
#> output: word_document
#> bibliography: /var/folders/bq/2rcjstv90nx1_wrt8d3gqw6m0000gn/T//Rtmp95OnIv/file3d521f8518fb.bib
#> ---
#>
#> Everything @dunnington16 says is obviously correct
#>
#> Lakes are fantastic [@dunnington16]
#>
#> This Dunnington fellow really has things figured out [-@dunnington16]
spelling::spell_check_files(rmd_file)
#> WORD FOUND IN
#> bq file3d525b2b16e7.Rmd:2
#> dunnington file3d525b2b16e7.Rmd:6,8,10
#> Dunnington file3d525b2b16e7.Rmd:10
#> fb file3d525b2b16e7.Rmd:2
#> gn file3d525b2b16e7.Rmd:2
#> gqw file3d525b2b16e7.Rmd:2
#> nx file3d525b2b16e7.Rmd:2
#> OnIv file3d525b2b16e7.Rmd:2
#> rcjstv file3d525b2b16e7.Rmd:2
#> Rtmp file3d525b2b16e7.Rmd:2
It would be great if there was a way to add a vector of additional (package root relative) file paths as a parameter to spell_check_package() and to spell_check_test(). It seems like this one general change would avoid a bunch of specific requests. It would also support rare use cases, including some specific issues I have :)
Line 33 in 7f5e3f6
will return the author@R
term if author
does not exist due to partial matching of $
. This should be changed to pkg[["author"]]
and additional code added to parse the author@R
separately.
I suggest that one include the README in spell_check_package(). Another option would be to find all Rmd or md files in the package, including ones outside vignettes/
, such as ones that may be in inst
.
We created a test running:
spell_check_package(pkg, vignettes = TRUE, use_wordlist = TRUE)
In case this runs in R CMD check
it fails on words which are obviously in the WORDLIST, This is the case for our NEWS.md
and DESCRIPTION
file. Not for .Rd
files.
We noticed upon creating inst/inst/WORDLIST
this error disappears. Can you make the location of the wordlist flexible in the call of spell_check_package
? We can insert an if clause to check the mode (test / R CMD check
) on our own.
Thanks
I have the following which triggers the spelling check errors:
#' @references
#' \itemize{
#' \item Yan, Xin, and Xiao Gang Su. 2010. βStratified Wilson and Newcombe Confidence Intervals for Multiple Binomial Proportions.β Statistics in Biopharmaceutical Research 2 (3): 329β35.
#' }
my solution at the moment is just to put backticks:
#' @references
#' \itemize{
#' \item `Yan, Xin, and Xiao Gang Su. 2010. βStratified Wilson and Newcombe Confidence Intervals for Multiple Binomial Proportions.β Statistics in Biopharmaceutical Research 2 (3): 329β35.`
#' }
The only function to update wordlist is update_wordlist
and it only works on a package.
My use-case is a blog-post and there I only have one file. Would it be possible to add a similar function just for files?
spell_check_package()
checks NEWS.md
but not plain-text NEWS
files. Please consider adding that. It's probably as simple as adding NEWS
to the list of files recognized.
How can I download a new dictionary to use another language?
I'm happy to help writing the documentation for this (or even a helper function?).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.