Giter Site home page Giter Site logo

Comments (2)

jmbarbone avatar jmbarbone commented on May 24, 2024

I have the same issue, picked up with ordinal indicators. It looks like this is a problem with the hunspell parser:

hunspell::hunspell_parse(c("1st", "RNA-seq", "EIF4G1"))
#> [[1]]
#> [1] "st"
#> 
#> [[2]]
#> [1] "RNA" "seq"
#> 
#> [[3]]
#> [1] "EIF" "G"

Created on 2021-02-06 by the reprex package (v0.3.0)

from spelling.

jmbarbone avatar jmbarbone commented on May 24, 2024

Implementing a pre filter right before the parse here could work:

spelling/R/check-files.R

Lines 118 to 123 in a2b5f29

spell_check_file_plain <- function(path, format, dict){
lines <- readLines(path, warn = FALSE, encoding = 'UTF-8')
words <- hunspell::hunspell_parse(lines, format = format, dict = dict)
text <- vapply(words, paste, character(1), collapse = " ")
spell_check_plain(text, dict = dict)
}

It feels like more of a quick-fix because it parses with strsplit() then paste()s back together before being sent to the actual parsing function.

ignore_words <- c("1st", "RNA-seq", "EIF4G1")

lines <- c(
  "This is the 1st line.  It has first written in it.",
  "The second has RNA-seq inside. But does not use RNAseq -- without the '-'",
  "EIF4G1 but not EIF4G1fdsadf is used",
  "This line's words are fine!"
)

pre_filter_plain <- function(lines, ignore = character()) {
  word_list <- strsplit(lines, "([^-[:alnum:][:punct:]])")
  
  vapply(
    word_list,
    function(i) {
      paste(i[!i %in% ignore], collapse = " ")
    },
    character(1)
  )
}

pre_filter_plain(lines, ignore_words)
#> [1] "This is the line.  It has first written in it."                   
#> [2] "The second has inside. But does not use RNAseq -- without the '-'"
#> [3] "but not EIF4G1fdsadf is used"                                     
#> [4] "This line's words are fine!"

Created on 2021-02-06 by the reprex package (v0.3.0)

from spelling.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.