s-u / fastmatch Goto Github PK
View Code? Open in Web Editor NEWFast hashing functions and replacement of match()
Fast hashing functions and replacement of match()
As the title says: fmatch('somevalue', c())
gives me:
Error in fmatch("somevalue", c()) :
uable to allocate 67108864.00Mb for a hash table
(I've also seen it with 33554432.00Mb, but anyway it's trying to allocate a nearly infinte amount of memory)
It does work fine with regular length-0 vectors (e.g. giving character(0) instead of c()).
Hello.
The reason I use base's %in% or match commands is because it makes easy to match values no matter they are numbers, characters or even NAs.
match(c(1,0,NA),NA) NA, NA, 1
c(1,0,NA) %in% NA gives FALSE, FALSE, TRUE
I specially like the latter one.
But
fmatch(c(1,0,NA),NA) gives NA, NA, NA (useless)
If fastmatch is supposed to improve base::match I think it should mimic it's behaviour with NA.
A question on StackOverlow asked about a faster version of %in%
analogous to fastmatch::fmatch
for base::match
. Since %in%
is basically a simple wrapper around match
, it would be an easy addition... unless I'm overlooking something.
For example:
`%fin%` <- function(x, table) {
fmatch(x, table, nomatch = 0L) > 0L
}
I used fmatch.hash
to attach a hash-table to my objects, but this gets problematic when my table is of length 0 (but not NULL, see issue #8): the returned value is NA_integer_
(regardless of the value of x
and the class of table
).
This is a problem, both because my length is now different, but also because it now matches NA
's.
I think the following shows the problems:
subtable <- data.frame(value=fmatch.hash("myval", bigtable[somecondition],
bigtablerows=which(somecondition))
Works fine if some value of somecondition is TRUE
, but not when it's all FALSE
And:
> match(NA, vector[c(F,F,F,F)])
[1] NA
> fmatch(NA, vector[c(F,F,F,F)])
[1] NA
> temp <- fmatch.hash('someval', vector(c(F,F,F,F]))
> fmatch(NA, temp)
[1] 1
I'd expect the final result to always be the same as when calling (f)match directly
Details about my setup:
Reproduced with both fastmatch 1.1-0 and 1.1-1
R 3.5.3 and 3.6.1
Under Rstudio 1.1.463 and 1.2.1335, and command prompt with R --vanilla
Windows 10
Thank you very much for fastmatch
. I'm using it in a package but I've been notified by CRAN of an rcnst
issue that seems to be associated with fastmatch
. I posted in the R-package-dev
mailing list seeking advice but haven't received a reply there. The guidelines in the CRAN documentation for rcnst
issues suggest getting in contact with the maintainer of an imported package if that package might be the cause; hence this issue. Apologies if I'm mistaken. Any help you could provide would be appreciated.
The package is TeXChecKR
and the issue page for the rcnst
issue is https://github.com/kalibera/cran-checks/tree/master/rcnst/results/TeXCheckR . The modified constant is c(".", "?")
. TeXCheckR
does not require compilation; it contains no C/C++ code.
The offending test has the following log:
Space inserted before \footnote
✖ 11: \footnote{\gls{HELP} lending, tuition funding, and most other higher education programs are special appropriations from consolidated government revenue.
ERROR: modification of compiler constant of type character, length 2
ERROR: the modified value of the constant is:
[1] "." "?"
attr(,".match.hash")
<hash table>
ERROR: the original value of the constant is:
[1] "." "?"
ERROR: the modified constant is at index 1
ERROR: the modified constant is in this function body:
c(".", "?")
Fatal error: compiler constants were modified!
I'm using the following line if (split_line_after_footnote[footnote_closes_at - 1] %notin% c(".", "?")){
where `%notin% is imported from the hutils package and is defined as:
`%notin%` <- function(x, y){
if (is.null(y)) {
rep_len(TRUE, length(x))
} else {
is.na(fmatch(x, y))
}
}
I note in the documentation for fmatch
, you remark
fmatch
modifies thetable
by attaching an attribute to it. It is expected that the values will not change unless that attribute is dropped. Under normal circumstances this should not have any effect from user's point of view, but there is a theoretical chance of the cache being out of sync with the table in case the table is modified directly (e.g. by some C code) without removing attributes.
TeXCheckR
alone does not modify table
directly (or at least it has no C code with that intent), so I'm not sure if this part of the documentation is applicable.
Thank you again.
Hi
I'm getting bitten by "bugfix: fix crash when a newly unserialized hash table is used (since the table hash is not stored during serialization)."
Would it be possible to release a version of fastmatch that doesn't suffer from this problem please?
Hi Simon
In hand compiling R-4.2.1 just out 6 days ago, there could be an issue with R_xlen_t, making the package fails, ie.
In file included from fasthash.c:19:
common.h:13:17: error: conflicting types for ‘R_xlen_t’; have ‘R_len_t’ {aka ‘int’}
13 | typedef R_len_t R_xlen_t;
| ^~~~~~~~
In file included from common.h:8,
from fasthash.c:19:
/opt/R-4.2.1/lib/R/include/Rinternals.h:72:23: note: previous declaration of ‘R_xlen_t’ with type ‘R_xlen_t’ {aka ‘long int’}
72 | typedef ptrdiff_t R_xlen_t;
xs <- ceiling(log(c(114.0916, 114.9999)/115)/log(1+1E-6))
match(xs, xs); fastmatch::fmatch(xs, xs)
ys <- as.integer(xs)
match(ys, ys); fastmatch::fmatch(ys, ys)
when FUN = list()
, ctapply()
works not as expected.
tapply(X = 1:10, INDEX = c(rep(1,5), rep(2,3), rep(3,2)), FUN = list)
$
1
[1] 1 2 3 4 5$
2
[1] 6 7 8$
3
[1] 9 10
ctapply(X = 1:10, INDEX = c(rep(1,5), rep(2,3), rep(3,2)), FUN = list)
$
1
$1
[[1]]
[1] 9 10$
2
$2
[[1]]
[1] 9 10$
3
$3
[[1]]
[1] 9 10
Hey, im trying to use a custom compiled version of phangorn in webR and i get the following error:
An error occurred: package or namespace load failed for ‘phangorn’ in dyn.load(file, DLLpath = DLLpath, ...):
webr.js?v=2d081c24:2588 unable to load shared object '/usr/lib/R/library/fastmatch/libs/fastmatch.so':
webr.js?v=2d081c24:2588 Could not load dynamic lib: /usr/lib/R/library/fastmatch/libs/fastmatch.so
webr.js?v=2d081c24:2588 LinkError: WebAssembly.Instance(): Import #28 "env" "R_registerRoutines": imported function does not match the expected type
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.