Comments (17)
Version 0.0.16 is in incoming at CRAN and polishes all this. It is still on hold because the remaining UBSAN issue is addresses. Thanks again to everybody for pitching in, this was pretty fun and effective.
from rcppannoy.
It's a thing of beauty. I may have stared at that clean check page with a sense of well-being for longer than I care to admit.
from rcppannoy.
(This is now taken care of in master thanks to very nice flurry of activity by @erikbern @jlmelville @LTLA and myself. We should have 0.0.16 on CRAN by end of week; I will close it then. If anybody needs a tarball of 0.0.15.2 in a drat-able repo let me know, else just fetch master.)
from rcppannoy.
odd. do you know what line in annoy this is caused on
from rcppannoy.
FWIW, this is the same UBSAN pattern that was triggered by the last version's clang-UBSAN checks
(currently at https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/RcppAnnoy/RcppAnnoy-Ex.Rout) and is triggered by:
a$getNNsByItemList(0, 5, -1, TRUE)
which in turn calls:
Rcpp::List getNNsByItemList(S item, size_t n,
size_t search_k, bool include_distances) {
// ... //
std::vector<S> result;
std::vector<T> distances;
ptr->get_nns_by_item(item, n, search_k, &result, &distances);
}
but from the look of the message it may not be complaining about anything in Annoy here: I think it might be that search_k = -1
, but is being cast to a size_t
(aka an unsigned long
on this platform)?
from rcppannoy.
ok doesn't look like anything in core annoy then
from rcppannoy.
it probably is a bug βΒ it sort of works because -1 ends up being interpreted as "search exhaustively", but i'm not sure if that's a bug or a feature honestly.
from rcppannoy.
runtime error: -1 is outside the range of representable values of type 'unsigned long'
Woah. Seriously? I thought assigning -1 to an unsigned int was a standard way of getting the maximum representable value. Maybe UBSAN wants us to be more explicit about it with a static_cast
?
from rcppannoy.
The code is in the examples section of the manual page. So we could avoid. So somewhat naive question for @erikbern: are there better ways to do this?
# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByItemList(0, 5, -1, TRUE)
# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list without an element for distance
a$getNNsByItemList(0, 5, -1, FALSE)
# [....]
# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)
# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)
Should we / could use NA
, then pick up the NA
and replace it with, say, std::numeric_limits<uint64_t>::max()
?
from rcppannoy.
What about adding an overload to the likes of getNNsByItemList
which doesn't have the search_k
parameter and which means "always do exhaustive search"? The overload can then forward to the original with a static_cast<size_t>(-1)
, and the examples can be updated to use the new method, e.g.:
a$getNNsByItemList(0, 5, FALSE)
# instead of
a$getNNsByItemList(0, 5, -1, FALSE)
The downsides I see are extra documentation and having to write out the exact method pointer types to disambiguate the overload in the RCPP_MODULE
, which leads to horrible stuff like:
Rcpp::List (AnnoyAngular::*getNNsByItemList_1)(int32_t, size_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;
Rcpp::List (AnnoyAngular::*getNNsByItemList_2)(int32_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;
This compiled and seemed to work when I tested it, so if there is interest in this approach I can make a PR. But I should note that I haven't actually tested that it fixes the UBSAN issues.
from rcppannoy.
Before embarking on that, perhaps the most expedient solution would be to replace size_t
with int
for the R-facing interface of the affected method:
Lines 77 to 78 in 68fe172
and then handling the coercion inside getNNsByItemList
. Either by a simple static_cast<size_t>(search_k)
or by setting it to the appropriate numeric_limits
if search_k
is negative.
I mean, it doesn't seem that the arguments to getNNsByItemList
need to be size_t
.
from rcppannoy.
Yep. And the could carry an NA as an int representation which we could the map to a special value (UINT_MAX?) and move on. The key really is not loose functionality, but to loose the SAN error.
from rcppannoy.
Still in Copenhagen at the conference, but yes, the key probably is to replace size_t
(aka uint64_r) with
int(or maybe
double` if we are concerned about domain range).
@erikbern Code below. Basically size_t n
etc in the signatures is what I would need to change (to something R support, ie int
or double
). Any reason not to just use int
?
std::vector<S> getNNsByItem(S item, size_t n) {
std::vector<S> result;
ptr->get_nns_by_item(item, n, -1, &result, NULL);
return result;
}
Rcpp::List getNNsByItemList(S item, size_t n,
size_t search_k, bool include_distances) {
if (include_distances) {
std::vector<S> result;
std::vector<T> distances;
ptr->get_nns_by_item(item, n, search_k, &result, &distances);
return Rcpp::List::create(Rcpp::Named("item") = result,
Rcpp::Named("distance") = distances);
} else {
std::vector<S> result;
ptr->get_nns_by_item(item, n, search_k, &result, NULL);
return Rcpp::List::create(Rcpp::Named("item") = result);
}
}
std::vector<S> getNNsByVector(std::vector<double> dv, size_t n) {
std::vector<T> fv(dv.size());
std::copy(dv.begin(), dv.end(), fv.begin());
std::vector<S> result;
ptr->get_nns_by_vector(&fv[0], n, -1, &result, NULL);
return result;
}
Rcpp::List getNNsByVectorList(std::vector<T> fv, size_t n,
size_t search_k, bool include_distances) {
if (fv.size() != vectorsz) {
Rcpp::stop("fv.size() != vector_size");
}
if (include_distances) {
std::vector<S> result;
std::vector<T> distances;
ptr->get_nns_by_vector(&fv[0], n, search_k, &result, &distances);
return Rcpp::List::create(
Rcpp::Named("item") = result,
Rcpp::Named("distance") = distances);
} else {
std::vector<S> result;
ptr->get_nns_by_vector(&fv[0], n, search_k, &result, NULL);
return Rcpp::List::create(Rcpp::Named("item") = result);
}
}
from rcppannoy.
i agree it should just be int if we're using -1 as a special value. i can fix
from rcppannoy.
Yeah, I probably just followed your interface and was not thinking hard enough about the limitations I would get from a size_t
in the interface.
Just tried the Rocker container for r-devel-ubsan-clang
but I could not yet reproduce the issue CRAN saw so the ubsan config I encoded in the Rocker (== Docker for R) container must be slightly different. But if we just switch to int
it should be better. But I can't really test it pre-upload.
from rcppannoy.
Took a few days for all builders to catch up but we do have the anticipated clean bill of health. (The remaining NOTE is something we cannot do anything, and that similarly nobody cares too much about.) Thanks again to everybody for pitching in.
from rcppannoy.
(There is also the foghorn
package that does that for all your packages. I find it a little heavy, so leaning on a few some hints and code by @brodieG I cooked up a helper function in my rag-tag bag of small functions. Credits for this really mostly to @brodieG though.)
R> dang::checkCRANStatus("[email protected]")
Package ERROR WARN NOTE OK
1 anytime 2 11
2 AsioHeaders 13
3 BH 8 5
4 binb 13
5 dang 13
6 digest 13
7 drat 13
8 gaussfacts 13
9 gcbd 3 6
10 gettz 2 11
11 gunsales 2 11
12 inline 13
13 linl 13
14 littler 6
15 nanotime 13
16 pinp 13
17 pkgKitten 13
18 prrd 13
19 random 13
20 RApiDatetime 13
21 RApiSerialize 13
22 Rblpapi 12
23 Rcpp 1 8 4
24 RcppAnnoy 3 10
25 RcppAPT 3 7
26 RcppArmadillo 8 5
27 RcppBDT 4 9
28 RcppCCTZ 13
29 RcppClassic 1 1 11
30 RcppClassicExamples 13
31 RcppCNPy 1 2 10
32 RcppDE 13
33 RcppEigen 8 5
34 RcppExamples 13
35 RcppGetconf 2 7
36 RcppGSL 13
37 RcppMsgPack 8 5
38 RcppNLoptExample 13
39 RcppQuantuccia 13
40 RcppRedis 12
41 RcppSimdJson 1 3 5
42 RcppSMC 11 2
43 RcppStreams 2 11
44 RcppTOML 2 11
45 RcppXts 13
46 RcppZiggurat 2 11
47 RDieHarder 6 6
48 rfoaas 13
49 RInside 2 11
50 rmsfact 13
51 RProtoBuf 1 6 5
52 RPushbullet 13
53 RQuantLib 6 4
54 RVowpalWabbit 3 5
55 sanitizers 1 2 10
56 tint 1 2 10
57 ttdo 13
58 x13binary 12 1
Errors/Warnings Present
https://cran.r-project.org/web/checks/check_results_edd_at_debian.org.html
R>
And yes, I have too many packages, and yes, CRAN still throws too many stooopid errors.
from rcppannoy.
Related Issues (20)
- Vignette build on windows errors HOT 17
- Warning during compilation with Ropen HOT 5
- Expose `set_seed` method HOT 4
- More granular documentation HOT 5
- error installing rcppannoy on Mac 10.9.5, R 3.4.4 HOT 3
- Error in loading RcppAnnoy HOT 2
- RcppAnnoy fails to compile HOT 4
- Error during compilation of RcppAnnoy HOT 9
- rcppannoy loading/installation error HOT 6
- Version 0.0.15 broke BiocNeighbors HOT 7
- 0.0.16 build from source issues on macOS HOT 5
- Vignette update HOT 9
- More content for RcppAnnoy.h HOT 4
- Unable to Save HOT 3
- Pass the whole matrix to addItem and getNNs function HOT 6
- Multithreading HOT 7
- Any plan for API like RcppHSNW or RANN? HOT 8
- R CMD build issue on Windows with R 4.2 HOT 9
- Run reverse depends checks HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rcppannoy.