Giter Site home page Giter Site logo

Comments (17)

eddelbuettel avatar eddelbuettel commented on May 31, 2024 2

Version 0.0.16 is in incoming at CRAN and polishes all this. It is still on hold because the remaining UBSAN issue is addresses. Thanks again to everybody for pitching in, this was pretty fun and effective.

from rcppannoy.

jlmelville avatar jlmelville commented on May 31, 2024 2

It's a thing of beauty. I may have stared at that clean check page with a sense of well-being for longer than I care to admit.

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024 1

(This is now taken care of in master thanks to very nice flurry of activity by @erikbern @jlmelville @LTLA and myself. We should have 0.0.16 on CRAN by end of week; I will close it then. If anybody needs a tarball of 0.0.15.2 in a drat-able repo let me know, else just fetch master.)

from rcppannoy.

erikbern avatar erikbern commented on May 31, 2024

odd. do you know what line in annoy this is caused on

from rcppannoy.

jlmelville avatar jlmelville commented on May 31, 2024

FWIW, this is the same UBSAN pattern that was triggered by the last version's clang-UBSAN checks
(currently at https://www.stats.ox.ac.uk/pub/bdr/memtests/clang-UBSAN/RcppAnnoy/RcppAnnoy-Ex.Rout) and is triggered by:

a$getNNsByItemList(0, 5, -1, TRUE)

which in turn calls:

Rcpp::List getNNsByItemList(S item, size_t n,
                                size_t search_k, bool include_distances) {
// ... //
    std::vector<S> result;
    std::vector<T> distances;
    ptr->get_nns_by_item(item, n, search_k, &result, &distances);
}

but from the look of the message it may not be complaining about anything in Annoy here: I think it might be that search_k = -1, but is being cast to a size_t (aka an unsigned long on this platform)?

from rcppannoy.

erikbern avatar erikbern commented on May 31, 2024

ok doesn't look like anything in core annoy then

from rcppannoy.

erikbern avatar erikbern commented on May 31, 2024

it probably is a bug – it sort of works because -1 ends up being interpreted as "search exhaustively", but i'm not sure if that's a bug or a feature honestly.

from rcppannoy.

LTLA avatar LTLA commented on May 31, 2024

runtime error: -1 is outside the range of representable values of type 'unsigned long'

Woah. Seriously? I thought assigning -1 to an unsigned int was a standard way of getting the maximum representable value. Maybe UBSAN wants us to be more explicit about it with a static_cast?

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024

The code is in the examples section of the manual page. So we could avoid. So somewhat naive question for @erikbern: are there better ways to do this?

# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByItemList(0, 5, -1, TRUE)

# Retrieve 5 nearest neighbors to item 0
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list without an element for distance
a$getNNsByItemList(0, 5, -1, FALSE)

# [....]

# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)

# Retrieve 5 nearest neighbors to vector v
# search_k = -1 will invoke default search_k value of n_trees * n
# Return results as list with an element for distance
a$getNNsByVectorList(v, 5, -1, TRUE)

Should we / could use NA, then pick up the NA and replace it with, say, std::numeric_limits<uint64_t>::max() ?

from rcppannoy.

jlmelville avatar jlmelville commented on May 31, 2024

What about adding an overload to the likes of getNNsByItemList which doesn't have the search_k parameter and which means "always do exhaustive search"? The overload can then forward to the original with a static_cast<size_t>(-1), and the examples can be updated to use the new method, e.g.:

a$getNNsByItemList(0, 5, FALSE)
# instead of
a$getNNsByItemList(0, 5, -1, FALSE)

The downsides I see are extra documentation and having to write out the exact method pointer types to disambiguate the overload in the RCPP_MODULE, which leads to horrible stuff like:

Rcpp::List (AnnoyAngular::*getNNsByItemList_1)(int32_t, size_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;
Rcpp::List (AnnoyAngular::*getNNsByItemList_2)(int32_t, size_t, bool) = &AnnoyAngular::getNNsByItemList;

This compiled and seemed to work when I tested it, so if there is interest in this approach I can make a PR. But I should note that I haven't actually tested that it fixes the UBSAN issues.

from rcppannoy.

LTLA avatar LTLA commented on May 31, 2024

Before embarking on that, perhaps the most expedient solution would be to replace size_t with int for the R-facing interface of the affected method:

rcppannoy/src/annoy.cpp

Lines 77 to 78 in 68fe172

Rcpp::List getNNsByItemList(S item, size_t n,
size_t search_k, bool include_distances) {

and then handling the coercion inside getNNsByItemList. Either by a simple static_cast<size_t>(search_k) or by setting it to the appropriate numeric_limits if search_k is negative.

I mean, it doesn't seem that the arguments to getNNsByItemList need to be size_t.

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024

Yep. And the could carry an NA as an int representation which we could the map to a special value (UINT_MAX?) and move on. The key really is not loose functionality, but to loose the SAN error.

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024

Still in Copenhagen at the conference, but yes, the key probably is to replace size_t (aka uint64_r) with int(or maybedouble` if we are concerned about domain range).

@erikbern Code below. Basically size_t n etc in the signatures is what I would need to change (to something R support, ie int or double). Any reason not to just use int ?

    std::vector<S> getNNsByItem(S item, size_t n) {
        std::vector<S> result;
        ptr->get_nns_by_item(item, n, -1, &result, NULL);
        return result;
    }

    Rcpp::List getNNsByItemList(S item, size_t n,
                                size_t search_k, bool include_distances) {
        if (include_distances) {
            std::vector<S> result;
            std::vector<T> distances;
            ptr->get_nns_by_item(item, n, search_k, &result, &distances);
            return Rcpp::List::create(Rcpp::Named("item")     = result,
                                      Rcpp::Named("distance") = distances);
        } else {
            std::vector<S> result;
            ptr->get_nns_by_item(item, n, search_k, &result, NULL);
            return Rcpp::List::create(Rcpp::Named("item") = result);
        }
    }

    std::vector<S> getNNsByVector(std::vector<double> dv, size_t n) {
        std::vector<T> fv(dv.size());
        std::copy(dv.begin(), dv.end(), fv.begin());
        std::vector<S> result;
        ptr->get_nns_by_vector(&fv[0], n, -1, &result, NULL);
        return result;
    }

    Rcpp::List getNNsByVectorList(std::vector<T> fv, size_t n,
                                  size_t search_k, bool include_distances) {
        if (fv.size() != vectorsz) {
            Rcpp::stop("fv.size() != vector_size");
        }
        if (include_distances) {
            std::vector<S> result;
            std::vector<T> distances;
            ptr->get_nns_by_vector(&fv[0], n, search_k, &result, &distances);
            return Rcpp::List::create(
                Rcpp::Named("item") = result,
                Rcpp::Named("distance") = distances);
        } else {
            std::vector<S> result;
            ptr->get_nns_by_vector(&fv[0], n, search_k, &result, NULL);
            return Rcpp::List::create(Rcpp::Named("item") = result);
        }
    }

from rcppannoy.

erikbern avatar erikbern commented on May 31, 2024

i agree it should just be int if we're using -1 as a special value. i can fix

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024

Yeah, I probably just followed your interface and was not thinking hard enough about the limitations I would get from a size_t in the interface.

Just tried the Rocker container for r-devel-ubsan-clang but I could not yet reproduce the issue CRAN saw so the ubsan config I encoded in the Rocker (== Docker for R) container must be slightly different. But if we just switch to int it should be better. But I can't really test it pre-upload.

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024

Took a few days for all builders to catch up but we do have the anticipated clean bill of health. (The remaining NOTE is something we cannot do anything, and that similarly nobody cares too much about.) Thanks again to everybody for pitching in.

image

from rcppannoy.

eddelbuettel avatar eddelbuettel commented on May 31, 2024

(There is also the foghorn package that does that for all your packages. I find it a little heavy, so leaning on a few some hints and code by @brodieG I cooked up a helper function in my rag-tag bag of small functions. Credits for this really mostly to @brodieG though.)

R> dang::checkCRANStatus("[email protected]")
               Package ERROR WARN NOTE OK
1              anytime     2           11
2          AsioHeaders                 13
3                   BH               8  5
4                 binb                 13
5                 dang                 13
6               digest                 13
7                 drat                 13
8           gaussfacts                 13
9                 gcbd          3    6   
10               gettz               2 11
11            gunsales          2      11
12              inline                 13
13                linl                 13
14             littler                  6
15            nanotime                 13
16                pinp                 13
17           pkgKitten                 13
18                prrd                 13
19              random                 13
20        RApiDatetime                 13
21       RApiSerialize                 13
22             Rblpapi              12   
23                Rcpp     1         8  4
24           RcppAnnoy               3 10
25             RcppAPT     3            7
26       RcppArmadillo               8  5
27             RcppBDT               4  9
28            RcppCCTZ                 13
29         RcppClassic          1    1 11
30 RcppClassicExamples                 13
31            RcppCNPy          1    2 10
32              RcppDE                 13
33           RcppEigen               8  5
34        RcppExamples                 13
35         RcppGetconf               2  7
36             RcppGSL                 13
37         RcppMsgPack               8  5
38    RcppNLoptExample                 13
39      RcppQuantuccia                 13
40           RcppRedis                 12
41        RcppSimdJson     1         3  5
42             RcppSMC              11  2
43         RcppStreams               2 11
44            RcppTOML               2 11
45             RcppXts              13   
46        RcppZiggurat               2 11
47          RDieHarder          6       6
48              rfoaas                 13
49             RInside          2   11   
50             rmsfact                 13
51           RProtoBuf          1    6  5
52         RPushbullet                 13
53           RQuantLib               6  4
54       RVowpalWabbit     3         5   
55          sanitizers     1         2 10
56                tint          1    2 10
57                ttdo                 13
58           x13binary              12  1
Errors/Warnings Present
https://cran.r-project.org/web/checks/check_results_edd_at_debian.org.html

R> 

And yes, I have too many packages, and yes, CRAN still throws too many stooopid errors.

from rcppannoy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.