Giter Site home page Giter Site logo

largevis's Introduction

largeVis

Travis-CI Build Status Coverage Status https://gitter.im/elbamos/largeVis AppVeyor Build Status

This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates:

  • A very fast algorithm for estimating k-nearest neighbors, implemented in C++ with Rcpp and OpenMP. See the Benchmarks file for performance details.
  • Efficient implementations of the clustering algorithms:
    • HDBSCAN
    • OPTICS
    • DBSCAN
  • Functions for visualizing manifolds like this.

News Highlights

  • Version 0.1.10 re-adds clustering, and also adds momentum training to largeVis, as well as a host of other features and improvements.
  • Version 0.1.9.1 has been accepted by CRAN. Much grattitude to Uwe Ligges and Kurt Hornik for their assistance, advice, and patience.

Some Examples

MNIST

Wiki Words

Clustering With HDBSCAN

Visualize Embeddings

Visualize Embeddings

Building Notes

  • Note on R 3.4: Before R 3.4, the CRAN binaries were likely to have been compiled without OpenMP, and getting OpenMP to work on Mac OS X was somewhat tricky. This should all have changed (for the better) with R 3.4, which natively using clang 4.0 by default. Since R 3.4 is new, I'm not able to provide advice, but am interested in hearing of any issues and any workarounds to issues that you may discover.

largevis's People

Contributors

bmschmidt avatar elbamos avatar jimhester avatar laurae2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

largevis's Issues

Why is largeVis found to be slower than Rtsne?

Hi,

first of all, thank you very much for this implementation of largeVis.
I gave it a try and it worked fine. I had to follow the instructions at http://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks-lgfortran-and-lquadmath-error/ for Mac OS X El Capitan to install it first, though.

The dataset that I tested is comprised of ~15k points with 512D. Running it with default parameters and a single thread took about 1200 seconds while running Rtsne() took about 212 seconds.

The clusters looked much tighter and mostly better separated then those from Rtsne().

The longer runtime came a bit unexpected after I read

It has been benchmarked at more than 30x faster than Barnes-Hut on datasets of approximately 1-million rows, and scaled linearly as long as there is sufficient RAM.

in your README.

Hence, I am curious where this difference comes from and was wondering if you could maybe provide some clarifications here.

TIA.

Best,

Cedric

largeVis for R version 4.1.2

Dear elbamos.

Your largeVis algorithm and package has been of great help to us so far to reveal patterns in mass spectrometric data on environmental samples. Unfortunately, I am now forced to update from R version 3.3.3 to the latest version 4.1.2, and have thus tried to install_github("elbamos/largeVis"), using the devtools library, Rtools and its dependencies. On a 64bit Windows machine, this eventually leads to the following error (apart from a number of other warnings beforehand, e.g., on the RcppArmadillo package):

Error: package or namespace load failed for 'largeVis':
 .onAttach failed in attachNamespace() for 'largeVis', details:
  call: checkBits()
  error: function 'enterRNGScope' not provided by package 'Rcpp'

Do you have any idea how to solve this? Do you plan to update largeVis to later versions of R? Thanks & regards!

Error in newest R package version

Hi,
I get an error, when I load the package:

> library(largeVis)

Loading required package: Matrix
Error : object ‘opticsXi’ is not exported by 'namespace:dbscan'
Error: package or namespace load failed for ‘largeVis’

my sessionInfo() is below.
When I google I find this - so somebody else has had this error too. They point to the CRAN page where this error apears too:
https://cran.r-project.org/web/checks/check_results_largeVis.html

Is there an issue with the newest version? Thank you!

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] Matrix_1.2-6

loaded via a namespace (and not attached):
[1] tools_3.3.1 Rcpp_0.12.7 grid_3.3.1 dbscan_1.0-0 lattice_0.20-33

LargeVis vs tSNE clustering

I have this data frame of 20,000 options with 14 distinct parameters. A standard Rtsne generates relatively well distributed and balanced clusters as follows;
rplot

However, I have been struggling to generate the similar plot. I've been playing with K and n_tree but have not quite get the plot that comes close to tSNE's.

e.g. v<-largeVis(test, K=200, n_tree=200, distance_method = "Eucledian", threads = 16)

rplot01

I'd appreciate if you could give me some pointer to start (stackoerflow was not helpful).

BuildWijMatrix fails: invalid row or column index

R 3.2.5; latest version of largeVis; Ubuntu 12.04

Running largeVis by itself and its step-by-step components doesn't work. I narrowed it down to the BuildWijMatrix step failing, but don't know how to proceed.

sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] largeVis_0.2 Matrix_1.2-8 Rcpp_0.12.10

loaded via a namespace (and not attached):
[1] colorspace_1.3-2 scales_0.4.1 assertthat_0.1 lazyeval_0.2.0
[5] plyr_1.8.4 tools_3.2.5 gtable_0.2.0 tibble_1.2
[9] ggplot2_2.2.1 grid_3.2.5 munsell_0.4.3 lattice_0.20-34

dim(as.matrix(seurat_sw480@data))
[1] 15843 1691
neighbors <- randomProjectionTreeSearch(as.matrix(seurat_sw480@data), n_trees = 5, max_iter = 1, verbose=T)
Searching for neighbors.
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**************************************************|
edges <- buildEdgeMatrix(data = as.matrix(seurat_sw480@data), neighbors = neighbors)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1124907 60.1 1770749 94.6 1770749 94.6
Vcells 118347552 903.0 196989268 1503.0 187146708 1427.9
rm(neighbors)
wij <- buildWijMatrix(edges)

error: SpMat::SpMat(): invalid row or column index
Error in referenceWij(is, x@i, x@x^2, as.integer(threads), perplexity) :
SpMat::SpMat(): invalid row or column index

randomProjectionTreeSearch fails on division by zero with SparseMatrix

Hello,

There's an error which I cannot traceback. First, I create a sparse data matrix simply

data <- read_csv("my_data.csv", col_names = FALSE)
data <- as.matrix(data)
data <- t(data)
data <- Matrix(data, sparse = TRUE)

When I give the sparse matrix to randomProjectionTreeSearch, the R session ends with the following message:

terminate called after throwing an instance of 'std::logic_error'
  what():  element-wise division: division by zero

My data set has 50k samples and 5 dimensions, so I was still able to run it without using a sparse matrix following the guideline in the vignette, but this could bring a memory issue in future (my workstation has only 16GB memory).

Thank you for the excellent package!

Any plans for Pythong bindings?

Thank you for implementing LargeVis - I am eager to try the algo since paper was published.

Do you have any plans to add Python bindings for it? Thanks again.

Documentation/examples

Hi,
could you, please, point to some examples about how to make the visualisations shown?

It'd be great to also have installation instructions for non R users.

Thank you,

hdbscan-non-numeric argument to binary operator

library(largeVis)
set.seed(123)
ts_matrix_elec <- elect_data %>% scale() %>% t()
visObject <- largeVis(ts_matrix_elec, n_trees = 50,
K = 10)
plot(t(visObject$coords))

clusters <- hdbscan(visObject, verbose = FALSE) # failed
Error in stats::aggregate(probs, by = list(clusters), FUN = "max")$probs - :
non-numeric argument to binary operator

gplot(clusters, t(visObject$coords))

What happened? Is there any suggestion?

Error when installing largeVis on Ubunutu

Hi,
I have this error when I try to install your package like this install_github("elbamos/largeVis") :

** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘largeVis’:
 .onAttach failed in attachNamespace() for 'largeVis', details:
  call: checkBits()
  error: la fonction 'enterRNGScope' n'existe pas dans le package 'Rcpp'
Erreur : le chargement a échoué
Exécution arrêtée
ERROR: loading failed
* removing ‘/usr/local/lib/R/site-library/largeVis’
Erreur : Failed to install 'largeVis' from GitHub:
  (converti depuis l'avis) installation of package ‘/tmp/RtmpptvoQp/file40d3768bc12b/largeVis_0.2.2.tar.gz’ had non-zero exit status

Gradient computation detail

In gradients.cpp, I am not sure whether the negative gradient computation of the AlphaOneGradient is correct.

For the positive gradient, we have:

L ~ log(1 / (1 + (yi - yj)**2).

Deriving w.r.t yi, we get :

 - 2 (yi - yj) / (1 + (yi - yj)**2 

This matches your computation of the gradient and the subsequent multModify operation.

For the negative gradient, we have:

L ~ gamma * log(  (yi - yj)**2 / (1 + (yi - yj)**2).

Deriving w.r.t yi, we get :

2 gamma * 1 / [ (yi - yj) * (1 + (yi - yj)**2 )

This matches your computation of the gradient and the subsequent multModify operation only if we omit the factor 2.

as.dendogram.hbscan not found

When running this example and others, R can't find the as. dendogram.hdbscan

data(iris)
vis <- largeVis(t(iris[,1:4]), K = 20, sgd_batches = 1)
hdbscanobj <- hdbscan(vis, minPts = 10, K = 5)
plot(as_dendrogram_hdbscan(hdbscanobj))

Error in plot(as_dendrogram_hdbscan(hdbscanobj)) :
could not find function "as_dendrogram_hdbscan"

Issue about the results of function vis()

Firstly, thanks for the nice implementation.
I tried this algorithm on the cifar 10 dataset. The problem is that I got different results from the same input. Is there any method to deal with it, and make the results more stable?

meaning of tree failure.

Thank you for providing the great works,
I got a question that some dataset will lead to "tree failure exception" in the function "copyHeapToMatrix".
What does it means? how can I avoid it when preparing the dataset?

**********************************************terminate called after throwing an instance of 'Rcpp::exception'
  what():  Tree failure.
Aborted

randomProjectionTreeSearch gets stuck and never returns

Apologies in advance if this is not a "bug" but just something I am doing wrong.

I have a data set of 423K rows and 225 dimensions. I am running the different largeVis steps separately to debug ("randomProjectionTreeSearch", "buildEdgeMatrix", "buildWijMatrix", "projectKNNs"). The first step runs at full speed for a couple of seconds and then settles in a mono-thread load (15% on a 4 core machine) and never returns.

I have had the same behaviour with a similar dataset (423K rows) but with 500 different dimensions. In that case changing the "K" parameter prevented the issue. I have gone over the various hyper parameters but have not been able to find a setting that works for my set of 225 dimensions.

Is there any way that I can debug this so as to prevent me from having to search randomly the solution space of hyper parameters ? I have tried setting the option "getOption("verbose", TRUE)" but this does not ouput anything.

Any help would be appreciated. In any case, thanks for your wonderful package!

Spec:

  • Windows 10 pro
  • 16 Gb RAM, core i7 6700 HQ (4 core)
  • largeVis 0.1.10 x64 (compiled against github, though I have also tried CRAN 32-bit version)
  • R 3.3.2 x86_64-w64-mingw32

Overcoming p_{j|i} over flow

I have a following data;

str(sLF)
Classes ‘data.table’ and 'data.frame': 20000 obs. of 14 variables:
$ SSC-A : num 134 123 169 122 133 ...
$ CD43 : num 115.8 102.8 82.9 94.3 108 ...
$ DEX : num 31.2 50.2 36.8 40 26.7 ...
$ CD28 : num 106.6 112.7 71.5 73.3 93.4 ...
$ CD45RA : num 95.4 28.4 39.9 90.7 61.5 ...
$ CD27 : num 66.4 177.8 83.4 75.1 68.9 ...
$ CD279 : num 55.9 97.3 57.2 61.8 55.4 ...
$ CD14CD16CD19: num 163.8 50.6 99.4 35.7 167.1 ...
$ CD57 : num 89.2 103.4 50 174.8 41.3 ...
$ CD3 : num 39.5 120.5 53.6 49.8 51.7 ...
$ KLRG1 : num 29.7 100.2 147.8 46.8 30.3 ...
$ CD8 : num 65 150.4 57.7 63.1 122 ...
$ CD56 : num 174.4 63.1 72 134 172.5 ...
$ CD38 : num 138.3 138.8 130.9 99.6 161.4 ...
attr(*, ".internal.selfref")=

I run largeVis and received an warning as follows;

v<-largeVis(sLF)
Warning message:
In largeVis(sLF) :
The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.

I am not quite sure about "Scaling the distance vector." Could you suggest an workaround of this issue ?

mnist.Rda not included

In Example 2, load("./mnist.Rda") does not work. At least the link of mnist.Rda should be given for downloading.

"largeVis is incompatible with gcc < 4.9"

I'm trying to run largeVis on R 3.4.1 when I encounter the above error. Do you happen to have an older version that is compatible with gcc < 4.9? I wish I could update the compiler, but it's from the company's network/grid-computer so I don't trust myself modifying that.

sessionInfo:
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

EDIT: I don't mean to imply that this is an issue with largeVis itself. I just don't know how else to contact you which is why I decided to submit this as an issue.

Dimensionality reduction clumps all but one point together

Apologies for the cross-post, as this issue was reported here also:
lferry007/LargeVis#8

Hoping to get some additional visibility.

I have the same issue using Ubuntu 15.10. Has anyone solved this yet?

After running Largevis on my dataset, the first point is orders of magnitude larger than the remaining points after dimensionality reduction, resulting in a meaningless plot.

root@blah-VirtualBox:/home/blah/Desktop/LargeVis/20161012# ./LargeVis -input 1k_points.txt -output 1k_2d.txt
Reading input file 1k_points.txt ...... Done.
Total vertices : 1000 Dimension : 64
Normalizing ...... Done.
Running ANNOY ...... Done.
Running propagation 3/3
Test knn accuracy : 95.98%
Computing similarities ...... Done.
Fitting model Alpha: 0.000100 Progress: 99.993%
root@blah-VirtualBox:/home/blah/Desktop/LargeVis/20161012#

root@blah-VirtualBox:/home/blah/Desktop/LargeVis/20161012# head 1k_2d.txt
1000 2
-31.457289 -0.287726
12.466423 -0.287530
12.466411 -0.287530
12.466626 -0.287530
12.466501 -0.287530
12.466530 -0.287530
12.466509 -0.287530
12.466496 -0.287530
12.466705 -0.287530

Here is a link to the input data:
https://www.dropbox.com/s/bvup56przujg52d/1k_points.txt?dl=0

And a link to the Largevis output:
https://www.dropbox.com/s/jk2p0qof2sn7hr9/1k_2d.txt?dl=0

Any guidance would be greatly appreciated. Thank you!

how to use hdbscan?

I've having some trouble getting results from hdbscan. A simple example on a small dataset would be helpful!

Thanks!

Installation error

I'm using Rtools to install the source files (through R in Windows 7). The R version is 3.3.0 and gcc version is 4.9.3. The command I use to install in R is
install.packages("largeVis", repos = NULL, type="source", verbose = T, quiet = F)

The error information is as follows.
(For convenience, I pick up the key line here: largeVis.cpp:58:6: error: cannot convert 'int*' to 'vertexidxtype* {aka long long int*}' in assignment)

d:/R/Rtools/mingw_32/bin/g++ -std=c++0x -I"D:/R/R-331.0/include" -DNDEBUG -I"D:/R/R-3.3.0/library/Rcpp/include" -I"D:/R/R-3.3.0/library/RcppProgress/include" -I"D:/R/R-3.3.0/library/RcppArmadillo/include" -I"D:/R/R-3.3.0/library/testthat/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -fopenmp -DARMA_64BIT_WORD -O2 -Wall -mtune=core2 -c gradients.cpp -o gradients.o
d:/R/Rtools/mingw_32/bin/g++ -std=c++0x -I"D:/R/R-33
1.0/include" -DNDEBUG -I"D:/R/R-3.3.0/library/Rcpp/include" -I"D:/R/R-3.3.0/library/RcppProgress/include" -I"D:/R/R-3.3.0/library/RcppArmadillo/include" -I"D:/R/R-3.3.0/library/testthat/include" -I"d:/Compiler/gcc-4.9.3/local330/include" -fopenmp -DARMA_64BIT_WORD -O2 -Wall -mtune=core2 -c largeVis.cpp -o largeVis.o
largeVis.cpp: In member function 'void Visualizer::initAlias(arma::ivec&, const vec&, const ivec&, Rcpp::Nullable<Rcpp::Vector<14, Rcpp::PreserveStorage> >)':
largeVis.cpp:58:6: error: cannot convert 'int_' to 'vertexidxtype_ {aka long long int_}' in assignment
ps = newps.memptr();
^
largeVis.cpp: In function 'arma::mat sgd(arma::mat, arma::ivec&, arma::ivec&, arma::ivec&, arma::vec&, double, double, long long int, int, double, Rcpp::Nullable<Rcpp::Vector<14, Rcpp::PreserveStorage> >, bool)':
largeVis.cpp:173:41: error: no matching function for call to 'Visualizer::Visualizer(int_, int_, const uword&, coordinatetype_, const int&, double, long long int)'
(iterationtype) n_samples);
^
largeVis.cpp:173:41: note: candidate is:
largeVis.cpp:34:3: note: Visualizer::Visualizer(vertexidxtype_, vertexidxtype_, dimidxtype, coordinatetype_, int, distancetype, iterationtype)
Visualizer(vertexidxtype * sourcePtr,
^
largeVis.cpp:34:3: note: no known conversion for argument 1 from 'int_' to 'vertexidxtype* {aka long long int_}'
make: *_* [largeVis.o] Error 1
Warning: running command 'make -f "Makevars" -f "D:/R/R-331.0/etc/i386/Makeconf" -f "D:/R/R-331.0/share/make/winshlib.mk" CXX='$(CXX1X) $(CXX1XSTD)' CXXFLAGS='$(CXX1XFLAGS)' CXXPICFLAGS='$(CXX1XPICFLAGS)' SHLIB_LDFLAGS='$(SHLIB_CXX1XLDFLAGS)' SHLIB_LD='$(SHLIB_CXX1XLD)' SHLIB="largeVis.dll" OBJECTS="RcppExports.o dbscan.o denseneighbors.o distance.o edgeweights.o gradients.o largeVis.o sparse.o"' had status 2
ERROR: compilation failed for package 'largeVis'

  • removing 'D:/R/R-3.3.0/library/largeVis'

Import K-nn feature (external K-nn)

Hey, the author of kmcuda here.

I've got very fast K-nn implementation on GPU which scales to millions by hundreds. I guess this project would benefit from it. The easiest way of integrating the stuff seems to add the ability to import K-nn assignments in largeVis API (that is, an additional function argument).

I can try to make a PR myself (in some near future). Alternatively, it must be straightforward for you add it. What do you prefer?

ubuntu 64

Hello! I have tried to install from git and from the R-lib. In both cases got following error: "Error : object ‘opticsXi’ is not exported by 'namespace:dbscan'". System: ubuntu 16.04 (64bit).
What I am missing?

Implement hook for custom distance methods?

Hi,

from what I can read in the code, implementing a new distance metric for use in searchTrees is only a matter of

  1. providing a DenseAnnoySearch implementation
  2. providing a corresponding distance function.

(This at least if I do not care about working with sparse matrices. I would have to study sparse matrices better to understand that.)

I have an application where I need a custom, very specific distance metric that I can implement in (R)C(++). I can imagine others also could use this. I could just tack on the functionality, copy/paste some of the code and provide a new searchTreesMyDistance() function but it would be nicer to use a more generalizable way to plug in custom distance functions. Would you consider this?

E.g. with a
map<std::string&, distancetype (*)(arma::vec&, arma::vec&)>
type construct to register distance types?

caught segfault : 'memory not mapped'

Got the following segfault error with largeVis, as the 'bench' branch was not available, have recompiled from github/master without OpenMP as suggested in here.

For compiling without OpenMP I made the Makevars file as follows,

PKG_LIBS = $(FLIBS) $(LAPACK_LIBS) $(BLAS_LIBS)
PKG_CXXFLAGS = -DARMA_64BIT_WORD -DNDEBUG
CXX_STD=CXX11
LDFLAGS = $(LDFLAGS)

and compiled as

R-3.3.1 CMD INSTALL largeVis-master

The error message :

> library(largeVis)
Loading required package: Rcpp
Loading required package: Matrix

Attaching package: ‘Matrix’

The following object is masked from ‘package:tidyr’:

    expand

largeVis was compiled without OpenMP support.
> neig<-randomProjectionTreeSearch(t(dat.small.matrix), K=10, tree_threshold = 100, max_iter = 15, n_trees = 10)

 *** caught segfault ***
address 0x75a8, cause 'memory not mapped'

Traceback:
 1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold,     n_trees, K, maxIter, data, distMethod, seed, threads, verbose)
 2: searchTrees(threshold = as.integer(tree_threshold), n_trees = as.integer(n_trees),     K = as.integer(K), maxIter = as.integer(max_iter), data = x,     distMethod = as.character(distance_method), seed = seed,     threads = threads, verbose = as.logical(verbose))
 3: randomProjectionTreeSearch.matrix(t(dat.small.matrix), K = 10,     tree_threshold = 100, max_iter = 15, n_trees = 10)
 4: randomProjectionTreeSearch(t(dat.small.matrix), K = 10, tree_threshold = 100,     max_iter = 15, n_trees = 10)

Maybe I didn't compile it properly since the error still occurs in the 'multiprocessing step'.

Install largeVis from Github met errors

Hi, I was trying to install largeVis package using R function 'install_github()' with the error message:
Can anyone help me solve this?
`In file included from RcppExports.cpp:4:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include/RcppArmadillo.h:31:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/RcppArmadillo/include/RcppArmadilloForward.h:26:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/RcppCommon.h:29:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/r/headers.h:67:
In file included from /Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100:
In file included from /usr/local/clang4/bin/../include/c++/v1/cmath:305:
/usr/local/clang4/bin/../include/c++/v1/math.h:301:15: fatal error: 'math.h' file not found
#include_next <math.h>
^~~~~~~~
1 error generated.
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘largeVis’

  • removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/largeVis’
    Error: Failed to install 'largeVis' from GitHub:
    (converted from warning) installation of package ‘/var/folders/9w/9grv0t81461bxp0r26p689b40000gn/T//Rtmpzoqfvg/file3e849b7844c/largeVis_0.2.tar.gz’ had non-zero exit status`

largeVis gives errors if the script is launched by Rscript

Environment:
OS: Ubuntu 14.04
R: 3.3.2

# test_iris.r
library(largeVis)
data <- iris[,1:4]
data_triplet <- matrix(nrow=dim(data)[1]*dim(data)[2],ncol=3)
triplet_idx <- 1
for (i in 1:dim(data)[1]) {
  for (j in 1:dim(data)[2]) {
    data_triplet[triplet_idx, 1] <- i
    data_triplet[triplet_idx, 2] <- j
    data_triplet[triplet_idx, 3] <- data[i, j]
    triplet_idx <- triplet_idx + 1 
  }
}

sparse_data <- sparseMatrix(i=data_triplet[,1], j=data_triplet[,2], x=data_triplet[,3])
largevis_obj <- largeVis(t(sparse_data))
$ Rscript --vanilla test_iris.r                                                                 
Loading required package: Matrix
Error in UseMethod("randomProjectionTreeSearch") : 
  no applicable method for 'randomProjectionTreeSearch' applied to an object of class "dgCMatrix"
Calls: largeVis -> randomProjectionTreeSearch
Execution halted
$ R CMD BATCH test_iris.r  # safe

Trying to run: "sourceCpp(sparseneighbors.cpp)"

I am trying to get the sparseneighbors.cpp to work from R by running sourceCpp. It works for largeVis.cpp:

sourceCpp("../src/largeVis.cpp")
sourceCpp("../src/sparseneighbors.cpp")
Error in dyn.load("/tmp/RtmpVGSEyh/sourcecpp_9ed162156b7/sourceCpp_82706.so") :
unable to load shared object '/tmp/RtmpVGSEyh/sourcecpp_9ed162156b7/sourceCpp_82706.so':
/tmp/RtmpVGSEyh/sourcecpp_9ed162156b7/sourceCpp_82706.so: undefined symbol: Z13sparseCosDistRKN4arma5SpMatIdEES3

I did update armadillo to the latest version:

install.packages("RcppArmadillo")
...
trying URL 'http://ftp.acc.umu.se/mirror/CRAN/src/contrib/RcppArmadillo_0.7.100.3.1.tar.gz'
...

I suspect that there is a reference to something like arma:mat:sparsecosdist() that is not present in my version of armadillo.

I am running on debian.

I am a beginner with R, all advice is welcome.
Thank you for putting the largeVis project on github.

terminate called after throwing an instance of 'std::logic_error'

Hello,

There's an error which I cannot traceback.
The input matrix is a matrix of 19752*16, and I want to reduce the dimensionality of 16 dimensions to 2 dimensions.
When I used the first 1000 examples, the code ran smoothly. But when I increase to 5000 examples, I get the following error.

> head(data)
          V1        V2        V3        V4        V5        V6        V7
1: -0.035533 -0.269505 -0.242356 -0.346642  0.215485 -0.270419 -0.084026
2:  0.365067 -0.013597 -0.198735 -0.202712 -0.172067  0.081563  0.406164
3: -0.116263 -0.763487 -0.766170 -1.050907  0.646879 -0.769130 -0.270211
4: -0.054875 -0.936998 -0.958534 -1.295586  0.733142 -0.917629 -0.134953
5: -0.077013 -0.059518 -0.105835  0.194416  0.055624 -0.000481  0.128617
6: -0.110590 -0.435528 -0.412089 -0.599555  0.373934 -0.477969 -0.207258
          V8        V9       V10       V11       V12       V13       V14
1:  0.037065 -0.101412  0.100584 -0.054974  0.050207  0.031419 -0.099805
2:  0.263552  0.138655 -0.276181  0.314461 -0.824125 -0.083640  0.190374
3:  0.087700 -0.333627  0.350103 -0.266656  0.181778  0.078066 -0.329959
4:  0.243693 -0.281359  0.230305 -0.141352 -0.024434  0.110799 -0.277720
5: -0.002416 -0.001282 -0.138842 -0.014105 -0.023678  0.066954  0.080230
6:  0.010906 -0.217827  0.231002 -0.210552  0.208879  0.036738 -0.257102
         V15       V16
1: -0.153969  0.175217
2: -0.014546 -0.137743
3: -0.532380  0.551592
4: -0.589061  0.559640
5:  0.060884 -0.101197
6: -0.307401  0.352396
> vis=largeVis(t(data), dim=2,max_iter = 2,threads = 1,verbose=TRUE)
Searching for neighbors.
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
terminate called after throwing an instance of 'std::logic_error'
  what():  median(): detected NaN
Aborted (core dumped)

My R environment is

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /software/biosoft/software/python/python2020/lib/libblas.so.3.6.0
LAPACK: /software/biosoft/software/python/python2020/lib/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.14.2 largeVis_0.2      Matrix_1.5-1      Rcpp_1.0.9

loaded via a namespace (and not attached):
 [1] lattice_0.20-44  fansi_1.0.3      assertthat_0.2.1 dplyr_1.0.10
 [5] utf8_1.2.2       grid_4.0.2       R6_2.5.1         DBI_1.1.3
 [9] lifecycle_1.0.3  gtable_0.3.0     magrittr_2.0.3   scales_1.2.1
[13] pillar_1.8.1     ggplot2_3.3.6    rlang_1.0.6      cli_3.4.1
[17] generics_0.1.3   vctrs_0.4.2      glue_1.6.2       munsell_0.5.0
[21] compiler_4.0.2   pkgconfig_2.0.3  colorspace_2.0-3 tidyselect_1.2.0
[25] tibble_3.1.8

Thanks & regards!

Installation error

Hi, thanks a lot for making this package.
I have trouble installing it, however. I'm running R 3.3.0 on a Debian 8, and all dependencies are satisfied. This is what I get :

$ R CMD INSTALL largeVis
* installing to library ‘/home/bart/R/x86_64-pc-linux-gnu-library/3.3’
* installing *source* package ‘largeVis’ ...
** libs
g++ -std=c++11 -I/usr/share/R/include -DNDEBUG -I"/home/bart/R/x86_64-pc-linux-gnu-library/3.3/Rcpp/include" -I"/home/bart/R/x86_64-pc-linux-gnu-library/3.3/RcppProgress/include" -I"/usr/lib/R/site-library/RcppArmadillo/include" -fopenmp -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c largeVis.cpp -o largeVis.o
largeVis.cpp: In function ‘void checkVector(const vec&, const string&)’:
largeVis.cpp:25:9: error: ‘const vec’ has no member named ‘has_nan’
if (x.has_nan() || x.has_inf())
^
largeVis.cpp:25:24: error: ‘const vec’ has no member named ‘has_inf’
if (x.has_nan() || x.has_inf())
^
/usr/lib/R/etc/Makeconf:141: recipe for target 'largeVis.o' failed
make: *** [largeVis.o] Error 1
ERROR: compilation failed for package ‘largeVis’
* removing ‘/home/bart/R/x86_64-pc-linux-gnu-library/3.3/largeVis’

installation error when compiling cpp code

In file included from denseneighbors.cpp:5:0:
neighbors.h: In instantiation of ‘AnnoySearch<M, V>::AnnoySearch(const M&, Progress&) [with M = arma::Mat<double>; V = arma::Col<double>]’:
denseneighbors.cpp:55:76:   required from here
neighbors.h:137:75: error: invalid initialization of non-const reference of type ‘Progress&’ from an rvalue of type ‘<brace-enclosed initializer list>’
  AnnoySearch(const M& data, Progress& p) : data{data}, N(data.n_cols), p{p} { }
                                                                           ^

change p{p} to p(p) can fix the compilation error, but the code crash when i do randomProjectionTreeSearc in R.

caught segfault, cause 'memory not mapped'

I installed this package on OSX 10.11.2 using devtools.
Then I ran the below command and caught segfault.

> library(largeVis)
> dat <- as.data.frame(matrix(runif(100*100), 100, 100))
> coords <- vis(dat, check = FALSE, n_tree = 50, tree_th = 700, K = 50, alpha = 1, max.iter = 4)
Searching for neighbors.

 *** caught segfault ***
address 0xfffffffffffffff8, cause 'memory not mapped'

Traceback:
 1: .Call("largeVis_searchTrees", PACKAGE = "largeVis", threshold,     n_trees, K, max_recursion_degree, maxIter, data, verbose)
 2: searchTrees(threshold = tree_threshold, n_trees = n_trees, K = K,     max_recursion_degree = max_depth, maxIter = max_iter, data = x,     verbose = verbose)
 3: randomProjectionTreeSearch(x, n_trees = n_trees, tree_threshold = tree_threshold,     K = K, max_iter = max_iter, max_depth = max_depth, verbose = verbose)
 4: vis(dat, check = FALSE, n_tree = 50, tree_th = 700, K = 50, alpha = 1,     max.iter = 4)

Rcpp used g++ (Homebrew gcc 5.3.0).
I also tried using gcc 4.9.2, but same error occurred.

Compilation failed in Rstudio???

When I use devtools install in Rstudio, It seems that compilation failed:

neighbors.cpp: In function ‘arma::mat searchTrees(const int&, const int&, const int&, const int&, const int&, const mat&, bool)’:
neighbors.cpp:114:25: error: ‘regspace’ is not a member of ‘arma’
arma::vec indices = arma::regspacearma::vec(0, N - 1);
^
neighbors.cpp:114:49: error: expected primary-expression before ‘>’ token
arma::vec indices = arma::regspacearma::vec(0, N - 1);
^
make: *** [neighbors.o] Error 1
ERROR: compilation failed for package ‘largeVis’

  • removing ‘/mnt/datascience/package/R/lib/R/library/largeVis’
    Error: Command failed (1)

searchReverse not correctly finding mutual nearest neighbors

Hi. First of all, thank you for your work on the package.

In edgeweights.cpp, I think the searchReverse routine is supposed to find mutual nearest neighbors found during the nearest neighbors routine. Later in the run method, non-mutual nearest neighbors are created with an edge_weight of 0 so that the wij matrix is symmetrized correctly.

I've found that in most cases, searchReverse doesn't identify already existing mutual nearest neighbors. In the getWIJ method, the locations matrix ends up with lots of duplicates, which is why wij has to be constructed with the add_values parameter set to true.

I think the problem is in the inner loop declaration of the searchReverse method:

  void searchReverse(vertexidxtype id) {
    edgeidxtype p, q;
    for (p = head[id]; p >= 0; p = next[p]) {
      for (q = head[id]; q >= 0; q = next[q]) {
        if (edge_to[q] == id) break;
      }
      reverse[p] = q;
    }
  }

I think the inner loop ought to be:

      for (q = head[edge_to[p]]; q >= 0; q = next[q]) {

i.e. we ought to be searching the other end of the edges incident to id. This doesn't seem to affect any of the unit test results.

issue on dealing with sparseMatrix?

Hi Amos,

Thanks for this nice package for largeVis integration. I tried to test whether we can run sparsematrix with this package. However, I found that the following issue:

library(largeVis)
library(Matrix)
largeDataset <- spMatrix(1000,1000, i=sample(1:1000, 1000), j=sample(1:1000, 1000), x=sample(1:1000, 1000))
neighbors <- randomProjectionTreeSearch(log(largeDataset + 1), K = 10)
Error in UseMethod("randomProjectionTreeSearch") : 
  no applicable method for 'randomProjectionTreeSearch' applied to an object of class "c('dgeMatrix', 'ddenseMatrix', 'generalMatrix', 'geMatrix', 'dMatrix', 'denseMatrix', 'compMatrix', 'Matrix', 'xMatrix', 'mMatrix', 'Mnumeric', 'replValueSp')"

I tried either the newest CRAN version and the github version, both having the same issue above.
Any thoughts?

In addition, have you test your package on the 3 million word vectors from the GoogleNews dataset (or any other datasets with 1m data points and hundreds feature dimensions)? and how does this package perform in terms of time or memory requirement on it?

error: SpMat::SpMat(): invalid row or column index

Hi there, I am trying to use largevis to do clustering. I have about ~200 dataset, each dataset has ~ 1000 - 100000 samples with 2 features (feature number is consistent). While the largevis function works for almost all my dataset, I still got this error message for one of my dataset:


error: SpMat::SpMat(): invalid row or column index
Error in referenceWij(is, x@i, x@x^2, as.integer(threads), perplexity) : 
  SpMat::SpMat(): invalid row or column index
In addition: Warning message:
In largeVis(t(as.matrix(memberships[, c("X", "Y")])), dim = 2, K = K,  :
  The Distances between some neighbors are large enough to cause the calculation of p_{j|i} to overflow. Scaling the distance vector.

I realized that someone had such problem before, and the solution is to install the branch 'hotfix/twobugs', I successfully installed this version as well but no luck. Any ideas? Thanks!

The dataset is here: data.csv

The function I run is: largeVis(t(as.matrix(data[, c('X', 'Y')])), dim=2, K = K, tree_threshold = 100, max_iter = 5,sgd_batches = 1, threads = 1)

Hyperparameters' definition domain

Hello,

I've looked thoroughly the paper discussing about largeVis since it is available in arxiv (since more than 5 months now), and I am still wondering about the following:

  • How many total hyperparameters are there from scratch? From what I am seeing, there are 10 of them (k, n_trees, tree_threshold, max_iter, distance_method, perplexity, M, gamma, alpha, rho), assuming the output dimension is fixed by the user - there are 6 if excluding the Stochastic Gradient Descent part
  • What are the definition domains of each hyperparameter? The question is not existant for M/gamma/alpha/rho (they are obvious), but what about k / n_trees / tree_threshold / max_iter / distance_method?

I am assuming currently that, from a matrix MAT before transposition:

  • k = [1, nrow(MAT)]
  • n_trees = [1, Inf]
  • tree_threshold = [1, nrow(MAT)], where the suggested is ncol(MAT)
  • max_iter = [1, Inf]
  • distance_method = Euclidean or Cosine
  • perplexity = [1, nrow(MAT)] unlike t-SNE where it is [1, floor(nrow(MAT)/3)]

And if using Windows, k*nrow(MAT) < ~4 billion (2^32) else error during projectKNNs (larger than arma sparse matrix max capacity) or even before.

Are my assumptions correct or did I miss something?

OpenMP

Just FYI - There's an issue with OpenMP I'm working on fixing. If you get an error that the c stack limit was exceeded, this is why.

testcfunctions.cpp fails (Win)... what is it supposed to do?

On a github clone with R-3.4.1, the tests from testcfunctions.cpp fail:

testthat results ================================================================
OK: 147 SKIPPED: 1 FAILED: 1
1. Failure: Catch unit tests pass (@test-cpp.R#6) 

and above:

testcfunctions.cpp:9
...............................................................................

testcfunctions.cpp:17: FAILED:
  CATCH_CHECK( testAlias() == 71 )
with expansion:
  83 == 71

testcfunctions.cpp:18: FAILED:
  CATCH_CHECK( testAlias() == 74 )
with expansion:
  97 == 74

testcfunctions.cpp:19: FAILED:
  CATCH_CHECK( testAlias() == 70 )
with expansion:
  68 == 70

testcfunctions.cpp:20: FAILED:
  CATCH_CHECK( testAlias() == 90 )
with expansion:
  88 == 90

etc. But I don't really get what these tests are supposed to test? Looks like you are setting up the RNG with a seed and expect a certain output?

Is this expected to fail on Windows possibly?

iris output is incorrect

Using the lines in Example 1, the output gives no pattern at all. The IRIS data should be clustered.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.