Giter Site home page Giter Site logo

std::bad_alloc about fastpca HOT 4 CLOSED

000Justin000 avatar 000Justin000 commented on August 20, 2024
std::bad_alloc

from fastpca.

Comments (4)

lettis avatar lettis commented on August 20, 2024

Dear Junteng,
can you provide me please with a bit more information: is 10^5 the number of observables (i.e. the number of columns) or the number of samples (== rows)?
Can you please give me the shape of your input (# rows x # cols)?

Thanks,
Florian

from fastpca.

000Justin000 avatar 000Justin000 commented on August 20, 2024

Dear Lettis,

The 100000 number is the number of observables. The shape of data is 500000 (rows)‎ and 100000(columns).

On top of that, is there a way to find out the largest k eigenvalue and their corresponding eigenvectors? Diagonalize the whole covariance matrix is too costly.

And I found that the code can not be compiled with intel compilers, or am I doing it wrong?

Thanks!

Best,
Junteng

Sent from my BlackBerry 10 smartphone.
From: Florian Sittel
Sent: Friday, February 5, 2016 1:16 AM
To: lettis/FastPCA
Reply To: lettis/FastPCA
Cc: Junteng Jia
Subject: Re: [FastPCA] std::bad_alloc (#2)

Dear Junteng,
can you provide me please with a bit more information: is 10^5 the number of observables (i.e. the number of columns) or the number of samples (== rows)?
Can you please give me the shape of your input (# rows x # cols)?

Thanks,
Florian


Reply to this email directly or view it on GitHub:
#2 (comment)

from fastpca.

lettis avatar lettis commented on August 20, 2024

Dear Junteng,
indeed, 10^5 observables means > 8*10^10 Bytes = 80 Gb space, just for the covariance matrix. Not included are the space requirements for temporary memory needed for LAPACK and memory for the construction of the covariance matrix.
I guess, this is where the bad_alloc exception comes from ...

To compute such a big data set, you probably need another approach.
I would suggest something like:

  1. compute the pairwise covariances of the observables
  2. define a cutoff and store only the covariances above that cutoff
  3. hope that your cov-matrix will be sparse
  4. use an appropriate algorithm for symmetric sparse matrices to get the eigendecomposition

Unfortunately, the FastPCA code is not written for these sizes or sparse cov-matrices. Of course, you can try to adapt it, but I guess it will be easier to write your own code for that ...

On a side node:
Of course, I do not know what your application is.
But be reminded that for a large number of observables that may lie in very different spaces, it may be prudent to first normalize the input variables, effectively computing the correlations instead of covariances.

To answer your second question: I never used the Intel compiler, since the latest version did not fully support the C++11 standard (at least when I developed the code, I do not know it's status today).
The code was developed with the GCC.

Cheers,
Florian

from fastpca.

000Justin000 avatar 000Justin000 commented on August 20, 2024

Dear Florian,

Thanks for your reply! I noticed in the package ARPACK, which is used in Matlab, it provide the function to find k eigenvector with the largest eigenvalues. It is very useful since its a O(N'2) algorithm. I will try to figure to link against that routine if possible.

Have a good week!

Best Regards,
Junteng

Sent from my BlackBerry 10 smartphone.
From: Florian Sittel
Sent: Friday, February 5, 2016 2:06 AM
To: lettis/FastPCA
Reply To: lettis/FastPCA
Cc: Junteng Jia
Subject: Re: [FastPCA] std::bad_alloc (#2)

Dear Junteng,
indeed, 10^5 observables means > 8*10^10 Bytes = 80 Gb space, just for the covariance matrix. Not included are the space requirements for temporary memory needed for LAPACK and memory for the construction of the covariance matrix.
I guess, this is where the bad_alloc exception comes from ...

To compute such a big data set, you probably need another approach.
I would suggest something like:

  1. compute the pairwise covariances of the observables
  2. define a cutoff and store only the covariances above that cutoff
  3. hope that your cov-matrix will be sparse
  4. use an appropriate algorithm for symmetric sparse matrices to get the eigendecomposition

Unfortunately, the FastPCA code is not written for these sizes or sparse cov-matrices. Of course, you can try to adapt it, but I guess it will be easier to write your own code for that ...

On a side node:
Of course, I do not know what your application is.
But be reminded that for a large number of observables that may lie in very different spaces, it may be prudent to first normalize the input variables, effectively computing the correlations instead of covariances.

To answer your second question: I never used the Intel compiler, since the latest version did not fully support the C++11 standard (at least when I developed the code, I do not know it's status today).
The code was developed with the GCC.

Cheers,
Florian


Reply to this email directly or view it on GitHub:
#2 (comment)

from fastpca.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.