Giter Site home page Giter Site logo

Comments (7)

mtazzari avatar mtazzari commented on August 25, 2024

@fredRos what do you think? Feasible?

from galario.

fredRos avatar fredRos commented on August 25, 2024

yes, we should do that. I'm profiling the code right now and see a number of things we can improve. This one is high-level and makes perfect sense

from galario.

fredRos avatar fredRos commented on August 25, 2024

It is more work and requires more decisions:

  • should fft2d accept a real image? It could not work in place anymore
  • chi2 and sample don't provide access to the Fourier space image, so here it is easy to do
  • could we fftshift the real image? We could save half of the memory transfer

from galario.

mtazzari avatar mtazzari commented on August 25, 2024

A realistic use-case of galario employs only the sample() and/or the chi2() functions.
The other functions are for those who want to play with galario more in detail (like us) and I think it's fine to keep data as complex for all of them.

For sample() and chi2() functions it is easy to implement the change and I would start with them. In this way, internally nothing should change since we cast from dreal* to dcomplex* at the very beginning, before starting any operation.
For the CUDA version, I would do the casting after dreal* data has been copied to the GPU and then I would feed it to the dcomplex* data_d array that has been initialized with 0 imaginary part.

from galario.

fredRos avatar fredRos commented on August 25, 2024

What about shifting the real image? It seems to me like we could do that and only after the shift we'd add the imaginary part, perhaps on the device

from galario.

fredRos avatar fredRos commented on August 25, 2024

kernel unsigned int to float: https://stackoverflow.com/questions/9153861/typecasting-in-cuda-and-cublas

I'm following your suggestion now. I tried to do it in place but that would not allow multiple threads to operate concurrently. But then we have to use 50 % extra memory on the GPU to have a real and complex image until the complex image is properly constructed. This may be a problem for users with high-res images and small memory GPUs

Perhaps I can do the profiling to see if it's faster to do the construction on the CPU and then transfer. On all system I have seen so far it is safe to assume that there is more memory on the host available.

from galario.

fredRos avatar fredRos commented on August 25, 2024

fixed by #45

from galario.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.