Giter Site home page Giter Site logo

Comments (3)

swkrueger avatar swkrueger commented on June 30, 2024

Thank you for the bug report.

Does it work when you read data from a file instead of the RTL-SDR? For example:

$ rtl_sdr -g 5 -f 433.83M -s 2.4M data.bin  # capture data, hit Ctrl-C to stop
$ ./fastcard -i data.bin -b 4096 -h 4000

Did you try using gdb with a debug build? Also, I would recommend reading from a file instead of directly from the rtlsdr when using valgrind.

Does it work when you use volk_32fc_magnitude_squared_32f_u instead of volk_32fc_magnitude_squared_32f_a? It might be possible that the fftw alignment does not match the volk alignment for some reason.

from thrifty.

liamdiprose avatar liamdiprose commented on June 30, 2024

Thanks for your reply,

Changing the volk function to volk_32fc_magnitude_squared_32f_u seems to have fixed it. Does that mean it was an alignment issue? I can't find any documentation on the difference between the two functions.

Using a data.bin file solved the issue for the specific command I used by the way. Segfaults seemed to be effected by how many files the process had open. I wish I followed your second suggestion first 😆

What next? Is this a special case to be added to the CMakeLists, or an issue-close and patch for by nix package? Either suits me well.

from thrifty.

swkrueger avatar swkrueger commented on June 30, 2024

Yeah, it is probably a memory alignment issue. volk_32fc_magnitude_squared_32f_a assumes that the memory is being allocated with volk_malloc, which will ensure that the memory is properly aligned for SIMD instructions (Neon in the case or ARM). volk_32fc_magnitude_squared_32f_u is for unaligned memory and would be slower and make use of the generic algorithm without Neon instructions.

The issue is probably that the Nix package for either FFTW or libvolk is compiled without Neon support. I was in a rush when I implemented fastcard and cut corners. I assumed that fftw's alignment would be the same as libvolk's alignment, which I think is the case for the Rpi configuration, library versions and architecture I used. It could be that FFTW is using a different alignment or that the FFTW library that you are using is compiled without Neon support and thus not performing any special alignment when fftw_malloc is called. My guess is that it is FFTW. I vaguely remember something about the official Raspbian package for FFTW including a patch to enable Neon. If I remember correctly, you can check the contents of the wisdom file generated by fastcard to check whether fftw is using neon or not -- it should contain something like fftwf_codelet_t2bv_16_neon.

You can probably fix the bug by replacing fftwf_malloc(num_bytes) with volk_malloc(num_bytes, alignment in fastcard/fft.c. But then you'll have the same issue with an opposite configuration where FFTW is compiled with Neon support and libvolk not.

Assuming that my hunch is correct regarding Nix's FFTW not using Neon on the Rpi, you can basically choose any one of the following three solutions:

  1. Fix the Nix package for FFTW to compile it with Neon instructions on the Rpi
  2. Use volk_32fc_magnitude_squared_32f_u instead of volk_32fc_magnitude_squared_32f_a and take the performance hit of using both the volk kernel and FFTW without Neon instructions.
  3. Use volk_malloc instead of fftwf_malloc and take the performance hit of using FFTW without Neon instructions.

Oh, and the number 4096 actually makes sense. It is 4K, which is probably the size of a page. You can check the virtual memory page size using getconf PAGESIZE in a shell. What could be happening is that the volk operation is going out of bounds into the next page when it starts from a misaligned address. This would result in a segfault if the next page isn't allocated. There could be cases where more memory is allocated next to that page, e.g. potentially when you read from a file and the file is mapped into the virtual address space, in which case it will not result in a segfault (but probably lead to incorrect results and unexpected behaviour).

from thrifty.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.