anthonix / ffts Goto Github PK

View Code? Open in Web Editor NEW

533.0 533.0 213.0 1.64 MB

The Fastest Fourier Transform in the South

Home Page: http://anthonix.com/ffts

License: Other

Shell 20.18% C 65.29% Java 0.35% Assembly 3.79% CMake 1.02% Makefile 5.97% M4 3.40%

ffts's People

Contributors

Stargazers

Watchers

Forkers

oyturas simap matszpk spart-traps mattles datenwolf henrygouk rajapanidepu sannorozco robertmassaioli rtoy dennisss yazon syntheticpp lmdsp playlaughlovelearn moloned zivvlee wangyung jay2002 benvd linkotec ghisvail simudream jfsantos biotrump sankalpg robclouth dacao pierrebizouard fokkedj jefferson13 jwdai philburr victordion liuhuihz tjachmann musicalentropy armingjazi mcanthony csete jaykickliter sqjoker tphilippe ruiaraujo grevutiu-gabriel tengyifei jardar h6ah4i abbyssoul linkai19910508 yang123vc mfkiwl derkreature liyancas gdkar draekko ralphtandetzky chao01 dzungpv jgom pimpmypixel coryxie colinbuchanan wgfi110 akkaze papiguy marcusmueller rryan adglkh prepare lordalbior sinozope zhu-hui gson1703 elongame menzi11 invokecd jerome-pouiller spkorhonen keiviln myapiary cube3power codingforfun villains srafay joaorossi mpoullet cueaudio ranka47 guokr1991 celeryty anhvanthe cblomert ediloren mu-l liuguoyou lamchiminhmark mjendrusch davens

ffts's Issues

Angstrom: aclocal-1.14: command not found

On Angstrom Linux it would't make:

# make
CDPATH="${ZSH_VERSION+.}:" && cd . && aclocal-1.14 -Im4
/bin/sh: aclocal-1.14: command not found
make: *** [aclocal.m4] Error 127

To fix this I had to add these symlinks:

# ln /usr/bin/automake /usr/bin/automake-1.14
# ln /usr/bin/aclocal /usr/bin/aclocal-1.14

I'm not very familiar to autotools, so don't know how to fix this properly.

Make FFTS work on all architectures

I would like to use FFTS in a piece of FOSS engineering software, however that same software needs to operate on multiple different architectures including i386 and armel without NEON support. There should be an architecture-independent module added (probably written in C) that acts as a fallback to allow FFTS operation on architectures that don't support SSE, x86-64 registers, NEON, etc. There would be a significant performance penalty I know, but at least in my case operation in any form is better than not having FFT support at all on non-x86-64 architectures.

FFTS output difference with fft in Matlab

Hello, I am converting code from Matlab to java/c to use in Android device. I can do this with pure java fft lib like Jtransform and Jwave but it slow. I want to use FFTS but the out put not the same, i don't know why.

What would make good benchmarks for the Haskell binding?

Hi there, I am still in the process of writing the Haskell binding, it seems to be working now and I have bound every function in include/ffts.h. At this point in time it all seems to work but that is only half the battle; I want to know how fast it is as well so it is time for Benchmarking.

I have started writing the benchmarking and you can view the code that I have written or you can just review the interesting results I have so far:

Running 1 benchmarks...
Benchmark bench-ffts: RUNNING...
warming up
estimating clock resolution...
mean is 1.441820 us (640001 iterations)
found 3913 outliers among 639999 samples (0.6%)
  3507 (0.5%) high severe
estimating cost of a clock call...
mean is 38.08749 ns (12 iterations)

benchmarking 1d Real 16 Values
mean: 9.497489 us, lb 9.452236 us, ub 9.553633 us, ci 0.950
std dev: 256.6657 ns, lb 215.7990 ns, ub 298.1516 ns, ci 0.950
found 17 outliers among 100 samples (17.0%)
  11 (11.0%) high mild
  6 (6.0%) high severe
variance introduced by outliers: 20.965%
variance is moderately inflated by outliers

benchmarking 1d Real 512 Values
mean: 267.5601 us, lb 264.9554 us, ub 274.5509 us, ci 0.950
std dev: 19.89528 us, lb 4.578779 us, ub 40.04049 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
  2 (2.0%) high mild
  2 (2.0%) high severe
variance introduced by outliers: 67.655%
variance is severely inflated by outliers

benchmarking 1d Real 1024 Values
mean: 546.5007 us, lb 540.0555 us, ub 562.0257 us, ci 0.950
std dev: 47.99116 us, lb 21.33818 us, ub 87.39147 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
  3 (3.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 74.830%
variance is severely inflated by outliers

benchmarking 1d Real 2048 Values
mean: 1.215410 ms, lb 1.196909 ms, ub 1.241009 ms, ci 0.950
std dev: 110.3181 us, lb 84.76925 us, ub 143.6377 us, ci 0.950
found 8 outliers among 100 samples (8.0%)
  3 (3.0%) high mild
  5 (5.0%) high severe
variance introduced by outliers: 75.876%
variance is severely inflated by outliers

benchmarking 1d Complex 16 Values
mean: 18.60266 us, lb 18.52926 us, ub 18.69820 us, ci 0.950
std dev: 428.8377 ns, lb 348.4019 ns, ub 515.2131 ns, ci 0.950
found 14 outliers among 100 samples (14.0%)
  10 (10.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 16.150%
variance is moderately inflated by outliers

benchmarking 1d Complex 512 Values
mean: 611.0597 us, lb 607.6002 us, ub 614.6598 us, ci 0.950
std dev: 18.08215 us, lb 15.99014 us, ub 21.20516 us, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 24.794%
variance is moderately inflated by outliers

benchmarking 1d Complex 1024 Values
mean: 1.363070 ms, lb 1.344092 ms, ub 1.387594 ms, ci 0.950
std dev: 110.0677 us, lb 89.10057 us, ub 130.3623 us, ci 0.950
found 14 outliers among 100 samples (14.0%)
  2 (2.0%) high mild
  12 (12.0%) high severe
variance introduced by outliers: 70.758%
variance is severely inflated by outliers

benchmarking 1d Complex 2048 Values
mean: 3.202281 ms, lb 3.193803 ms, ub 3.213675 ms, ci 0.950
std dev: 50.09133 us, lb 37.46054 us, ub 72.28393 us, ci 0.950
found 4 outliers among 100 samples (4.0%)
  2 (2.0%) high severe
variance introduced by outliers: 8.483%
variance is slightly inflated by outliers

benchmarking 1d Reverse Real 16 Values
mean: 8.949454 us, lb 8.928603 us, ub 8.975940 us, ci 0.950
std dev: 120.6416 ns, lb 99.41294 ns, ub 155.4416 ns, ci 0.950
found 4 outliers among 100 samples (4.0%)
  3 (3.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 6.579%
variance is slightly inflated by outliers

benchmarking 1d Reverse Real 512 Values
mean: 248.8768 us, lb 247.9895 us, ub 250.3459 us, ci 0.950
std dev: 5.715797 us, lb 3.902308 us, ub 9.094875 us, ci 0.950
found 7 outliers among 100 samples (7.0%)
  4 (4.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 16.144%
variance is moderately inflated by outliers

benchmarking 1d Reverse Real 1024 Values
mean: 507.0067 us, lb 505.2487 us, ub 508.9387 us, ci 0.950
std dev: 9.465558 us, lb 8.227033 us, ub 11.19390 us, ci 0.950
found 2 outliers among 100 samples (2.0%)
  2 (2.0%) high mild
variance introduced by outliers: 11.354%
variance is moderately inflated by outliers

benchmarking 1d Reverse Real 2048 Values
mean: 1.142064 ms, lb 1.128539 ms, ub 1.159102 ms, ci 0.950
std dev: 77.62682 us, lb 65.57185 us, ub 92.82481 us, ci 0.950
found 22 outliers among 100 samples (22.0%)
  11 (11.0%) high mild
  11 (11.0%) high severe
variance introduced by outliers: 63.571%
variance is severely inflated by outliers
Benchmark bench-ffts: FINISH

That seems pretty good but I don't know how to compare it with the benchmarks that you already seem to have written. So my questions are:

How can I compare my benchmark results with the benchmarks that you have already written?
How does my first version of the Haskell binding compare to the raw C version?
Are there any benchmarks that you can suggest that I try and write so that I can really flex the library?

Thankyou for your time.

Website down, releases not downloadable

Looks like the anthonix.com website this repo links to is down; server's not responding and hasn't been since at least the start of October. All the release downloads were hosted on that site.

Is there a new site where the FFTS releases can be downloaded? Or would it be possible to tag a Release here on GitHub so distribution tarballs could be downloaded from here, or at least a named version of the repo code downloaded?

build fails on x86

I'm trying to build ffts on my Atom. First, I had to manually enable SSE for gcc:
./configure --enable-sse --enable-single CFLAGS="-msse -msse2"
make

but then I got a bunch of errors from sse.s, all about unknown registers like
sse.s:56: Error: bad register name `%rdi)'

So make is trying to build 64 bit asm on a 32 bit machine. Is this a Issue with the build system or is 32 bit just not supported?

edit: I used the 0.7 release

Optimize zero-vectors.

This is more of a question, as I don't immediately see it in the code (because it uses code generation).

Does FFTS optimise the transformation of zero vectors? As the FFT of zero vector is again the same zero vector. This would be especially useful in the case of 2D (and higher), as I know up front that for some transforms, I will have a matrix where there is only in the upper-left corner some data, and everything else is zero.

As FFTS first does row-by-row Fourier transforms, I would like to pass it the information that only rows 0 to k are non-zero, so that it can skip all the calculations on these rows.

iOS Crash Bug

After having built FFTS for the iPhone I've noticed a bug. When ffts_init_1d_real(1024, -1); is called via an app not ran through Xcode then the app crashes, but doesn't crash when it is ran through Xcode. If 1024 is replaced with either 4096 or 8192 then no issues arise. For what I want to achieve I need to be able to set the value to 512 and 1024.

Here's a sample project that outlines the error - run with Xcode and run without and you'll see that the app will crash when not ran through Xcode: https://www.dropbox.com/s/x6fv55wkcw1sk62/FFTSTest.zip

Device: iPhone 5, OS: iOS 7.0.4, SDK Version: 7.0

EDIT: After checking the crash reports from my device, the most common description is :

Exception Type: EXC_CRASH (SIGKILL - CODESIGNING)
Exception Codes: 0x0000000000000000, 0x0000000000000000

Thread 0 Crashed:
0 libsystem_platform.dylib 0x3aaecca8 sys_icache_invalidate + 8
1 FFTSTest 0x000e9fa0 ffts_generate_func_code + 6348
2 FFTSTest 0x000e7004 ffts_init_1d + 2244
3 FFTSTest 0x000e7ca4 ffts_init_1d_real + 156

Thread 0 crashed with ARM Thread State (32-bit):
r0: 0x02c24000 r1: 0x00002000 r2: 0x00000005 r3: 0x00000000
r4: 0x17d79780 r5: 0x02c24000 r6: 0x02c24a00 r7: 0x27d65d98
r8: 0x00000020 r9: 0x00101000 r10: 0x00000200 r11: 0x00000f00
ip: 0x80000000 sp: 0x27d65d28 lr: 0x000c0fa4 pc: 0x3b2bfca8
cpsr: 0x60000010

P.S. Thanks for creating FFTS, where I've used it that hasn't caused the above crash, it really has been the fastest FFT :)

Documenting FFTS

Using knowledge of how the DFT is supposed to work (by reading through the DSP Guide) and by reading through the source code, I was able to work out how this library is supposed to be used. However, the code can do with some documentation. I have the following questions:

I was wondering roughly how long until you intended to update the "Coming Soon" on the Documentation Page?
In bullet point form, what content did you want to appear / were you planning to write on the blank Documentation page?
If you had no plans to write up that guide then please let me know, I was planning to write a blog post about it and you can take anything from that blog post that you like. (This is just a heads up)
Instead of "Coming Soon", why not link back to this issue on the documentation page.

At any rate, thankyou for writing FFTS, it is awesome and permissive. You rock.

Tips on processing vector with sizes that is not n power of 2

Just wondering if anyone has tips on how to use ffts on vector with sizes that are not n power of 2. Off the top of my head, I suppose I could pad the signal with zeros?

Inverse real 2D transforms

I get a segfault when doing 2D inverse transforms. It looks like it's expecting more memory to be allocated by the output buffer than is actually needed for the final image, since the output buffer is used for storing the results of the complex-to-complex 1D transforms.

adding -llog to build android

Hi,

i'm trying to build ffts (btw, excellent work, thanks) for the different arm arch of android. At the first try, build_android.sh complains about _android_print_log not beeing available.

I just added -llog to the configure directive:
./configure --enable-neon --build=${CONFBUILD} --host=${CONFTARG} --prefix=$INSTALL_DIR LIBS="-lc -lgcc -llog"

This way, it compiles

Is there a better way to achieve this?

Suggestions for improving performance?

Hi,
I've been using this FFT code:
http://www.apo33.org/pub/puredata/APO/librairies_PD/recup/paraloeuil_v1_pd/src/d_mayer_fft.c
But the performance wasn't what I expected so I switched to FFTS. However, I'm finding that FFTS performs slightly worse than this code which is surprising. I'm using it for real-time audio processing for a VST and I ideally would like up to 100 simultaneous 1024 size FFTs running. I can only get 30 at the moment before the cpu gets to 40%, which is too much for an audio plugin.
I'm developing on Windows. I pre-built the lib with cmake. I'm allocating the aligned memory like this:

    static void* fft_alloc(int size) {
        float FFTS_ALIGN(32) *data = (float*)_aligned_malloc(size, 32);
        memset(data, 0, size);
        return data;
    }

And this is the method than executes it:

void FFT::forward_real(int size, const void* in, void* out){
    init(size);
    ffts_execute(real_fftPlan, in, out);
}

init(size); only does anything if the size has changed, so it's not initializing the fft every execution.
Any idea as to what's slowing things down?

Thanks

RTEMS port

Before I start an intern on it for the summer has anybody looked at porting FFTS to use RTEMS?

https://www.rtems.org/
https://devel.rtems.org/wiki/Developer/VirtualMachines/VirtualBox

Correctness tests?

Is ffts ever checked for correctness, because I'm getting highly suspicious about the correctness. Comparing results from FFTS and matlab, leads me to believe that FFTS is broken for 2D non-square transforms.

fatal error: 'ffts.h' file not found

Hi, I´m trying to build for OSX but I´m getting this error.
I cd´d into the download directory, did ./configure --enable-sse --enable-single --prefix=/usr/local, but then on make I get this error. ffts.h is clearly there, which makes me think the makefile is wrong, but I don´t know enough about them to understand the problem.
In the makefile there is this:
includedir = ${prefix}/include
If I move ffts.h there (using what I passed in as the prefix arg), the error moves onto malloc.h. So somethings up.

Cheers,

Rob

Comparison with FFTW

I've written a benchmark to compare FFTS with FFTW:
https://github.com/syntheticpp/ffts/blob/bench/tests/bench_fftw.c
https://github.com/syntheticpp/ffts/tree/bench

And tested FFTS on two machines: on CPU 1 FFTS is faster than FFTW and on CPU 2 (newer CPU) FFTW is faster.
Does anyone have an explanation for this?

CPU 1:

model name  : Intel(R) Core(TM) i7 CPU 870  @ 2.93GHz
cache size  : 8192 KB
flags       : ... sse sse2 ssse3 sse4_1 sse4_2 ...
bogomips    : 5862.75

N = 2^ 2 =       4: Speed  1.5 GigaFLOPS  FFTS/FFTW = 1.31
N = 2^ 3 =       8: Speed  3.8 GigaFLOPS  FFTS/FFTW = 1.38
N = 2^ 4 =      16: Speed  8.0 GigaFLOPS  FFTS/FFTW = 1.88
N = 2^ 5 =      32: Speed 11.9 GigaFLOPS  FFTS/FFTW = 1.28
N = 2^ 6 =      64: Speed 15.5 GigaFLOPS  FFTS/FFTW = 1.24
N = 2^ 7 =     128: Speed 17.8 GigaFLOPS  FFTS/FFTW = 1.16
N = 2^ 8 =     256: Speed 18.6 GigaFLOPS  FFTS/FFTW = 1.14
N = 2^ 9 =     512: Speed 19.2 GigaFLOPS  FFTS/FFTW = 1.11
N = 2^10 =    1024: Speed 19.2 GigaFLOPS  FFTS/FFTW = 1.12
N = 2^11 =    2048: Speed 19.0 GigaFLOPS  FFTS/FFTW = 1.19
N = 2^12 =    4096: Speed 18.5 GigaFLOPS  FFTS/FFTW = 1.23
N = 2^13 =    8192: Speed 18.3 GigaFLOPS  FFTS/FFTW = 1.31
N = 2^14 =   16384: Speed 16.7 GigaFLOPS  FFTS/FFTW = 1.25
N = 2^15 =   32768: Speed 15.5 GigaFLOPS  FFTS/FFTW = 1.29
N = 2^16 =   65536: Speed 15.3 GigaFLOPS  FFTS/FFTW = 1.31
N = 2^17 =  131072: Speed 15.1 GigaFLOPS  FFTS/FFTW = 1.28
N = 2^18 =  262144: Speed 13.5 GigaFLOPS  FFTS/FFTW = 1.20

CPU 2:

model name  : Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz
cache size  : 3072 KB
flags       : ... sse sse2 ssse3 sse4_1 sse4_2 avx ...
bogomips    : 5387.49

N = 2^ 2 =       4: Speed  1.4 GigaFLOPS  FFTS/FFTW = 1.25
N = 2^ 3 =       8: Speed  4.0 GigaFLOPS  FFTS/FFTW = 1.50
N = 2^ 4 =      16: Speed  8.6 GigaFLOPS  FFTS/FFTW = 1.81
N = 2^ 5 =      32: Speed 12.3 GigaFLOPS  FFTS/FFTW = 1.17
N = 2^ 6 =      64: Speed 17.9 GigaFLOPS  FFTS/FFTW = 0.91
N = 2^ 7 =     128: Speed 20.9 GigaFLOPS  FFTS/FFTW = 0.78
N = 2^ 8 =     256: Speed 22.4 GigaFLOPS  FFTS/FFTW = 0.70
N = 2^ 9 =     512: Speed 23.0 GigaFLOPS  FFTS/FFTW = 0.67
N = 2^10 =    1024: Speed 23.5 GigaFLOPS  FFTS/FFTW = 0.64
N = 2^11 =    2048: Speed 23.3 GigaFLOPS  FFTS/FFTW = 0.72
N = 2^12 =    4096: Speed 22.3 GigaFLOPS  FFTS/FFTW = 0.81
N = 2^13 =    8192: Speed 22.0 GigaFLOPS  FFTS/FFTW = 0.88
N = 2^14 =   16384: Speed 21.0 GigaFLOPS  FFTS/FFTW = 0.85
N = 2^15 =   32768: Speed 20.2 GigaFLOPS  FFTS/FFTW = 0.88
N = 2^16 =   65536: Speed 17.7 GigaFLOPS  FFTS/FFTW = 0.87
N = 2^17 =  131072: Speed 15.1 GigaFLOPS  FFTS/FFTW = 0.88
N = 2^18 =  262144: Speed 12.0 GigaFLOPS  FFTS/FFTW = 0.85

Support in place processing

For low memory devices, support for in place processing would help reduce memory usage.

No ARM64 support

The code will not compile for apple devices that support 64-bit which is now a requirement.

64-bit support would be great.

FFT across a certain dimension in a multi-dimensional array

Just wondering if it's possible to perform the following MATLAB style fft operation:

X = fft(x,[],dim)

where dim specifies which dimension the fft operation is performed. e.g. fft(x,[],2) simultaneously transforms all rows, with each row as an 1D signal.

Q: Is the algorithm suitable for Java implementation on Android, without native?

Would it still maintain performance advantage over other java based implementations, like https://sites.google.com/site/piotrwendykier/software/jtransforms (single threaded)?

ffts_real only works for single-precision

Should also support double-precision

Segfault on x86 (misalignment in generated code)

I ran into a segfault when using data in a c++ std::array as the input and output of a 1d real transform. Here's a test case:

#include "ffts/ffts.h"

#include <array>
#include <complex>

int main()
{
    constexpr int size = 128;

    std::array<float, size> in;
    std::array<std::complex<float>, size / 2 + 1> out;

    in.fill(0.0);

    auto plan = ffts_init_1d_real(size, NEGATIVE_SIGN);
    ffts_execute(plan, in.data(), out.data());
    ffts_free(plan);
}

After noticing that it only happens with clang, and does not happen with a regular array (e.g. float in[size]), I had a chat with Chandler Carruth on the llvm IRC channel, the conclusion being:

12:41 <+chandlerc> if the segfault is occurring on a 'movaps' instruction, then 
                   its a common difference between clang generated code and gcc 
                   generated code on x86: gcc generates code which is *much* 
                   more tolerant of misalignment than clang does. if this 
                   library is misaligning the stack for example when calling 
                   back into C++ code, it can very easily trigger this
12:42 <+chandlerc> generated functions are fine in gdb, just 'disass' to look at 
                   the assembly
12:42 < SaBer> chandlerc: seems you are right: movaps 0x0(%rsi,%rax,4),%xmm7
12:42 <+chandlerc> hah
12:42 <+chandlerc> sorry,
12:42 <+chandlerc> =/
12:43 <+chandlerc> we've seen this in the JVM, Python, Ruby, and every other 
                   code generator so far
12:43 <+chandlerc> its a bug in ffts -- it needs to ensure the stack and 
                   variables are properly aligned according to the ABI when 
                   caling back into C/C++ code

build shared lib

Is there any particular reason that building a shared lib is disabled (LT_INIT([disable-shared])) in configure.ac?
I'd like to use FFTS as a shared lib, and if I remove that restriction it builds fine (on OSX), but is there something I'm missing - am I gonna have a bad time if I try and dlopen libffts.dylib?

Segfault with 2D real transform

I'm getting a segfault with this code:

#include <ffts/ffts.h>
#include <xmmintrin.h>
#include <string.h>

int main(void)
{
        float *input = _mm_malloc(64 * 64 * sizeof(float), 32);
        float *fft_input = _mm_malloc(64 * (64 / 2 + 1) * 2 * sizeof(float), 32);
        memset(input, 0, 64 * 64 * sizeof(float));

        ffts_plan_t *fft = ffts_init_2d_real(64, 64, -1);

        if(!fft)
        {
                fprintf(stderr, "Not supported.");
                return 0;
        }

        ffts_execute(fft, input, fft_input);

        ffts_free(fft);

        _mm_free(input);
        _mm_free(fft_input);

        return 0;
}

Running this command:

valgrind --track-origins=yes ./ffts_test

Gives this output:

==18817== Invalid write of size 8
==18817== at 0x401E7E: ffts_execute_nd_real (ffts_real_nd.c:45)
==18817== by 0x400CBF: main (main.c:19)
==18817== Address 0x5509320 is 0 bytes after a block of size 16,896 alloc'd
==18817== at 0x4C2A896: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18817== by 0x4C2A987: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18817== by 0x400C13: main (main.c:7)
==18817==
==18817== Conditional jump or move depends on uninitialised value(s)
==18817== at 0x401EB0: ffts_execute_nd_real (ffts_real_nd.c:63)
==18817== by 0x400CBF: main (main.c:19)
==18817== Uninitialised value was created by a heap allocation
==18817== at 0x4C2A896: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18817== by 0x4022C7: ffts_init_nd_real (ffts_real_nd.c:112)
==18817== by 0x402559: ffts_init_2d_real (ffts_real_nd.c:153)
==18817== by 0x400C70: main (main.c:11)
==18817==
==18817== Use of uninitialised value of size 8
==18817== at 0x400DC0: ffts_free (ffts.c:60)
==18817== by 0x400CC8: main (main.c:21)
==18817== Uninitialised value was created by a heap allocation
==18817== at 0x4C2A896: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18817== by 0x4022C7: ffts_init_nd_real (ffts_real_nd.c:112)
==18817== by 0x402559: ffts_init_2d_real (ffts_real_nd.c:153)
==18817== by 0x400C70: main (main.c:11)
==18817==
==18817== Jump to the invalid address stated on the next line
==18817== at 0x0: ???
==18817== by 0x515AEA4: (below main) (libc-start.c:260)
==18817== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==18817==
==18817==
==18817== Process terminating with default action of signal 11 (SIGSEGV)
==18817== Bad permissions for mapped region at address 0x0
==18817== at 0x0: ???
==18817== by 0x515AEA4: (below main) (libc-start.c:260)
==18817==
==18817== HEAP SUMMARY:
==18817== in use at exit: 71,240 bytes in 23 blocks
==18817== total heap usage: 38 allocs, 15 frees, 71,992 bytes allocated

I'm using Ubuntu 13.04 64 bit.

How to use it in the Android APP?

Hi, I am sorry to ask this basic question in here.
I am using the Android Studio to build an application (using Java language mainly), can some one tell me how to import FFTS to the application and test it? Which part I should import into?
Thank you.

Building failed on Xcode 5

Hi.
I've tried building ffts on my Mac, by running "build_iphone.sh" and failed.
(OSX 10.8.5, Xcode 5.0.2, with iOS SDK 7.0 or 6.1(failed both). ffts is the newest release.)

The console message:
"""
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/Users/ulyness/Downloads/ffts-master/missing: Unknown '--is-lightweight' option
Try '/Users/ulyness/Downloads/ffts-master/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for arm-eabi-strip... no
checking for strip... strip
configure: WARNING: using cross tools not prefixed with host triplet
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... i386-apple-darwin12.5.0
checking host system type... arm-unknown-eabi
checking for arm-eabi-g++... no
checking for arm-eabi-c++... no
checking for arm-eabi-gpp... no
checking for arm-eabi-aCC... no
checking for arm-eabi-CC... no
checking for arm-eabi-cxx... no
checking for arm-eabi-cc++... no
checking for arm-eabi-cl.exe... no
checking for arm-eabi-FCC... no
checking for arm-eabi-KCC... no
checking for arm-eabi-RCC... no
checking for arm-eabi-xlC_r... no
checking for arm-eabi-xlC... no
checking for g++... g++
checking whether the C++ compiler works... no
configure: error: in /Users/ulyness/Downloads/ffts-master': configure: error: C++ compiler cannot create executables Seeconfig.log' for more details
make: *** No rule to make target clean'. Stop. make: *** No targets specified and no makefile found. Stop. make: *** No rule to make targetinstall'. Stop.
"""

And in the config.log, I found warning messages like this:
"""
ld: warning: ld: warning: ignoring file /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/lib/crt1.o, missing required architecture x86_64 in file /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/lib/crt1.o (4 slices)ignoring file /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/lib/libstdc++.dylib, missing required architecture x86_64 in file /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/lib/libstdc++.dylib (2 slices)
ld: warning: ignoring file /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/lib/libSystem.dylib, missing required architecture x86_64 in file /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/usr/lib/libSystem.dylib (2 slices)
ld: dynamic main executables must link with libSystem.dylib for architecture x86_64
"""

I've google it, and found some detail. It seems like some problems occur when changing Xcode from older version to 5.0. But I can't find a better solution except delete Xcode 5.0 and go back to older version.

Any ideas or solutions?

MSVC support

Hi,

first, thanks for making this great library available to all, the benchmarks look very promising.
I would like to use the library under Windows and MacOS, but it seems MSVC is not currently supported. Are there any plans to add support for Windows support anytime in the future ?

Best,
Lorcan

static version depends on neon

Hi,

having a static version that works on x86 (SSE2 and non-sse2) would be nice.

Right now I can't use the dynamic code because porting it to Visual C++ 2010 is a nightmare. So I was hoping to compile a static version, only to find that it calls into neon specific code.

Plans to implement inverse fft?

Is there any plan to implement inverse fft, if there isn't an implementation I've missed.

Thanks!

results are conjugate of expected

I sincerely hope this is a stupid question and that you can set me straight.

I've built FFTS on x86_64 and am finding the results of the 1D and 2D (perhaps others, haven't tried) to have negated imaginary components, at least compared to other trusted FFT implementations, like numpy. See below:

X Y     FFTS                     NUMPY
        Re        Im
size 2
0 1     1.000000 0.000000       1.+0.j
1 1    -1.000000 0.000000      -1.+0.j  
size 4                  
0 1     6.000000 0.000000       6.+0.j
1 1    -2.000000 -2.000000     -2.+2.j
2 1    -2.000000 0.000000      -2.+0.j
3 1    -2.000000 2.000000      -2.-2.j
size 8                  
0 1    28.000000 0.000000       28.+0.j
1 1    -4.000000 -9.656855      -4.+9.65685425j
2 1    -4.000000 -4.000000     -4.+4.j
3 1    -4.000000 -1.656854      -4.+1.65685425j
4 1    -4.000000 0.000000      -4.+0.j
5 1    -4.000000 1.656854       -4.-1.65685425j
6 1    -4.000000 4.000000      -4.-4.j
7 1    -4.000000 9.656855       -4.-9.65685425j

FFTS results come from the tests/test program. I can't think of a reason why I'm getting the conjugate of the expected result, and I'm hoping someone else can!

Build fails on x86_64 without --enable-sse

simply because ffts.c includes macros.h, which, without any of the accelerator headers, doesn't have a way to know what type V should be.

This is either a bug or should be specified in the README.

VFP version doesnt work with sign=1

Firstly, I appreciate your efforts to create this amazing library. Unfortunatelly support for VFP was not finished. Do you can fix support for VFP by adding support for sign=1? Computations for sign=1 are wrong (generates wrong results). I will be very glad, when this bug would be fixed quickly.
Edit: Sorry. Bug concerns sign=1 (not -1). Simply I forgot :(

Real version of FFT

Hi everyone.

Just a doubt. Is there any specific/documented way to use a real 2d fft/ifft?

According to ffts.h file, a real fft of N elements expects a real input for a real-to-complex (r2c) transform, and a real ifft expects a complex input for a complex-to-real (c2r) transform. Does this mean that the initial allocation for the forward transform must be done as _mm_malloc(N * sizeof(float)) and the allocation for the inverse transform must be done as _mm_malloc(((N / 2) + 1) * sizeof(float))?

Experimenting with several forms of allocation crashes the library for me. Any help?

Guidance setting up a 2d fft, reverse and forward?

Hey, trying hard to use this great fft library. Is there a short example of how to use ffts_init_2d_real() with ffts_execute? Only example is a 1d. And what I can glean from a couple of bug reports on use of ffts_init_2d_real() leaves me confused. Or should I use Stack Overflow?

Backwards transforms on ARM

Need to swap some add and sub instructions for backward transforms to work on ARM. Will probably delay doing this until I implement intermediate form, which will fix it anyway.

Strange ARM results with iOS device

Can someone please answer this question, have you tried to run the ARM neon code on an iOS device? I am getting very incorrect values with ffts_init_1d_real forward transform starting with any signal size 16 and above. I am testing with i = i + 1 so 1,2,3,4... etc.

At this point I have tried this master branch and a few forks but it seems like no matter what I try there is something wrong with the results on ARM. The same exact code I have works perfectly with SSE but gives completely wrong results on ARM.

icc compiler problem

As I know, the default compiler of ffts code on linux is gcc.
Anyone knows how to replace icc to gcc?

I used the command "./configure CC=icc CFLAGS=icc --enable-sse --enable-single --prefix=/usr/local", but it didn't work.

Thanks!!

Single and double precision?

When running ./configure with --enable-single enabled, does that preclude the ability to perform an FFT on a double precision data? I'm sorry if this is a stupid question, but I can't find an explanation of the options anywhere.

2D R2C/C2R transform on NEON gives bus error

Hi,

I'm using ffts to process pictures, and was about to switch from naive C2C transforms to R2C/C2R, when I noticed that it was crashing on ARM, because of alignment issues.
The nd transform is calling the 1d transform on buffers like: a=0xb6dd3400 b=0xb6e59408.

I'm checking whether I can fix that...

Linux: Make broken

I tried compiling the latest release for this on Ubuntu 16.04.

this commit 92c8a59, broke the makefile.

There is still one double_t in src/ftts_trig.c

Here is the error while compiling:

ffts_trig.c: In function ‘ffts_generate_cosine_sine_pow2_32f’:
ffts_trig.c:884:5: error: unknown type name ‘constffts_’
     constffts_ double_t *FFTS_RESTRICT hs;
     ^
ffts_trig.c:884:25: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘*’ token
     constffts_ double_t *FFTS_RESTRICT hs;
                         ^
ffts_trig.c:914:5: error: ‘hs’ undeclared (first use in this function)
     hs = FFTS_ASSUME_ALIGNED_16(&half_secant[2 * offset]);

After changing I got this error while compiling

make[2]: Entering directory '/tmp/ffts/tests'
gcc -DHAVE_CONFIG_H -I. -I..     -g -O2 -MT test.o -MD -MP -MF .deps/test.Tpo -c -o test.o test.c
mv -f .deps/test.Tpo .deps/test.Po
/bin/bash ../libtool  --tag=CC   --mode=link gcc  -g -O2   -o test test.o ../src/libffts.la -lm 
libtool: link: gcc -g -O2 -o test test.o  ../src/.libs/libffts.a -lm
../src/.libs/libffts.a(ffts.o): In function `ffts_init_1d':
/tmp/ffts/src/ffts.c:434: undefined reference to `ffts_chirp_z_init'
collect2: error: ld returned 1 exit status
Makefile:360: recipe for target 'test' failed
make[2]: *** [test] Error 1
make[2]: Leaving directory '/tmp/ffts/tests'
Makefile:476: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/tmp/ffts'
Makefile:385: recipe for target 'all' failed
make: *** [all] Error 2

How to support mips ?

It look like only support x86 and arm, how to do it on mips cpu ? Thanks

Building on Debian Jessie?

x86_64.

The configure throws a syntax error:

 $ ./configure --enable-sse --enable-single --prefix=/usr/local
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
/home/kyle/Projects/ffts/missing: Unknown '--is-lightweight' option
Try '/home/kyle/Projects/ffts/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
./configure: line 4600: syntax error near unexpected token `disable-shared'
./configure: line 4600: `LT_INIT(disable-shared)'

So I'm not too surprised when I see

$ make
 cd . && automake-1.14 --foreign
configure.ac:6: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated.  For more info, see:
configure.ac:6: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation
configure.ac:17: error: required file './compile' not found
configure.ac:17:   'automake --add-missing' can install 'compile'
java/Makefile.am:18: error: Libtool library used but 'LIBTOOL' is undefined
java/Makefile.am:18:   The usual way to define 'LIBTOOL' is to add 'LT_INIT'
java/Makefile.am:18:   to 'configure.ac' and run 'aclocal' and 'autoconf' again.
java/Makefile.am:18:   If 'LT_INIT' is in 'configure.ac', make sure
java/Makefile.am:18:   its definition is in aclocal's search path.
java/Makefile.am:19: warning: source file 'jni/ffts_jni.c' is in a subdirectory,
java/Makefile.am:19: but option 'subdir-objects' is disabled
automake-1.14: warning: possible forward-incompatibility.
automake-1.14: At least a source file is in a subdirectory, but the 'subdir-objects'
automake-1.14: automake option hasn't been enabled.  For now, the corresponding output
automake-1.14: object file(s) will be placed in the top-level directory.  However,
automake-1.14: this behaviour will change in future Automake versions: they will
automake-1.14: unconditionally cause object files to be placed in the same subdirectory
automake-1.14: of the corresponding sources.
automake-1.14: You are advised to start using 'subdir-objects' option throughout your
automake-1.14: project, to avoid future incompatibilities.
src/Makefile.am:3: error: Libtool library used but 'LIBTOOL' is undefined
src/Makefile.am:3:   The usual way to define 'LIBTOOL' is to add 'LT_INIT'
src/Makefile.am:3:   to 'configure.ac' and run 'aclocal' and 'autoconf' again.
src/Makefile.am:3:   If 'LT_INIT' is in 'configure.ac', make sure
src/Makefile.am:3:   its definition is in aclocal's search path.

SIGILL in neon_transpose8 on ARM 32bit?

Attempting to run a 512x512 2D test on a 32 bit ARM7 (AllWinner H3). The 1D transforms appear to work correctly.

Sadly I am no ARM assembler expert. Looking up the stack here: https://github.com/anthonix/ffts/blob/master/src/ffts_transpose.c#L48, there is something that looks like it should be used for 32 bit code but is commented out. (neon_transpose4(in, out, w, h);)

Any thoughts. Anyone else working on ARM32?

The vanila C implementation in ffts_transpose.c does work as a fallback

0x000214a6 in neon_transpose8 () at src/neon.s:686
686       vpush   {q4-q7}
(gdb) where
#0  0x000214a6 in neon_transpose8 () at src/neon.s:686
#1  0x0001eaf6 in ffts_transpose (in=0xb6bed020, out=0xb6c6e020, w=256, h=256)
    at src/ffts_transpose.c:54
#2  0x000199c2 in ffts_execute_nd (p=0x35008, in=0xb6cef020, out=0xb6c6e020)
    at src/ffts_nd.c:96
#3  0x00018226 in ffts_execute (p=0x35008, in=0xb6cef020, out=0xb6c6e020)
    at src/ffts.c:221
#4  0x00011f2c in try_2d (dimensions=65536) at basicfft.cpp:417
#5  0x0001206c in main (argc=1, argv=0xbefff7c4) at basicfft.cpp:447

error about "--disable-dynamic-code"

When I add the command "--disable-dynamic-code", compiler is error.
The function "neon_static_o_f" and "neon_static_e_b" is undefined. And I can find the function in code.

Compilation failure on Fedora 21

Compilation fails on fedora 21 as follows.

./configure --enable-sse --enable-single --prefix=/usr/local

completes succesfully, but the subsequent make fails:

> make
 cd . && automake-1.14 --foreign
configure.ac:6: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated.  For more info, see:
configure.ac:6: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation
configure.ac:17: error: required file './compile' not found
configure.ac:17:   'automake --add-missing' can install 'compile'
java/Makefile.am:19: warning: source file 'jni/ffts_jni.c' is in a subdirectory,
java/Makefile.am:19: but option 'subdir-objects' is disabled
automake-1.14: warning: possible forward-incompatibility.
automake-1.14: At least a source file is in a subdirectory, but the 'subdir-objects'
automake-1.14: automake option hasn't been enabled.  For now, the corresponding output
automake-1.14: object file(s) will be placed in the top-level directory.  However,
automake-1.14: this behaviour will change in future Automake versions: they will
automake-1.14: unconditionally cause object files to be placed in the same subdirectory
automake-1.14: of the corresponding sources.
automake-1.14: You are advised to start using 'subdir-objects' option throughout your
automake-1.14: project, to avoid future incompatibilities.
Makefile:377: recipe for target 'Makefile.in' failed
make: *** [Makefile.in] Error 1

Questions on build_android.sh

Firstly, build_android.sh seems to only build the native part of the library, not the Java wrapper that's needed to integrate into a (non-NDK) Android App, right? Is there an official .java file somewhere I've missed?

Secondly, the script fails on a 64-bit (OS X) system with appropriate 64-bit NDK, since it sets $HOSTPLAT to darwin-x86 when it should be darwin-x86_64 - the same probably holds true for Linux.

Thirdly, and lastly, I haven't been able to find any examples of Android apps that use ffts - I'm new to the whole NDK thing and could use a little guidance. Can you share / point me towards some?

Either way, thanks for building ffts! :)

ffts_init_1d slow?

Hi,

i've a library that is able to use several FFTs (FFTS, FFTW3, my own one, OSX, etc). I've run a benchmark to check that FFTS was at least as fast as FFTW3, an found that FFTW3 is 9 times faster.

My benchmark was badly designed: i ran the fft compute in a for loop, but also init the fft at each loop.

Once corrected (i.e., fft init before the for loop), FFTS is a bit faster than FFTW3.

So, i guess that the fft_init_1d step is really slow compared to FFTW3 equivalent.

Is it a known issue, or is it expected behaviour?

thanks for your answer

ffts_execute: input buffer needs to be aligned to a 128bit boundary

My data is in:
std::vector< float, boost::alignment::aligned_allocator< float,128 > > data;

Which according to boost::align:isaligned(128,&data[0]) is aligned to 128.
(But is this not 128 bits?)

However the error message:
ffts_execute: input buffer needs to be aligned to a 128bit boundary
persists.

How do i get 128bit boundary?

Would be helpful if the solution to this would be included in some documentation. Or the library had its own std::vector container that has the right alignment.

Is FFTS thread-safe?

Can I use a single ffts_plan_t object multiple times simultaneously from different threads using ffts_execute()?