Comments (15)
Katja from http://www.katjaas.nl calls the unmangle/mangle operations just split spectrum and merge spectrum
from lib_dsp.
that's much better name. We'll go with that. Thanks. So the API will be:
void lib_dsp_fft_bit_reverse( lib_dsp_fft_complex_t pts[], unsigned N );
void lib_dsp_fft_inverse(lib_dsp_fft_complex_t pts[], unsigned N, const int32_t sine[]);
void lib_dsp_fft_forward(lib_dsp_fft_complex_t pts[], unsigned N, const int32_t sine[]);
void lib_dsp_fft_split_spectrum(lib_dsp_fft_complex_t pts[], unsigned N);
void lib_dsp_fft_merge_spectrum(lib_dsp_fft_complex_t pts[], unsigned N);
from lib_dsp.
Do we really need the 16 bit versions too? If we instead had an efficient 16 -> 32 and 32 -> 16 array converter functions then we could do away with the 16 bit implementations at the cost of some run time memory (one length N 32 bit array). The benefit, other than implementation, would be increased precision on the 16 bit version.
from lib_dsp.
void lib_dsp_fft_short_to_long(lib_dsp_fft_complex_short_t pts_in[], lib_dsp_fft_complex_t pts_out[], unsigned N);
void lib_dsp_fft_long_to_short(lib_dsp_fft_complex_t pts_in[], lib_dsp_fft_complex_short_t pts_out[], unsigned N);
from lib_dsp.
I love that guys graphics.
On 8 Mar 2016, at 12:10, Larry Snizek <[email protected]mailto:[email protected]> wrote:
Katja from http://www.katjaas.nl calls the unmangle/mangle operations just split spectrum and merge spectrum
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-193757874.
from lib_dsp.
ASJ - you might have a good point about only providing the 32 bit versions and then converting to 16 bit. In addition to the increased precision (which has been a big problem for Thomas, with the 16 bit version), this might actually also be more efficient than the direct 16 bit versions because the 16 bit versions are a lot slower, from Thomas' benchmarks.
Maybe we should get some feedback from Thomas and customer ?
J
from lib_dsp.
Love the graphics too, Henk
from lib_dsp.
If someone could contact them then that would be good. I want to get
this done asap as I want to use it properly soon. Thanks
On 08/03/2016 14:05, John Edwards wrote:
ASJ - you might have a good point about only providing the 32 bit
versions and then converting to 16 bit. In addition to the increased
precision (which has been a big problem for Thomas, with the 16 bit
version), this might actually also be more efficient than the direct
16 bit versions because the 16 bit versions are a lot slower, from
Thomas' benchmarks.
Maybe we should get some feedback from Thomas and AISpeech ?
J—
Reply to this email directly or view it on GitHub
#25 (comment).
from lib_dsp.
I can do that, Andrew. I've reviewed Thomas' emails and think I've got a handle on the performance differences. I'll let you know
from lib_dsp.
The accuracy will be very limited for a 16 bit implementation with the
current algorithm. However, if we go to the 32 bit approach then this
issue will be gone. I've written automated tests to verify correctness
and accuracy of the current implementation(32 bit). Forward is good
backwards is bad!
On 08/03/2016 14:22, John Edwards wrote:
I can do that, Andrew. I've reviewed Thomas' emails and think I've got
a handle on the performance differences. I'll let you know—
Reply to this email directly or view it on GitHub
#25 (comment).
from lib_dsp.
Hi Guys
The main constraint at customer is memory.
They need to do a 4k FFT on each of the 8 input channels. Sample data is 16 bit.
32 bit lib_dsp_fft_forward_tworeals was prohibitive as that would need 512k just for buffers:
4096 * 8 (channels) * 4 (bytes per sample) * 2 (re+im) * 2 (double buffers).
That was the motivation to implement the 16-bit version. But that would still need 256k and thus require partitioning the processing on two tiles.
And not leave much memory for the additional temporary 32-bit buffer: 4k_4_2 = 32k. I'm assuming a double buffer would be needed for performance.
But:
Now that the method using the halved spectrum was proposed by Andrew and accepted by customer, the buffer requirements are relaxed:
4096 * 8 (channels) * 2 (bytes per sample) * 2 (double buffers) = 128 kB
That means the additional temporary buffer proposed to convert from short int to int before the FFT should fit.
An additional advantage would be that this temp buffer is only needed on the one tile.
Here's the test for the forward FFT using the new method:
https://github.com/ThomasGmeinder/lib_dsp/blob/master/AN00209_xCORE-200_DSP_Library/app_transforms/src/app_transforms.xc#L279
Regarding Performance:
The 16 bit FFT lib_dsp_fft_forward_complex_shortfor 4k points takes 796855 cycles (7.9ms)
The 32 bit version lib_dsp_fft_forward_complex takes 524504 cycles (5.25ms)
The overhead of 2.54ms is due to unpacking short int from memory into int (for the maccs, + and - of the butterfly)
and then back into into short int before packing and storing to memory.
The function Andrew proposes to copy a 16-bit short int array in a 32-bit int array should be faster than that.
Something like this:
copy_loop:
//load 16-bit im and re from lib_dsp_fft_complex_t
{ld16s r6, r4[r2]; sub r2, r2, 1} // r6: re; decrement offset
{ld16s r3, r4[r2]; sub r2, r2, 1} // r3: im; decrement offset
std r3, r6, r5[r6] // store 32-bit im and re to lib_dsp_fft_complex_t
{bt r2, copy_loop; sub r6, r6, 1}
would take: 16384 cycles (0.164 ms).
Probably an additional cycle due to FNOP but that would still be a lot faster (0.205 ms).
Bottom line:
Performance and accuracy would be better.
The expense would be an additional 32k temp buffer which seems a worthy tradeoff.
But because memory is so tight for DOA+Beamforming+AEC (ideally also Keyword Recognition) we should also get feedback from customer.
Cheers, /T
from lib_dsp.
Thanks very much, Thomas,
I'll add your comments to the case, for Knight to check.
Enjoy the snow :-)
John
from lib_dsp.
I implemented and tested the array conversion functions:
void lib_dsp_fft_short_to_long(lib_dsp_fft_complex_short_t pts_in[], lib_dsp_fft_complex_t pts_out[], unsigned N);
void lib_dsp_fft_long_to_short(lib_dsp_fft_complex_t pts_in[], lib_dsp_fft_complex_short_t pts_out[], unsigned N);
Performance (cycles per im/re data point):
5 cycles (incl 1 FNOP):
https://github.com/ThomasGmeinder/lib_dsp/blob/master/lib_dsp/src/lib_dsp_fft_short_to_long.S#L27
5 cycles:
https://github.com/ThomasGmeinder/lib_dsp/blob/master/lib_dsp/src/lib_dsp_fft_long_to_short.S#L28
from lib_dsp.
that's much better name. We'll go with that. Thanks. So the API will be
Should be merge_spectra
rather than merge_spectrum
from lib_dsp.
all done now.
from lib_dsp.
Related Issues (20)
- FFT testing
- dsp_bfp documentation missing from rst
- dsp_bfp.h causes error when included by a C file HOT 2
- dsp_vector_abs often outputs negative numbers HOT 3
- dsp_vector_{muls_addv, muls_subv, mulv_addv, mulv_subv} do not give correct outputs for vector lengths > 7.
- Double-word alignment of inputs HOT 3
- xmake fails for app_vector HOT 1
- dsp_adaptive_nlms error when the input source_sample is a very low value
- DCT overflows rather than saturates
- Standardise naming convention + add regex regression tests
- Remove `_bfp` from functions that aren't BFP specific
- Add missing docstrings to api/dsp_bfp.h
- Simulator time-out in app_math tests
- Convert legacy tests to run with axe
- Extend non-XS1 processing to XS3 HOT 2
- tests/test_biquad does not build for xcore.ai
- tests/test_fft_forward does not build for xcore.ai HOT 1
- Some tests require autogenerated files and do not build for xcore.ai HOT 1
- tests/dsp_unit_tests cannot run under xcore.ai
- examples/AN00209 Application Note fails to build HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lib_dsp.