vcdevel / vc Goto Github PK

View Code? Open in Web Editor NEW

1.4K 67.0 150.0 11.28 MB

SIMD Vector Classes for C++

License: BSD 3-Clause "New" or "Revised" License

Shell 0.61% C++ 93.26% C 0.54% CMake 5.35% Makefile 0.03% MATLAB 0.13% Gnuplot 0.04% M 0.03%

vectorization parallel simd-vector simd-instructions simd avx c-plus-plus avx512 sse neon

vc's People

Contributors

Stargazers

Watchers

Forkers

vks respu chr-engwer sawenzel nextgenintelligence oliverdaniell lepeltsmok svalleco rrahn sailfish009 jcowgill amadio baslr hkaiser zhongxingzhi hosiet gladhorn dhrvg yuhangwang hanumathrao manfrenc2 ycheparallelstudio apprisi edanor stellar-group crycrane templeblock sunmeng007 faroo28 extr15 lancelot899 yunhuiguo bulat-ziganshin lipufei sinozope dreamplayerzhang bremerm31 dimula73 amallia panda-lewandowski cjones051073 eminsight kiicvicnd twesterhout ghscan ktf shridharkini6 tao-githup adra0 stefanbruens umiss filiatra frank-lesser jianlirong johnmcfarlane outbackman hyeongijeon dendisuhubdy babywade arifahmed1995 dennisklein optimalbucket audiobucket alisw keithpitsui fcccode pauljurczak milianw mazalet sheerluck khawatkom sbastrakov andrelrt gangbanlau lilolbet rainergrimm ngzhappy stjordanis project-kotinos skn123 hanlin-w 5l1v3r1 joe-nano light1707 donproc castigli linev bijoumd78 eef808a24ff raphaelk12 ziqichai royzon lgh0504 michoumichmich d-hoke anders-wind manish364824 little-gu89 sreekanth370 crazyguitar

vc's Issues

Compilation error in math.cpp with icc (ICC) 15.0.3 20150407

Hi,
See the following error. Is this a problem with ICC or the test can be improved?

[ 15%] Building CXX object math/vc/tests/CMakeFiles/math_scalar.dir/math.cpp.o
/mnt/build/jenkins/workspace/root-nightly-master/BUILDTYPE/Release/COMPILER/icc15/LABEL/slc6/root/math/vc/include/Vc/scalar/math.h(204): error: the global scope has no "isfinite"
              ::isfinite(x.data())
                ^
          detected during instantiation of "void testInf<Vec>() [with Vec=ROOT::Vc::float_v]" at line 915 of "/mnt/build/jenkins/workspace/root-nightly-master/BUILDTYPE/Release/COMPILER/icc15/LABEL/slc6/root/math/vc/tests/math.cpp"

compilation aborted for /mnt/build/jenkins/workspace/root-nightly-master/BUILDTYPE/Release/COMPILER/icc15/LABEL/slc6/root/math/vc/tests/math.cpp (code 2)
make[2]: *** [math/vc/tests/CMakeFiles/math_scalar.dir/math.cpp.o] Error 2
make[1]: *** [math/vc/tests/CMakeFiles/math_scalar.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

conversion from uint_v to float_v rounds incorrectly for input values greater than 0xC0000000u

This issue is present in the SSE and AVX implementations.

Vc_DEFINITIONS vairable exprted by FindVc.cmake should be Vc_COMPILE_OPTIONS

Despite its name, the variable Vc_DEFINITIONS does not contain compile definitions but other compiler flags: -march=core2 -msse2 -msse2 -msse3 -msse3 -mssse3 etc.

This results in broken compiler calls if processed as definitions, like:
set_property(TARGET target APPEND PROPERTY COMPILE_DEFINITIONS ${Vc_DEFINITIONS})

testFrexp<double_v> fails on MIC

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:796 (0x48d751)):
 FAIL: │ exp (<-1 0 1 0 | 0 0 2 0 | 3 0 0 0 | 3 0 3 0>) == reference (<-1 1 0 2 | 3 0 3 3 | 4 4 4 4 | 4 4 4 4>) -> «1000 0100 0000 0000» 
 FAIL: │ input: [0.25, 1, 0, 3, 4, 0.5, 6, 7], fraction: [0.5, 0.5, 0, 0.75, 0.5, 0.5, 0.75, 0.875], i: 0
 FAIL: ┕ testFrexp<double_v>

Change the underlying types of Vector<T, Cuda> to CUDA's builtin vector types

Vc's CUDA Vector type currently uses an array of standard data types internally. We should find out if changing these underlying types to arrays of CUDA's builtin vector types (e.g. float4) can be done efficiently and without changing the per-warp boundary for each Vc::Vector. If this can be achieved we could operate on more data with the same amount of threads.

Vc does not compile

I tried to compile d257efb on Linux with gcc 5.2.0:

[...]
~/src/Vc/common/simdarray.h:501:9: error: non-constant condition for static assertion
         static_assert(init.size() == size(), "The initializer_list argument to "
         ^
[...]

See the full error message for details.

howto install on Fedora 21

First of all - I have ZERO knowledge on what's "build" / "cmake" etc. Consider a dumb user here. I'm trying to install Vc on Fedora 21 as it is needed by one of the packages I wish to use.

I'm stuck at:
"Call cmake with the relevant options"

What are relevant options?

drop internal StaticCastHelper in favor of convert<From, To> function

VcMacros.cmake needs to warn about AVX and clang 3.6

clang 3.6 miscompiles AVX. There's no possible workaround in the Vc implementation. Therefore the cmake code should identify the broken state and disable the AVX implementation.

fix naming of simdarray / simd_mask_array

either both with underscores, or none. Or CamelCase - which would be consistent with how the Vc naming convention started.
So I guess SimdArray and SimdMaskArray are the way to go.

MIC compilation fails because AddCompilerFlag determines ICC can't do C++11/14

See, e.g. https://cdash.gsi.de/viewConfigure.php?buildid=52620:

-- MIC ICC Version: "15.0.3.187 Build 20150407"
-- Performing Test check_cxx_compiler_flag__std_c__14
-- Performing Test check_cxx_compiler_flag__std_c__14 - Success
-- Performing Test Check MIC C++ Compiler flag -std=c++14 - Failed
-- Performing Test Check MIC C++ Compiler flag -std=c++1y - Failed
-- Performing Test Check MIC C++ Compiler flag -std=c++11 - Failed
-- Performing Test Check MIC C++ Compiler flag -std=c++0x - Failed
CMake Error at CMakeLists.txt:50 (message):
  Vc 1.x requires C++11, better even C++14.  The MIC native compiler does not
  support any of the C++11 language flags.

sorted_mic fails on testSort<ushort_v>

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/sorted.cpp:84 (0x40c99e)):
 FAIL: │ test.sorted() ([19453, 32498, 50405, 21770, 16366, 14157, 57316, 20117, 10049, 31502, 27598, 51386, 26371, 54045, 27285, 35138]) == ref ([10049, 14157, 16366, 19453, 20117, 21770, 26371, 27285, 27598, 31502, 32498, 35138, 50405, 51386, 54045, 57316]) -> m[0000 0000 0100 0000] 
 FAIL: ┕ testSort<ushort_v>

clean up SortHelper

The SSE sort implementation should also move (at least in part) into libVc.a. Also code duplication between the integral vector types of the AVX and SSE implementations need to be folded. This will subsequently allow a clean implementation of AVX2 (#11).

Allow optional integration of Intel's SVML library

Finally review 71bf93a for integration.

drop all internal namespaces and move everything to Vc::Detail

Right now there are several internal namespaces hiding implementation details. All of these namespaces need to be folded to Vc::Detail.

The arithmetics unit test exits with FPE on MIC

testModulo fails for SimdArray<ushort, 31> with a floating point exception (typically a division by zero using integers)

Implementation of Vector<T> and Mask<T> that works in CUDA kernels

Prototyping
develop a simple Vector kernel spawn function
modify tests/unittest.h to execute tests as kernels on the GPU
implement Mask<T>
make the arithmetics test pass
implement math functions
make the polar coordinate example execute on the GPU

modern compilers optimize the call to _UnitTest_verify_vector_unit_supported_result away

With GCC 5.1 one gets the following warnings:

Building CXX object vc/tests/CMakeFiles/supportfunctions.dir/supportfunctions.cpp.o
In file included from vc/tests/mask.cpp:20:0:
vc/tests/unittest.h:81:13: warning: ‘_UnitTest_verify_vector_unit_supported_result’ defined but not used [-Wunused-variable]
 static bool _UnitTest_verify_vector_unit_supported_result = _UnitTest_verify_vector_unit_supported();

See for example: http://cdash.cern.ch/viewBuildError.php?type=1&buildid=128688

The warning only appears with optimized builds.

CPU not known on 0.7.4

When using Vc 0.7.4, running cmake yields:

[...]
CMake Warning at cmake/OptimizeForArchitecture.cmake:110 (message):
  Your CPU (family 6, model 60) is not known.  Auto-detection of optimization
  flags failed and will use the 65nm Core 2 CPU settings.
Call Stack (most recent call first):
  cmake/OptimizeForArchitecture.cmake:159 (AutodetectHostArchitecture)
  cmake/VcMacros.cmake:382 (OptimizeForArchitecture)
  CMakeLists.txt:105 (vc_set_preferred_compiler_flags)


-- Detected CPU: merom
[...]

This has been fixed on master. It should probably be backported.

$ cat /proc/cpuinfo
processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
stepping    : 3
microcode   : 0x1a
cpu MHz     : 2838.671
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 4990.20
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
[...]

Regression: memory and math tests are failing after recent changes

see e.g. https://cdash.gsi.de/viewTest.php?onlyfailed&buildid=74951

template instantiation depth exceeded when trying to compile master

Trying to compile master (18e6ba4) with gcc 5.1.0 yields this error:

In file included from /home/one/src/vc/tests/stlcontainer.cpp:24:0:
/home/one/src/vc/tests/unittest.h: In instantiation of ‘_UnitTest_Compare::_UnitTest_Compare(const T1&, const T2&, const char*, const char*, const char*, int) [with T1 = long unsigned int; T2 = int]’:
/home/one/src/vc/tests/stlcontainer.cpp:68:9:   required from ‘void stdVectorAlignment() [with V = Vc::v0::Scalar::Vector<float>]’
/home/one/src/vc/tests/stlcontainer.cpp:98:5:   required from here
/home/one/src/vc/tests/unittest.h:310:41: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                 print(") -> "); print(a == b);
                                         ^
/home/one/src/vc/tests/unittest.h: In instantiation of ‘bool unittest_compareHelper(const T1&, const T2&) [with T1 = long unsigned int; T2 = int]’:
/home/one/src/vc/tests/unittest.h:303:62:   required from ‘_UnitTest_Compare::_UnitTest_Compare(const T1&, const T2&, const char*, const char*, const char*, int) [with T1 = long unsigned int; T2 = int]’
/home/one/src/vc/tests/stlcontainer.cpp:68:9:   required from ‘void stdVectorAlignment() [with V = Vc::v0::Scalar::Vector<float>]’
/home/one/src/vc/tests/stlcontainer.cpp:98:5:   required from here
/home/one/src/vc/tests/unittest.h:240:117: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
 template<typename T1, typename T2> static inline bool unittest_compareHelper( const T1 &a, const T2 &b ) { return a == b; }
                                                                                                                     ^
In file included from /usr/include/c++/5.1.0/bits/move.h:57:0,
                 from /usr/include/c++/5.1.0/bits/stl_pair.h:59,
                 from /usr/include/c++/5.1.0/utility:70,
                 from /usr/include/c++/5.1.0/algorithm:60,
                 from /home/one/src/vc/scalar/vector.h:24,
                 from /home/one/src/vc/include/Vc/vector.h:26,
                 from /home/one/src/vc/include/Vc/Vc:22,
                 from /home/one/src/vc/tests/unittest.h:29,
                 from /home/one/src/vc/tests/stlcontainer.cpp:24:
/usr/include/c++/5.1.0/type_traits: In instantiation of ‘struct std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>’:
/usr/include/c++/5.1.0/type_traits:114:12:   required from ‘struct std::__or_<std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_void<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> >’
/usr/include/c++/5.1.0/type_traits:119:12:   required from ‘struct std::__or_<std::is_function<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_void<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> >’
/usr/include/c++/5.1.0/type_traits:148:38:   required from ‘struct std::__not_<std::__or_<std::is_function<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_void<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> > >’
/usr/include/c++/5.1.0/type_traits:564:12:   required from ‘struct std::is_object<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>’
/usr/include/c++/5.1.0/type_traits:114:12:   required from ‘struct std::__or_<std::is_object<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> >’
/usr/include/c++/5.1.0/type_traits:601:12:   [ skipping 21 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/include/c++/5.1.0/bits/stl_deque.h:519:61:   required from ‘std::_Deque_base<_Tp, _Alloc>::_Deque_base(std::_Deque_base<_Tp, _Alloc>&&) [with _Tp = Vc::v0::Scalar::Vector<float>; _Alloc = std::allocator<Vc::v0::Scalar::Vector<float> >]’
/usr/include/c++/5.1.0/bits/stl_deque.h:956:29:   required from ‘std::deque<_Tp, _Alloc>::deque(std::deque<_Tp, _Alloc>&&) [with _Tp = Vc::v0::Scalar::Vector<float>; _Alloc = std::allocator<Vc::v0::Scalar::Vector<float> >]’
/home/one/src/vc/common/makeContainer.h:114:62:   required from ‘constexpr decltype (Vc::v0::Public::{anonymous}::make_container_helper<Container, T>::help(list)) Vc::v0::Public::makeContainer(std::initializer_list<T>) [with Container = std::deque<Vc::v0::Scalar::Vector<float>, std::allocator<Vc::v0::Scalar::Vector<float> > >; T = float; decltype (Vc::v0::Public::{anonymous}::make_container_helper<Container, T>::help(list)) = std::deque<Vc::v0::Scalar::Vector<float>, std::allocator<Vc::v0::Scalar::Vector<float> > >]’
/home/one/src/vc/tests/stlcontainer.cpp:76:51:   required from ‘void listInitialization() [with V = Vc::v0::Scalar::Vector<float>; Container = std::deque<Vc::v0::Scalar::Vector<float>, std::allocator<Vc::v0::Scalar::Vector<float> > >]’
/home/one/src/vc/tests/stlcontainer.cpp:88:41:   required from ‘void listInitialization() [with V = Vc::v0::Scalar::Vector<float>]’
/home/one/src/vc/tests/stlcontainer.cpp:99:5:   required from here
/usr/include/c++/5.1.0/type_traits:544:12: fatal error: template instantiation depth exceeds maximum of 32 (use -ftemplate-depth= to increase the maximum)
     struct is_reference
            ^

AVX int_v should have 4 entries. Only with AVX2 int_v size increases to 8.

The old guarantee int_v::size() == float_v::size() was convenient but not general enough. Therefore, for Vc 1.0, the int_v size should follow the natural size of the target.

shiftedIn (utils) fails on MIC

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x47d1ec)):
 FAIL: │ test ([1985841714, -1704435280, 1969738457, -66191025, 861418630, -1576818331, 1237622026, -1537451734, -89579436, 291880, 1073393844, -1532297990, -1667382658, -989806726, 1700013330, -1854592241]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1985841714]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [1985841713, -1704435281, 1969738456, -66191026, 861418629, -1576818332, 1237622025, -1537451735, -89579437, 291879, 1073393843, -1532297991, -1667382659, -989806727, 1700013329, -1854592242]
 FAIL: ┕ shiftedIn<   int_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x48eb6d)):
 FAIL: │ test ([45467, 63687, 40666, 30924, 18369, 17391, 56030, 5372, 13546, 59789, 9339, 7660, 17275, 39479, 2955, 44934]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45467]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [45466, 63686, 40665, 30923, 18368, 17390, 56029, 5371, 13545, 59788, 9338, 7659, 17274, 39478, 2954, 44933]
 FAIL: ┕ shiftedIn<ushort_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4a193f)):
 FAIL: │ test ([1.520929742, 1.145010183, 1.230016591, 1.699677535, 1.114922915, 1.340313761, 1.90494338, 1.07857613]) == reference ([0, 0, 0, 0, 0, 0, 0, 1.520929742]) -> m[0000 0000] 
 FAIL: │ shift = -15
 FAIL: │ data = [0.52093, 0.14501, 0.230017, 0.699678, 0.114923, 0.340314, 0.904943, 0.0785761]
 FAIL: ┕ shiftedIn<double_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4a9a5c)):
 FAIL: │ test ([4155937596, 3874290018, 3043032750, 4087464754, 1864872371, 3179054537, 614350690, 1248903516, 3634659762, 30717206, 1362159638, 1927301256, 4049910438, 1662717141, 1413110167, 1043233605]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4155937596]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [4155937595, 3874290017, 3043032749, 4087464753, 1864872370, 3179054536, 614350689, 1248903515, 3634659761, 30717205, 1362159637, 1927301255, 4049910437, 1662717140, 1413110166, 1043233604]
 FAIL: ┕ shiftedIn<  uint_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4bb5a3)):
 FAIL: │ test ([25960, 15968, 11416, -32661, 10661, 21143, 20415, 10655, 6077, 1950, -21582, 22813, 28080, 11131, -3171, -10438]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25960]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [25959, 15967, 11415, -32662, 10660, 21142, 20414, 10654, 6076, 1949, -21583, 22812, 28079, 11130, -3172, -10439]
 FAIL: ┕ shiftedIn< short_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4cdf9c)):
 FAIL: │ test ([1.87137115, 1.794731736, 1.54643929, 1.808159113, 1.443829536, 1.893012047, 1.537759304, 1.28550458, 1.48706305, 1.185253739, 1.287488341, 1.123407364, 1.940476298, 1.982578158, 1.727294564, 1.660931826]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.87137115]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [0.871371, 0.794732, 0.546439, 0.808159, 0.44383, 0.893012, 0.537759, 0.285505, 0.487063, 0.185254, 0.287488, 0.123407, 0.940476, 0.982578, 0.727295, 0.660932]
 FAIL: ┕ shiftedIn< float_v>

Implement gathers with AVX2 intrinsics

Reference: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=gather&techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2,AVX,AVX2,FMA

simdarray_mic segfaults

 PASS: store<SimdArray< float, 32>>
Remote process returned: -1
Exit reason: Exit reason 11 - Segmentation fault

have nightlies on MIC able to access the reference data from the host

The math_mic unit test cannot access the reference data since it only looks on the MIC filesystem. Somehow the unit test framework needs to supply the data to the MIC.

clang 3.7 miscompiles _mm_palignr_epi8

The current clang 3.7 release branch miscompiles _mm_palignr_epi8 for N greater than 16. This is reported in 24187. If the bug is not fixed for clang 3.7.0 then I will need to implement workarounds in Vc for replacing the broken palignr uses.

master does not compile with clang 3.7

abi_AVX2 fails on nightlies

The ctest invocation cannot find the abi_AVX2 binary. It seems like there's still a dependency bug somewhere.

Instead of Scalar::Vector use a partial MIC::Vector with SimdArray for MIC

Consider SimdArray<float, 31> on MIC. This would be one SIMD register and 15 scalar registers (actually SIMD registers used to 6%). That's crazy since the platform has great support for masking.

The VectorAbi type for MIC could carry a number that determines the number of active lanes on the vector. Then all operations need to implicitly work with the mask.

This would solve the following issue:
If the SimdArray has to be built from MIC and Scalar Vector objects then subscripting can be broken through use of different VectorEntryType types (which is the case for (u)short).

swizzles_avx fails with GCC 5.1

https://cdash.gsi.de/testDetails.php?test=272280&build=51127

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-0.7/tests/swizzles.cpp:115 (0x40da5b):
 FAIL: │ test.badc() ([23591, -7970, 18436, -13100, 29690, 24409, 12135, 29166]) == scalarSwizzle(test, BADC) ([23591, -7970, 18436, -13100, 29690, -7970, 12135, 29166]) -> m[1111 1011] 
 FAIL: ┕ testSwizzle<short_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-0.7/tests/swizzles.cpp:115 (0x40f367):
 FAIL: │ test.badc() ([57770, 9820, 42369, 36174, 26448, 46473, 35535, 47990]) == scalarSwizzle(test, BADC) ([57770, 9820, 42369, 36174, 26448, 9820, 35535, 47990]) -> m[1111 1011] 
 FAIL: ┕ testSwizzle<ushort_v>

write user documentation for SimdArray<T, N>

Disable AVX (via cmake and #ifdef) for clang 3.6.x

Because clang 3.6.x miscompiles AVX code it should get disabled. The user should not have to reproduce the error.

gather/scatter offset calculation for double on ia32 incorrect

On ia32 double has sizeof(double) == 8 and alignof(double) == 4. That's why structures can have a sizeof that is not a multiple of 8, even though it contains doubles.
The gather/scatter implementation for arrays of structures reduces the gather to a gather on the fundamental type of the structure member and an index vector scaled to an array of such fundamental types.

The fix probably needs another (internal) gather/scatter overload which scales the index vector with an additional sizeof(MT), i.e. have the index vector signify a Byte offset.

drop swizzle API in favor of a new permutation generalization

1.0 and .75 compatibility with Vc::tie()

Hello, I am attempting to build Krita with Vc 1.0 support and ran into a bit of confusion. This Vc .75 code interleaves some data. (The data types are struct Pixel {float r, g, b, a;}; and Vc::float_v src_c1, src_c2, ... )

const Vc::uint_v indexes(Vc::IndexesFromZero);
Vc::InterleavedMemoryWrapper<Pixel, Vc::float_v> data(const_cast<Pixel*>(sp));
(src_c1, src_c2, src_c3, src_alpha) = data[indexes];

This failed to compile on Vc 1.0. Just inspecting example code I got it working by substituting the type in the first line, and adding Vc::tie() in the last line.

const Vc::float_v::IndexType indexes(Vc::IndexesFromZero);
Vc::InterleavedMemoryWrapper<Pixel, Vc::float_v> data(const_cast<Pixel*>(sp));
tie(src_c1, src_c2, src_c3, src_alpha) = data[indexes];

I think the change in the first line is acceptable in both versions, but the fix in the last line is problematic since Vc::tie() is not defined. We can't drop support for compiling the older version, since we want to build on Windows with MSVC 2015. My idea is that perhaps that dropping the tie() could work in Vc 1.0, but that results in this template error.

error: no match for ‘operator=’ (operand types are ‘Vc_0::float_v {aka Vc_0::Vector<float, Vc_0::VectorAbi::Sse>}’ and ‘Vc_0::enable_if<true, Vc_0::Common::InterleavedMemoryAccess<4ul, Vc_0::Vector<float, Vc_0::VectorAbi::Sse>, Vc_0::SimdArray<int, 4ul, Vc_0::Vector<int, Vc_0::VectorAbi::Sse>, 4ul> > > {aka Vc_0::Common::InterleavedMemoryAccess<4ul, Vc_0::Vector<float, Vc_0::VectorAbi::Sse>, Vc_0::SimdArray<int, 4ul, Vc_0::Vector<int, Vc_0::VectorAbi::Sse>, 4ul> >}’)

no matching function call to frexp

When trying to compile staging, I get the following error:

/home/one/src/vc/tests/math.cpp: In instantiation of ‘void testFrexp() [with V = Vc_0::AVX2::Vector<float>]’:
/home/one/src/vc/tests/math.cpp:843:5:   required from here
/home/one/src/vc/tests/math.cpp:772:33: error: no matching function for call to ‘frexp(const Vc_0::AVX2::Vector<float>&, ExpV*)’
         const V fraction = frexp(v, &exp);
                                 ^

See https://gist.github.com/vks/e806d6695cd01726c9b5 for the full error.

$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
stepping    : 3
microcode   : 0x1a
cpu MHz     : 3464.941
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 4990.20
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
[...]

cannot compile for MIC

I tied to build Vc for MIC with icc 15.0.1, commit d257efb.
The following errors raise for both the CPU (target Vc) side and MIC side (target Vc_MIC).

In file included from /home/kehw/hlt/Vc/mic/../common/../traits/type_traits.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/types.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/loadstoreflags.h(32),
                 from /home/kehw/hlt/Vc/mic/intrinsics.h(41),
                 from /home/kehw/hlt/Vc/src/mic_sorthelper.cpp(29):
/home/kehw/hlt/Vc/mic/../common/../traits/is_functor_argument_immutable.h(41): error: type name is not allowed
      typedef decltype(&F::template operator()<A>) type;
                                               ^

In file included from /home/kehw/hlt/Vc/mic/../common/../traits/type_traits.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/types.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/loadstoreflags.h(32),
                 from /home/kehw/hlt/Vc/mic/intrinsics.h(41),
                 from /home/kehw/hlt/Vc/src/mic_sorthelper.cpp(29):
/home/kehw/hlt/Vc/mic/../common/../traits/is_functor_argument_immutable.h(41): error: expected an expression
      typedef decltype(&F::template operator()<A>) type;
                                                 ^

In addition, in the file cmake/FindMIC.cmake, you mentioned that

   # For now offload is not supported so skip it

Does this mean we can not build Vc in offload mode?

use policy argument instead of namespaces for selecting the implementation

transform Vc::<Impl>::Vector<T> to Vc::Vector<T, <Impl>>.
provide alias templates for old namespace based policy.
all tests compile
tests pass (that passed before)
build Scalar, SSE, AVX with ICC
port the MIC implementation

consider implementing std::get and std::tuple_size for class templates implementing Vc_SIMDIZE_INTERFACE

The Vc_SIMDIZE_INTERFACE macro allows users to create class templates with minimal boilerplate code that can be introspected by the simdize code. It should be possible to add get and tuple_size implementations in the std namespace that specialize for these user-defined class-templates. This might make such types even more useful. But just doing it because I can is not enough justification.

I keep this issue here for someone to comment/bump it when there's a real use case/issue to solve.

write user documentation for simdize<T>

subscript_mic segfaults

 PASS: gathers<ushort_v>
Remote process returned: -1
Exit reason: Exit reason 11 - Segmentation fault

testInf and testNaN fail on MIC

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:572 (0x4639fb)):
 FAIL: │ none_of(Vc::isfinite(inf)) 
 FAIL: ┕ testInf< float_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:572 (0x4646ae)):
 FAIL: │ none_of(Vc::isfinite(inf)) 
 FAIL: ┕ testInf<double_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:591 (0x4653c8)):
 FAIL: │ all_of(Vc::isnan(Vec(inf * zero))) 
 FAIL: ┕ testNaN< float_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:591 (0x46631f)):
 FAIL: │ all_of(Vc::isnan(Vec(inf * zero))) 
 FAIL: ┕ testNaN<double_v>

unit test for VectorAlignedBase

The VectorAlignedBase class is a hack to help users get alignment on heap allocation right. C++14 still does not support new on over-aligned types. But without a unit test this feature might just not work as intended.

generalize AlignedBase
test them

get rid of Internal::Helper

This is just one more variant for forwarding to the correct implementation. Having 4 different abstractions to do that is just frustrating.
I should first finish #12, though.

AVX2 support

refactor class design to avoid code duplication between SSE <-> AVX <-> AVX2
increase the int_v and short_v vector sizes
look into usefulness of BMI(2)

deinterleave_mic segfaults

 PASS: testDeinterleave<{ float_v,  float}>
 PASS: testDeinterleave<{ float_v, ushort}>
 PASS: testDeinterleave<{ float_v,  short}>
 PASS: testDeinterleave<{double_v, double}>
 PASS: testDeinterleave<{   int_v,    int}>
 PASS: testDeinterleave<{   int_v,  short}>
 PASS: testDeinterleave<{  uint_v,   uint}>
 PASS: testDeinterleave<{  uint_v, ushort}>
 PASS: testDeinterleave<{ short_v,  short}>
 PASS: testDeinterleave<{ushort_v, ushort}>
Remote process returned: -1
Exit reason: Exit reason 11 - Segmentation fault