Giter Site home page Giter Site logo

vcdevel / vc Goto Github PK

View Code? Open in Web Editor NEW
1.4K 67.0 150.0 11.28 MB

SIMD Vector Classes for C++

License: BSD 3-Clause "New" or "Revised" License

Shell 0.61% C++ 93.26% C 0.54% CMake 5.35% Makefile 0.03% MATLAB 0.13% Gnuplot 0.04% M 0.03%
vectorization parallel simd-vector simd-instructions simd avx c-plus-plus avx512 sse neon

vc's People

Contributors

adra0 avatar amadio avatar amyspark avatar axel-naumann avatar bernhardmgruber avatar bmanga avatar chr-engwer avatar corristo avatar dennisklein avatar ericators avatar ex-bart avatar gruenich avatar hahnjo avatar hkaiser avatar htfy96 avatar j-stephan avatar jcowgill avatar kgnk avatar lduhem avatar linev avatar mattkretz avatar oshadura avatar pauljurczak avatar pikacic avatar pinotree avatar sawenzel avatar stefanbruens avatar stephanlachnit avatar themarix avatar vks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vc's Issues

Compilation error in math.cpp with icc (ICC) 15.0.3 20150407

Hi,
See the following error. Is this a problem with ICC or the test can be improved?

[ 15%] Building CXX object math/vc/tests/CMakeFiles/math_scalar.dir/math.cpp.o
/mnt/build/jenkins/workspace/root-nightly-master/BUILDTYPE/Release/COMPILER/icc15/LABEL/slc6/root/math/vc/include/Vc/scalar/math.h(204): error: the global scope has no "isfinite"
              ::isfinite(x.data())
                ^
          detected during instantiation of "void testInf<Vec>() [with Vec=ROOT::Vc::float_v]" at line 915 of "/mnt/build/jenkins/workspace/root-nightly-master/BUILDTYPE/Release/COMPILER/icc15/LABEL/slc6/root/math/vc/tests/math.cpp"

compilation aborted for /mnt/build/jenkins/workspace/root-nightly-master/BUILDTYPE/Release/COMPILER/icc15/LABEL/slc6/root/math/vc/tests/math.cpp (code 2)
make[2]: *** [math/vc/tests/CMakeFiles/math_scalar.dir/math.cpp.o] Error 2
make[1]: *** [math/vc/tests/CMakeFiles/math_scalar.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Vc_DEFINITIONS vairable exprted by FindVc.cmake should be Vc_COMPILE_OPTIONS

Despite its name, the variable Vc_DEFINITIONS does not contain compile definitions but other compiler flags: -march=core2 -msse2 -msse2 -msse3 -msse3 -mssse3 etc.

This results in broken compiler calls if processed as definitions, like:
set_property(TARGET target APPEND PROPERTY COMPILE_DEFINITIONS ${Vc_DEFINITIONS})

testFrexp<double_v> fails on MIC

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:796 (0x48d751)):
 FAIL: │ exp (<-1 0 1 0 | 0 0 2 0 | 3 0 0 0 | 3 0 3 0>) == reference (<-1 1 0 2 | 3 0 3 3 | 4 4 4 4 | 4 4 4 4>) -> «1000 0100 0000 0000» 
 FAIL: │ input: [0.25, 1, 0, 3, 4, 0.5, 6, 7], fraction: [0.5, 0.5, 0, 0.75, 0.5, 0.5, 0.75, 0.875], i: 0
 FAIL: ┕ testFrexp<double_v>

Change the underlying types of Vector<T, Cuda> to CUDA's builtin vector types

Vc's CUDA Vector type currently uses an array of standard data types internally. We should find out if changing these underlying types to arrays of CUDA's builtin vector types (e.g. float4) can be done efficiently and without changing the per-warp boundary for each Vc::Vector. If this can be achieved we could operate on more data with the same amount of threads.

Vc does not compile

I tried to compile d257efb on Linux with gcc 5.2.0:

[...]
~/src/Vc/common/simdarray.h:501:9: error: non-constant condition for static assertion
         static_assert(init.size() == size(), "The initializer_list argument to "
         ^
[...]

See the full error message for details.

howto install on Fedora 21

First of all - I have ZERO knowledge on what's "build" / "cmake" etc. Consider a dumb user here. I'm trying to install Vc on Fedora 21 as it is needed by one of the packages I wish to use.

I'm stuck at:
"Call cmake with the relevant options"

What are relevant options?

fix naming of simdarray / simd_mask_array

either both with underscores, or none. Or CamelCase - which would be consistent with how the Vc naming convention started.
So I guess SimdArray and SimdMaskArray are the way to go.

MIC compilation fails because AddCompilerFlag determines ICC can't do C++11/14

See, e.g. https://cdash.gsi.de/viewConfigure.php?buildid=52620:

-- MIC ICC Version: "15.0.3.187 Build 20150407"
-- Performing Test check_cxx_compiler_flag__std_c__14
-- Performing Test check_cxx_compiler_flag__std_c__14 - Success
-- Performing Test Check MIC C++ Compiler flag -std=c++14 - Failed
-- Performing Test Check MIC C++ Compiler flag -std=c++1y - Failed
-- Performing Test Check MIC C++ Compiler flag -std=c++11 - Failed
-- Performing Test Check MIC C++ Compiler flag -std=c++0x - Failed
CMake Error at CMakeLists.txt:50 (message):
  Vc 1.x requires C++11, better even C++14.  The MIC native compiler does not
  support any of the C++11 language flags.

sorted_mic fails on testSort<ushort_v>

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/sorted.cpp:84 (0x40c99e)):
 FAIL: │ test.sorted() ([19453, 32498, 50405, 21770, 16366, 14157, 57316, 20117, 10049, 31502, 27598, 51386, 26371, 54045, 27285, 35138]) == ref ([10049, 14157, 16366, 19453, 20117, 21770, 26371, 27285, 27598, 31502, 32498, 35138, 50405, 51386, 54045, 57316]) -> m[0000 0000 0100 0000] 
 FAIL: ┕ testSort<ushort_v>

clean up SortHelper

The SSE sort implementation should also move (at least in part) into libVc.a. Also code duplication between the integral vector types of the AVX and SSE implementations need to be folded. This will subsequently allow a clean implementation of AVX2 (#11).

modern compilers optimize the call to _UnitTest_verify_vector_unit_supported_result away

With GCC 5.1 one gets the following warnings:

Building CXX object vc/tests/CMakeFiles/supportfunctions.dir/supportfunctions.cpp.o
In file included from vc/tests/mask.cpp:20:0:
vc/tests/unittest.h:81:13: warning: ‘_UnitTest_verify_vector_unit_supported_result’ defined but not used [-Wunused-variable]
 static bool _UnitTest_verify_vector_unit_supported_result = _UnitTest_verify_vector_unit_supported();

See for example: http://cdash.cern.ch/viewBuildError.php?type=1&buildid=128688

The warning only appears with optimized builds.

CPU not known on 0.7.4

When using Vc 0.7.4, running cmake yields:

[...]
CMake Warning at cmake/OptimizeForArchitecture.cmake:110 (message):
  Your CPU (family 6, model 60) is not known.  Auto-detection of optimization
  flags failed and will use the 65nm Core 2 CPU settings.
Call Stack (most recent call first):
  cmake/OptimizeForArchitecture.cmake:159 (AutodetectHostArchitecture)
  cmake/VcMacros.cmake:382 (OptimizeForArchitecture)
  CMakeLists.txt:105 (vc_set_preferred_compiler_flags)


-- Detected CPU: merom
[...]

This has been fixed on master. It should probably be backported.

$ cat /proc/cpuinfo
processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
stepping    : 3
microcode   : 0x1a
cpu MHz     : 2838.671
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 1
initial apicid  : 1
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 4990.20
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
[...]

template instantiation depth exceeded when trying to compile master

Trying to compile master (18e6ba4) with gcc 5.1.0 yields this error:

In file included from /home/one/src/vc/tests/stlcontainer.cpp:24:0:
/home/one/src/vc/tests/unittest.h: In instantiation of ‘_UnitTest_Compare::_UnitTest_Compare(const T1&, const T2&, const char*, const char*, const char*, int) [with T1 = long unsigned int; T2 = int]’:
/home/one/src/vc/tests/stlcontainer.cpp:68:9:   required from ‘void stdVectorAlignment() [with V = Vc::v0::Scalar::Vector<float>]’
/home/one/src/vc/tests/stlcontainer.cpp:98:5:   required from here
/home/one/src/vc/tests/unittest.h:310:41: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                 print(") -> "); print(a == b);
                                         ^
/home/one/src/vc/tests/unittest.h: In instantiation of ‘bool unittest_compareHelper(const T1&, const T2&) [with T1 = long unsigned int; T2 = int]’:
/home/one/src/vc/tests/unittest.h:303:62:   required from ‘_UnitTest_Compare::_UnitTest_Compare(const T1&, const T2&, const char*, const char*, const char*, int) [with T1 = long unsigned int; T2 = int]’
/home/one/src/vc/tests/stlcontainer.cpp:68:9:   required from ‘void stdVectorAlignment() [with V = Vc::v0::Scalar::Vector<float>]’
/home/one/src/vc/tests/stlcontainer.cpp:98:5:   required from here
/home/one/src/vc/tests/unittest.h:240:117: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
 template<typename T1, typename T2> static inline bool unittest_compareHelper( const T1 &a, const T2 &b ) { return a == b; }
                                                                                                                     ^
In file included from /usr/include/c++/5.1.0/bits/move.h:57:0,
                 from /usr/include/c++/5.1.0/bits/stl_pair.h:59,
                 from /usr/include/c++/5.1.0/utility:70,
                 from /usr/include/c++/5.1.0/algorithm:60,
                 from /home/one/src/vc/scalar/vector.h:24,
                 from /home/one/src/vc/include/Vc/vector.h:26,
                 from /home/one/src/vc/include/Vc/Vc:22,
                 from /home/one/src/vc/tests/unittest.h:29,
                 from /home/one/src/vc/tests/stlcontainer.cpp:24:
/usr/include/c++/5.1.0/type_traits: In instantiation of ‘struct std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>’:
/usr/include/c++/5.1.0/type_traits:114:12:   required from ‘struct std::__or_<std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_void<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> >’
/usr/include/c++/5.1.0/type_traits:119:12:   required from ‘struct std::__or_<std::is_function<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_void<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> >’
/usr/include/c++/5.1.0/type_traits:148:38:   required from ‘struct std::__not_<std::__or_<std::is_function<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_void<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> > >’
/usr/include/c++/5.1.0/type_traits:564:12:   required from ‘struct std::is_object<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>’
/usr/include/c++/5.1.0/type_traits:114:12:   required from ‘struct std::__or_<std::is_object<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&>, std::is_reference<std::_Deque_iterator<Vc::v0::Scalar::Vector<float>, Vc::v0::Scalar::Vector<float>&, Vc::v0::Scalar::Vector<float>*>&> >’
/usr/include/c++/5.1.0/type_traits:601:12:   [ skipping 21 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/include/c++/5.1.0/bits/stl_deque.h:519:61:   required from ‘std::_Deque_base<_Tp, _Alloc>::_Deque_base(std::_Deque_base<_Tp, _Alloc>&&) [with _Tp = Vc::v0::Scalar::Vector<float>; _Alloc = std::allocator<Vc::v0::Scalar::Vector<float> >]’
/usr/include/c++/5.1.0/bits/stl_deque.h:956:29:   required from ‘std::deque<_Tp, _Alloc>::deque(std::deque<_Tp, _Alloc>&&) [with _Tp = Vc::v0::Scalar::Vector<float>; _Alloc = std::allocator<Vc::v0::Scalar::Vector<float> >]’
/home/one/src/vc/common/makeContainer.h:114:62:   required from ‘constexpr decltype (Vc::v0::Public::{anonymous}::make_container_helper<Container, T>::help(list)) Vc::v0::Public::makeContainer(std::initializer_list<T>) [with Container = std::deque<Vc::v0::Scalar::Vector<float>, std::allocator<Vc::v0::Scalar::Vector<float> > >; T = float; decltype (Vc::v0::Public::{anonymous}::make_container_helper<Container, T>::help(list)) = std::deque<Vc::v0::Scalar::Vector<float>, std::allocator<Vc::v0::Scalar::Vector<float> > >]’
/home/one/src/vc/tests/stlcontainer.cpp:76:51:   required from ‘void listInitialization() [with V = Vc::v0::Scalar::Vector<float>; Container = std::deque<Vc::v0::Scalar::Vector<float>, std::allocator<Vc::v0::Scalar::Vector<float> > >]’
/home/one/src/vc/tests/stlcontainer.cpp:88:41:   required from ‘void listInitialization() [with V = Vc::v0::Scalar::Vector<float>]’
/home/one/src/vc/tests/stlcontainer.cpp:99:5:   required from here
/usr/include/c++/5.1.0/type_traits:544:12: fatal error: template instantiation depth exceeds maximum of 32 (use -ftemplate-depth= to increase the maximum)
     struct is_reference
            ^

shiftedIn (utils) fails on MIC

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x47d1ec)):
 FAIL: │ test ([1985841714, -1704435280, 1969738457, -66191025, 861418630, -1576818331, 1237622026, -1537451734, -89579436, 291880, 1073393844, -1532297990, -1667382658, -989806726, 1700013330, -1854592241]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1985841714]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [1985841713, -1704435281, 1969738456, -66191026, 861418629, -1576818332, 1237622025, -1537451735, -89579437, 291879, 1073393843, -1532297991, -1667382659, -989806727, 1700013329, -1854592242]
 FAIL: ┕ shiftedIn<   int_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x48eb6d)):
 FAIL: │ test ([45467, 63687, 40666, 30924, 18369, 17391, 56030, 5372, 13546, 59789, 9339, 7660, 17275, 39479, 2955, 44934]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45467]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [45466, 63686, 40665, 30923, 18368, 17390, 56029, 5371, 13545, 59788, 9338, 7659, 17274, 39478, 2954, 44933]
 FAIL: ┕ shiftedIn<ushort_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4a193f)):
 FAIL: │ test ([1.520929742, 1.145010183, 1.230016591, 1.699677535, 1.114922915, 1.340313761, 1.90494338, 1.07857613]) == reference ([0, 0, 0, 0, 0, 0, 0, 1.520929742]) -> m[0000 0000] 
 FAIL: │ shift = -15
 FAIL: │ data = [0.52093, 0.14501, 0.230017, 0.699678, 0.114923, 0.340314, 0.904943, 0.0785761]
 FAIL: ┕ shiftedIn<double_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4a9a5c)):
 FAIL: │ test ([4155937596, 3874290018, 3043032750, 4087464754, 1864872371, 3179054537, 614350690, 1248903516, 3634659762, 30717206, 1362159638, 1927301256, 4049910438, 1662717141, 1413110167, 1043233605]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4155937596]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [4155937595, 3874290017, 3043032749, 4087464753, 1864872370, 3179054536, 614350689, 1248903515, 3634659761, 30717205, 1362159637, 1927301255, 4049910437, 1662717140, 1413110166, 1043233604]
 FAIL: ┕ shiftedIn<  uint_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4bb5a3)):
 FAIL: │ test ([25960, 15968, 11416, -32661, 10661, 21143, 20415, 10655, 6077, 1950, -21582, 22813, 28080, 11131, -3171, -10438]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 25960]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [25959, 15967, 11415, -32662, 10660, 21142, 20414, 10654, 6076, 1949, -21583, 22812, 28079, 11130, -3172, -10439]
 FAIL: ┕ shiftedIn< short_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/utils.cpp:290 (0x4cdf9c)):
 FAIL: │ test ([1.87137115, 1.794731736, 1.54643929, 1.808159113, 1.443829536, 1.893012047, 1.537759304, 1.28550458, 1.48706305, 1.185253739, 1.287488341, 1.123407364, 1.940476298, 1.982578158, 1.727294564, 1.660931826]) == reference ([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.87137115]) -> m[0000 0000 0000 0000] 
 FAIL: │ shift = -31
 FAIL: │ data = [0.871371, 0.794732, 0.546439, 0.808159, 0.44383, 0.893012, 0.537759, 0.285505, 0.487063, 0.185254, 0.287488, 0.123407, 0.940476, 0.982578, 0.727295, 0.660932]
 FAIL: ┕ shiftedIn< float_v>

simdarray_mic segfaults

 PASS: store<SimdArray< float, 32>>
Remote process returned: -1
Exit reason: Exit reason 11 - Segmentation fault

clang 3.7 miscompiles _mm_palignr_epi8

The current clang 3.7 release branch miscompiles _mm_palignr_epi8 for N greater than 16. This is reported in 24187. If the bug is not fixed for clang 3.7.0 then I will need to implement workarounds in Vc for replacing the broken palignr uses.

abi_AVX2 fails on nightlies

The ctest invocation cannot find the abi_AVX2 binary. It seems like there's still a dependency bug somewhere.

Instead of Scalar::Vector use a partial MIC::Vector with SimdArray for MIC

Consider SimdArray<float, 31> on MIC. This would be one SIMD register and 15 scalar registers (actually SIMD registers used to 6%). That's crazy since the platform has great support for masking.

The VectorAbi type for MIC could carry a number that determines the number of active lanes on the vector. Then all operations need to implicitly work with the mask.

This would solve the following issue:
If the SimdArray has to be built from MIC and Scalar Vector objects then subscripting can be broken through use of different VectorEntryType types (which is the case for (u)short).

swizzles_avx fails with GCC 5.1

https://cdash.gsi.de/testDetails.php?test=272280&build=51127

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-0.7/tests/swizzles.cpp:115 (0x40da5b):
 FAIL: │ test.badc() ([23591, -7970, 18436, -13100, 29690, 24409, 12135, 29166]) == scalarSwizzle(test, BADC) ([23591, -7970, 18436, -13100, 29690, -7970, 12135, 29166]) -> m[1111 1011] 
 FAIL: ┕ testSwizzle<short_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-0.7/tests/swizzles.cpp:115 (0x40f367):
 FAIL: │ test.badc() ([57770, 9820, 42369, 36174, 26448, 46473, 35535, 47990]) == scalarSwizzle(test, BADC) ([57770, 9820, 42369, 36174, 26448, 9820, 35535, 47990]) -> m[1111 1011] 
 FAIL: ┕ testSwizzle<ushort_v>

gather/scatter offset calculation for double on ia32 incorrect

On ia32 double has sizeof(double) == 8 and alignof(double) == 4. That's why structures can have a sizeof that is not a multiple of 8, even though it contains doubles.
The gather/scatter implementation for arrays of structures reduces the gather to a gather on the fundamental type of the structure member and an index vector scaled to an array of such fundamental types.

The fix probably needs another (internal) gather/scatter overload which scales the index vector with an additional sizeof(MT), i.e. have the index vector signify a Byte offset.

1.0 and .75 compatibility with Vc::tie()

Hello, I am attempting to build Krita with Vc 1.0 support and ran into a bit of confusion. This Vc .75 code interleaves some data. (The data types are struct Pixel {float r, g, b, a;}; and Vc::float_v src_c1, src_c2, ... )

const Vc::uint_v indexes(Vc::IndexesFromZero);
Vc::InterleavedMemoryWrapper<Pixel, Vc::float_v> data(const_cast<Pixel*>(sp));
(src_c1, src_c2, src_c3, src_alpha) = data[indexes];

This failed to compile on Vc 1.0. Just inspecting example code I got it working by substituting the type in the first line, and adding Vc::tie() in the last line.

const Vc::float_v::IndexType indexes(Vc::IndexesFromZero);
Vc::InterleavedMemoryWrapper<Pixel, Vc::float_v> data(const_cast<Pixel*>(sp));
tie(src_c1, src_c2, src_c3, src_alpha) = data[indexes];

I think the change in the first line is acceptable in both versions, but the fix in the last line is problematic since Vc::tie() is not defined. We can't drop support for compiling the older version, since we want to build on Windows with MSVC 2015. My idea is that perhaps that dropping the tie() could work in Vc 1.0, but that results in this template error.

error: no match for ‘operator=’ (operand types are ‘Vc_0::float_v {aka Vc_0::Vector<float, Vc_0::VectorAbi::Sse>}’ and ‘Vc_0::enable_if<true, Vc_0::Common::InterleavedMemoryAccess<4ul, Vc_0::Vector<float, Vc_0::VectorAbi::Sse>, Vc_0::SimdArray<int, 4ul, Vc_0::Vector<int, Vc_0::VectorAbi::Sse>, 4ul> > > {aka Vc_0::Common::InterleavedMemoryAccess<4ul, Vc_0::Vector<float, Vc_0::VectorAbi::Sse>, Vc_0::SimdArray<int, 4ul, Vc_0::Vector<int, Vc_0::VectorAbi::Sse>, 4ul> >}’) 

no matching function call to frexp

When trying to compile staging, I get the following error:

/home/one/src/vc/tests/math.cpp: In instantiation of ‘void testFrexp() [with V = Vc_0::AVX2::Vector<float>]’:
/home/one/src/vc/tests/math.cpp:843:5:   required from here
/home/one/src/vc/tests/math.cpp:772:33: error: no matching function for call to ‘frexp(const Vc_0::AVX2::Vector<float>&, ExpV*)’
         const V fraction = frexp(v, &exp);
                                 ^

See https://gist.github.com/vks/e806d6695cd01726c9b5 for the full error.

$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
stepping    : 3
microcode   : 0x1a
cpu MHz     : 3464.941
cache size  : 6144 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs        :
bogomips    : 4990.20
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
[...]

cannot compile for MIC

I tied to build Vc for MIC with icc 15.0.1, commit d257efb.
The following errors raise for both the CPU (target Vc) side and MIC side (target Vc_MIC).

In file included from /home/kehw/hlt/Vc/mic/../common/../traits/type_traits.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/types.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/loadstoreflags.h(32),
                 from /home/kehw/hlt/Vc/mic/intrinsics.h(41),
                 from /home/kehw/hlt/Vc/src/mic_sorthelper.cpp(29):
/home/kehw/hlt/Vc/mic/../common/../traits/is_functor_argument_immutable.h(41): error: type name is not allowed
      typedef decltype(&F::template operator()<A>) type;
                                               ^

In file included from /home/kehw/hlt/Vc/mic/../common/../traits/type_traits.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/types.h(38),
                 from /home/kehw/hlt/Vc/mic/../common/loadstoreflags.h(32),
                 from /home/kehw/hlt/Vc/mic/intrinsics.h(41),
                 from /home/kehw/hlt/Vc/src/mic_sorthelper.cpp(29):
/home/kehw/hlt/Vc/mic/../common/../traits/is_functor_argument_immutable.h(41): error: expected an expression
      typedef decltype(&F::template operator()<A>) type;
                                                 ^

In addition, in the file cmake/FindMIC.cmake, you mentioned that

   # For now offload is not supported so skip it

Does this mean we can not build Vc in offload mode?

consider implementing std::get and std::tuple_size for class templates implementing Vc_SIMDIZE_INTERFACE

The Vc_SIMDIZE_INTERFACE macro allows users to create class templates with minimal boilerplate code that can be introspected by the simdize code. It should be possible to add get and tuple_size implementations in the std namespace that specialize for these user-defined class-templates. This might make such types even more useful. But just doing it because I can is not enough justification.

I keep this issue here for someone to comment/bump it when there's a real use case/issue to solve.

subscript_mic segfaults

 PASS: gathers<ushort_v>
Remote process returned: -1
Exit reason: Exit reason 11 - Segmentation fault

testInf and testNaN fail on MIC

 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:572 (0x4639fb)):
 FAIL: │ none_of(Vc::isfinite(inf)) 
 FAIL: ┕ testInf< float_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:572 (0x4646ae)):
 FAIL: │ none_of(Vc::isfinite(inf)) 
 FAIL: ┕ testInf<double_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:591 (0x4653c8)):
 FAIL: │ all_of(Vc::isnan(Vec(inf * zero))) 
 FAIL: ┕ testNaN< float_v>
 FAIL: ┍ at /home/mkretz/.Vc-Test/Vc-master/tests/math.cpp:591 (0x46631f)):
 FAIL: │ all_of(Vc::isnan(Vec(inf * zero))) 
 FAIL: ┕ testNaN<double_v>

unit test for VectorAlignedBase

The VectorAlignedBase class is a hack to help users get alignment on heap allocation right. C++14 still does not support new on over-aligned types. But without a unit test this feature might just not work as intended.

  • generalize AlignedBase
  • test them

get rid of Internal::Helper

This is just one more variant for forwarding to the correct implementation. Having 4 different abstractions to do that is just frustrating.
I should first finish #12, though.

AVX2 support

  • refactor class design to avoid code duplication between SSE <-> AVX <-> AVX2
  • increase the int_v and short_v vector sizes
  • look into usefulness of BMI(2)

deinterleave_mic segfaults

 PASS: testDeinterleave<{ float_v,  float}>
 PASS: testDeinterleave<{ float_v, ushort}>
 PASS: testDeinterleave<{ float_v,  short}>
 PASS: testDeinterleave<{double_v, double}>
 PASS: testDeinterleave<{   int_v,    int}>
 PASS: testDeinterleave<{   int_v,  short}>
 PASS: testDeinterleave<{  uint_v,   uint}>
 PASS: testDeinterleave<{  uint_v, ushort}>
 PASS: testDeinterleave<{ short_v,  short}>
 PASS: testDeinterleave<{ushort_v, ushort}>
Remote process returned: -1
Exit reason: Exit reason 11 - Segmentation fault

simdize_avx doesn't compile with GCC 5.x

The compiler says it's because of incompatible mangling and that -fabi-version=0 would fix it. That's obviously not true because Vc compiles with -fabi-version=0 already.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.