Giter Site home page Giter Site logo

sse2neon's Introduction

sse2neon

IMPORTANT NOTE

This project is now deprecated!

Other software engineers have been making continued improvements on this project in another GitHub repot.

This is NO LONGER considered the official depot for SSE2NEON!!

You should please be redirected to:

https://github.com/DLTcollab/sse2neon

A C/C++ header file that converts Intel SSE intrinsics to ARN NEON intrinsics.

Info

The SIMD instruction set of Intel, which is known as SSE is used in many applications for improved performance. ARM also have introduced an SIMD instruction set called Neon to their processors. Rewriting code written for SSE to work on Neon is very time consuming. This is a header file that can automatically convert some of the SSE instricts into NEON instricts.

Usage

  • Put the SSE2NEON.h file in to your source code directory.

  • Locate the following SSE header files included in the code:

    #include <xmmintrin.h>
    #include <emmintrin.h>
  • Replace them with :
#include "SSE2NEON.h"
  • On Linux compile your code with the following gcc/g++ flag:
-mfpu=neon 

sse2neon's People

Contributors

hasindu2008 avatar jratcliff63367 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sse2neon's Issues

Consider merging efforts with SIMDe project

I've been working on an eerily similar project, SIMDe, which is also MIT licensed, and is also an attempt to allow code written for one set of SIMD instructions to run on machines without them.

We're both working on implementing x86/x86_64 ISA extensions right now, but SIMDe is using portable fallbacks (with hints to encourage the compiler to vectorize what it can) instead of NEON instrisics. I have been planning to create a NEON backend for SIMDe eventually, but so far I've been focusing on getting the portable version in place. Eventually I also intend to go in the other direction with SIMDe: NEON (and others) to SSE (and everything else).

I'm wondering if you would be interested in merging the two projects.

Bug in SSE2NEON.h

Hello,

Thank you for sharing your source code.

There is a bug in file SSE2NEON.h, line 361 in function

template FORCE_INLINE __m128 _mm_shuffle_ps_function(__m128 a, __m128 b)

Current code:
default: _mm_shuffle_ps_default(a, b);

The correct code should be:
default: return _mm_shuffle_ps_default(a, b);

Thank you,
Tuan Dao.

No License

There is no license file in the repository, and I didn't see any mentioned in the source. Is there a particular license this is supposed to be released under?

Use __GNUC__ instead of GCC

Instead of defining GCC manually and using it in #if GCC, replace all the #if GCC with #ifdef __GNUC__. The latter is defined for all the GCC-derived or GCC-compatible compilers (including Android NDK and Clang).

Add new functions for librealsense

Hi All,

I want to use the Intel camera sr300 in Odroidxu4, therefore I need to convert some functions from SSE to NEON, which are at least four functions:

  • _mm_setr_epi8
  • _mm_shuffle_epi8
  • _mm_storeu_si128
  • _mm_alignr_epi8

I would be appreciated for any help.
Librealsense code
Best

add some new functions and bug fix

new function:
// added by wangyongxin
// Shift packed 64-bit integers in a left by imm8 while shifting in zeros, and store the results in dst. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_epi64&expand=4961,5279
// FORCE_INLINE __m128i _mm_slli_epi64(__m128i a, int imm8)
#define _mm_slli_epi64(a, imm)
({
__m128i ret;
if ((imm) <= 0) {
ret = a;
}
else if ((imm) > 63) {
ret = _mm_setzero_si128();
}
else {
ret = vreinterpretq_m128i_s64(vshlq_n_s64(vreinterpretq_s64_m128i(a), (imm)));
}
ret;
})

//added by wangyongxin
//Shift packed 64-bit integers in a right by imm8 while shifting in zeros, and store the results in dst. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_epi64&expand=4961,5279,5491
//FORCE_INLINE __m128i _mm_srli_epi64 (__m128i a, int imm8)
#define _mm_srli_epi64(a, imm)
({
__m128i ret;
if ((imm) <= 0) {
ret = a;
}
else if ((imm)> 63) {
ret = _mm_setzero_si128();
}
else {
ret = vreinterpretq_m128i_u64(vshrq_n_u64(vreinterpretq_u64_m128i(a), (imm)));
}
ret;
})

//added by wangyongxin
//Compare packed 32-bit integers in a and b for equality, and store the results in dst. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_epi32&expand=4961,5279,5491,767
FORCE_INLINE __m128i _mm_cmpeq_epi32(__m128i a, __m128i b)
{
return vreinterpretq_m128i_u32(vceqq_s32(vreinterpretq_s32_m128i(a), vreinterpretq_s32_m128i(b)));
}

bug:
#define _mm_srli_epi16(a, imm)
({
__m128i ret;
if ((imm) <= 0) {
ret = a;
}
else if ((imm)> 31) {
ret = _mm_setzero_si128();
}
else {
ret = vreinterpretq_m128i_u16(vshrq_n_u16(vreinterpretq_u16_m128i(a), (imm)));
}
ret;
})
according to intel document https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_epi16&expand=5473
else if ((imm)> 31) { \
should be changed to
else if ((imm)> 15) { \

function _mm_slli_epi16 also has this bug.

thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.