Giter Site home page Giter Site logo

build from source failed about fbgemm HOT 4 CLOSED

pytorch avatar pytorch commented on May 22, 2024
build from source failed

from fbgemm.

Comments (4)

dskhudia avatar dskhudia commented on May 22, 2024 2

It works fine with gcc >= 5.4.

https://godbolt.org/z/2S9fRR

from fbgemm.

jiecaoyu avatar jiecaoyu commented on May 22, 2024

Hi @sunnymoon155 , could you help run make -d to print more debug information?

from fbgemm.

XiaobingSuper avatar XiaobingSuper commented on May 22, 2024

@jiecaoyu, GCC 5 can't support AVX512 well, it will be better to use a higher GCC version, at least >6.3, there has a test case which you can do a test:

 #include <immintrin.h>
 #include <cstdint>
 #include <cstdlib>

 #define float16 std::uint16_t

 inline void FloatToFloat16KernelAvx512WithClip(const float* src, float16* dst) {

   constexpr float FP16_MAX = 65504.f;
   __m512 neg_fp16_max_vector = _mm512_set1_ps(-FP16_MAX);
   __m512 pos_fp16_max_vector = _mm512_set1_ps(FP16_MAX);

   __m512 float_vector = _mm512_loadu_ps(src);

   // Do the clipping.
   float_vector = _mm512_max_ps(
       neg_fp16_max_vector, _mm512_min_ps(float_vector, pos_fp16_max_vector));

   __m256i half_vector = _mm512_cvtps_ph(
       float_vector, (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC));
   _mm256_storeu_si256((__m256i*)dst, half_vector);

 }
void FloatToFloat16_avx512(
     const float* src,
     float16* dst,
     int size,
     bool do_clip) {
   if (do_clip) {
     int i = 0;
     for (i = 0; i + 16 <= size; i += 16) {
       FloatToFloat16KernelAvx512WithClip(src + i, dst + i);
     }
     //FloatToFloat16_avx2(src + i, dst + i, size - i, do_clip);
   } else {
     int i = 0;
     for (i = 0; i + 16 <= size; i += 16) {
       //FloatToFloat16KernelAvx512(src + i, dst + i);
     }
     //FloatToFloat16_avx2(src + i, dst + i, size - i);
   }
 }

 int main() {
     return 0;
 }

if we use -O1, -O2 or -O3 flag for gcc 5(5.3.1 in my side), such as

g++ -O1  -mavx512f -mavx512bw -mavx512dq -mavx512vl -masm=intel -std=c++11 test.cpp

there will have a build error:

/tmp/cchHLPDt.s: Assembler messages:
/tmp/cchHLPDt.s:14: Error: operand size mismatch for `vbroadcastss'
/tmp/cchHLPDt.s:15: Error: operand size mismatch for `vbroadcastss'

but for GCC 6.3, all optimization level can works.

@jianyuh, in PyTorch side, it always use -O3 optimization flag when build fbgemm:

[1194/4048] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
/opt/rh/devtoolset-4/root/usr/bin/c++  -DFBGEMM_STATIC -DTH_BLAS_MKL -I../third_party/cpuinfo/include -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/fbgemm/include -I../third_party/fbgemm -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /home/xiaobinz/anaconda3/envs/pytorch-3.7/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG -fPIC -fvisibility=hidden   -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -masm=intel -std=c++14 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -c ../third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc
/tmp/ccAI34An.s: Assembler messages:
/tmp/ccAI34An.s:53: Error: operand size mismatch for `vbroadcastss'
/tmp/ccAI34An.s:55: Error: operand size mismatch for `vbroadcastss'

So do we need set the optimization flag to -O0 level when user use GCC 5.X or tell user that Fbgemm need higher GCC version to use AVX512 path?

from fbgemm.

dskhudia avatar dskhudia commented on May 22, 2024

Hi @sunnymoon155 , It's an issue with gcc assembler. Please use a newer version of gcc.

For underlying reason, please check out: https://stackoverflow.com/questions/35758644/gcc4-8-3-generating-invalid-asm-from-intrinsics-operand-size-mismatch

from fbgemm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.