Comments (4)
It works fine with gcc >= 5.4.
from fbgemm.
Hi @sunnymoon155 , could you help run make -d
to print more debug information?
from fbgemm.
@jiecaoyu, GCC 5 can't support AVX512 well, it will be better to use a higher GCC version, at least >6.3, there has a test case which you can do a test:
#include <immintrin.h>
#include <cstdint>
#include <cstdlib>
#define float16 std::uint16_t
inline void FloatToFloat16KernelAvx512WithClip(const float* src, float16* dst) {
constexpr float FP16_MAX = 65504.f;
__m512 neg_fp16_max_vector = _mm512_set1_ps(-FP16_MAX);
__m512 pos_fp16_max_vector = _mm512_set1_ps(FP16_MAX);
__m512 float_vector = _mm512_loadu_ps(src);
// Do the clipping.
float_vector = _mm512_max_ps(
neg_fp16_max_vector, _mm512_min_ps(float_vector, pos_fp16_max_vector));
__m256i half_vector = _mm512_cvtps_ph(
float_vector, (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC));
_mm256_storeu_si256((__m256i*)dst, half_vector);
}
void FloatToFloat16_avx512(
const float* src,
float16* dst,
int size,
bool do_clip) {
if (do_clip) {
int i = 0;
for (i = 0; i + 16 <= size; i += 16) {
FloatToFloat16KernelAvx512WithClip(src + i, dst + i);
}
//FloatToFloat16_avx2(src + i, dst + i, size - i, do_clip);
} else {
int i = 0;
for (i = 0; i + 16 <= size; i += 16) {
//FloatToFloat16KernelAvx512(src + i, dst + i);
}
//FloatToFloat16_avx2(src + i, dst + i, size - i);
}
}
int main() {
return 0;
}
if we use -O1, -O2 or -O3 flag for gcc 5(5.3.1 in my side), such as
g++ -O1 -mavx512f -mavx512bw -mavx512dq -mavx512vl -masm=intel -std=c++11 test.cpp
there will have a build error:
/tmp/cchHLPDt.s: Assembler messages:
/tmp/cchHLPDt.s:14: Error: operand size mismatch for `vbroadcastss'
/tmp/cchHLPDt.s:15: Error: operand size mismatch for `vbroadcastss'
but for GCC 6.3, all optimization level can works.
@jianyuh, in PyTorch side, it always use -O3 optimization flag when build fbgemm:
[1194/4048] Building CXX object third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o
/opt/rh/devtoolset-4/root/usr/bin/c++ -DFBGEMM_STATIC -DTH_BLAS_MKL -I../third_party/cpuinfo/include -I../third_party/fbgemm/third_party/asmjit/src -I../third_party/fbgemm/include -I../third_party/fbgemm -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem /home/xiaobinz/anaconda3/envs/pytorch-3.7/include -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -O3 -DNDEBUG -fPIC -fvisibility=hidden -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -masm=intel -std=c++14 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -c ../third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc
/tmp/ccAI34An.s: Assembler messages:
/tmp/ccAI34An.s:53: Error: operand size mismatch for `vbroadcastss'
/tmp/ccAI34An.s:55: Error: operand size mismatch for `vbroadcastss'
So do we need set the optimization flag to -O0 level when user use GCC 5.X or tell user that Fbgemm need higher GCC version to use AVX512 path?
from fbgemm.
Hi @sunnymoon155 , It's an issue with gcc assembler. Please use a newer version of gcc.
For underlying reason, please check out: https://stackoverflow.com/questions/35758644/gcc4-8-3-generating-invalid-asm-from-intrinsics-operand-size-mismatch
from fbgemm.
Related Issues (20)
- import fbgemm_gpu get a undefined symbol error: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE HOT 3
- fbgemm_gpu_py.so: undefined symbol: _ZN6fbgemm8internal31compressed_indices_remap_avx512IiLb1EEEviPKT_PKiS4_PKfPS2_S9_Pf HOT 1
- Undefined symbol _ZN6asmjit8_abi_1_910JitRuntimeD1Ev HOT 2
- Any plan to publish fbgemm_gpu that supports cuda 12.1? HOT 9
- Support MacOS? HOT 1
- `.to("meta")` is leaked to the public main branch in the tests. HOT 2
- Can't compile FBGEMM with GCC 12.3.0 HOT 3
- compiling FBGEMM for ARM HOT 3
- How `partition_avx512` is auto-tuned? HOT 3
- Compiling on windows with mingw
- Error importing fbgemm_gpu HOT 15
- nvcc fatal : Unknown option '-mavx' HOT 5
- Build failure on MacOS HOT 6
- momentum for SGD/Adagrad HOT 2
- RuntimeError: No such operator fbgemm::jagged_2d_to_dense HOT 3
- Having issue installing FBGEMM-gpu on MacOS HOT 8
- Latest FBGEMM doesn't build with latest PyTorch HOT 1
- fbgemm_gpu doesn't build for CPU because impl_abstract_pystub is not found HOT 1
- [FBGEMM_GPU Question] When should I use FusedEmbeddingBagCollection over EmbeddingBagCollection?
- [Question] What does device / embedding_specs.compute_device parameter in ctor of TBE mean? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fbgemm.