The llvm types used for the mask operand of AVX maskload/maskstore intrinsics are incorrect.
In particular:
- The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd should be of type llvm_v2i64_ty and not llvm_v2f64_ty.
- The mask argument for builtin_ia32_maskloadpd256 and builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not llvm_v4f64_ty.
- The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps should be of type llvm_v4i32_ty and not llvm_v4f32_ty.
- The mask argument for builtin_ia32_maskloadps256 and builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not llvm_v8f32_ty.
Basically, the mask type for maskload/maskstore GCC builtins is never a vector of packed floats/doubles.
I also noticed that Clang definitions for those builtins are incorrect in BuiltinsX86.def. Also, Clang header file avxintrin.h definitions for maskload/maskstore intrinsics wrongly use packed floats/doubles instead of packed int/long for the mask operands.
For example, _mm_maskstore_pd is currently defined in avxintrin.h as:
static __inline __m256 __DEFAULT_FN_ATTRS _mm_maskstore_pd(double *__p, __m128d __m, __m128d __a) { __builtin_ia32_maskstorepd((__v2df *)__p, (__v2df)__m, (__v2df)__a); }
According to the Intel documentation, the correct prototype for _mm_maskstore_pd should be:
_mm_maskstore_pd(double *p, m128i m, m128d __a).
So, I think the definition should be something like:
static __inline __m256 __DEFAULT_FN_ATTRS _mm_maskstore_pd(double *__p, __m128i __m, __m128d __a) { __builtin_ia32_maskstorepd((__v2df *)__p, (__v2di)__m, (__v2df)__a); }
If you agree with this patch, I plan to send a follow-on patch (this time a Clang patch) to also fix intrinsic header file avxintrin.h (and the prototype definitions for the x86 maskload/store builtins in BuiltinsX86.def).
Please let me know if okay to submit.
-Andrea