_mm_mmask_i64gather_epi32 and _mm_mmask_i64gather_ps operates the lower 64 bits and zeroes the higher 64 bits of the return value . Old version steps 64 bits in the do_intrin_ loop, which will overlap higher 64 bits of the last iteration. This is a wrong usage of the intrin. Especially when compiler malloc dst128_f and mask128 's address adjacent, this test will run fail, happenning in LLVM HEAD. So I modified the loop step into 128 bits, and add a new check function to check these two intrins (failed to find similar check function other tests).
Diff Detail
Diff Detail
Event Timeline
SingleSource/UnitTests/Vector/AVX512VL/i64gather_32.c | ||
---|---|---|
54 | I wonder if it would be better to change to a 64-bit store _mm_storel_epi64 instead? |
SingleSource/UnitTests/Vector/AVX512VL/i64gather_32.c | ||
---|---|---|
54 | I'm confused on this, too. A 64-bit store is disable to verify the higher 64 bits of the return value (even if they are zero). I think it depends on the original design purpose of this test suite. |
Comment Actions
LGTM
SingleSource/UnitTests/Vector/AVX512VL/i64gather_32.c | ||
---|---|---|
54 | Ok, then what you've done here seems fine. |
I wonder if it would be better to change to a 64-bit store _mm_storel_epi64 instead?