Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
13829 | I haven't looked at this part of the compiler in a long time, so I was wondering how we handle FMF scope. It looks like there is already an FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will not affect anything but this CreateCall(). Does that match your understanding? Should we have an extra regression test to make sure that does not change? I am imagining something like: double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; return S; } Then we could confirm that reassoc is not applied to the fadd that follows the reduction call. | |
13829 | Currently (and we could say that this is an LLVM codegen bug), we will not generate the optimal/expected reduction with reassoc alone. I think the x86 reduction definition is implicitly assuming that -0.0 is not meaningful here, so we should add nsz too. The backend is expecting an explicit nsz on this op. Ie, I see this x86 asm currently with only reassoc: vextractf64x4 $1, %zmm0, %ymm1 vaddpd %zmm1, %zmm0, %zmm0 vextractf128 $1, %ymm0, %xmm1 vaddpd %xmm1, %xmm0, %xmm0 vpermilpd $1, %xmm0, %xmm1 vaddsd %xmm1, %xmm0, %xmm0 vxorpd %xmm1, %xmm1, %xmm1 <--- create 0.0 vaddsd %xmm1, %xmm0, %xmm0 <--- add it to the reduction result Alternatively (and I'm not sure where it is specified), we could replace the default 0.0 argument with -0.0? | |
clang/lib/Headers/avx512fintrin.h | ||
9300 | This is an existing text bug, but if we are changing this text, we might as well fix it in this patch - I'm not sure what "off" refers to here. Should that be "order"? | |
9303 | Typo: "floating-point types" | |
9304 | Also mention that sign of zero is indeterminate. We might use the LangRef text as a model for what to say here: |
Address Sanjay's comments. Thanks for the thoroughly review!
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
13829 | Confirmed by new tests. | |
13829 | I think there's no such assumption for fadd/fmul instructions. We do have it for fmin/fmax. So I think we don't need to add nsz here. | |
clang/lib/Headers/avx512fintrin.h | ||
9304 | Got it. Thanks! | |
9352 | -0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we can find a corner case. |
I haven't looked at this part of the compiler in a long time, so I was wondering how we handle FMF scope. It looks like there is already an FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will not affect anything but this CreateCall().
Does that match your understanding? Should we have an extra regression test to make sure that does not change?
I am imagining something like:
Then we could confirm that reassoc is not applied to the fadd that follows the reduction call.