This patch is to support transform something like
_mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph()))
to _mm512_fmadd_pch(a, b, acc).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47481 | Do we really need this output here? Simplify it a bit? Something like you wrote "Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)"? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47522 | I think we can use bool IsConj, SDValue MulOp0, MulOp0 instead of CFmul. Then you don't need to create a temp mul node. | |
47529–47530 | Better to add parentheses. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47522 | They are temp variables rather than nodes. And compiler may likly optimize them. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47517–47518 | Can these be MulOp0 = Op0->getOperand(1); MulOp1 = Op0->getOperand(2); | |
47522 | I think we can then use if ((Opcode == X86ISD::VFMULC || Opcode == X86ISD::VFCMULC)) { ... return true; } if ((Opcode == X86ISD::VFMADDC || Opcode == X86ISD::VFCMADDC) ... { ... return true; } return false; | |
47533–47534 | I think we can remove the assert now. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47514–47517 | Why we still need this? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47512 | We don't need else after return. See the Lint comment. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47514–47517 | We need transfer FMA(A, B 0) to MUL(A, B) firstly. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47514–47517 | My bad. I got what's your mean. |
Can we make the title of this patch more obvious that we're talking about the f16 packed complex operations? FMA and FADD make it look generic.
I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.
I agree, we may need to check -0.0 and 0.0 only when 'hasNoSignedZeros' is true?
We need to mention complex FMA in the title.
There are special rules for adding or subtracting signed zero:
- x + (+0) = x (for x different than 0)
- x + (-0) = x (for x different than 0)
- (-0) + (-0) = (-0) - (+0) = -0
- (+0) + (+0) = (+0) - (-0) = +0
- x - x = x + (-x) = +0 for any finite x unless rounding towards negative infininity then the result is -0 instead
If A * B is -0, then FMA(A, B, +0) would be (-0 + +0) or (-0 - (-0)) which should produce +0 by the last rule above with x = -0. This is different than FMUL(A,B) which we said was -0.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47588–47597 | This seems been changed unconsciously. | |
47616–47617 | The indentation is wrong too. The same below. |
The format in this file was wrongly formatted.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47538 | Yeah, I prefer to checking both in line 47582. | |
47598 | Should this be AllowContract(Op0->getFlags()) && (ISD::isBuildVectorAllZeros(Op0->getOperand(0).getNode()) && Op0->getFlags().hasNoSignedZeros()) || IsVectorAllNegativeZero(Op0->getOperand(0).getNode())) I.e, check AllowContract together with IsVectorAllNegativeZero as well. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47588–47597 | Sorry for this. Looks like I accidentally do some change here. | |
47598 | AllowContract will check hasNoSignedZeros(). It seems that we can only do this combination when the fast-math flag is set, No matter if the third operand is +0.0 or 0.0. |
Thanks for you review. The order of operand is changed in final commit: The addend of FMA builtin is moved from the first to the third.
Do we really need this output here? Simplify it a bit? Something like you wrote "Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)"?