Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
6651 | This is a ternary operation. Code below assumes unary or binary. |
Sure, will do.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
6651 | In Intrinsics.td, we have defined fma is a ternary operator. Here it only mutates STRICT_FMA to FMA and IsUnary is false by default. So we may not need to specify whether is unary of binary here? |
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
6651 | Please take a look at lines 6676 - 6680 below. Do you not need to pass a 3 element list to MorphNodeTo for the FMA case? |
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
6651 | You definitely need to add code to handle the third argument. | |
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
6015 | This code also needs to be updated to handle the case of three value operands. | |
lib/IR/Verifier.cpp | ||
3985 | The implementation of this function assumes only 1 or 2 value operands. It will need to be updated. | |
test/Feature/fp-intrinsics.ll | ||
236 | If you checked the arguments here it should reveal the problems in the code. There's also a test at llvm/tests/CodeGen/X86/fp-intrinsics.ll that carries the constrained FP intrinsics all the way through code generation. Can you add a case there for this intrinsic? |
Could you also add a use of this new intrinsic to llvm/test/Verifier/fp-intrinsics.ll?
docs/LangRef.rst | ||
---|---|---|
13035 | I'm not sure it's clear what the comment "Note that the rounding happens only once here" means in this context. The rounding mode argument provides information to the optimizer and does not have any functional effect. I hope that this is straightforward enough with the other intrinsics that the terse comments there were sufficient. In the case of the constrained fma intrinsic, it is worth mentioning that any actions the optimizer performs on the intrinsic must be consistent with the rounding behavior of an fma instruction. For instance, the optimizer cannot perform constant folding where a rounded multiply is performed followed a rounded add -- the rounding must be atomic. Perhaps that is what you intended to say here. If so, I believe a more verbose statement is needed. | |
13037 | I think it would be a good idea to discuss here the circumstances under which this intrinsic can be formed. Specifically, what is the relationship between rounding mode control and the fp-contract setting. If strict rounding behavior is required within a scope, but fusing is enabled globally within the compilation unit does the rounding requirement override the fp-contract setting? I think it should. Also, what are the expected exception semantics? If a scope is governed by strict exception behavior, how will the FP status flags be handled if a multiply and an add are fused? I believe what is required is that if either operation would have set an FP status flag then the fused operation must also set that flag, and no flag should be set by the fused operation that would not have been set by either of the two operations separately. | |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
2015 ↗ | (On Diff #110823) | Can you explain why this was necessary? I would have expected there to have been handling already in place for ISD::FMA. |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2015 ↗ | (On Diff #110823) | No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is -mattr=+fma option. Without this, CodeGen/X86/fp-intrinsics.ll will fail in instruction selection. |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2015 ↗ | (On Diff #110823) | I still don't understand. What happens when -mattr=+fma is used? The CodeGen/X86/fma.ll test uses that option. This case should work in the same way. |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2015 ↗ | (On Diff #110823) | I think I made a mistake when describing the problem in my early comments. Let me rephrase and explain it there.
In fma.ll, all fma tests are *not* constrained fp operations, during the during the X86ISelLowering phase, the FMA node has been lowered to X86ISD::FMADD. So there is no ISD::FMA at this phase since it has already been changed to X86ISD::FMADD before the instruction selection starts. Please refer to the following dump. (gdb) p CurDAG->dump() SelectionDAG has 12 nodes: t0: ch = EntryToken t2: f64,ch = CopyFromReg t0, Register:f64 %vreg0 t4: f64,ch = CopyFromReg t0, Register:f64 %vreg1 t6: f64,ch = CopyFromReg t0, Register:f64 %vreg2 t12: f64 = X86ISD::FMADD t2, t4, t6 t10: ch,glue = CopyToReg t0, Register:f64 %XMM0, t12 t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:f64 %XMM0, t10:1 However, for the constrained fma, we use mutateStrictFPToFP( ) function to mutate constrained_fma to normal fma, namely ISD::FMA before the instruction selction starts. The X86 backend cannot recognize the ISD::FMA, so we have to add codes to convert ISD::FMA to X86ISD::FMADD during the instruction selection. |
Update LangRef.rst: put more accurate descriptions into the constrained.fma semantic section.
docs/LangRef.rst | ||
---|---|---|
13040 | Extra period |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2015 ↗ | (On Diff #110823) | I'm still not sure I understand this, but it sounds to me like this should be happening somewhere else. Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path. |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2015 ↗ | (On Diff #110823) | Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? --> Yes My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path. ---> I once tried to move the "mutateStrictFPToFP( )" to the LegalizeDAG phase, like the following code shows and I found it works and there is no need to add codes into X86 backend instruction selector: switch (Action) { case TargetLowering::Legal: if (Node->isStrictFPOpcode()) Node = DAG.mutateStrictFPToFP(Node); return; So once those strict fp operator haven legalized to legal, we can directly mutate them to their corresponding normal fp operator. However, here comes a problem that non-default FP (or constrained fp operations) exception behaviors are target-specific, which means we have to leave it to each sub-target selectors to handle them. So I would not suggest mutating those instructions at somewhere. What do you think? |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2015 ↗ | (On Diff #110823) | I do think the mutate needs to be done as late as possible. I'm not even entirely certain that we won't need to figure out a way to communicate the FP constraints beyond instruction selection. Would it be possible to have the mutateStrictFPToFP call (in its current location) call a target-specific hook to get a target-specific mutated node, so that we could convert directly to X86ISD::FMADD there? Also, have you considered how non-X86 architectures need to handle this case? |
Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?
That's exactly the kind of thing I was afraid of missing by not channeling this through the normal path that ISD::FMA takes.
I'm wondering if X86 really needs X86ISD::FMADD opcode at all. We definitely need FNMADD, FMSUB, and FNMSUB. But I don't think there's any real difference between X86ISD::FMADD and ISD::FMA.
include/llvm/CodeGen/ISDOpcodes.h | ||
---|---|---|
266 | Please keep the blank line here. | |
lib/Target/X86/X86ISelDAGToDAG.cpp | ||
2016 ↗ | (On Diff #111428) | Add a comment that this is here because STRICT_FMA is turned into FMA after legalization and DAG combine. |
Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.
Patch update after the patch [X86] Remove X86ISD::FMADD in favor ISD::FMA has been upstreamed.
test/CodeGen/X86/fp-intrinsics.ll | ||
---|---|---|
245 | These values could be constant folded without rounding, so even though this test case works now it's testing something that we don't necessarily want to be true. At some point, we're going to want to teach optimizations to recognize these intrinsics and fold cases like this. That's why I was using 42.1 in the other tests. It's just an arbitrary value that introduces rounding errors. |
Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.
Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.
I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!
Hi, Andrew, no problem at all. I will provide an updated full patch for this. Thanks a lot!
How about "...returns the result of a fused-multiply-add operation on its operands."?