This is an archive of the discontinued LLVM Phabricator instance.

Add ‘llvm.experimental.constrained.fma‘ Intrinsic
ClosedPublic

Authored by wdng on Aug 4 2017, 12:10 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng created this revision.Aug 4 2017, 12:10 PM
arsenm edited edge metadata.
arsenm added a subscriber: llvm-commits.

Needs tests

b-sumner added inline comments.Aug 4 2017, 12:23 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652

This is a ternary operation. Code below assumes unary or binary.

b-sumner edited edge metadata.Aug 4 2017, 12:28 PM

An update to docs/LangRef.rst is needed.

wdng updated this revision to Diff 109821.Aug 4 2017, 2:04 PM

Add missing lit tests.

wdng added a comment.Aug 4 2017, 2:13 PM

An update to docs/LangRef.rst is needed.

Sure, will do.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652

In Intrinsics.td, we have defined fma is a ternary operator. Here it only mutates STRICT_FMA to FMA and IsUnary is false by default. So we may not need to specify whether is unary of binary here?

b-sumner added inline comments.Aug 4 2017, 2:29 PM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652

Please take a look at lines 6676 - 6680 below. Do you not need to pass a 3 element list to MorphNodeTo for the FMA case?

andrew.w.kaylor requested changes to this revision.Aug 4 2017, 3:09 PM
andrew.w.kaylor added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652

You definitely need to add code to handle the third argument.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6015

This code also needs to be updated to handle the case of three value operands.

lib/IR/Verifier.cpp
3985

The implementation of this function assumes only 1 or 2 value operands. It will need to be updated.

test/Feature/fp-intrinsics.ll
236

If you checked the arguments here it should reveal the problems in the code.

There's also a test at llvm/tests/CodeGen/X86/fp-intrinsics.ll that carries the constrained FP intrinsics all the way through code generation. Can you add a case there for this intrinsic?

This revision now requires changes to proceed.Aug 4 2017, 3:09 PM

Could you also add a use of this new intrinsic to llvm/test/Verifier/fp-intrinsics.ll?

wdng updated this revision to Diff 110804.Aug 11 2017, 1:46 PM
wdng edited edge metadata.
wdng marked 3 inline comments as done.

Address code reviews.

wdng updated this revision to Diff 110808.Aug 11 2017, 1:53 PM

Upload correct diff.

b-sumner added inline comments.Aug 11 2017, 2:06 PM
docs/LangRef.rst
13035

Too much cut and paste from frem

13041

rounding only once

lib/Target/X86/X86ISelDAGToDAG.cpp
2017

Did you run clang-format?

wdng updated this revision to Diff 110823.Aug 11 2017, 3:30 PM
wdng marked 4 inline comments as done.

Code changes based on Brian's comments.

wdng marked 3 inline comments as done.Aug 11 2017, 3:31 PM
arsenm added inline comments.Aug 11 2017, 3:33 PM
test/CodeGen/X86/fp-intrinsics.ll
2

Missing -check-prefix=CHECK

248

You need a separate check-label for the FMAless run line

test/Feature/fp-intrinsics.ll
244

Should also test for the other FP types

docs/LangRef.rst
13035

I'm not sure it's clear what the comment "Note that the rounding happens only once here" means in this context. The rounding mode argument provides information to the optimizer and does not have any functional effect. I hope that this is straightforward enough with the other intrinsics that the terse comments there were sufficient.

In the case of the constrained fma intrinsic, it is worth mentioning that any actions the optimizer performs on the intrinsic must be consistent with the rounding behavior of an fma instruction. For instance, the optimizer cannot perform constant folding where a rounded multiply is performed followed a rounded add -- the rounding must be atomic. Perhaps that is what you intended to say here. If so, I believe a more verbose statement is needed.

13037

I think it would be a good idea to discuss here the circumstances under which this intrinsic can be formed. Specifically, what is the relationship between rounding mode control and the fp-contract setting. If strict rounding behavior is required within a scope, but fusing is enabled globally within the compilation unit does the rounding requirement override the fp-contract setting? I think it should.

Also, what are the expected exception semantics? If a scope is governed by strict exception behavior, how will the FP status flags be handled if a multiply and an add are fused? I believe what is required is that if either operation would have set an FP status flag then the fused operation must also set that flag, and no flag should be set by the fused operation that would not have been set by either of the two operations separately.

lib/Target/X86/X86ISelDAGToDAG.cpp
2015

Can you explain why this was necessary? I would have expected there to have been handling already in place for ISD::FMA.

wdng updated this revision to Diff 111268.Aug 15 2017, 3:36 PM
wdng marked 3 inline comments as done.

Address code reviews.

wdng added inline comments.Aug 15 2017, 3:42 PM
lib/Target/X86/X86ISelDAGToDAG.cpp
2015

No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is -mattr=+fma option. Without this, CodeGen/X86/fp-intrinsics.ll will fail in instruction selection.

lib/Target/X86/X86ISelDAGToDAG.cpp
2015

I still don't understand. What happens when -mattr=+fma is used?

The CodeGen/X86/fma.ll test uses that option. This case should work in the same way.

b-sumner added inline comments.Aug 15 2017, 4:15 PM
docs/LangRef.rst
13023

How about "...returns the result of a fused-multiply-add operation on its operands."?

13041

How about "The result produced is the product of the first two operands added to the third operand computed with infinite precision, and then rounded to the target precision."

wdng added inline comments.Aug 16 2017, 9:07 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
2015

I think I made a mistake when describing the problem in my early comments. Let me rephrase and explain it there.

  1. Without -mattr=+fma, a FMA libcall will be generated
  2. With -mattr=+fma, we are expecting the corresponding FMA instruction to be generated.

In fma.ll, all fma tests are *not* constrained fp operations, during the during the X86ISelLowering phase, the FMA node has been lowered to X86ISD::FMADD. So there is no ISD::FMA at this phase since it has already been changed to X86ISD::FMADD before the instruction selection starts. Please refer to the following dump.

(gdb) p CurDAG->dump()
SelectionDAG has 12 nodes:
  t0: ch = EntryToken
      t2: f64,ch = CopyFromReg t0, Register:f64 %vreg0
      t4: f64,ch = CopyFromReg t0, Register:f64 %vreg1
      t6: f64,ch = CopyFromReg t0, Register:f64 %vreg2
    t12: f64 = X86ISD::FMADD t2, t4, t6
  t10: ch,glue = CopyToReg t0, Register:f64 %XMM0, t12
  t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:f64 %XMM0, t10:1

However, for the constrained fma, we use mutateStrictFPToFP( ) function to mutate constrained_fma to normal fma, namely ISD::FMA before the instruction selction starts. The X86 backend cannot recognize the ISD::FMA, so we have to add codes to convert ISD::FMA to X86ISD::FMADD during the instruction selection.

wdng updated this revision to Diff 111412.Aug 16 2017, 1:31 PM

Update LangRef.rst based on comments.

wdng updated this revision to Diff 111423.Aug 16 2017, 2:10 PM

Update LangRef.rst: put more accurate descriptions into the constrained.fma semantic section.

b-sumner added inline comments.Aug 16 2017, 2:22 PM
docs/LangRef.rst
13042

Extra period

wdng updated this revision to Diff 111428.Aug 16 2017, 3:24 PM
wdng marked 2 inline comments as done.

Remove extra period. Thanks!

lib/Target/X86/X86ISelDAGToDAG.cpp
2015

I'm still not sure I understand this, but it sounds to me like this should be happening somewhere else.

Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA?

My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path.

wdng added inline comments.Aug 18 2017, 11:55 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
2015

Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA?

--> Yes

My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path.

---> I once tried to move the "mutateStrictFPToFP( )" to the LegalizeDAG phase, like the following code shows and I found it works and there is no need to add codes into X86 backend instruction selector:

    switch (Action) {
    case TargetLowering::Legal:
		if (Node->isStrictFPOpcode())
        Node = DAG.mutateStrictFPToFP(Node);
      return;

So once those strict fp operator haven legalized to legal, we can directly mutate them to their corresponding normal fp operator.

However, here comes a problem that non-default FP (or constrained fp operations) exception behaviors are target-specific, which means we have to leave it to each sub-target selectors to handle them. So I would not suggest mutating those instructions at somewhere. What do you think?

lib/Target/X86/X86ISelDAGToDAG.cpp
2015

I do think the mutate needs to be done as late as possible. I'm not even entirely certain that we won't need to figure out a way to communicate the FP constraints beyond instruction selection.

Would it be possible to have the mutateStrictFPToFP call (in its current location) call a target-specific hook to get a target-specific mutated node, so that we could convert directly to X86ISD::FMADD there?

Also, have you considered how non-X86 architectures need to handle this case?

Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?

Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?

That's exactly the kind of thing I was afraid of missing by not channeling this through the normal path that ISD::FMA takes.

I'm wondering if X86 really needs X86ISD::FMADD opcode at all. We definitely need FNMADD, FMSUB, and FNMSUB. But I don't think there's any real difference between X86ISD::FMADD and ISD::FMA.

include/llvm/CodeGen/ISDOpcodes.h
266

Please keep the blank line here.

lib/Target/X86/X86ISelDAGToDAG.cpp
2016

Add a comment that this is here because STRICT_FMA is turned into FMA after legalization and DAG combine.

wdng updated this revision to Diff 112215.Aug 22 2017, 1:13 PM
wdng marked 2 inline comments as done.

Address code reviews. Thanks a lot!

Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.

wdng added a comment.Aug 22 2017, 9:10 PM

Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.

Sure, thanks!

wdng updated this revision to Diff 112408.Aug 23 2017, 11:28 AM

Patch update after the patch [X86] Remove X86ISD::FMADD in favor ISD::FMA has been upstreamed.

wdng updated this revision to Diff 112411.Aug 23 2017, 11:45 AM

Fixed a format issue.

test/CodeGen/X86/fp-intrinsics.ll
245

These values could be constant folded without rounding, so even though this test case works now it's testing something that we don't necessarily want to be true. At some point, we're going to want to teach optimizations to recognize these intrinsics and fold cases like this. That's why I was using 42.1 in the other tests. It's just an arbitrary value that introduces rounding errors.

Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.

Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.

I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!

wdng added a comment.Aug 23 2017, 12:12 PM

Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.

Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.

I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!

Hi, Andrew, no problem at all. I will provide an updated full patch for this. Thanks a lot!

wdng updated this revision to Diff 112461.Aug 23 2017, 3:12 PM

Address code reviews. Thanks!

This revision is now accepted and ready to land.Aug 23 2017, 4:19 PM
This revision was automatically updated to reflect the committed changes.