This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Rough out constrained FCmp intrinsics
AbandonedPublic

Authored by cameron.mcinally on Nov 16 2018, 1:36 PM.

Details

Summary

This is a continuation of the abandoned D54121...

Here's a big patch for the constrained FCmp intrinsics. There's a lot here, so I did not include the full set of constrained FCmp intrinsics or tests yet. I intend to add those in a separate patch.

These intrinsics will require some auxiliary constructs in TableGen, which make up the bulk of this code. Those are:

  • any_bool_ty
  • LLVMScalarOrVectorSameWidth

I'll also point out some things that I'm unsure about inline. Hopefully others are more familiar with that code.

Diff Detail

Repository
rL LLVM

Event Timeline

include/llvm/CodeGen/ValueTypes.td
153

I'm not sure if it's safe to reorder these values. I had to cluster these entries together since there's a Duff's device in utils/TableGen/IntrinsicEmitter.cpp that depends on it.

Unfortunately, the new bAny entry could not be added to the end of the *Any defs list, since the last value is 255 and the container is an 8-bit unsigned int.

include/llvm/IR/Intrinsics.td
167

This should probably be more strict. We could end up with something like <2 <2 x double>> as an argument. There's some C++ code later that should catch that case, but we could probably come up with a better class to extend here.

include/llvm/Support/MachineValueType.h
219

This is the same value reordering mentioned in the comment for include/llvm/CodeGen/ValueTypes.td.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1226

Has anyone given any thought to how we will Custom lower Strict nodes? This is punting too soon, but I wanted to get the code out there to review before the changeset became too large.

nhaehnle added inline comments.
include/llvm/IR/Intrinsics.td
167

Either way, it needs a comment that explains what it does :)

nhaehnle added inline comments.Nov 19 2018, 3:57 AM
include/llvm/IR/Intrinsics.td
167

I should say that my first thought at the name was that it's about bit width, which it's not, so maybe the name could be improved as well? And the num parameter doesn't currently go anywhere, which means that this can't work. The class should instead derive from LLVMMatchType<num>.

In general, I think going forward the back-ends should be explicitly aware of the STRICT_ FP nodes. Anyway, that's the direction my proposed SystemZ patch takes: https://reviews.llvm.org/D45576

While the above patch currently only forwards Legal STRICT_ nodes to the back-end, it would only be logical to also forward Custom STRICT_ nodes, so that the target can handle them as necessary.

So I'm not sure why you're adding special code to handle Custom STRICT_ nodes in common code -- that seems to somewhat contradict the very notion of Custom. I believe if the target does mark any STRICT_ node as Custom, it expects to handle those for itself.

(A different issue may be where a STRICT_ node is marked as Expand and therefore translated to a non-strict node -- but that non-strict node was already marked Custom by the target. We may have to handle that case specially somehow, I'm not sure ...)

Update TableGen changes, as suggested by @nhaehnle.

Also, renamed LLVMScalarOrVectorSameWidth to LLVMVectorSameLengthOrScalar. Is that easier to parse? [Note that I previously copied "Width" from existing code, but it should really be "Length".]

cameron.mcinally updated this revision to Diff 175563.EditedNov 27 2018, 1:02 PM

@uweigand, what about something like this patch? The STRICT_FSETCC isn't handled all the way through the target, but it's stubbed out for now.

The bulk of this patch is getting any_bool_ty and LLVMVectorSameLengthOrScalar constructs in place. There's still a number of constrained FCMP intrinsics to add later, so I was thinking about adding the target support with those.

@uweigand, what about something like this patch? The STRICT_FSETCC isn't handled all the way through the target, but it's stubbed out for now.

See inline comments. Otherwise, I think this makes sense ...

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
81

Just to clarify: the reason why we need to handle STRICT_FSETCC here but not any of the other STRICT_ opcodes is that only FSETCC involves an integer type (as return type), right?

lib/Target/X86/X86ISelLowering.cpp
26106

Why do you have this here? It seems that this can never be called right now (at least without something like my D45576, right?)

cameron.mcinally marked 2 inline comments as done.Dec 3 2018, 8:08 AM
cameron.mcinally added inline comments.
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
81

That's correct.

lib/Target/X86/X86ISelLowering.cpp
26106

It actually will be called. Some SETCCs are always custom lowered on x86, even for scalars.

The other constrained vector operations are all legal or expanded to legal scalar operations. SETCC is the first operation that needs the special lowering.

Just to be pedantic, I do realize this code is punting right now. The TableGen changes in this patch are quite involved. So I'd like for us to focus on those first. I will handle the STRICT_FSETCC peculiarities in a future patch.

uweigand added inline comments.Dec 3 2018, 8:34 AM
lib/Target/X86/X86ISelLowering.cpp
26106

But still, I don't see any place where common code would even call the back-end's getOperationAction on a STRICT_FSETCC.

E.g. when calling getStrictFPOperationAction with a STRICT_FSETCC, it will actually call the back-end with SETCC (not STRICT_FSETCC). So this should already return Custom for you ...

cameron.mcinally marked an inline comment as done.Dec 3 2018, 9:37 AM
cameron.mcinally added inline comments.
lib/Target/X86/X86ISelLowering.cpp
26106

But still, I don't see any place where common code would even call the back-end's getOperationAction on a STRICT_FSETCC.

I agree with this statement. We call getStrictFPOperationAction(...) which will return Custom from SETCC (for the STRICT_FSETCC node).

E.g. when calling getStrictFPOperationAction with a STRICT_FSETCC, it will actually call the back-end with SETCC (not STRICT_FSETCC). So this should already return Custom for you ...

I also agree with this, but when the backend returns Custom for SETCC, we have to then Custom lower the STRICT_FSETCC node. We do not mutate the STRICT_FSETCC node into SETCC node in LegalizeOp(), where we call getStrictFPOperationAction().

I'm not sure I fully understand your point though (although I may admittedly be making a fundamental error). If you comment out this line in the code, you'll see the backtrace leading up to the problem.

uweigand added inline comments.Dec 3 2018, 9:59 AM
lib/Target/X86/X86ISelLowering.cpp
26106

Ah OK, now I see it.

I guess what confused me most is that you added the setOperationAction statement for STRICT_FSETCC further up. That statement really is the one that doesn't seem to have any effect.

This statement does have effect, it's just a bit weird that the reason why LowerOperation is called with a STRICT_FSETCC node is that setOperationAction for SETCC (not STRICT_FSETCC!) is Custom.

Taking a step back, I think the original thought behind the getStrictFPOperationAction logic was that if the target doesn't do anything special, all strict FP operations will just transparently work as if they were regular FP operations. But that is not (any longer?) the case if the regular FP operation is marked as Custom: there will have to be target support, or else it doesn't work at all.

The question is, is it worth trying to preseve that original thought? I guess one way to do that would be for common code to handle STRICT_ nodes where the corresponding regular node is Custom by mutating the node before calling LowerOperation on the regular node. (Later, when we add support for the target to actually handle STRICT_ nodes directly, we'd first check whether the STRICT_ node itself is marked Custom, and then pass it directly to the back-end.)

cameron.mcinally marked an inline comment as done.Dec 3 2018, 10:56 AM
cameron.mcinally added inline comments.
lib/Target/X86/X86ISelLowering.cpp
26106

I guess what confused me most is that you added the setOperationAction statement for STRICT_FSETCC further up. That statement really is the one that doesn't seem to have any effect.

You're right. I'll correct it now. Sorry about that.

Taking a step back, I think the original thought behind the getStrictFPOperationAction logic was that if the target doesn't do anything special, all strict FP operations will just transparently work as if they were regular FP operations. But that is not (any longer?) the case if the regular FP operation is marked as Custom: there will have to be target support, or else it doesn't work at all.

Yes. This is the first StrictFP node that always needs to be Custom lowered. The others were general enough to not cause any problems.

The question is, is it worth trying to preseve that original thought? I guess one way to do that would be for common code to handle STRICT_ nodes where the corresponding regular node is Custom by mutating the node before calling LowerOperation on the regular node. (Later, when we add support for the target to actually handle STRICT_ nodes directly, we'd first check whether the STRICT_ node itself is marked Custom, and then pass it directly to the back-end.)

That was what I originally did with the patch, but then changed it to this. :D

I don't know if there's a huge difference between the two implementations. Both are weird (and really not right). They mutate the StrictFP node too soon.

In any case, this problem originated from not having backend support for strict nodes in place. So either solution is temporary. When we have backend support for strict nodes, this code will go away.

Remove setting of STRICT_FSETCC Custom lowering, as @uweigand suggested.

uweigand added inline comments.Dec 4 2018, 7:36 AM
lib/Target/X86/X86ISelLowering.cpp
26106

OK, thanks for the explanation. I'm not really sure if I'd prefer this current implementation over your original one, but as you say, they're really both wrong anyway, so I wouldn't want to hold things up any further over this.

lib/Target/X86/X86ISelLowering.cpp
26106

I still have the old patch, so can resurrect it if needed.

There is one drawback from that old patch though. It really kludged up the generic code for TargetLowering::Custom in LegalizeOp(). That code is built around a switch fall through and mutating the StrictFP node was awkward.

That said, the one obvious strength of that old patch is the Custom StrictFP node lowering is pretty general. If/when more Custom lowered StrictFP nodes are seen, they should just do the right thing without additional effort.

I'll post a snippet of that patch so you can see what I mean...

Old patch:

Index: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -1198,30 +1199,36 @@
     case TargetLowering::Legal:
       LLVM_DEBUG(dbgs() << "Legal node: nothing to do\n");
       return;
-    case TargetLowering::Custom:
+    case TargetLowering::Custom: {
       LLVM_DEBUG(dbgs() << "Trying custom legalization\n");
+      SDNode *N = Node;
+
+      if (N->isStrictFPOpcode())
+        N = DAG.mutateStrictFPToFP(Node); 
+
       // FIXME: The handling for custom lowering with multiple results is
       // a complete mess.
-      if (SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG)) {
-        if (!(Res.getNode() != Node || Res.getResNo() != 0))
+      if (SDValue Res = TLI.LowerOperation(SDValue(N, 0), DAG)) {
+        if (!(Res.getNode() != N || Res.getResNo() != 0))
           return;
 
-        if (Node->getNumValues() == 1) {
+        if (N->getNumValues() == 1) {
           LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
           // We can just directly replace this node with the lowered value.
-          ReplaceNode(SDValue(Node, 0), Res);
+          ReplaceNode(SDValue(N, 0), Res);
           return;
         }
 
         SmallVector<SDValue, 8> ResultVals;
-        for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)
+        for (unsigned i = 0, e = N->getNumValues(); i != e; ++i)
           ResultVals.push_back(Res.getValue(i));
         LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
-        ReplaceNode(Node, ResultVals.data());
+        ReplaceNode(N, ResultVals.data());
         return;
       }

I think I prefer the backend solution since it hides the weirdness in the LowerSTRICT_FSETCC(...) function:

SDValue X86TargetLowering::LowerSTRICT_FSETCC(SDValue Op, SelectionDAG &DAG) const {
  // Mutate away for now...
  Op = SDValue(DAG.mutateStrictFPToFP(Op.getNode()), 0);
  return LowerSETCC(Op, DAG);
}

The one problem with your "new" code is that it now forces back-ends to implement something, or else code involving constrained intrinsics will trigger internal compiler errors. It might be preferable to avoid those ...

But I also agree that your "old" code is a bit awkward. Ideally, I'd have preferred to see the default handling of strict opcodes in the absence of target support to work just via regular legalization, i.e. remove all the special-cased getStrictFPOperationAction and mutateStrictFPToFP stuff. Instead, the strict opcodes would be treated like everything else, except they all have a default action of Expand, and the associated expand action (in the same place where all the other expand actions are) would simply replace the strict opcode by the corresponding normal opcode.

If that normal opcode then has an operation action of Custom, that should just automatically work I think (since the result of an Expand rule is just normally run through legalization again).

And any target that wants to actually properly handle strict opcodes would simply overwrite the default action by either Legal (if it can directly handle the strict opcode) or Custom (if it wants LowerOperation to be called with the strict opcode to do anything special).

But that's not where we are today, and I'm not sure if it is really worthwhile to rework that code at this point ...

The one problem with your "new" code is that it now forces back-ends to implement something, or else code involving constrained intrinsics will trigger internal compiler errors. It might be preferable to avoid those ...

Ah, I see. This is a problem.

But I also agree that your "old" code is a bit awkward. Ideally, I'd have preferred to see the default handling of strict opcodes in the absence of target support to work just via regular legalization, i.e. remove all the special-cased getStrictFPOperationAction and mutateStrictFPToFP stuff. Instead, the strict opcodes would be treated like everything else, except they all have a default action of Expand, and the associated expand action (in the same place where all the other expand actions are) would simply replace the strict opcode by the corresponding normal opcode.

If that normal opcode then has an operation action of Custom, that should just automatically work I think (since the result of an Expand rule is just normally run through legalization again).

And any target that wants to actually properly handle strict opcodes would simply overwrite the default action by either Legal (if it can directly handle the strict opcode) or Custom (if it wants LowerOperation to be called with the strict opcode to do anything special).

Looking closer, I may have a better solution for the SETCC case. It won't solve the general problem of nodes needing to be Custom lowered though. Will report back soon...

Revert back to mutating a temp before Custom lowering.

Looking closer, I may have a better solution for the SETCC case. It won't solve the general problem of nodes needing to be Custom lowered though. Will report back soon...

This was a dead end...

A quiet ping on this, since this is my last working day of the year...

This problem is awkward, so I understand if there is hesitation to move forward. There is a need to get this problem solved for targets that will be slow to adopt the STRICT_ nodes though.

I'm a bit out of my depth with the tablegen type handling stuff. I have some minor comments regarding naming consistency. Everything looks reasonable to me.

Perhaps it would be worth renaming this review to make clear that it is primarily an extension to the tablegen type handling and only implements part of the strict FCMP stuff in order to have a use for the new type.

include/llvm/IR/Intrinsics.h
104

This should probably be called SameVectorWidthOrScalarArgument for consistency.

include/llvm/IR/Intrinsics.td
608

You have a problem with line wrap here.

lib/IR/Function.cpp
693

IIT_SAME_VEC_WIDTH_OR_SCALAR_ARG

include/llvm/IR/Intrinsics.h
104

Ah, so we actually discussed this in one of the first comments. My view is that Vector Width describes the bit-width of the vector. And Vector Length describes the number of elements in the vector. If everyone agrees, then it would be correct as VecLength.

include/llvm/IR/Intrinsics.h
104

Does 'length' mean the same thing that 'width' means in SameVecWidthArgument on the previous line? I'm happy with either wording as long as they're consistent.

cameron.mcinally marked an inline comment as done.Dec 29 2018, 11:42 AM
cameron.mcinally added inline comments.
include/llvm/IR/Intrinsics.h
104

It does not. E.g. <2 x i32> has a length of 2 and width of 64.

andrew.w.kaylor added inline comments.Jan 2 2019, 4:24 PM
include/llvm/IR/Intrinsics.h
104

Is that what we want? For instance, would you want to be able to compare a <4 x f32> with a <2 x f64>?

What I was expecting was something that would require the vector to have the same element type and the same number of elements.

include/llvm/IR/Intrinsics.h
104

Let's look at a hard example:

def int_experimental_constrained_fcmpeq : Intrinsic<[ llvm_anybool_ty ],
  [ LLVMVectorSameLengthOrScalar<0, llvm_anyfloat_ty>,
    LLVMVectorSameLengthOrScalar<0, llvm_anyfloat_ty>,
    llvm_metadata_ty,
    llvm_metadata_ty ]>;

We want the vector length to match between the <N x i1> return type and the <N x fp> operands. Both input arguments have the same type.

Now that I look at it, the 2nd LLVMVectorSameLengthOrScalar could probably just be a:

LLVMMatchType<1>

andrew.w.kaylor added inline comments.Jan 8 2019, 5:24 PM
include/llvm/IR/Intrinsics.h
104

Oh, I'm sorry. I read your explanation backward. For some reason I had it in my head that the vector width (128/256 or whatever) was what you were introducing.

Now I understand, and your choice makes perfect sense.

include/llvm/IR/Intrinsics.h
104

No sweat.

I'll try to address your other comments in the near future. Some higher priority things have come up on my end. Stay tuned...

Bah, the following patch poked holes in this Diff:

------------------------------------------------------------------------
r350421 | spatel | 2019-01-04 11:48:13 -0600 (Fri, 04 Jan 2019) | 18 lines

[x86] lower extracted fadd/fsub to horizontal vector math; 2nd try

The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.

Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

r350421 is now custom lowering FADD and FSUB, and it breaks the generic Custom lowering changes I made to handle STRICT nodes.

Perhaps we should wait until D55506 has landed, since that would allow us to handle STRICT_FSETCC directly in the backend? I'll dig into this more and post soon...

jsji added a subscriber: jsji.Feb 25 2019, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2019, 10:43 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
cameron.mcinally abandoned this revision.Jun 3 2019, 8:30 AM

This is just taking up space, so abandoning it. I'll resurrect the useful parts if needed.

r350421 is now custom lowering FADD and FSUB, and it breaks the generic Custom lowering changes I made to handle STRICT nodes.

Perhaps we should wait until D55506 has landed, since that would allow us to handle STRICT_FSETCC directly in the backend? I'll dig into this more and post soon...

Now that this has landed, are you planning to have another look at this?

Now that this has landed, are you planning to have another look at this?

Thanks, Ulrich. I will dig into this.

I'm just wrapping up (I think) some other work with unary FNeg. If someone would like to pick this Diff up in the meantime, I have no objections...