Page MenuHomePhabricator

[FPEnv][WIP] Constrained FCmp intrinsics
Needs ReviewPublic

Authored by uweigand on Mon, Oct 21, 2:10 PM.

Details

Summary

This is a continuation of https://reviews.llvm.org/D54649, which was abandoned by @cameron.mcinally

The major differences to Cameron's approach include:

  • I did not actually require any TableGen changes, but was able to represent the overloaded intrinsic types using existing mechanisms. This solution looks correct to me, but if I'm missing anything here, please let me know ...
  • This adds a full set of compare intrinsics for all comparison codes.
  • I've added (mostly) complete SystemZ back-end support to actually generate correct code for all of them -- this is also to verify that the use of Custom expansion of strict operations actually allows the back-end to do what it needs to do.

This patch is still not complete, but I wanted to show it now to ask for feedback.

Some areas that definitely need more work are:

  • Vector type legalization of invalid vector types involving STRICT_FSETCC doesn't work yet.
  • Signalling comparisons are not supported yet.
  • We may also need strict versions of SELECT_CC and BR_CC. I haven't implemented those yet since they aren't really necessary for SystemZ, but some other platforms may require them.
  • The X86 back-end changes are incomplete, they're right now the bare minimum to keep Cameron's two test cases working.
  • There are many SETCC-related transformations and optimizations in the SelectionDAG codegen. Many of these are also valid for the strict FP case. I haven't found a simple way yet to write those transformations so that can easily handle both cases without duplicating a lot of boilerplate code ... Right now the patch implements the minimum set of those to get the SystemZ test cases to work (at least generating the same code as the non-strict versions for extremely simple use cases).

Diff Detail

Event Timeline

uweigand created this revision.Mon, Oct 21, 2:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Oct 21, 2:11 PM

I did not actually require any TableGen changes, but was able to represent the overloaded intrinsic types using existing mechanisms. This solution looks correct to me, but if I'm missing anything here, please let me know ...

That sounds right. LLVMScalarOrSameVectorWidth did not exist when D54649 was proposed, IINM.

This adds a full set of compare intrinsics for all comparison codes.

There's really no reasonable alternative, correct? I believe that negating < and > would get messy wrt signals when a NaN is present. I don't think that savings is worth the effort (e.g. readability).

The X86 back-end changes are incomplete, they're right now the bare minimum to keep Cameron's two test cases working.

I'd be ok with leaving these changes to a later patch. Your call...

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7073 ↗(On Diff #225954)

These DAG.getNode changes could probably be broken out into a standalone Diff.

lib/Target/SystemZ/SystemZISelLowering.cpp
2171 ↗(On Diff #225954)

This is probably worthy of a comment.

2655 ↗(On Diff #225954)

I notice the static_cast<SystemZISD::NodeType> has been dropped. Is this ok?

lib/Target/SystemZ/SystemZISelLowering.h
248 ↗(On Diff #225954)

preoduce typo could use a fix.

uweigand updated this revision to Diff 227715.Mon, Nov 4, 8:55 AM
uweigand changed the repository for this revision from rL LLVM to rG LLVM Github Monorepo.

I did not actually require any TableGen changes, but was able to represent the overloaded intrinsic types using existing mechanisms. This solution looks correct to me, but if I'm missing anything here, please let me know ...

That sounds right. LLVMScalarOrSameVectorWidth did not exist when D54649 was proposed, IINM.

OK, thanks for your review!

This adds a full set of compare intrinsics for all comparison codes.

There's really no reasonable alternative, correct? I believe that negating < and > would get messy wrt signals when a NaN is present. I don't think that savings is worth the effort (e.g. readability).

Agreed.

The X86 back-end changes are incomplete, they're right now the bare minimum to keep Cameron's two test cases working.

I'd be ok with leaving these changes to a later patch. Your call...

OK, I'm dropping the X86-related changes for now.

Due to the switch to the github monorepo, it seems all the inline comments disappeared -- sorry for that.

Anyway, I believe I've anyway addressed them all:

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7073
These DAG.getNode changes could probably be broken out into a standalone Diff.

Agreed. Since this is just a simple refactoring, I've checked this in as 664f84e246478db82be2871f36fd1a523d9f2731.

lib/Target/SystemZ/SystemZISelLowering.cpp:2171
This is probably worthy of a comment.

Agreed. Added comment.

lib/Target/SystemZ/SystemZISelLowering.cpp:2655
I notice the static_cast<SystemZISD::NodeType> has been dropped. Is this ok?

Since the return value of that function was just plain int anyway, that static_cast didn't really have any point as far as I can see ...

lib/Target/SystemZ/SystemZISelLowering.h:248
preoduce typo could use a fix.

Checked in separately as d4a7855b68d4d53f121209333d5f2796731ab1f5

simoll added a subscriber: simoll.Mon, Nov 4, 11:26 AM
pengfei added a subscriber: pengfei.Mon, Nov 4, 5:12 PM

This looks like a very good start. I've talked to @pengfei about the x86 backend support for this. As long as x86 doesn't fail horribly, I'd be OK with the x86 work being done in a separate patch that depends on this one.

Do you have a plan for handling quiet versus signaling predicates?

Hmm, something is weird (with my Diff, at least). It looks like changes are listed twice (e.g. lib/IR/Verifier.cpp and llvm/lib/IR/Verifier.cpp). Maybe this is an SVN->GIT side-effect and the patch needs a rebase?

Hmm, something is weird (with my Diff, at least). It looks like changes are listed twice (e.g. lib/IR/Verifier.cpp and llvm/lib/IR/Verifier.cpp). Maybe this is an SVN->GIT side-effect and the patch needs a rebase?

In the current diff I only see the new llvm/lib/... paths. It is indeed the case that the diff I submitted originally had the SVN-style lib/... paths, but I updated it with a new patch after the GIT transition.

Oh, it's gone now. Maybe it was a temporary hiccup. Will review this afternoon...

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1420

This throws away strict-ness for the expanded scalar selects. Is that deliberate?

I would have expected this to produce scalar STRICT_FSETCC nodes instead.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3832

This else can be removed.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
2514

Missing space before ?.

uweigand marked an inline comment as done.Tue, Nov 12, 8:26 AM

Thanks for the review!

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1420

I don't see how this throws away strict-ness. Note that we do produce scalar STRICT_FSETCC nodes (that's in the ScalarOp = DAG.getNode(Op->getOpcode() ...) line). The additional select just ensures the (integral) output has the same value that a vector SETCC would have, instead of the (difference) value that a scalar SETCC has. Since this is completely an integer operation, strict-ness doesn't apply.

cameron.mcinally marked an inline comment as done.Tue, Nov 12, 8:35 AM
cameron.mcinally added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1420

Oh, I see it now. I misread this change...

simoll added inline comments.Tue, Nov 12, 8:58 AM
llvm/include/llvm/IR/Intrinsics.td
750–791

Out of curiosity: what is your motivation to have one intrinsic per fcmp predicate instead of, say, an i8 immarg?

uweigand marked an inline comment as done.Tue, Nov 12, 9:14 AM
uweigand added inline comments.
llvm/include/llvm/IR/Intrinsics.td
750–791

I didn't really make the decision, just took it over from the original patch. But I agree it seems to make sense simply from a perspective of being explicit in the IR: if there's just a random numeric value, it would make the IR harder to read/write.

llvm/include/llvm/IR/Intrinsics.td
750–791

It would be hard to read -- unless we wrote an IR decoding routine somewhere. That seems weird though.

llvm/include/llvm/IR/Intrinsics.td
750–791

This is why we used metadata strings for the constraint arguments. I'm still not entirely satisfied with that. Would it be reasonable to make these arguments constants and add a custom printer and parser for these intrinsics? What about token arguments?

simoll added inline comments.Wed, Nov 13, 1:40 AM
llvm/include/llvm/IR/Intrinsics.td
750–791

This is why we used metadata strings for the constraint arguments. I'm still not entirely satisfied with that.

You could invent some fpenv operand bundle for fpexcept, fpround, as in:

%p = llvm.experimental.constrained.fcmp_ugt(%x, %y) [ "fpenv"(metadata !"fpround.tonearest", metadata !"fpexcept.strict") ]

Omission would then imply the default fp env (i'd like that for the VP fp intrinsics).

From https://llvm.org/docs/LangRef.html#operand-bundles

An operand bundle at a call site cannot change the implementation of the called function.

I am not sure what to make out of that statement. I guess it should be ok to also use operand bundles also for the comparison predicate? Otw, it might be possible to change to OpBound specification to allow different implementations.

Personally, i'd prefer we had a more flexible attribute mechanism, so we could tag-on all of these things:

%p = fcmp %x, %y cmp(ugt) fpround(tonearest) fpexcept(strict)
llvm/include/llvm/IR/Intrinsics.td
750–791

I like the idea of operand bundles for the constraint arguments. They seem naturally separate. I'm less comfortable with the comparison predicate being in the operand bundle because it is essential to the semantic meaning of the operation.

I started an experiment yesterday using token arguments instead of metadata to represent the rounding mode and exception behavior. I can easily shift the tokens to an operand bundle. I think we can use a token argument for the fcmp intrinsic predicate. I'll post an RFC to the mailing list soon to solicit opinions about using token arguments this way.

Regarding more flexible attributes, the problem we were trying to solve with the intrinsics was introducing new semantics in a way that wouldn't break existing optimizations that didn't know about the new semantics. Once these intrinsics are well established and well supported we might be able to merge their behavior back into the instructions.

llvm/include/llvm/IR/Intrinsics.td
750–791

I've just uploaded D70261, showing what I have in mind for implementing the FP constraints as tokens in an operand bundle. I think the fcmp predicates could be represented by a new token type in the same way, but as I said previously I'd prefer the predicate to be a proper argument to the constrained fcmp intrinsic.