This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Canonicalize clamp of float types to minmax in fast mode.
ClosedPublic

Authored by a.elovikov on May 15 2017, 4:25 AM.

Details

Summary

This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.

This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.

Diff Detail

Repository
rL LLVM

Event Timeline

a.elovikov created this revision.May 15 2017, 4:25 AM
jmolloy edited edge metadata.May 15 2017, 4:38 AM

Hi,

On the other hand, some backends (e.g. X86) have only

FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening.

For my own reference so I can more effectively review this, could you please explain this a bit more? What is the defined behaviour for such instructions when given a NaN?

a.elovikov added a comment.EditedMay 15 2017, 4:52 AM

Hi,

On the other hand, some backends (e.g. X86) have only

FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening.

For my own reference so I can more effectively review this, could you please explain this a bit more? What is the defined behaviour for such instructions when given a NaN?

I was referring to the following comment from the X86ISelLowering.cpp (combineFMinNumFMaxNum routine):

// There are 4 possibilities involving NaN inputs, and these are the required
// outputs:
//                   Op1
//               Num     NaN
//            ----------------
//       Num  |  Max  |  Op0 |
// Op0        ----------------
//       NaN  |  Op1  |  NaN |
//            ----------------
//
// The SSE FP max/min instructions were not designed for this case, but rather
// to implement:
//   Min = Op1 < Op0 ? Op1 : Op0
//   Max = Op1 > Op0 ? Op1 : Op0
//
// So they always return Op0 if either input is a NaN. However, we can still
// use those instructions for fmaxnum by selecting away a NaN input.

Right; that's X86's strange not-a-maxnum-not-a-maxnan behaviour where it just selects the zeroth operand.

You should be able to select that for either of FMAXNAN or FMAXNUM though when the "nnan" flag is set on the instruction, which should be the case in fast-math mode?

Right; that's X86's strange not-a-maxnum-not-a-maxnan behaviour where it just selects the zeroth operand.

You should be able to select that for either of FMAXNAN or FMAXNUM though when the "nnan" flag is set on the instruction, which should be the case in fast-math mode?

Yes, but they're not selected at all in SelectionDAGBuilder::visitSelect because ISD::FMINNAN/FMINNUM are not isOperationLegalOrCustom for X86 target.
And stating that they are would be incorrect. So the only way to get X86::FMIN/FMAX would be to fix combineSelect in X86ISelLowering.cpp in the same way that I've done for IR. I believe that such manipulation is better be done on IR than on SDNodes

Hi,

And stating that they are would be incorrect.

Why wouldn't you state that they are custom lowered, and copy the generic bailout code from SelectionDAGBuilder.cpp if "nnan" isn't set?

James

a.elovikov added a comment.EditedMay 22 2017, 3:32 AM

Why wouldn't you state that they are custom lowered, and copy the generic bailout code from SelectionDAGBuilder.cpp if "nnan" isn't set?

If I understand correctly, making that "custom" would have negative impact in the presence of nans because cmp/select sequence would be turned to min/max+cmp(for nans)+select. And I have not seen the cases where lowering is set to custom due to global fast math flags. Is that really an option?

One option can be to check for nnan flags in *both* SelectionDagBuilder and combineSelect but that would add an implicit dependency between them which does not look right.

spatel edited edge metadata.Jun 14 2017, 10:10 AM

If there are no objections, I would add the regression tests with the current output as a preliminary step, so we document the current behavior.

@jmolloy - do you plan to continue the review of this patch?

Adding more potential reviewers.

@a.elovikov - I didn't find your name in bugzilla; you might be interested in:
https://bugs.llvm.org/show_bug.cgi?id=33467

a.elovikov updated this revision to Diff 103044.EditedJun 19 2017, 8:05 AM
Re-base onto D34350 where the current status of the tests was captured.

That did not work as intended, sorry for the noise. :(

Now, properly re-base onto D34350 where the current status of the tests was captured.

efriedma added inline comments.Jun 23 2017, 5:09 PM
llvm/lib/Analysis/ValueTracking.cpp
3948 ↗(On Diff #103051)

Missing close-paren? Also, MaxMin and MinMax are a little confusing; just write out one, I think

3951 ↗(On Diff #103051)

"inversed" isn't a word.

3957 ↗(On Diff #103051)

"Assume success" is a little confusing... maybe something more like "Return the LHS and RHS early".

4206 ↗(On Diff #103051)

Need an explanation here describing why signed zeros matter. (Something about the sign of the result?)

llvm/test/Transforms/InstCombine/clamp-to-minmax.ll
146 ↗(On Diff #103051)

"dit"?

  • Update comments as requested in the review.
a.elovikov marked 4 inline comments as done.Jun 26 2017, 3:33 AM
a.elovikov added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3957 ↗(On Diff #103051)

This is copy-paste from the integer case in function matchMinMax below. Do you want me to change both places? Also, in the integer case the intent was pretty clear for me with no confusion.

Properly update keeping the dependence on D34350.

efriedma added inline comments.Jun 26 2017, 5:12 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1329 ↗(On Diff #103921)

This change isn't guarded for floating-point operations in particular... does this have an impact on integer clamp? If not, why?

a.elovikov added inline comments.Jun 27 2017, 2:00 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1329 ↗(On Diff #103921)

Yes, I missed that. This does have impact on integer clamp but I think it won't matter for end-to-end compilation because before this change integer cases were already handled during ISel. Anyway, there are two options here:

  • Limit the transformation for floating-point operations only, like this
  • Leave the code as is and modify title/summary + add tests to document the change in the behavior for integers.

I would prefer to go with the first approach and do the change for integers in a separate review, if necessary. Unless someone has a preference for the second approach, of course.

efriedma edited edge metadata.Jun 27 2017, 11:05 AM

I'm okay with limiting it to FP vectors for now.

I'm okay with limiting it to FP vectors for now.

I believe for scalars it can be beneficial too, similar to cmov vs. branch.

Err, sorry, typo. Yes, it's fine to limit it to FP scalars and vectors.

Limit clamp->min/max transformation to float types only.

a.elovikov marked an inline comment as done.Jun 27 2017, 2:04 PM
efriedma accepted this revision.Jun 27 2017, 2:15 PM

LGTM.

This revision is now accepted and ready to land.Jun 27 2017, 2:15 PM
This revision was automatically updated to reflect the committed changes.
a.elovikov reopened this revision.Jul 3 2017, 1:28 AM
This revision is now accepted and ready to land.Jul 3 2017, 1:28 AM
a.elovikov updated this revision to Diff 105035.Jul 3 2017, 1:29 AM
  • Fix UBSan error after D33186/r306525.

Hi @efriedma, original commit caused the failure under UBSan. The latest uploaded revision fixes it.
Can you please take a look and say if it's ok?

Thanks,
Andrei

a.elovikov requested review of this revision.Jul 3 2017, 1:32 AM
a.elovikov edited edge metadata.
efriedma added inline comments.Jul 3 2017, 11:28 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1326 ↗(On Diff #105035)

This is very suspicious.

The way the implementation of matchSelectPattern is written, CastOp is uninitialized if the type of the compare matches the type of the select; otherwise, it's set to whatever cast we looked through. That cast might not be a cast which changes the size of the type; it could bit a BitCast/FPToUI/etc.

I'd like to see a few testcases which cover the situations where we insert casts.

a.elovikov added inline comments.Jul 3 2017, 12:55 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1326 ↗(On Diff #105035)

Yes, this exactly the case causing UB - passing uninitialized object by value to createCast Corresponding argument is not even used in that function because of the early return when the types are equal.

I believe the case with cast should be already covered by earlier tests as I did not touch that part - will try to find them tomorrow, or add new ones if they're missing.

a.elovikov added inline comments.Jul 3 2017, 12:59 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1326 ↗(On Diff #105035)

One more note - neither Aslan nor MSan catches this, only UBSan because the value of the uninitialized argument to createCast is not used.

efriedma added inline comments.Jul 3 2017, 1:05 PM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1326 ↗(On Diff #105035)

I believe the case with cast should be already covered by earlier tests as I did not touch that part - will try to find them tomorrow, or add new ones if they're missing.

I'm specifically concerned about cases where transforming a floating-point clamp requires inserting a cast.

a.elovikov updated this revision to Diff 105234.Jul 5 2017, 2:23 AM

Compare types instead of their sizes to decide if CastInst is needed.

Note, that casts of "inner" min/max prohibit clamp pattern recognition
because the casts are being looked by one level of use only, and to
get the original value being clamped two levels of look through are
required.

I'd like to see a testcase where CastOp is fptosi.

a.elovikov updated this revision to Diff 108446.EditedJul 27 2017, 4:22 AM

Rebased to updated D35002 where fptosi tests were added. That also included re-base to the current master.
No changes in the instcombine results for these added tests though.

This revision is now accepted and ready to land.Jul 27 2017, 11:19 AM
This revision was automatically updated to reflect the committed changes.