This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize minnum/maxnum with 'nnan' to fcmp+select
AbandonedPublic

Authored by spatel on May 20 2019, 1:07 PM.

Details

Summary

This is an atypical proposal because we are converting 1 instruction (intrinsic) into 2 instructions, but this allows better analysis via ValueTracking's matchSelectPattern().

Also, see this discussion for a larger potential optimization that would be made easier to match if we canonicalize to fcmp+select:
https://bugs.llvm.org/show_bug.cgi?id=37403#c6

There's a related codegen problem here:
https://bugs.llvm.org/show_bug.cgi?id=34149
We've already solved that for x86 at least, but this IR transform should help more generally.

Finally, I don't know of any current downside to converting this to fcmp+select, but it's another case where we should have FMF on the select:
D61917

Diff Detail

Event Timeline

spatel created this revision.May 20 2019, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2019, 1:07 PM

Typo 'nan' in the description

spatel retitled this revision from [InstCombine] canonicalize minnum/maxnum with 'nan' to fcmp+select to [InstCombine] canonicalize minnum/maxnum with 'nnan' to fcmp+select.May 20 2019, 1:37 PM

I'm not sure I see the ValueTracking advantage here. Can you add a test where this matters? I wouldn't expect matchSelectPattern to give any advantage here.

Also last I knew, the use of matchSelectPattern in SelectionDAGBuilder was somewhat buggy. It leaves behind a dead use, so in the initial combine, it breaks hasOneUse checks and prevents necessary early matching. There was also a second problem that I don't remember the details of

I'm not sure I see the ValueTracking advantage here. Can you add a test where this matters? I wouldn't expect matchSelectPattern to give any advantage here.

The follow-on transform that I'm thinking about is mentioned in PR37403 - if we have an offset or some other math op on both operands of a min/max, we should be able to optimize that away:

define double @foo(double, double, double) local_unnamed_addr #0 {
  %4 = fadd fast double %2, %0
  %5 = fadd fast double %2, %1
  %6 = tail call fast double @llvm.maxnum.f64(double %4, double %5)
  ret double %6
}
-->
define double @foo1(double, double, double) local_unnamed_addr #0 {
  %3 = tail call fast double @llvm.maxnum.f64(double %0, double %1)
  %4 = fadd fast double %2, %3
  ret double %4
}

We also miss this with integer types:

define i32 @via_icmp(i32 %z, i32 %x, i32 %y) {
  %xz = add nsw i32 %x, %z   ; the adds must be 'nsw'
  %yz = add nsw i32 %y, %z
  %c = icmp slt i32 %xz, %yz
  %r = select i1 %c, i32 %xz, i32 %yz
  ret i32 %r
}

So there's potentially some overlap in how we match those patterns if we use matchSelectPattern(). Now, when I commented on that problem, we had codegen problems too, but I think we've overcome them. So I'd be fine with canonicalizing in the other direction:
fcmp+select --> minnum/maxnum
...but I think that's gated on D61917 because we need 'nsz'.

Also last I knew, the use of matchSelectPattern in SelectionDAGBuilder was somewhat buggy. It leaves behind a dead use, so in the initial combine, it breaks hasOneUse checks and prevents necessary early matching. There was also a second problem that I don't remember the details of

Ok, I didn't know there was a problem there. Do we have a bug report with an example?

Also last I knew, the use of matchSelectPattern in SelectionDAGBuilder was somewhat buggy. It leaves behind a dead use, so in the initial combine, it breaks hasOneUse checks and prevents necessary early matching. There was also a second problem that I don't remember the details of

Ok, I didn't know there was a problem there. Do we have a bug report with an example?

No. IIRC I ran into this during r344914. The FIXME for the regression on reduction_fast_min_pattern_v4f16 in test/CodeGen/AMDGPU/reduction.ll

lebedev.ri resigned from this revision.Jun 21 2019, 10:58 AM

Not great with floating-point, don't think i can provide any feedback here, sorry.

cameron.mcinally added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2310

FYI that a draft of IEEE-754 2018 defines the (+/-)0.0 case:

minimumNumber(x, y) is x if x < y, y if y < x, and the number if one operand is a number and the other is a NaN. For this operation, −0 compares less than +0. If x = y and signs are the same it is either x or y. If both operands are NaNs, a quiet NaN is returned, according to 6.2. If either operand is a signaling NaN, an invalid operation exception is signaled, but unless both operands
are NaNs, the signaling NaN is otherwise ignored and not converted to a quiet NaN as stated in 6.2 for other operations.
lebedev.ri marked an inline comment as done.Jun 21 2019, 11:27 AM
lebedev.ri added a subscriber: lebedev.ri.
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2310

LLVM internals mostly follow LLVM LangRef, which currently does not state that;
i guess you'd first want to clarify LangRef.
https://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic

spatel abandoned this revision.Jun 21 2019, 11:35 AM

Sorry - I should have abandoned this. There's some patch spaghetti here.

I think we will go with the more conventional approach of 'less-instructions-is-more-canonical':
D62414 - so the opposite direction of this.

That patch could get committed now because it doesn't have a direct dependency on anything, but I was hoping for a re-ruling on D63214 which was itself dependent on the now abandoned D63294.