This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
ClosedPublic

Authored by nikic on Mar 18 2019, 12:42 PM.

Details

Summary

Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This was suggested by @spatel as an alternative approach to D59378. I've also added the infinite looping test from that revision here.

My main concern here would be compile time, I don't really have a handle on how expensive matchSelectPattern() is and whether it's appropriate to use in code used by InstSimplify.

Diff Detail

Repository
rL LLVM

Event Timeline

nikic created this revision.Mar 18 2019, 12:42 PM
lebedev.ri added inline comments.Mar 18 2019, 12:58 PM
llvm/lib/Analysis/ValueTracking.cpp
5663 ↗(On Diff #191148)

matchSelectPattern() is the likely-costly bit.
(though it seems to be only a lot of pattern-matching, nothing more costly)

The rest of setLimitsForSelect() is rater cheap.

5686 ↗(On Diff #191148)

No SPF_ABS / SPF_[N]ABS ?

nikic marked an inline comment as done.Mar 18 2019, 1:13 PM
nikic added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
5686 ↗(On Diff #191148)

Those wouldn't match because I'm bailing out early if I don't have a Constant operand.

However, I just found that InstructionSimplify already has handling for ABS/NABS in simplifyICmpWithAbsNabs(). I could just move that in here, save the duplicate matchSelectPattern() call, and also remove the performance concern (because we were already calling matchSelectPattern(), not it's just in a more general place...)

nikic updated this revision to Diff 191159.Mar 18 2019, 1:28 PM

Move simplifyICmpWithAbsNabs() logic into computeConstantRange(). This eliminates the duplicate matchSelectPattern() call, and is more general (e.g. we can also benefit from this for the constant range based overflow checks).

lebedev.ri added inline comments.Mar 18 2019, 1:28 PM
llvm/lib/Analysis/ValueTracking.cpp
5686 ↗(On Diff #191148)

Sounds good (maybe new-pm will magically make it easier to cache such things?),
but i'm not sure how that would look.
How about you reverse the order of patches slightly, first do "just move that in here",
and base this patch ontop of that new patch?

If we want to be conservative for compile-time, we could use the simple pattern matchers (m_SMax...) rather than the heavier ValueTracking call. But we don't have that option currently for abs/nabs.
It seems like we've accomplished the improvement for almost no extra cost though, so that's probably a moot point now.

llvm/test/Transforms/InstSimplify/cmp_of_min_max.ll
4 ↗(On Diff #191159)

The min and max names for these tests seem inverted (ugt -> umax)?

lebedev.ri accepted this revision.Mar 18 2019, 1:50 PM
lebedev.ri marked an inline comment as done.

Nice, no performance concerns anymore, looks reasonable to me.

In general, i really like that this isn't being solved by yet another monstrous
pattern-matching (well, excluding the internals of matchSelectPattern():)).

Is the test coverage sufficient?

llvm/lib/Analysis/ValueTracking.cpp
5686 ↗(On Diff #191148)

Posted before refreshing page, nvm.

5656 ↗(On Diff #191159)

Select isn't descriptive enough.
I'd go with SelectPattern, as this is not about the select inst, but about some specific pattern flavors.

5689–5692 ↗(On Diff #191159)

Might be best not to hope that the other APInt is set to zero already?

This revision is now accepted and ready to land.Mar 18 2019, 1:50 PM
nikic marked 2 inline comments as done.Mar 18 2019, 2:01 PM
nikic added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
5689–5692 ↗(On Diff #191159)

This is the general convention in this code (the other two setLimitsFor functions): For unsigned ranges only one side is specified, because Lower/Upper are pre-initialized to a full unsigned range. For signed both have to be specified.

llvm/test/Transforms/InstSimplify/cmp_of_min_max.ll
4 ↗(On Diff #191159)

You're right, min/max should be swapped in these tests.

This revision was automatically updated to reflect the committed changes.
nikic reopened this revision.Mar 18 2019, 3:29 PM

Reverted this in rL356424 due to AMDGPU smed3.ll and umed3.ll test failures. Likely some constants will need to be adjusted to avoid always-true/false conditions.

This revision is now accepted and ready to land.Mar 18 2019, 3:29 PM
nikic planned changes to this revision.Mar 18 2019, 3:29 PM
nikic updated this revision to Diff 191358.Mar 19 2019, 11:19 AM

Update AMDGPU tests, update for more InstCombine test changes.

This revision is now accepted and ready to land.Mar 19 2019, 11:19 AM

I've updated the AMDGPU tests to check for min(max(x, 17), 12) being constant folded to 12. The reverse test for min(max(x, 12), 17) already exists, so it doesn't seem like adjusting those constants to prevent the constant folding would make sense.

Additionally there are two more InstCombine test changes, in sub.ll and fold-minmax.ll. This is because computeConstantRange() has an additional user since D59386, so we now also see this showing up in nuw inference.

lebedev.ri added inline comments.Mar 19 2019, 11:23 AM
llvm/test/CodeGen/AMDGPU/smed3.ll
47 ↗(On Diff #191358)

It's impossible to tell from these partial check-lines, but the diff looks like an improvement?

arsenm added a subscriber: arsenm.Mar 19 2019, 11:26 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

The test should be fixed so this is still testing what it intends to

nikic marked an inline comment as done.Mar 19 2019, 12:02 PM
nikic added inline comments.
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

From what I understand, the test is here to ensure that a "clamp" pattern with swapped constants is not incorrectly compiled to a med3 instruction. After this change this kind of pattern is always constant folded by InstSimplify. From the debug log, in this case the simplification is caused by an EarlyCSE pass added by the AMDGPU target. I haven't found a way to disable it.

Do you have any ideas/pointers on how this pattern can be preserved all the way down to isel?

lebedev.ri added inline comments.Mar 19 2019, 12:16 PM
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

The *test* needs to be modified, supposedly.
I.e. just write another test that would still produce the "original output".

nikic marked an inline comment as done.Mar 19 2019, 1:03 PM
nikic added inline comments.
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

Yes, I'm just not sure how to do that. What we need is to get something like

  t36: i32 = umax t44, Constant:i32<17>
t40: i32 = umin t36, Constant:i32<12>

during isel without anything optimizing it away before that. For the unsigned case I can use something like

define amdgpu_kernel void @v_test_umed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
  %tid = call i32 @llvm.amdgcn.workitem.id.x()
  %gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
  %outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
  %a = load i32, i32 addrspace(1)* %gep0

  %icmp0 = icmp ugt i32 %a, 17
  %i0 = select i1 %icmp0, i32 %a, i32 17

  %sat = call i32 @llvm.uadd.sat(i32 %i0, i32 -13)

  store i32 %sat, i32 addrspace(1)* %outgep
  ret void
}
declare i32 @llvm.uadd.sat(i32, i32)

because uaddsat x, -13 will be expanded using a umax x, 12 as first instruction on AMDGPU. I don't have any ideas for the signed case though. And relying on expansion details seems like a bad idea anyway.

lebedev.ri added inline comments.Mar 19 2019, 1:17 PM
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

Can this be replaced with MIR test perhaps? Then you shouldn't run the whole pipeline.

nikic marked an inline comment as done.Mar 20 2019, 11:48 AM
nikic added inline comments.
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

Not really familiar with MIR, but I think it is only created after isel, so it would be too late in the pipeline for this test.

lebedev.ri added inline comments.Mar 21 2019, 5:26 AM
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

Hmm, i'm out of the ideas presently then.
@arsenm any suggestions as to *how* to harden the test?

arsenm added inline comments.Mar 25 2019, 7:50 AM
llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗(On Diff #191358)

I've used intrinsics that aren't constant folded by instsimplify before for this purpose. llvm.amdgcn.groupstaticsize is expanded too late for this, and I'm not sure what other good candidates are there. Overall I find the fact that llc runs InstSimplify problematic for this reason

Maybe you could try adding an extra use of something that's dead which will be deleted during lowering? llvm.amdgcn.icmp with an out of bounds last operand might work

So something like
%use = call i64 @llvm.amdgcn.icmp.i64(i32 %select, i32 %0, i32 99999)
store i64 %use, ....
nikic updated this revision to Diff 192308.Mar 26 2019, 11:47 AM
nikic added a reviewer: arsenm.

Add -amdgpu-scalar-ir-passes flag and split off the problematic umed3/smed3 tests into a separate file with -amdgpu-scalar-ir-passes=false. It's not possible to disable this for the whole umed3/smed3 tests because the scalar IR passes are needed for the tests that involve inlining.

lebedev.ri added inline comments.Mar 29 2019, 8:46 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
161–165 ↗(On Diff #192308)

@arsenm are you ok with this workaround?

nikic requested review of this revision.Apr 4 2019, 8:11 AM
nikic added a comment.EditedApr 6 2019, 8:37 AM

Is it possible to land this with the AMDGPU tests marked as XFAIL, so that a maintainer can adjust them at their convenience?

arsenm accepted this revision.Apr 6 2019, 9:26 AM

LGTM, although if -O0 works that would be preferable to introducing a new pass control flag

This revision is now accepted and ready to land.Apr 6 2019, 9:26 AM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.Apr 7 2019, 10:35 AM

LGTM, although if -O0 works that would be preferable to introducing a new pass control flag

Gave this a try, with -O0 I end up with both v_max/v_min operands being non-immediate.