This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/
-
Analysis/
-
ValueTracking.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUTargetMachine.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
med3-no-simplify.ll
-
smed3.ll
-
umed3.ll
-
Transforms/
-
InstCombine/
-
minmax-fold.ll
-
sub.ll
-
InstSimplify/
-
cmp_of_min_max.ll

Differential D59506

[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
ClosedPublic

Authored by nikic on Mar 18 2019, 12:42 PM.

Download Raw Diff

Details

Reviewers

spatel
tejohnson
lebedev.ri
arsenm

Commits

rG3db93ac5d6d0: Reapply [ValueTracking] Support min/max selects in computeConstantRange()
rL357870: Reapply [ValueTracking] Support min/max selects in computeConstantRange()
rG106f0cdefb02: [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
rL356415: [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()

Summary

Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This was suggested by @spatel as an alternative approach to D59378. I've also added the infinite looping test from that revision here.

My main concern here would be compile time, I don't really have a handle on how expensive matchSelectPattern() is and whether it's appropriate to use in code used by InstSimplify.

Diff Detail

Repository: rL LLVM

Event Timeline

nikic created this revision.Mar 18 2019, 12:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2019, 12:42 PM

Herald added subscribers: llvm-commits, jdoerfert, hiraditya. · View Herald Transcript

nikic mentioned this in D59378: [InstCombine] Prevent icmp transform that can cause inf loop if part of min/max.Mar 18 2019, 12:45 PM

lebedev.ri added inline comments.Mar 18 2019, 12:58 PM

llvm/lib/Analysis/ValueTracking.cpp
5663 ↗	(On Diff #191148)	`matchSelectPattern()` is the likely-costly bit. (though it seems to be only a lot of pattern-matching, nothing more costly) The rest of `setLimitsForSelect()` is rater cheap.
5686 ↗	(On Diff #191148)	No `SPF_ABS` / `SPF_[N]ABS` ?

nikic marked an inline comment as done.Mar 18 2019, 1:13 PM

nikic added inline comments.

llvm/lib/Analysis/ValueTracking.cpp
5686 ↗	(On Diff #191148)	Those wouldn't match because I'm bailing out early if I don't have a Constant operand. However, I just found that InstructionSimplify already has handling for ABS/NABS in simplifyICmpWithAbsNabs(). I could just move that in here, save the duplicate matchSelectPattern() call, and also remove the performance concern (because we were already calling matchSelectPattern(), not it's just in a more general place...)

Move simplifyICmpWithAbsNabs() logic into computeConstantRange(). This eliminates the duplicate matchSelectPattern() call, and is more general (e.g. we can also benefit from this for the constant range based overflow checks).

lebedev.ri added inline comments.Mar 18 2019, 1:28 PM

llvm/lib/Analysis/ValueTracking.cpp
5686 ↗	(On Diff #191148)	Sounds good (maybe new-pm will magically make it easier to cache such things?), but i'm not sure how that would look. How about you reverse the order of patches slightly, first do "just move that in here", and base this patch ontop of that new patch?

If we want to be conservative for compile-time, we could use the simple pattern matchers (m_SMax...) rather than the heavier ValueTracking call. But we don't have that option currently for abs/nabs.
It seems like we've accomplished the improvement for almost no extra cost though, so that's probably a moot point now.

llvm/test/Transforms/InstSimplify/cmp_of_min_max.ll
4 ↗	(On Diff #191159)	The min and max names for these tests seem inverted (ugt -> umax)?

Base on D59511.

nikic added a parent revision: D59511: [ValueTracking][InstSimplify] Move abs handling into computeConstantRange(); NFC.Mar 18 2019, 1:49 PM

Nice, no performance concerns anymore, looks reasonable to me.

In general, i really like that this isn't being solved by yet another monstrous
pattern-matching (well, excluding the internals of matchSelectPattern():)).

Is the test coverage sufficient?

llvm/lib/Analysis/ValueTracking.cpp
5686 ↗	(On Diff #191148)	Posted before refreshing page, nvm.
5656 ↗	(On Diff #191159)	`Select` isn't descriptive enough. I'd go with `SelectPattern`, as this is not about the `select` inst, but about some specific pattern flavors.
5689–5692 ↗	(On Diff #191159)	Might be best not to hope that the other APInt is set to zero already?

This revision is now accepted and ready to land.Mar 18 2019, 1:50 PM

nikic marked 2 inline comments as done.Mar 18 2019, 2:01 PM

nikic added inline comments.

llvm/lib/Analysis/ValueTracking.cpp
5689–5692 ↗	(On Diff #191159)	This is the general convention in this code (the other two setLimitsFor functions): For unsigned ranges only one side is specified, because Lower/Upper are pre-initialized to a full unsigned range. For signed both have to be specified.
llvm/test/Transforms/InstSimplify/cmp_of_min_max.ll
4 ↗	(On Diff #191159)	You're right, min/max should be swapped in these tests.

nikic mentioned this in rL356408: [InstSimplify] Add additional icmp of min/max tests; NFC.Mar 18 2019, 2:18 PM

nikic mentioned this in rL356409: [ValueTracking][InstSimplify] Move abs handling into computeConstantRange(); NFC.

nikic mentioned this in rG05baa9ee1aea: [InstSimplify] Add additional icmp of min/max tests; NFC.

nikic mentioned this in rGf89343bc47dc: [ValueTracking][InstSimplify] Move abs handling into computeConstantRange(); NFC.

Closed by commit rL356415: [ValueTracking][InstSimplify] Support min/max selects in computeConstantRange() (authored by nikic). · Explain WhyMar 18 2019, 2:34 PM

This revision was automatically updated to reflect the committed changes.

Reverted this in rL356424 due to AMDGPU smed3.ll and umed3.ll test failures. Likely some constants will need to be adjusted to avoid always-true/false conditions.

This revision is now accepted and ready to land.Mar 18 2019, 3:29 PM

nikic planned changes to this revision.Mar 18 2019, 3:29 PM

Update AMDGPU tests, update for more InstCombine test changes.

This revision is now accepted and ready to land.Mar 19 2019, 11:19 AM

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptMar 19 2019, 11:19 AM

I've updated the AMDGPU tests to check for min(max(x, 17), 12) being constant folded to 12. The reverse test for min(max(x, 12), 17) already exists, so it doesn't seem like adjusting those constants to prevent the constant folding would make sense.

Additionally there are two more InstCombine test changes, in sub.ll and fold-minmax.ll. This is because computeConstantRange() has an additional user since D59386, so we now also see this showing up in nuw inference.

lebedev.ri added inline comments.Mar 19 2019, 11:23 AM

llvm/test/CodeGen/AMDGPU/smed3.ll
47 ↗	(On Diff #191358)	It's impossible to tell from these partial check-lines, but the diff looks like an improvement?

arsenm added a subscriber: arsenm.Mar 19 2019, 11:26 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	The test should be fixed so this is still testing what it intends to

nikic marked an inline comment as done.Mar 19 2019, 12:02 PM

nikic added inline comments.

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	From what I understand, the test is here to ensure that a "clamp" pattern with swapped constants is not incorrectly compiled to a `med3` instruction. After this change this kind of pattern is always constant folded by InstSimplify. From the debug log, in this case the simplification is caused by an EarlyCSE pass added by the AMDGPU target. I haven't found a way to disable it. Do you have any ideas/pointers on how this pattern can be preserved all the way down to isel?

lebedev.ri added inline comments.Mar 19 2019, 12:16 PM

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	The test needs to be modified, supposedly. I.e. just write another test that would still produce the "original output".

nikic marked an inline comment as done.Mar 19 2019, 1:03 PM

nikic added inline comments.

llvm/test/CodeGen/AMDGPU/smed3.ll

48 ↗

(On Diff #191358)

Yes, I'm just not sure how to do that. What we need is to get something like

  t36: i32 = umax t44, Constant:i32<17>
t40: i32 = umin t36, Constant:i32<12>

during isel without anything optimizing it away before that. For the unsigned case I can use something like

define amdgpu_kernel void @v_test_umed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
  %tid = call i32 @llvm.amdgcn.workitem.id.x()
  %gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
  %outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
  %a = load i32, i32 addrspace(1)* %gep0

  %icmp0 = icmp ugt i32 %a, 17
  %i0 = select i1 %icmp0, i32 %a, i32 17

  %sat = call i32 @llvm.uadd.sat(i32 %i0, i32 -13)

  store i32 %sat, i32 addrspace(1)* %outgep
  ret void
}
declare i32 @llvm.uadd.sat(i32, i32)

because uaddsat x, -13 will be expanded using a umax x, 12 as first instruction on AMDGPU. I don't have any ideas for the signed case though. And relying on expansion details seems like a bad idea anyway.

lebedev.ri added inline comments.Mar 19 2019, 1:17 PM

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	Can this be replaced with MIR test perhaps? Then you shouldn't run the whole pipeline.

nikic marked an inline comment as done.Mar 20 2019, 11:48 AM

nikic added inline comments.

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	Not really familiar with MIR, but I think it is only created after isel, so it would be too late in the pipeline for this test.

lebedev.ri added inline comments.Mar 21 2019, 5:26 AM

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	Hmm, i'm out of the ideas presently then. @arsenm any suggestions as to how to harden the test?

arsenm added inline comments.Mar 25 2019, 7:50 AM

llvm/test/CodeGen/AMDGPU/smed3.ll
48 ↗	(On Diff #191358)	I've used intrinsics that aren't constant folded by instsimplify before for this purpose. llvm.amdgcn.groupstaticsize is expanded too late for this, and I'm not sure what other good candidates are there. Overall I find the fact that llc runs InstSimplify problematic for this reason Maybe you could try adding an extra use of something that's dead which will be deleted during lowering? llvm.amdgcn.icmp with an out of bounds last operand might work So something like %use = call i64 @llvm.amdgcn.icmp.i64(i32 %select, i32 %0, i32 99999) store i64 %use, ....

Add -amdgpu-scalar-ir-passes flag and split off the problematic umed3/smed3 tests into a separate file with -amdgpu-scalar-ir-passes=false. It's not possible to disable this for the whole umed3/smed3 tests because the scalar IR passes are needed for the tests that involve inlining.

Herald added a subscriber: wdng. · View Herald TranscriptMar 26 2019, 11:47 AM

lebedev.ri added inline comments.Mar 29 2019, 8:46 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
161–165 ↗	(On Diff #192308)	@arsenm are you ok with this workaround?

nikic requested review of this revision.Apr 4 2019, 8:11 AM

Is it possible to land this with the AMDGPU tests marked as XFAIL, so that a maintainer can adjust them at their convenience?

LGTM, although if -O0 works that would be preferable to introducing a new pass control flag

This revision is now accepted and ready to land.Apr 6 2019, 9:26 AM

Closed by commit rL357870: Reapply [ValueTracking] Support min/max selects in computeConstantRange() (authored by nikic). · Explain WhyApr 7 2019, 10:20 AM

This revision was automatically updated to reflect the committed changes.

In D59506#1457375, @arsenm wrote:

LGTM, although if -O0 works that would be preferable to introducing a new pass control flag

Gave this a try, with -O0 I end up with both v_max/v_min operands being non-immediate.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

ValueTracking.cpp

23 lines

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

11 lines

test/

CodeGen/

AMDGPU/

med3-no-simplify.ll

48 lines

smed3.ll

19 lines

umed3.ll

19 lines

Transforms/

InstCombine/

minmax-fold.ll

31 lines

sub.ll

2 lines

InstSimplify/

cmp_of_min_max.ll

20 lines

Diff 194062

llvm/trunk/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 5,683 Lines • ▼ Show 20 Lines	static void setLimitsForSelectPattern(const SelectInst &SI, APInt &Lower,

if (R.Flavor == SelectPatternFlavor::SPF_NABS) {		if (R.Flavor == SelectPatternFlavor::SPF_NABS) {
// The result of -abs(X) is <= 0.		// The result of -abs(X) is <= 0.
Lower = APInt::getSignedMinValue(BitWidth);		Lower = APInt::getSignedMinValue(BitWidth);
Upper = APInt(BitWidth, 1);		Upper = APInt(BitWidth, 1);
return;		return;
}		}

// TODO Handle min/max flavors.		const APInt *C;
		if (!match(LHS, m_APInt(C)) && !match(RHS, m_APInt(C)))
		return;

		switch (R.Flavor) {
		case SPF_UMIN:
		Upper = *C + 1;
		break;
		case SPF_UMAX:
		Lower = *C;
		break;
		case SPF_SMIN:
		Lower = APInt::getSignedMinValue(BitWidth);
		Upper = *C + 1;
		break;
		case SPF_SMAX:
		Lower = *C;
		Upper = APInt::getSignedMaxValue(BitWidth) + 1;
		break;
		default:
		break;
		}
}		}

ConstantRange llvm::computeConstantRange(const Value *V, bool UseInstrInfo) {		ConstantRange llvm::computeConstantRange(const Value *V, bool UseInstrInfo) {
assert(V->getType()->isIntOrIntVectorTy() && "Expected integer instruction");		assert(V->getType()->isIntOrIntVectorTy() && "Expected integer instruction");

const APInt *C;		const APInt *C;
if (match(V, m_APInt(C)))		if (match(V, m_APInt(C)))
return ConstantRange(*C);		return ConstantRange(*C);
Show All 21 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableSIModeRegisterPass(
cl::Hidden);		cl::Hidden);

// Option is used in lit tests to prevent deadcoding of patterns inspected.		// Option is used in lit tests to prevent deadcoding of patterns inspected.
static cl::opt<bool>		static cl::opt<bool>
EnableDCEInRA("amdgpu-dce-in-ra",		EnableDCEInRA("amdgpu-dce-in-ra",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable machine DCE inside regalloc"));		cl::desc("Enable machine DCE inside regalloc"));

		static cl::opt<bool> EnableScalarIRPasses(
		"amdgpu-scalar-ir-passes",
		cl::desc("Enable scalar IR passes"),
		cl::init(true),
		cl::Hidden);

extern "C" void LLVMInitializeAMDGPUTarget() {		extern "C" void LLVMInitializeAMDGPUTarget() {
// Register the target		// Register the target
RegisterTargetMachine<R600TargetMachine> X(getTheAMDGPUTarget());		RegisterTargetMachine<R600TargetMachine> X(getTheAMDGPUTarget());
RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());		RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());

PassRegistry *PR = PassRegistry::getPassRegistry();		PassRegistry *PR = PassRegistry::getPassRegistry();
initializeR600ClauseMergePassPass(*PR);		initializeR600ClauseMergePassPass(*PR);
initializeR600ControlFlowFinalizerPass(*PR);		initializeR600ControlFlowFinalizerPass(*PR);
▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addIRPasses() {

if (TM.getOptLevel() > CodeGenOpt::None) {		if (TM.getOptLevel() > CodeGenOpt::None) {
addPass(createInferAddressSpacesPass());		addPass(createInferAddressSpacesPass());
addPass(createAMDGPUPromoteAlloca());		addPass(createAMDGPUPromoteAlloca());

if (EnableSROA)		if (EnableSROA)
addPass(createSROAPass());		addPass(createSROAPass());

		if (EnableScalarIRPasses)
addStraightLineScalarOptimizationPasses();		addStraightLineScalarOptimizationPasses();

if (EnableAMDGPUAliasAnalysis) {		if (EnableAMDGPUAliasAnalysis) {
addPass(createAMDGPUAAWrapperPass());		addPass(createAMDGPUAAWrapperPass());
addPass(createExternalAAWrapperPass([](Pass &P, Function &,		addPass(createExternalAAWrapperPass([](Pass &P, Function &,
AAResults &AAR) {		AAResults &AAR) {
if (auto *WrapperPass = P.getAnalysisIfAvailable<AMDGPUAAWrapperPass>())		if (auto *WrapperPass = P.getAnalysisIfAvailable<AMDGPUAAWrapperPass>())
AAR.addAAResult(WrapperPass->getResult());		AAR.addAAResult(WrapperPass->getResult());
}));		}));
Show All 9 Lines	void AMDGPUPassConfig::addIRPasses() {
// %1 = add %b, %a		// %1 = add %b, %a
//		//
// and		// and
//		//
// %0 = shl nsw %a, 2		// %0 = shl nsw %a, 2
// %1 = shl %a, 2		// %1 = shl %a, 2
//		//
// but EarlyCSE can do neither of them.		// but EarlyCSE can do neither of them.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None && EnableScalarIRPasses)
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();
}		}

void AMDGPUPassConfig::addCodeGenPrepare() {		void AMDGPUPassConfig::addCodeGenPrepare() {
if (TM->getTargetTriple().getArch() == Triple::amdgcn)		if (TM->getTargetTriple().getArch() == Triple::amdgcn)
addPass(createAMDGPUAnnotateKernelFeaturesPass());		addPass(createAMDGPUAnnotateKernelFeaturesPass());

if (TM->getTargetTriple().getArch() == Triple::amdgcn &&		if (TM->getTargetTriple().getArch() == Triple::amdgcn &&
▲ Show 20 Lines • Show All 315 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/med3-no-simplify.ll

				; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-scalar-ir-passes=false < %s \| FileCheck -check-prefix=GCN -check-prefix=SICIVI -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs -amdgpu-scalar-ir-passes=false < %s \| FileCheck -check-prefix=GCN -check-prefix=SICIVI -check-prefix=VI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs -amdgpu-scalar-ir-passes=false < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 %s

				; These tests are split out from umed3.ll and smed3.ll and use the
				; -amdgpu-scalar-ir-passes=false flag, because InstSimplify would constant
				; fold these functions otherwise.

				declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone

				; GCN-LABEL: {{^}}v_test_umed3_r_i_i_constant_order_i32:
				; GCN: v_max_u32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}
				; GCN: v_min_u32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}
				define amdgpu_kernel void @v_test_umed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
				%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
				%a = load i32, i32 addrspace(1)* %gep0

				%icmp0 = icmp ugt i32 %a, 17
				%i0 = select i1 %icmp0, i32 %a, i32 17

				%icmp1 = icmp ult i32 %i0, 12
				%i1 = select i1 %icmp1, i32 %i0, i32 12

				store i32 %i1, i32 addrspace(1)* %outgep
				ret void
				}

				; GCN-LABEL: {{^}}v_test_smed3_r_i_i_constant_order_i32:
				; GCN: v_max_i32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}
				; GCN: v_min_i32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}
				define amdgpu_kernel void @v_test_smed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
				%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
				%a = load i32, i32 addrspace(1)* %gep0

				%icmp0 = icmp sgt i32 %a, 17
				%i0 = select i1 %icmp0, i32 %a, i32 17

				%icmp1 = icmp slt i32 %i0, 12
				%i1 = select i1 %icmp1, i32 %i0, i32 12

				store i32 %i1, i32 addrspace(1)* %outgep
				ret void
				}

llvm/trunk/test/CodeGen/AMDGPU/smed3.ll

Show All 36 Lines	define amdgpu_kernel void @v_test_smed3_multi_use_r_i_i_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
%icmp1 = icmp slt i32 %i0, 17		%icmp1 = icmp slt i32 %i0, 17
%i1 = select i1 %icmp1, i32 %i0, i32 17		%i1 = select i1 %icmp1, i32 %i0, i32 17

store volatile i32 %i0, i32 addrspace(1)* %outgep		store volatile i32 %i0, i32 addrspace(1)* %outgep
store volatile i32 %i1, i32 addrspace(1)* %outgep		store volatile i32 %i1, i32 addrspace(1)* %outgep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_test_smed3_r_i_i_constant_order_i32:
; GCN: v_max_i32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}
; GCN: v_min_i32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}
define amdgpu_kernel void @v_test_smed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
%a = load i32, i32 addrspace(1)* %gep0

%icmp0 = icmp sgt i32 %a, 17
%i0 = select i1 %icmp0, i32 %a, i32 17

%icmp1 = icmp slt i32 %i0, 12
%i1 = select i1 %icmp1, i32 %i0, i32 12

store i32 %i1, i32 addrspace(1)* %outgep
ret void
}

; GCN-LABEL: {{^}}v_test_smed3_r_i_i_sign_mismatch_i32:		; GCN-LABEL: {{^}}v_test_smed3_r_i_i_sign_mismatch_i32:
; GCN: v_max_u32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}		; GCN: v_max_u32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}
; GCN: v_min_i32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}		; GCN: v_min_i32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}
define amdgpu_kernel void @v_test_smed3_r_i_i_sign_mismatch_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {		define amdgpu_kernel void @v_test_smed3_r_i_i_sign_mismatch_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid		%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
%a = load i32, i32 addrspace(1)* %gep0		%a = load i32, i32 addrspace(1)* %gep0
▲ Show 20 Lines • Show All 637 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/umed3.ll

Show All 36 Lines	define amdgpu_kernel void @v_test_umed3_multi_use_r_i_i_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
%icmp1 = icmp ult i32 %i0, 17		%icmp1 = icmp ult i32 %i0, 17
%i1 = select i1 %icmp1, i32 %i0, i32 17		%i1 = select i1 %icmp1, i32 %i0, i32 17

store volatile i32 %i0, i32 addrspace(1)* %outgep		store volatile i32 %i0, i32 addrspace(1)* %outgep
store volatile i32 %i1, i32 addrspace(1)* %outgep		store volatile i32 %i1, i32 addrspace(1)* %outgep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_test_umed3_r_i_i_constant_order_i32:
; GCN: v_max_u32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}
; GCN: v_min_u32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}
define amdgpu_kernel void @v_test_umed3_r_i_i_constant_order_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
%a = load i32, i32 addrspace(1)* %gep0

%icmp0 = icmp ugt i32 %a, 17
%i0 = select i1 %icmp0, i32 %a, i32 17

%icmp1 = icmp ult i32 %i0, 12
%i1 = select i1 %icmp1, i32 %i0, i32 12

store i32 %i1, i32 addrspace(1)* %outgep
ret void
}

; GCN-LABEL: {{^}}v_test_umed3_r_i_i_sign_mismatch_i32:		; GCN-LABEL: {{^}}v_test_umed3_r_i_i_sign_mismatch_i32:
; GCN: v_max_i32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}		; GCN: v_max_i32_e32 v{{[0-9]+}}, 12, v{{[0-9]+}}
; GCN: v_min_u32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}		; GCN: v_min_u32_e32 v{{[0-9]+}}, 17, v{{[0-9]+}}
define amdgpu_kernel void @v_test_umed3_r_i_i_sign_mismatch_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {		define amdgpu_kernel void @v_test_umed3_r_i_i_sign_mismatch_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr i32, i32 addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid		%outgep = getelementptr i32, i32 addrspace(1)* %out, i32 %tid
%a = load i32, i32 addrspace(1)* %gep0		%a = load i32, i32 addrspace(1)* %gep0
▲ Show 20 Lines • Show All 671 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/minmax-fold.ll

	Show First 20 Lines • Show All 527 Lines • ▼ Show 20 Lines
	;			;
	%cmp1 = icmp sgt i32 %i, -255			%cmp1 = icmp sgt i32 %i, -255
	%sel1 = select i1 %cmp1, i32 %i, i32 -255			%sel1 = select i1 %cmp1, i32 %i, i32 -255
	%cmp2 = icmp slt i32 %i, 0			%cmp2 = icmp slt i32 %i, 0
	%res = select i1 %cmp2, i32 %sel1, i32 0			%res = select i1 %cmp2, i32 %sel1, i32 0
	ret i32 %res			ret i32 %res
	}			}

				; Check that there is no infinite loop because of reverse cmp transformation:
				; (icmp slt smax(PositiveA, B) 2) -> (icmp eq B 1)
				define i32 @clamp_check_for_no_infinite_loop3(i32 %i) {
				; CHECK-LABEL: @clamp_check_for_no_infinite_loop3(
				; CHECK-NEXT: [[I2:%.]] = icmp sgt i32 [[I:%.]], 1
				; CHECK-NEXT: [[I3:%.*]] = select i1 [[I2]], i32 [[I]], i32 1
				; CHECK-NEXT: br i1 true, label [[TRUELABEL:%.]], label [[FALSELABEL:%.]]
				; CHECK: truelabel:
				; CHECK-NEXT: [[I5:%.*]] = icmp slt i32 [[I3]], 2
				; CHECK-NEXT: [[I6:%.*]] = select i1 [[I5]], i32 [[I3]], i32 2
				; CHECK-NEXT: [[I7:%.*]] = shl nuw nsw i32 [[I6]], 2
				; CHECK-NEXT: ret i32 [[I7]]
				; CHECK: falselabel:
				; CHECK-NEXT: ret i32 0
				;

				%i2 = icmp sgt i32 %i, 1
				%i3 = select i1 %i2, i32 %i, i32 1
				%i4 = icmp sgt i32 %i3, 0
				br i1 %i4, label %truelabel, label %falselabel

				truelabel: ; %i<=1, %i3>0
				%i5 = icmp slt i32 %i3, 2
				%i6 = select i1 %i5, i32 %i3, i32 2
				%i7 = shl nuw nsw i32 %i6, 2
				ret i32 %i7

				falselabel:
				ret i32 0
				}

	; The next 3 min tests should canonicalize to the same form...and not infinite loop.			; The next 3 min tests should canonicalize to the same form...and not infinite loop.

	define double @PR31751_umin1(i32 %x) {			define double @PR31751_umin1(i32 %x) {
	; CHECK-LABEL: @PR31751_umin1(			; CHECK-LABEL: @PR31751_umin1(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ult i32 [[X:%.]], 2147483647			; CHECK-NEXT: [[TMP1:%.]] = icmp ult i32 [[X:%.]], 2147483647
	; CHECK-NEXT: [[SEL:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 2147483647			; CHECK-NEXT: [[SEL:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 2147483647
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[SEL]] to double			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[SEL]] to double
	; CHECK-NEXT: ret double [[CONV]]			; CHECK-NEXT: ret double [[CONV]]
	▲ Show 20 Lines • Show All 876 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/sub.ll

Show First 20 Lines • Show All 1,207 Lines • ▼ Show 20 Lines	;
%res = sub i32 0, %3		%res = sub i32 0, %3
ret i32 %res		ret i32 %res
}		}

define i32 @test66(i32 %x) {		define i32 @test66(i32 %x) {
; CHECK-LABEL: @test66(		; CHECK-LABEL: @test66(
; CHECK-NEXT: [[TMP1:%.]] = icmp ult i32 [[X:%.]], -101		; CHECK-NEXT: [[TMP1:%.]] = icmp ult i32 [[X:%.]], -101
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 -101		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[X]], i32 -101
; CHECK-NEXT: [[RES:%.*]] = add i32 [[TMP2]], 1		; CHECK-NEXT: [[RES:%.*]] = add nuw i32 [[TMP2]], 1
; CHECK-NEXT: ret i32 [[RES]]		; CHECK-NEXT: ret i32 [[RES]]
;		;
%1 = xor i32 %x, -1		%1 = xor i32 %x, -1
%2 = icmp ugt i32 %1, 100		%2 = icmp ugt i32 %1, 100
%3 = select i1 %2, i32 %1, i32 100		%3 = select i1 %2, i32 %1, i32 100
%res = sub i32 0, %3		%res = sub i32 0, %3
ret i32 %res		ret i32 %res
}		}
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstSimplify/cmp_of_min_max.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -instsimplify -S \| FileCheck %s		; RUN: opt < %s -instsimplify -S \| FileCheck %s

define i1 @test_umax1(i32 %n) {		define i1 @test_umax1(i32 %n) {
; CHECK-LABEL: @test_umax1(		; CHECK-LABEL: @test_umax1(
; CHECK-NEXT: [[C1:%.]] = icmp ugt i32 [[N:%.]], 10		; CHECK-NEXT: ret i1 true
; CHECK-NEXT: [[S:%.*]] = select i1 [[C1]], i32 [[N]], i32 10
; CHECK-NEXT: [[C2:%.*]] = icmp ugt i32 [[S]], 9
; CHECK-NEXT: ret i1 [[C2]]
;		;
%c1 = icmp ugt i32 %n, 10		%c1 = icmp ugt i32 %n, 10
%s = select i1 %c1, i32 %n, i32 10		%s = select i1 %c1, i32 %n, i32 10
%c2 = icmp ugt i32 %s, 9		%c2 = icmp ugt i32 %s, 9
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_umax2(i32 %n) {		define i1 @test_umax2(i32 %n) {
Show All 17 Lines	;
%c1 = icmp ugt i32 %n, 10		%c1 = icmp ugt i32 %n, 10
%s = select i1 %c1, i32 %n, i32 10		%s = select i1 %c1, i32 %n, i32 10
%c2 = icmp ugt i32 %s, 11		%c2 = icmp ugt i32 %s, 11
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_umin1(i32 %n) {		define i1 @test_umin1(i32 %n) {
; CHECK-LABEL: @test_umin1(		; CHECK-LABEL: @test_umin1(
; CHECK-NEXT: [[C1:%.]] = icmp ult i32 [[N:%.]], 10		; CHECK-NEXT: ret i1 true
; CHECK-NEXT: [[S:%.*]] = select i1 [[C1]], i32 [[N]], i32 10
; CHECK-NEXT: [[C2:%.*]] = icmp ult i32 [[S]], 11
; CHECK-NEXT: ret i1 [[C2]]
;		;
%c1 = icmp ult i32 %n, 10		%c1 = icmp ult i32 %n, 10
%s = select i1 %c1, i32 %n, i32 10		%s = select i1 %c1, i32 %n, i32 10
%c2 = icmp ult i32 %s, 11		%c2 = icmp ult i32 %s, 11
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_umin2(i32 %n) {		define i1 @test_umin2(i32 %n) {
Show All 17 Lines	;
%c1 = icmp ult i32 %n, 10		%c1 = icmp ult i32 %n, 10
%s = select i1 %c1, i32 %n, i32 10		%s = select i1 %c1, i32 %n, i32 10
%c2 = icmp ult i32 %s, 9		%c2 = icmp ult i32 %s, 9
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_smax1(i32 %n) {		define i1 @test_smax1(i32 %n) {
; CHECK-LABEL: @test_smax1(		; CHECK-LABEL: @test_smax1(
; CHECK-NEXT: [[C1:%.]] = icmp sgt i32 [[N:%.]], -10		; CHECK-NEXT: ret i1 true
; CHECK-NEXT: [[S:%.*]] = select i1 [[C1]], i32 [[N]], i32 -10
; CHECK-NEXT: [[C2:%.*]] = icmp sgt i32 [[S]], -11
; CHECK-NEXT: ret i1 [[C2]]
;		;
%c1 = icmp sgt i32 %n, -10		%c1 = icmp sgt i32 %n, -10
%s = select i1 %c1, i32 %n, i32 -10		%s = select i1 %c1, i32 %n, i32 -10
%c2 = icmp sgt i32 %s, -11		%c2 = icmp sgt i32 %s, -11
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_smax2(i32 %n) {		define i1 @test_smax2(i32 %n) {
Show All 17 Lines	;
%c1 = icmp sgt i32 %n, -10		%c1 = icmp sgt i32 %n, -10
%s = select i1 %c1, i32 %n, i32 -10		%s = select i1 %c1, i32 %n, i32 -10
%c2 = icmp sgt i32 %s, -9		%c2 = icmp sgt i32 %s, -9
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_smin1(i32 %n) {		define i1 @test_smin1(i32 %n) {
; CHECK-LABEL: @test_smin1(		; CHECK-LABEL: @test_smin1(
; CHECK-NEXT: [[C1:%.]] = icmp slt i32 [[N:%.]], 10		; CHECK-NEXT: ret i1 true
; CHECK-NEXT: [[S:%.*]] = select i1 [[C1]], i32 [[N]], i32 10
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[S]], 11
; CHECK-NEXT: ret i1 [[C2]]
;		;
%c1 = icmp slt i32 %n, 10		%c1 = icmp slt i32 %n, 10
%s = select i1 %c1, i32 %n, i32 10		%s = select i1 %c1, i32 %n, i32 10
%c2 = icmp slt i32 %s, 11		%c2 = icmp slt i32 %s, 11
ret i1 %c2		ret i1 %c2
}		}

define i1 @test_smin2(i32 %n) {		define i1 @test_smin2(i32 %n) {
Show All 22 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 194062

llvm/trunk/lib/Analysis/ValueTracking.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/trunk/test/CodeGen/AMDGPU/med3-no-simplify.ll

llvm/trunk/test/CodeGen/AMDGPU/smed3.ll

llvm/trunk/test/CodeGen/AMDGPU/umed3.ll

llvm/trunk/test/Transforms/InstCombine/minmax-fold.ll

llvm/trunk/test/Transforms/InstCombine/sub.ll

llvm/trunk/test/Transforms/InstSimplify/cmp_of_min_max.ll

[ValueTracking][InstSimplify] Support min/max selects in computeConstantRange()
ClosedPublic