Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
DaniilSuchkov
reames

Commits

rG3097b76e8c2c: [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
rL336172: [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done

Summary

This patch changes order of transform in InstCombineCompares to avoid
performing transforms based on ranges which produce complex bit arithmetics
before more simple things (like folding with constants) are done. See PR37636
for the motivating example.

Diff Detail

Repository: rL LLVM

Event Timeline

mkazantsev created this revision.Jun 25 2018, 11:00 PM

lebedev.ri added inline comments.Jun 25 2018, 11:20 PM

test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll
4–5 ↗	(On Diff #152835)	Could you add comment what this tests, and commit as-is [and rebase this]? Also, i'm not sure these two lines are needed.

mkazantsev updated this revision to Diff 152844.Jun 25 2018, 11:49 PM

mkazantsev marked an inline comment as done.

lebedev.ri added inline comments.Jun 25 2018, 11:57 PM

test/Transforms/InstCombine/icmp-shl-nsw.ll
72 ↗	(On Diff #152844)	Stale comments. (and elsewhere in this file)

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

In D48584#1143240, @lebedev.ri wrote:

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

From my point of view, no. We use different predicates in some cases, I am not convinced that either of them is better (at least less/greater predicates are more useful for SCEV analysis than ne/eq).
I'll fix the comments.

mkazantsev updated this revision to Diff 152846.Jun 26 2018, 12:22 AM

mkazantsev marked an inline comment as done.

In D48584#1143257, @mkazantsev wrote:

In D48584#1143240, @lebedev.ri wrote:

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

From my point of view, no. We use different predicates in some cases, I am not convinced that either of them is better (at least less/greater predicates are more useful for SCEV analysis than ne/eq).
I'll fix the comments.

The test diffs here show what I was concerned about. It's not clear to me if/when the predicate/constant changes in these compares are wins or not. It's probably in the noise, but we should check for regressions on test-suite or some other benchmark?

In D48584#1143518, @spatel wrote:

In D48584#1143257, @mkazantsev wrote:

In D48584#1143240, @lebedev.ri wrote:

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

From my point of view, no. We use different predicates in some cases, I am not convinced that either of them is better (at least less/greater predicates are more useful for SCEV analysis than ne/eq).
I'll fix the comments.

The test diffs here show what I was concerned about. It's not clear to me if/when the predicate/constant changes in these compares are wins or not. It's probably in the noise, but we should check for regressions on test-suite or some other benchmark?

Also, are there more diffs if you sink foldICmpUsingKnownBits even further down? If there aren't, we should sink it all the way to the end because it's expensive in compile-time. Either way, please add a code comment, so we know why we positioned it wherever it is.

In D48584#1143240, @lebedev.ri wrote:

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

In D48584#1143518, @spatel wrote:

In D48584#1143257, @mkazantsev wrote:

In D48584#1143240, @lebedev.ri wrote:

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

From my point of view, no. We use different predicates in some cases, I am not convinced that either of them is better (at least less/greater predicates are more useful for SCEV analysis than ne/eq).
I'll fix the comments.

The test diffs here show what I was concerned about. It's not clear to me if/when the predicate/constant changes in these compares are wins or not. It's probably in the noise, but we should check for regressions on test-suite or some other benchmark?

I think that these changes are practically either unimportant or positive. SCEV has reasons to prefer lt/gt comparisons over ne/eq because in certain cases it can derive more facts from it, however I'm not sure whether it has real profit anywhere or not. I can test it on our Java benchmarks, but don't have environment to run C++ clang tests properly, so I'd appreciate if you could help with it.

In D48584#1143529, @spatel wrote:

In D48584#1143518, @spatel wrote:

In D48584#1143257, @mkazantsev wrote:

In D48584#1143240, @lebedev.ri wrote:

This only moves the foldICmpUsingKnownBits() call later on,
but in PR37636 @spatel suggests it *might* expose more problems.
Does this expose any problems?

From my point of view, no. We use different predicates in some cases, I am not convinced that either of them is better (at least less/greater predicates are more useful for SCEV analysis than ne/eq).
I'll fix the comments.

The test diffs here show what I was concerned about. It's not clear to me if/when the predicate/constant changes in these compares are wins or not. It's probably in the noise, but we should check for regressions on test-suite or some other benchmark?

Also, are there more diffs if you sink foldICmpUsingKnownBits even further down? If there aren't, we should sink it all the way to the end because it's expensive in compile-time. Either way, please add a code comment, so we know why we positioned it wherever it is.

Interesting idea, I need to check it.

Sinking it to the end of method did not introduce any new changes. Also added a comment.

lebedev.ri added inline comments.Jun 27 2018, 2:00 AM

lib/Transforms/InstCombine/InstCombineCompares.cpp
4710–4711 ↗	(On Diff #153000)	Please check that there is some test that relies on this, and does not get folded by previous cases.

mkazantsev added inline comments.Jun 27 2018, 7:55 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
4710–4711 ↗	(On Diff #153000)	If we completely comment away this code, more tests fail. So yes, it actually does some useful work.

In D48584#1144405, @mkazantsev wrote:

I think that these changes are practically either unimportant or positive. SCEV has reasons to prefer lt/gt comparisons over ne/eq because in certain cases it can derive more facts from it, however I'm not sure whether it has real profit anywhere or not. I can test it on our Java benchmarks, but don't have environment to run C++ clang tests properly, so I'd appreciate if you could help with it.

Ok - I need to get a test-suite machine set up again. I'll try to have some data by tomorrow.

In D48584#1146822, @spatel wrote:

In D48584#1144405, @mkazantsev wrote:

I think that these changes are practically either unimportant or positive. SCEV has reasons to prefer lt/gt comparisons over ne/eq because in certain cases it can derive more facts from it, however I'm not sure whether it has real profit anywhere or not. I can test it on our Java benchmarks, but don't have environment to run C++ clang tests properly, so I'd appreciate if you could help with it.

Ok - I need to get a test-suite machine set up again. I'll try to have some data by tomorrow.

I don't see anything above the noise with this patch applied, so we should be ok here.

lib/Transforms/InstCombine/InstCombineCompares.cpp
4707–4709 ↗	(On Diff #153000)	I'd state this more like: "This may be expensive in compile-time, and transforms based on known bits can make further analysis more difficult, so we use it as the last resort..."
test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll
4–5 ↗	(On Diff #152835)	Please commit the test file to trunk with the baseline CHECK lines, and then rebase this patch, so we have a record of the bug (also add a comment referencing/linking PR37636).

Rebased on top of pre-merged test, fixed comments.

In D48584#1148383, @spatel wrote:

In D48584#1146822, @spatel wrote:

In D48584#1144405, @mkazantsev wrote:

I think that these changes are practically either unimportant or positive. SCEV has reasons to prefer lt/gt comparisons over ne/eq because in certain cases it can derive more facts from it, however I'm not sure whether it has real profit anywhere or not. I can test it on our Java benchmarks, but don't have environment to run C++ clang tests properly, so I'd appreciate if you could help with it.

Ok - I need to get a test-suite machine set up again. I'll try to have some data by tomorrow.

I don't see anything above the noise with this patch applied, so we should be ok here.

LGTM then.

This revision is now accepted and ready to land.Jul 2 2018, 4:44 AM

spatel added inline comments.Jul 2 2018, 5:38 AM

test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll
5 ↗	(On Diff #153675)	This comment isn't correct - as the test shows. Instcombine should do better with the range metadata, and I think that's true now (at least in this case).

LGTM too, other than the nit about the test comment.

Closed by commit rL336172: [InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done (authored by mkazantsev). · Explain WhyJul 2 2018, 11:28 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in rL336293: [InstCombine] allow narrowing of min/max/abs.Jul 4 2018, 10:49 AM

Diff 153859

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 4,479 Lines • ▼ Show 20 Lines	if (Instruction *Res = canonicalizeICmpBool(I, Builder))
return Res;		return Res;

if (ICmpInst *NewICmp = canonicalizeCmpWithConstant(I))		if (ICmpInst *NewICmp = canonicalizeCmpWithConstant(I))
return NewICmp;		return NewICmp;

if (Instruction *Res = foldICmpWithConstant(I))		if (Instruction *Res = foldICmpWithConstant(I))
return Res;		return Res;

if (Instruction *Res = foldICmpUsingKnownBits(I))
return Res;

// Test if the ICmpInst instruction is used exclusively by a select as		// Test if the ICmpInst instruction is used exclusively by a select as
// part of a minimum or maximum operation. If so, refrain from doing		// part of a minimum or maximum operation. If so, refrain from doing
// any other folding. This helps out other analyses which understand		// any other folding. This helps out other analyses which understand
// non-obfuscated minimum and maximum idioms, such as ScalarEvolution		// non-obfuscated minimum and maximum idioms, such as ScalarEvolution
// and CodeGen. And in this case, at least one of the comparison		// and CodeGen. And in this case, at least one of the comparison
// operands has at least one user besides the compare (the select),		// operands has at least one user besides the compare (the select),
// which would often largely negate the benefit of folding anyway.		// which would often largely negate the benefit of folding anyway.
//		//
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	if (I.getPredicate() == ICmpInst::ICMP_EQ)
// icmp X+Cst, X		// icmp X+Cst, X
if (match(Op0, m_Add(m_Value(X), m_ConstantInt(Cst))) && Op1 == X)		if (match(Op0, m_Add(m_Value(X), m_ConstantInt(Cst))) && Op1 == X)
return foldICmpAddOpConst(X, Cst, I.getPredicate());		return foldICmpAddOpConst(X, Cst, I.getPredicate());

// icmp X, X+Cst		// icmp X, X+Cst
if (match(Op1, m_Add(m_Value(X), m_ConstantInt(Cst))) && Op0 == X)		if (match(Op1, m_Add(m_Value(X), m_ConstantInt(Cst))) && Op0 == X)
return foldICmpAddOpConst(X, Cst, I.getSwappedPredicate());		return foldICmpAddOpConst(X, Cst, I.getSwappedPredicate());
}		}

		// This may be expensive in compile-time, and transforms based on known bits
		// can make further analysis more difficult, so we use it as the last resort
		// if we cannot do anything better.
		if (Instruction *Res = foldICmpUsingKnownBits(I))
		return Res;

return Changed ? &I : nullptr;		return Changed ? &I : nullptr;
}		}

/// Fold fcmp ([us]itofp x, cst) if possible.		/// Fold fcmp ([us]itofp x, cst) if possible.
Instruction InstCombiner::foldFCmpIntToFPConst(FCmpInst &I, Instruction LHSI,		Instruction InstCombiner::foldFCmpIntToFPConst(FCmpInst &I, Instruction LHSI,
Constant *RHSC) {		Constant *RHSC) {
if (!isa<ConstantFP>(RHSC)) return nullptr;		if (!isa<ConstantFP>(RHSC)) return nullptr;
const APFloat &RHS = cast<ConstantFP>(RHSC)->getValueAPF();		const APFloat &RHS = cast<ConstantFP>(RHSC)->getValueAPF();
▲ Show 20 Lines • Show All 428 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/ValueTracking/non-negative-phi-bits.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine < %s -S \| FileCheck %s			; RUN: opt -instcombine < %s -S \| FileCheck %s

	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ult i64 [[INDVARS_IV_NEXT]], 40			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ult i64 [[INDVARS_IV]], 39
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%indvars.iv.next = add nsw i64 %indvars.iv, 1			%indvars.iv.next = add nsw i64 %indvars.iv, 1
	%exitcond = icmp slt i64 %indvars.iv.next, 40			%exitcond = icmp slt i64 %indvars.iv.next, 40
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

llvm/trunk/test/Transforms/InstCombine/icmp-shl-nsw.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	;
%mul = shl nsw <2 x i32> %x, <i32 5, i32 5>		%mul = shl nsw <2 x i32> %x, <i32 5, i32 5>
%cmp = icmp eq <2 x i32> %mul, zeroinitializer		%cmp = icmp eq <2 x i32> %mul, zeroinitializer
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}

; icmp sgt with shl nsw with a constant compare operand and constant		; icmp sgt with shl nsw with a constant compare operand and constant
; shift amount can always be reduced to icmp sgt alone.		; shift amount can always be reduced to icmp sgt alone.

; Known bits analysis turns this into an equality predicate.

define i1 @icmp_sgt1(i8 %x) {		define i1 @icmp_sgt1(i8 %x) {
; CHECK-LABEL: @icmp_sgt1(		; CHECK-LABEL: @icmp_sgt1(
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 %x, -64		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i8 %x, -64
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 1		%shl = shl nsw i8 %x, 1
%cmp = icmp sgt i8 %shl, -128		%cmp = icmp sgt i8 %shl, -128
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @icmp_sgt2(i8 %x) {		define i1 @icmp_sgt2(i8 %x) {
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i8 %x, 62		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i8 %x, 62
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 1		%shl = shl nsw i8 %x, 1
%cmp = icmp sgt i8 %shl, 124		%cmp = icmp sgt i8 %shl, 124
ret i1 %cmp		ret i1 %cmp
}		}

; Known bits analysis turns this into an equality predicate.

define i1 @icmp_sgt8(i8 %x) {		define i1 @icmp_sgt8(i8 %x) {
; CHECK-LABEL: @icmp_sgt8(		; CHECK-LABEL: @icmp_sgt8(
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 %x, 63		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i8 %x, 62
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 1		%shl = shl nsw i8 %x, 1
%cmp = icmp sgt i8 %shl, 125		%cmp = icmp sgt i8 %shl, 125
ret i1 %cmp		ret i1 %cmp
}		}

; Compares with 126 and 127 are recognized as always false.		; Compares with 126 and 127 are recognized as always false.

; Known bits analysis turns this into an equality predicate.

define i1 @icmp_sgt9(i8 %x) {		define i1 @icmp_sgt9(i8 %x) {
; CHECK-LABEL: @icmp_sgt9(		; CHECK-LABEL: @icmp_sgt9(
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 %x, -1		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i8 %x, -1
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 7		%shl = shl nsw i8 %x, 7
%cmp = icmp sgt i8 %shl, -128		%cmp = icmp sgt i8 %shl, -128
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @icmp_sgt10(i8 %x) {		define i1 @icmp_sgt10(i8 %x) {
Show All 31 Lines
; Known bits analysis returns false for compares with >=0.		; Known bits analysis returns false for compares with >=0.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;		;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;		;
; Repeat the shl nsw + sgt tests with predicate changed to 'sle'.		; Repeat the shl nsw + sgt tests with predicate changed to 'sle'.
;		;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;		;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Known bits analysis turns this into an equality predicate.

define i1 @icmp_sle1(i8 %x) {		define i1 @icmp_sle1(i8 %x) {
; CHECK-LABEL: @icmp_sle1(		; CHECK-LABEL: @icmp_sle1(
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 %x, -64		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i8 %x, -63
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 1		%shl = shl nsw i8 %x, 1
%cmp = icmp sle i8 %shl, -128		%cmp = icmp sle i8 %shl, -128
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @icmp_sle2(i8 %x) {		define i1 @icmp_sle2(i8 %x) {
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i8 %x, 63		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i8 %x, 63
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 1		%shl = shl nsw i8 %x, 1
%cmp = icmp sle i8 %shl, 124		%cmp = icmp sle i8 %shl, 124
ret i1 %cmp		ret i1 %cmp
}		}

; Known bits analysis turns this into an equality predicate.

define i1 @icmp_sle8(i8 %x) {		define i1 @icmp_sle8(i8 %x) {
; CHECK-LABEL: @icmp_sle8(		; CHECK-LABEL: @icmp_sle8(
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 %x, 63		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i8 %x, 63
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 1		%shl = shl nsw i8 %x, 1
%cmp = icmp sle i8 %shl, 125		%cmp = icmp sle i8 %shl, 125
ret i1 %cmp		ret i1 %cmp
}		}

; Compares with 126 and 127 are recognized as always true.		; Compares with 126 and 127 are recognized as always true.

; Known bits analysis turns this into an equality predicate.

define i1 @icmp_sle9(i8 %x) {		define i1 @icmp_sle9(i8 %x) {
; CHECK-LABEL: @icmp_sle9(		; CHECK-LABEL: @icmp_sle9(
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 %x, -1		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i8 %x, 0
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 7		%shl = shl nsw i8 %x, 7
%cmp = icmp sle i8 %shl, -128		%cmp = icmp sle i8 %shl, -128
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @icmp_sle10(i8 %x) {		define i1 @icmp_sle10(i8 %x) {
Show All 33 Lines
; CHECK-LABEL: @icmp_ne1(		; CHECK-LABEL: @icmp_ne1(
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 %x, -2		; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 %x, -2
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%shl = shl nsw i8 %x, 6		%shl = shl nsw i8 %x, 6
%cmp = icmp ne i8 %shl, -128		%cmp = icmp ne i8 %shl, -128
ret i1 %cmp		ret i1 %cmp
}		}

llvm/trunk/test/Transforms/InstCombine/icmp-shr-lt-gt.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	define i1 @lshrugt_01_00(i4 %x) {			define i1 @lshrugt_01_00(i4 %x) {
	; CHECK-LABEL: @lshrugt_01_00(			; CHECK-LABEL: @lshrugt_01_00(
	; CHECK-NEXT: [[C:%.*]] = icmp ugt i4 %x, 1			; CHECK-NEXT: [[C:%.*]] = icmp ugt i4 %x, 1
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr i4 %x, 1			%s = lshr i4 %x, 1
	▲ Show 20 Lines • Show All 1,820 Lines • ▼ Show 20 Lines
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ugt i4 %s, 5			%c = icmp ugt i4 %s, 5
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrugt_01_06_exact(i4 %x) {			define i1 @lshrugt_01_06_exact(i4 %x) {
	; CHECK-LABEL: @lshrugt_01_06_exact(			; CHECK-LABEL: @lshrugt_01_06_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp eq i4 %x, -2			; CHECK-NEXT: [[C:%.*]] = icmp ugt i4 %x, -4
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ugt i4 %s, 6			%c = icmp ugt i4 %s, 6
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrugt_01_07_exact(i4 %x) {			define i1 @lshrugt_01_07_exact(i4 %x) {
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	;			;
	%s = lshr exact i4 %x, 2			%s = lshr exact i4 %x, 2
	%c = icmp ugt i4 %s, 1			%c = icmp ugt i4 %s, 1
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrugt_02_02_exact(i4 %x) {			define i1 @lshrugt_02_02_exact(i4 %x) {
	; CHECK-LABEL: @lshrugt_02_02_exact(			; CHECK-LABEL: @lshrugt_02_02_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp eq i4 %x, -4			; CHECK-NEXT: [[C:%.*]] = icmp ugt i4 %x, -8
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 2			%s = lshr exact i4 %x, 2
	%c = icmp ugt i4 %s, 2			%c = icmp ugt i4 %s, 2
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrugt_02_03_exact(i4 %x) {			define i1 @lshrugt_02_03_exact(i4 %x) {
	▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ult i4 %s, 0			%c = icmp ult i4 %s, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_01_01_exact(i4 %x) {			define i1 @lshrult_01_01_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_01_01_exact(			; CHECK-LABEL: @lshrult_01_01_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp eq i4 %x, 0			; CHECK-NEXT: [[C:%.*]] = icmp ult i4 %x, 2
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ult i4 %s, 1			%c = icmp ult i4 %s, 1
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_01_02_exact(i4 %x) {			define i1 @lshrult_01_02_exact(i4 %x) {
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ult i4 %s, 6			%c = icmp ult i4 %s, 6
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_01_07_exact(i4 %x) {			define i1 @lshrult_01_07_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_01_07_exact(			; CHECK-LABEL: @lshrult_01_07_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp ne i4 %x, -2			; CHECK-NEXT: [[C:%.*]] = icmp ult i4 %x, -2
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ult i4 %s, 7			%c = icmp ult i4 %s, 7
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_01_08_exact(i4 %x) {			define i1 @lshrult_01_08_exact(i4 %x) {
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	;			;
	%s = lshr exact i4 %x, 2			%s = lshr exact i4 %x, 2
	%c = icmp ult i4 %s, 0			%c = icmp ult i4 %s, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_02_01_exact(i4 %x) {			define i1 @lshrult_02_01_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_02_01_exact(			; CHECK-LABEL: @lshrult_02_01_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp eq i4 %x, 0			; CHECK-NEXT: [[C:%.*]] = icmp ult i4 %x, 4
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 2			%s = lshr exact i4 %x, 2
	%c = icmp ult i4 %s, 1			%c = icmp ult i4 %s, 1
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_02_02_exact(i4 %x) {			define i1 @lshrult_02_02_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_02_02_exact(			; CHECK-LABEL: @lshrult_02_02_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -1			; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -1
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 2			%s = lshr exact i4 %x, 2
	%c = icmp ult i4 %s, 2			%c = icmp ult i4 %s, 2
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_02_03_exact(i4 %x) {			define i1 @lshrult_02_03_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_02_03_exact(			; CHECK-LABEL: @lshrult_02_03_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp ne i4 %x, -4			; CHECK-NEXT: [[C:%.*]] = icmp ult i4 %x, -4
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 2			%s = lshr exact i4 %x, 2
	%c = icmp ult i4 %s, 3			%c = icmp ult i4 %s, 3
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_02_04_exact(i4 %x) {			define i1 @lshrult_02_04_exact(i4 %x) {
	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	;			;
	%s = lshr exact i4 %x, 3			%s = lshr exact i4 %x, 3
	%c = icmp ult i4 %s, 0			%c = icmp ult i4 %s, 0
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_03_01_exact(i4 %x) {			define i1 @lshrult_03_01_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_03_01_exact(			; CHECK-LABEL: @lshrult_03_01_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp ne i4 %x, -8			; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -1
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = lshr exact i4 %x, 3			%s = lshr exact i4 %x, 3
	%c = icmp ult i4 %s, 1			%c = icmp ult i4 %s, 1
	ret i1 %c			ret i1 %c
	}			}

	define i1 @lshrult_03_02_exact(i4 %x) {			define i1 @lshrult_03_02_exact(i4 %x) {
	▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	;			;
	%s = ashr exact i4 %x, 1			%s = ashr exact i4 %x, 1
	%c = icmp sgt i4 %s, 14			%c = icmp sgt i4 %s, 14
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashrsgt_01_15_exact(i4 %x) {			define i1 @ashrsgt_01_15_exact(i4 %x) {
	; CHECK-LABEL: @ashrsgt_01_15_exact(			; CHECK-LABEL: @ashrsgt_01_15_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -1			; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -2
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = ashr exact i4 %x, 1			%s = ashr exact i4 %x, 1
	%c = icmp sgt i4 %s, 15			%c = icmp sgt i4 %s, 15
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashrsgt_02_00_exact(i4 %x) {			define i1 @ashrsgt_02_00_exact(i4 %x) {
	▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	;			;
	%s = ashr exact i4 %x, 2			%s = ashr exact i4 %x, 2
	%c = icmp sgt i4 %s, 14			%c = icmp sgt i4 %s, 14
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashrsgt_02_15_exact(i4 %x) {			define i1 @ashrsgt_02_15_exact(i4 %x) {
	; CHECK-LABEL: @ashrsgt_02_15_exact(			; CHECK-LABEL: @ashrsgt_02_15_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -1			; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -4
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = ashr exact i4 %x, 2			%s = ashr exact i4 %x, 2
	%c = icmp sgt i4 %s, 15			%c = icmp sgt i4 %s, 15
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashrsgt_03_00_exact(i4 %x) {			define i1 @ashrsgt_03_00_exact(i4 %x) {
	▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	;			;
	%s = ashr exact i4 %x, 3			%s = ashr exact i4 %x, 3
	%c = icmp sgt i4 %s, 14			%c = icmp sgt i4 %s, 14
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashrsgt_03_15_exact(i4 %x) {			define i1 @ashrsgt_03_15_exact(i4 %x) {
	; CHECK-LABEL: @ashrsgt_03_15_exact(			; CHECK-LABEL: @ashrsgt_03_15_exact(
	; CHECK-NEXT: [[C:%.*]] = icmp sgt i4 %x, -1			; CHECK-NEXT: [[C:%.*]] = icmp ne i4 %x, -8
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = ashr exact i4 %x, 3			%s = ashr exact i4 %x, 3
	%c = icmp sgt i4 %s, 15			%c = icmp sgt i4 %s, 15
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashrslt_01_00_exact(i4 %x) {			define i1 @ashrslt_01_00_exact(i4 %x) {
	▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -instcombine -S < %s \| FileCheck %s

	; Test that presence of range does not cause unprofitable transforms with bit			; Test that presence of range does not cause unprofitable transforms with bit
	; arithmetics, and instcombine behaves exactly the same as without the range.			; arithmetics. InstCombine needs to be smart about dealing with range-annotated
				; values.

	define i1 @without_range(i32* %A) {			define i1 @without_range(i32* %A) {
	; CHECK-LABEL: @without_range(			; CHECK-LABEL: @without_range(
	; CHECK-NEXT: [[A_VAL:%.]] = load i32, i32 [[A:%.*]], align 8			; CHECK-NEXT: [[A_VAL:%.]] = load i32, i32 [[A:%.*]], align 8
	; CHECK-NEXT: [[C:%.*]] = icmp slt i32 [[A_VAL]], 2			; CHECK-NEXT: [[C:%.*]] = icmp slt i32 [[A_VAL]], 2
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%A.val = load i32, i32* %A, align 8			%A.val = load i32, i32* %A, align 8
	%B = sdiv i32 %A.val, 2			%B = sdiv i32 %A.val, 2
	%C = icmp sge i32 0, %B			%C = icmp sge i32 0, %B
	ret i1 %C			ret i1 %C
	}			}

	define i1 @with_range(i32* %A) {			define i1 @with_range(i32* %A) {
	; CHECK-LABEL: @with_range(			; CHECK-LABEL: @with_range(
	; CHECK-NEXT: [[A_VAL:%.]] = load i32, i32 [[A:%.*]], align 8, !range !0			; CHECK-NEXT: [[A_VAL:%.]] = load i32, i32 [[A:%.*]], align 8, !range !0
	; CHECK-NEXT: [[B_MASK:%.*]] = and i32 [[A_VAL]], 2147483646			; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[A_VAL]], 2
	; CHECK-NEXT: [[C:%.*]] = icmp eq i32 [[B_MASK]], 0
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%A.val = load i32, i32* %A, align 8, !range !0			%A.val = load i32, i32* %A, align 8, !range !0
	%B = sdiv i32 %A.val, 2			%B = sdiv i32 %A.val, 2
	%C = icmp sge i32 0, %B			%C = icmp sge i32 0, %B
	ret i1 %C			ret i1 %C
	}			}

	!0 = !{i32 0, i32 2147483647}			!0 = !{i32 0, i32 2147483647}

llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll

	Show First 20 Lines • Show All 2,046 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: store double [[ADD_PROL]], double* [[ARRAYIDX7_PROL]], align 8			; AVX512-NEXT: store double [[ADD_PROL]], double* [[ARRAYIDX7_PROL]], align 8
	; AVX512-NEXT: br label [[FOR_INC_PROL]]			; AVX512-NEXT: br label [[FOR_INC_PROL]]
	; AVX512: for.inc.prol:			; AVX512: for.inc.prol:
	; AVX512-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 16			; AVX512-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 16
	; AVX512-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1			; AVX512-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1
	; AVX512-NEXT: [[PROL_ITER_CMP:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0			; AVX512-NEXT: [[PROL_ITER_CMP:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0
	; AVX512-NEXT: br i1 [[PROL_ITER_CMP]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL]], !llvm.loop !50			; AVX512-NEXT: br i1 [[PROL_ITER_CMP]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL]], !llvm.loop !50
	; AVX512: for.body.prol.loopexit:			; AVX512: for.body.prol.loopexit:
	; AVX512-NEXT: [[DOTMASK:%.*]] = and i64 [[TMP24]], 9984			; AVX512-NEXT: [[TMP28:%.*]] = icmp ult i64 [[TMP24]], 48
	; AVX512-NEXT: [[TMP28:%.*]] = icmp eq i64 [[DOTMASK]], 0
	; AVX512-NEXT: br i1 [[TMP28]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; AVX512-NEXT: br i1 [[TMP28]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; AVX512: for.body:			; AVX512: for.body:
	; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_INC_3:%.*]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL_LOOPEXIT]] ]			; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_INC_3:%.*]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL_LOOPEXIT]] ]
	; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDVARS_IV]]
	; AVX512-NEXT: [[TMP29:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; AVX512-NEXT: [[TMP29:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; AVX512-NEXT: [[CMP1:%.*]] = icmp slt i32 [[TMP29]], 100			; AVX512-NEXT: [[CMP1:%.*]] = icmp slt i32 [[TMP29]], 100
	; AVX512-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]			; AVX512-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; AVX512: if.then:			; AVX512: if.then:
	▲ Show 20 Lines • Show All 1,326 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 153859

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/trunk/test/Analysis/ValueTracking/non-negative-phi-bits.ll

llvm/trunk/test/Transforms/InstCombine/icmp-shl-nsw.ll

llvm/trunk/test/Transforms/InstCombine/icmp-shr-lt-gt.ll

llvm/trunk/test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are doneClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 153859

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/trunk/test/Analysis/ValueTracking/non-negative-phi-bits.ll

llvm/trunk/test/Transforms/InstCombine/icmp-shl-nsw.ll

llvm/trunk/test/Transforms/InstCombine/icmp-shr-lt-gt.ll

llvm/trunk/test/Transforms/InstCombine/icmp_sdiv_with_and_without_range.ll

llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll

[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done
ClosedPublic