This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1
InstCombineCompares.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
icmp-shr-lt-gt.ll
-
icmp-shr.ll

Differential D117365

[InstCombine] optimize icmp-ugt-ashr
AcceptedPublic

Authored by nadav on Jan 14 2022, 4:05 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
craig.topper
spatel

Summary

This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine already implements this optimization for sgt, and this patch adds support ugt.

@craig.topper came up with the idea and proof.

This patch adds the check for UGT, and also simplifies the check for SGT because Craig's proof shows that the comparison to min_int is not necessary.

define i1 @src(i8 %x, i8 %y, i8 %c) {
  %cp1 = add i8 %c, 1
  %i = shl i8 %cp1, %y
  %i.2 = ashr i8 %i, %y
  %cmp = icmp eq i8 %cp1, %i.2
  ;Assume: C + 1 == (((C + 1) << y) >> y)
  call void @llvm.assume(i1 %cmp)

  ; uncomment for the sgt case
  %j = shl i8 %cp1, %y 
  %j.2 = sub i8 %j, 1
  %cmp2 = icmp ne i8 %j.2, 127
  ;Assume (((c + 1 ) << y) - 1) != 127
  call void @llvm.assume(i1 %cmp2)

  %s = ashr i8 %x, %y
  %r = icmp sgt i8 %s, %c
  ret i1 %r
}

define i1 @tgt(i8 %x, i8 %y, i8 %c) {
  %cp1 = add i8 %c, 1
  %j = shl i8 %cp1, %y
  %j.2 = sub i8 %j, 1

  %r = icmp sgt i8 %x, %j.2
  ret i1 %r
}

declare void @llvm.assume(i1)

This change is related to the optimizations in D117252.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nadav requested review of this revision.Jan 14 2022, 4:05 PM

nadav created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2022, 4:05 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nadav updated this revision to Diff 400177.Jan 14 2022, 4:08 PM

Harbormaster completed remote builds in B143520: Diff 400177.Jan 14 2022, 4:13 PM

Could you post the general proof, not for a single constant?

lib/Transforms/InstCombine/InstCombineCompares.cpp
2242 ↗	(On Diff #400177)	As i have said in D117252, this should be if (IsExact \|\| Pred == CmpInst::ICMP_SLT \|\| Pred == CmpInst::ICMP_ULT) {

In D117365#3245910, @lebedev.ri wrote:

Could you post the general proof, not for a single constant?

I think this is the general proof. It requires less pre-conditions than the sgt case.

https://alive2.llvm.org/ce/z/tFAcZt

lib/Transforms/InstCombine/InstCombineCompares.cpp
2253 ↗	(On Diff #400177)	I believe the !C.isMaxSignedValue() is an unneeded condition for the signed case. If C.isMaxSignedValue() is true then C+1 is 0x80 for i8. `(ShiftedC + 1).ashr(ShAmtVal) == (C + 1)` could only possibly be true when ShAmtVal is 0. 0x80 only has 1 sign bit, any other ShAmtVal would mean that (C+1) would have more than 1 sign bit. I'm not suggesting we remove it in this patch. Just pointing it out for alive proof purposes.

nadav updated this revision to Diff 400942.Jan 18 2022, 12:10 PM

nadav edited the summary of this revision. (Show Details)

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 18 2022, 12:10 PM

Harbormaster completed remote builds in B144084: Diff 400942.Jan 18 2022, 1:54 PM

craig.topper added inline comments.Jan 18 2022, 3:34 PM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2253	I think the `!(C + 1).shl(ShAmtVal).isMinSignedValue()` term is required for the signed case isn't it? It's the commented out code in my alive proof.

@craig.topper Thank you for the review, and for writing the proof, of course. This is my understanding of the code and of your proof.

Your proof has two checks, which I added as comments.

;Assume: C + 1 == ((C+1 << y) >> y)
;Assume ((c +1 ) << y) != 127      -- only needed for the sgt case

The first check is implemented with the code:

APInt ShiftedC = (C + 1).shl(ShAmtVal) - 1;
if ((ShiftedC + 1).ashr(ShAmtVal) == (C + 1))

And the second one with:

!C.isMaxSignedValue() && (ShiftedC + 1).ashr(ShAmtVal) == (C + 1)

The code also has this check, which I removed, because the proof does not have this check:

!(C + 1).shl(ShAmtVal).isMinSignedValue()

Is this also your understanding, or did I miss something.

Ah, I did miss something. I'll rewrite the second check: "Assume ((c +1 ) << y) != 127"

nadav updated this revision to Diff 401265.Jan 19 2022, 8:56 AM

nadav edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B144322: Diff 401265.Jan 19 2022, 11:11 AM

LGTM

This revision is now accepted and ready to land.Jan 19 2022, 2:42 PM

spatel mentioned this in D127188: [InstCombine] improve fold for icmp-ugt-ashr.Jun 7 2022, 12:17 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

8 lines

test/

Transforms/

InstCombine/

icmp-shr-lt-gt.ll

15 lines

icmp-shr.ll

20 lines

Diff 401265

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	bool IsAShr = Shr->getOpcode() == Instruction::AShr;			bool IsAShr = Shr->getOpcode() == Instruction::AShr;
	bool IsExact = Shr->isExact();			bool IsExact = Shr->isExact();
	Type *ShrTy = Shr->getType();			Type *ShrTy = Shr->getType();
	// TODO: If we could guarantee that InstSimplify would handle all of the			// TODO: If we could guarantee that InstSimplify would handle all of the
	// constant-value-based preconditions in the folds below, then we could assert			// constant-value-based preconditions in the folds below, then we could assert
	// those conditions rather than checking them. This is difficult because of			// those conditions rather than checking them. This is difficult because of
	// undef/poison (PR34838).			// undef/poison (PR34838).
	if (IsAShr) {			if (IsAShr) {
	if (Pred == CmpInst::ICMP_SLT \|\| Pred == CmpInst::ICMP_ULT \|\| IsExact) {			if (IsExact \|\| Pred == CmpInst::ICMP_SLT \|\| Pred == CmpInst::ICMP_ULT) {
	// When ShAmtC can be shifted losslessly:			// When ShAmtC can be shifted losslessly:
	// icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)			// icmp PRED (ashr exact X, ShAmtC), C --> icmp PRED X, (C << ShAmtC)
	// icmp slt/ult (ashr X, ShAmtC), C --> icmp slt/ult X, (C << ShAmtC)			// icmp slt/ult (ashr X, ShAmtC), C --> icmp slt/ult X, (C << ShAmtC)
	APInt ShiftedC = C.shl(ShAmtVal);			APInt ShiftedC = C.shl(ShAmtVal);
	if (ShiftedC.ashr(ShAmtVal) == C)			if (ShiftedC.ashr(ShAmtVal) == C)
	return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));			return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
	}			}
	if (Pred == CmpInst::ICMP_SGT) {			if (Pred == CmpInst::ICMP_SGT) {
	// icmp sgt (ashr X, ShAmtC), C --> icmp sgt X, ((C + 1) << ShAmtC) - 1			// icmp sgt (ashr X, ShAmtC), C --> icmp sgt X, ((C + 1) << ShAmtC) - 1
	APInt ShiftedC = (C + 1).shl(ShAmtVal) - 1;			APInt ShiftedC = (C + 1).shl(ShAmtVal) - 1;
	if (!C.isMaxSignedValue() && !(C + 1).shl(ShAmtVal).isMinSignedValue() &&			if (!C.isMaxSignedValue() && !(C + 1).shl(ShAmtVal).isMinSignedValue() &&
				craig.topperUnsubmitted Not Done Reply Inline Actions I think the `!(C + 1).shl(ShAmtVal).isMinSignedValue()` term is required for the signed case isn't it? It's the commented out code in my alive proof. craig.topper: I think the `!(C + 1).shl(ShAmtVal).isMinSignedValue()` term is required for the signed case…
	(ShiftedC + 1).ashr(ShAmtVal) == (C + 1))			(ShiftedC + 1).ashr(ShAmtVal) == (C + 1))
	return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));			return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
	}			}
				if (Pred == CmpInst::ICMP_UGT) {
				// icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1
				APInt ShiftedC = (C + 1).shl(ShAmtVal) - 1;
				if ((ShiftedC + 1).ashr(ShAmtVal) == (C + 1))
				return new ICmpInst(Pred, X, ConstantInt::get(ShrTy, ShiftedC));
				}

	// If the compare constant has significant bits above the lowest sign-bit,			// If the compare constant has significant bits above the lowest sign-bit,
	// then convert an unsigned cmp to a test of the sign-bit:			// then convert an unsigned cmp to a test of the sign-bit:
	// (ashr X, ShiftC) u> C --> X s< 0			// (ashr X, ShiftC) u> C --> X s< 0
	// (ashr X, ShiftC) u< C --> X s> -1			// (ashr X, ShiftC) u< C --> X s> -1
	if (C.getBitWidth() > 2 && C.getNumSignBits() <= ShAmtVal) {			if (C.getBitWidth() > 2 && C.getNumSignBits() <= ShAmtVal) {
	if (Pred == CmpInst::ICMP_UGT) {			if (Pred == CmpInst::ICMP_UGT) {
	return new ICmpInst(CmpInst::ICMP_SLT, X,			return new ICmpInst(CmpInst::ICMP_SLT, X,
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-shr-lt-gt.ll

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	;			;
	%s = ashr i8 %x, 3			%s = ashr i8 %x, 3
	%c = icmp ne i8 %s, 10			%c = icmp ne i8 %s, 10
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashr_ugt_noexact(i8 %x) {			define i1 @ashr_ugt_noexact(i8 %x) {
	; CHECK-LABEL: @ashr_ugt_noexact(			; CHECK-LABEL: @ashr_ugt_noexact(
	; CHECK-NEXT: [[S:%.]] = ashr i8 [[X:%.]], 3			; CHECK-NEXT: [[C:%.]] = icmp ugt i8 [[X:%.]], 87
	; CHECK-NEXT: [[C:%.*]] = icmp ugt i8 [[S]], 10
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = ashr i8 %x, 3			%s = ashr i8 %x, 3
	%c = icmp ugt i8 %s, 10			%c = icmp ugt i8 %s, 10
	ret i1 %c			ret i1 %c
	}			}


	define i1 @ashr_uge_noexact(i8 %x) {			define i1 @ashr_uge_noexact(i8 %x) {
	; CHECK-LABEL: @ashr_uge_noexact(			; CHECK-LABEL: @ashr_uge_noexact(
	; CHECK-NEXT: [[S:%.]] = ashr i8 [[X:%.]], 3			; CHECK-NEXT: [[C:%.]] = icmp ugt i8 [[X:%.]], 79
	; CHECK-NEXT: [[C:%.*]] = icmp ugt i8 [[S]], 9
	; CHECK-NEXT: ret i1 [[C]]			; CHECK-NEXT: ret i1 [[C]]
	;			;
	%s = ashr i8 %x, 3			%s = ashr i8 %x, 3
	%c = icmp uge i8 %s, 10			%c = icmp uge i8 %s, 10
	ret i1 %c			ret i1 %c
	}			}

	define i1 @ashr_ult_noexact(i8 %x) {			define i1 @ashr_ult_noexact(i8 %x) {
	▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[C:%.]] = icmp ult <4 x i8> [[X:%.]], <i8 88, i8 88, i8 88, i8 88>			; CHECK-NEXT: [[C:%.]] = icmp ult <4 x i8> [[X:%.]], <i8 88, i8 88, i8 88, i8 88>
	; CHECK-NEXT: ret <4 x i1> [[C]]			; CHECK-NEXT: ret <4 x i1> [[C]]
	;			;
	%s = ashr exact <4 x i8> %x, <i8 3,i8 3, i8 3, i8 3>			%s = ashr exact <4 x i8> %x, <i8 3,i8 3, i8 3, i8 3>
	%c = icmp ule <4 x i8> %s, <i8 10,i8 10,i8 10,i8 10>			%c = icmp ule <4 x i8> %s, <i8 10,i8 10,i8 10,i8 10>
	ret <4 x i1> %c			ret <4 x i1> %c
	}			}

				define i1 @ashr_sgt_overflow(i8 %x) {
				; CHECK-LABEL: @ashr_sgt_overflow(
				; CHECK-NEXT: ret i1 false
				;
				%s = ashr i8 %x, 1
				%c = icmp sgt i8 %s, 63
				ret i1 %c
				}

	define i1 @lshrult_01_00_exact(i4 %x) {			define i1 @lshrult_01_00_exact(i4 %x) {
	; CHECK-LABEL: @lshrult_01_00_exact(			; CHECK-LABEL: @lshrult_01_00_exact(
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%s = lshr exact i4 %x, 1			%s = lshr exact i4 %x, 1
	%c = icmp ult i4 %s, 0			%c = icmp ult i4 %s, 0
	ret i1 %c			ret i1 %c
	}			}
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-shr.ll

	Show First 20 Lines • Show All 568 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[R:%.]] = icmp ugt i4 [[X:%.]], 1			; CHECK-NEXT: [[R:%.]] = icmp ugt i4 [[X:%.]], 1
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = ashr i4 %x, 1			%s = ashr i4 %x, 1
	%r = icmp ugt i4 %s, 0 ; 0b0000			%r = icmp ugt i4 %s, 0 ; 0b0000
	ret i1 %r			ret i1 %r
	}			}

	; negative test

	define i1 @ashr_ugt_1(i4 %x) {			define i1 @ashr_ugt_1(i4 %x) {
	; CHECK-LABEL: @ashr_ugt_1(			; CHECK-LABEL: @ashr_ugt_1(
	; CHECK-NEXT: [[S:%.]] = ashr i4 [[X:%.]], 1			; CHECK-NEXT: [[R:%.]] = icmp ugt i4 [[X:%.]], 3
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i4 [[S]], 1
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = ashr i4 %x, 1			%s = ashr i4 %x, 1
	%r = icmp ugt i4 %s, 1 ; 0b0001			%r = icmp ugt i4 %s, 1 ; 0b0001
	ret i1 %r			ret i1 %r
	}			}

	; negative test

	define i1 @ashr_ugt_2(i4 %x) {			define i1 @ashr_ugt_2(i4 %x) {
	; CHECK-LABEL: @ashr_ugt_2(			; CHECK-LABEL: @ashr_ugt_2(
	; CHECK-NEXT: [[S:%.]] = ashr i4 [[X:%.]], 1			; CHECK-NEXT: [[R:%.]] = icmp ugt i4 [[X:%.]], 5
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i4 [[S]], 2
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = ashr i4 %x, 1			%s = ashr i4 %x, 1
	%r = icmp ugt i4 %s, 2 ; 0b0010			%r = icmp ugt i4 %s, 2 ; 0b0010
	ret i1 %r			ret i1 %r
	}			}

	; negative test			; negative test
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[R:%.]] = icmp slt i4 [[X:%.]], 0			; CHECK-NEXT: [[R:%.]] = icmp slt i4 [[X:%.]], 0
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = ashr i4 %x, 1			%s = ashr i4 %x, 1
	%r = icmp ugt i4 %s, 11 ; 0b1011			%r = icmp ugt i4 %s, 11 ; 0b1011
	ret i1 %r			ret i1 %r
	}			}

	; negative test

	define i1 @ashr_ugt_12(i4 %x) {			define i1 @ashr_ugt_12(i4 %x) {
	; CHECK-LABEL: @ashr_ugt_12(			; CHECK-LABEL: @ashr_ugt_12(
	; CHECK-NEXT: [[S:%.]] = ashr i4 [[X:%.]], 1			; CHECK-NEXT: [[R:%.]] = icmp ugt i4 [[X:%.]], -7
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i4 [[S]], -4
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = ashr i4 %x, 1			%s = ashr i4 %x, 1
	%r = icmp ugt i4 %s, 12 ; 0b1100			%r = icmp ugt i4 %s, 12 ; 0b1100
	ret i1 %r			ret i1 %r
	}			}

	; negative test

	define i1 @ashr_ugt_13(i4 %x) {			define i1 @ashr_ugt_13(i4 %x) {
	; CHECK-LABEL: @ashr_ugt_13(			; CHECK-LABEL: @ashr_ugt_13(
	; CHECK-NEXT: [[S:%.]] = ashr i4 [[X:%.]], 1			; CHECK-NEXT: [[R:%.]] = icmp ugt i4 [[X:%.]], -5
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i4 [[S]], -3
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = ashr i4 %x, 1			%s = ashr i4 %x, 1
	%r = icmp ugt i4 %s, 13 ; 0b1101			%r = icmp ugt i4 %s, 13 ; 0b1101
	ret i1 %r			ret i1 %r
	}			}

	; negative test, but different transform possible			; negative test, but different transform possible
	▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines