This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix for PR39060
ClosedPublic

Authored by samparker on Sep 25 2018, 6:29 AM.

Download Raw Diff

Details

Reviewers

hans
SjoerdMeijer
javed.absar

Commits

rG75aca9409356: [ARM] Fix for PR39060
rL343092: [ARM] Fix for PR39060

Summary

When calculating whether a value can safely overflow for use by an icmp, we weren't checking that the value couldn't wrap around. To do this we need the icmp to be using a constant, as well as the incoming add or sub.

bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060

Diff Detail

Repository: rL LLVM

Event Timeline

samparker created this revision.Sep 25 2018, 6:29 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 25 2018, 6:29 AM

Herald added subscribers: chrib, kristof.beyls, srhines. · View Herald Transcript

SjoerdMeijer added inline comments.Sep 25 2018, 8:29 AM

lib/Target/ARM/ARMCodeGenPrepare.cpp
249 ↗	(On Diff #166868)	Thanks for commenting on this, this is very useful! A few nits below.
253 ↗	(On Diff #166868)	can you specify what exactly the "overflowing value" in the pattern is?
254 ↗	(On Diff #166868)	and similarly, what the "underflowing instruction" is?
255 ↗	(On Diff #166868)	Thinking out loud.... I think we could be a bit more precise. Something along the lines of: We are pattern matching a sequence of 2 instructions: Instruction I is and add or sub, with a LHS, and a RHS that should be a constant C1, and User of I is a compare instruction CI: the LHS is instruction I, the RHS a constant C2. Instruction I needs to be decreasing, which is the case when instruction I as an add and C1 is negative or when I is a sub and C1 positive, because <insert reason here>. To determine whether wrapping occurs, we calculate a new constant C3: C3 = abs(C1) + C2, where C1 and C2 are zero-extend to 32 bits if necessary. We define a new constant: Max = .... And overflow in C3 is okay if: ... ...

Thanks for suggestions Sjoerd. I've evidently had a difficult time of wrapping (pun intended) my head around this and really should have put some comments up before. Hopefully I've now also illustrated when and how we can use this.

Thanks for clarifying! Looks good, with only a nit.

lib/Target/ARM/ARMCodeGenPrepare.cpp
274 ↗	(On Diff #166952)	This example plugs in numbers to show that evaluating this in different types gives different results. Perhaps a quick example too how we calculate that overflow might happen changing the result: %a - 2 <= 254 %a <= 254 + 2 %a <= 256 we find that 256 does not fit in i8, and therefore we say this is not safe to promote. And perhaps a note somewhere that this could all be generalised and we could be accepting the general case: %a - %b <= %c if we have (accurate) value range analysis available and a, b, or c are variables; I have never look at valuerange in LLVM though, and of course this definitely something for a follow up.

This revision is now accepted and ready to land.Sep 26 2018, 2:47 AM

Closed by commit rL343092: [ARM] Fix for PR39060 (authored by sam_parker). · Explain WhySep 26 2018, 3:57 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

ARM/

ARMCodeGenPrepare.cpp

131 lines

test/

CodeGen/

ARM/

arm-cgp-overflow.ll

133 lines

arm-cgp-signed-icmps.ll

25 lines

pr39060.ll

33 lines

Diff 167097

llvm/trunk/lib/Target/ARM/ARMCodeGenPrepare.cpp

Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines

/// Return whether the instruction can be promoted within any modifications to		/// Return whether the instruction can be promoted within any modifications to
/// it's operands or result.		/// it's operands or result.
static bool isSafeOverflow(Instruction *I) {		static bool isSafeOverflow(Instruction *I) {
// FIXME Do we need NSW too?		// FIXME Do we need NSW too?
if (isa<OverflowingBinaryOperator>(I) && I->hasNoUnsignedWrap())		if (isa<OverflowingBinaryOperator>(I) && I->hasNoUnsignedWrap())
return true;		return true;

		// We can support a, potentially, overflowing instruction (I) if:
		// - It is only used by an unsigned icmp.
		// - The icmp uses a constant.
		// - The overflowing value (I) is decreasing, i.e would underflow - wrapping
		// around zero to become a larger number than before.
		// - The underflowing instruction (I) also uses a constant.
		//
		// We can then use the two constants to calculate whether the result would
		// wrap in respect to itself in the original bitwidth. If it doesn't wrap,
		// just underflows the range, the icmp would give the same result whether the
		// result has been truncated or not. We calculate this by:
		// - Zero extending both constants, if needed, to 32-bits.
		// - Take the absolute value of I's constant, adding this to the icmp const.
		// - Check that this value is not out of range for small type. If it is, it
		// means that it has underflowed enough to wrap around the icmp constant.
		//
		// For example:
		//
		// %sub = sub i8 %a, 2
		// %cmp = icmp ule i8 %sub, 254
		//
		// If %a = 0, %sub = -2 == FE == 254
		// But if this is evalulated as a i32
		// %sub = -2 == FF FF FF FE == 4294967294
		// So the unsigned compares (i8 and i32) would not yield the same result.
		//
		// Another way to look at it is:
		// %a - 2 <= 254
		// %a + 2 <= 254 + 2
		// %a <= 256
		// And we can't represent 256 in the i8 format, so we don't support it.
		//
		// Whereas:
		//
		// %sub i8 %a, 1
		// %cmp = icmp ule i8 %sub, 254
		//
		// If %a = 0, %sub = -1 == FF == 255
		// As i32:
		// %sub = -1 == FF FF FF FF == 4294967295
		//
		// In this case, the unsigned compare results would be the same and this
		// would also be true for ult, uge and ugt:
		// - (255 < 254) == (0xFFFFFFFF < 254) == false
		// - (255 <= 254) == (0xFFFFFFFF <= 254) == false
		// - (255 > 254) == (0xFFFFFFFF > 254) == true
		// - (255 >= 254) == (0xFFFFFFFF >= 254) == true
		//
		// To demonstrate why we can't handle increasing values:
		//
		// %add = add i8 %a, 2
		// %cmp = icmp ult i8 %add, 127
		//
		// If %a = 254, %add = 256 == (i8 1)
		// As i32:
		// %add = 256
		//
		// (1 < 127) != (256 < 127)

unsigned Opc = I->getOpcode();		unsigned Opc = I->getOpcode();
if (Opc == Instruction::Add \|\| Opc == Instruction::Sub) {		if (Opc != Instruction::Add && Opc != Instruction::Sub)
// We don't care if the add or sub could wrap if the value is decreasing		return false;
// and is only being used by an unsigned compare.
if (!I->hasOneUse() \|\|		if (!I->hasOneUse() \|\|
!isa<ICmpInst>(*I->user_begin()) \|\|		!isa<ICmpInst>(*I->user_begin()) \|\|
!isa<ConstantInt>(I->getOperand(1)))		!isa<ConstantInt>(I->getOperand(1)))
return false;		return false;

auto CI = cast<ICmpInst>(I->user_begin());		ConstantInt *OverflowConst = cast<ConstantInt>(I->getOperand(1));
		bool NegImm = OverflowConst->isNegative();
		bool IsDecreasing = ((Opc == Instruction::Sub) && !NegImm) \|\|
		((Opc == Instruction::Add) && NegImm);
		if (!IsDecreasing)
		return false;

// Don't support an icmp that deals with sign bits, including negative		// Don't support an icmp that deals with sign bits.
// immediates		auto CI = cast<ICmpInst>(I->user_begin());
if (CI->isSigned())		if (CI->isSigned() \|\| CI->isEquality())
return false;		return false;

		ConstantInt *ICmpConst = nullptr;
if (auto *Const = dyn_cast<ConstantInt>(CI->getOperand(0)))		if (auto *Const = dyn_cast<ConstantInt>(CI->getOperand(0)))
if (Const->isNegative())		ICmpConst = Const;
		else if (auto *Const = dyn_cast<ConstantInt>(CI->getOperand(1)))
		ICmpConst = Const;
		else
return false;		return false;

if (auto *Const = dyn_cast<ConstantInt>(CI->getOperand(1)))		// Now check that the result can't wrap on itself.
if (Const->isNegative())		APInt Total = ICmpConst->getValue().getBitWidth() < 32 ?
return false;		ICmpConst->getValue().zext(32) : ICmpConst->getValue();

bool NegImm = cast<ConstantInt>(I->getOperand(1))->isNegative();		Total += OverflowConst->getValue().getBitWidth() < 32 ?
bool IsDecreasing = ((Opc == Instruction::Sub) && !NegImm) \|\|		OverflowConst->getValue().abs().zext(32) : OverflowConst->getValue().abs();
((Opc == Instruction::Add) && NegImm);
if (!IsDecreasing)		APInt Max = APInt::getAllOnesValue(ARMCodeGenPrepare::TypeSize);

		if (Total.getBitWidth() > Max.getBitWidth()) {
		if (Total.ugt(Max.zext(Total.getBitWidth())))
		return false;
		} else if (Max.getBitWidth() > Total.getBitWidth()) {
		if (Total.zext(Max.getBitWidth()).ugt(Max))
		return false;
		} else if (Total.ugt(Max))
return false;		return false;

LLVM_DEBUG(dbgs() << "ARM CGP: Allowing safe overflow for " << *I << "\n");		LLVM_DEBUG(dbgs() << "ARM CGP: Allowing safe overflow for " << *I << "\n");
return true;		return true;
}		}

return false;
}

static bool shouldPromote(Value *V) {		static bool shouldPromote(Value *V) {
if (!isa<IntegerType>(V->getType()) \|\| isSink(V))		if (!isa<IntegerType>(V->getType()) \|\| isSink(V))
return false;		return false;

if (isSource(V))		if (isSource(V))
return true;		return true;

auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	void IRPromoter::Mutate(Type *OrigTy,
// as insert any intrinsics.		// as insert any intrinsics.
for (auto *V : Visited) {		for (auto *V : Visited) {
if (Sources.count(V))		if (Sources.count(V))
continue;		continue;

if (!shouldPromote(V) \|\| isPromotedResultSafe(V))		if (!shouldPromote(V) \|\| isPromotedResultSafe(V))
continue;		continue;

		assert(EnableDSP && "DSP intrinisc insertion not enabled!");

// Replace unsafe instructions with appropriate intrinsic calls.		// Replace unsafe instructions with appropriate intrinsic calls.
InsertDSPIntrinsic(cast<Instruction>(V));		InsertDSPIntrinsic(cast<Instruction>(V));
}		}

auto InsertTrunc = [&](Value V) -> Instruction {		auto InsertTrunc = [&](Value V) -> Instruction {
if (!isa<Instruction>(V) \|\| !isa<IntegerType>(V->getType()))		if (!isa<Instruction>(V) \|\| !isa<IntegerType>(V->getType()))
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 311 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/arm-cgp-overflow.ll

	; RUN: llc -mtriple=thumbv8.main -mcpu=cortex-m33 %s -arm-disable-cgp=false -o - \| FileCheck %s			; RUN: llc -mtriple=thumbv8m.main -mcpu=cortex-m33 %s -arm-disable-cgp=false -o - \| FileCheck %s

	; CHECK: overflow_add			; CHECK: overflow_add
	; CHECK: add			; CHECK: add
	; CHECK: uxth			; CHECK: uxth
	; CHECK: cmp			; CHECK: cmp
	define zeroext i16 @overflow_add(i16 zeroext %a, i16 zeroext %b) {			define zeroext i16 @overflow_add(i16 zeroext %a, i16 zeroext %b) {
	%add = add i16 %a, %b			%add = add i16 %a, %b
	%or = or i16 %add, 1			%or = or i16 %add, 1
	Show All 32 Lines
	; CHECK-COMMON: cmp			; CHECK-COMMON: cmp
	define zeroext i16 @overflow_shl(i16 zeroext %a, i16 zeroext %b) {			define zeroext i16 @overflow_shl(i16 zeroext %a, i16 zeroext %b) {
	%add = shl i16 %a, %b			%add = shl i16 %a, %b
	%or = or i16 %add, 1			%or = or i16 %add, 1
	%cmp = icmp ugt i16 %or, 1024			%cmp = icmp ugt i16 %or, 1024
	%res = select i1 %cmp, i16 2, i16 5			%res = select i1 %cmp, i16 2, i16 5
	ret i16 %res			ret i16 %res
	}			}

				; CHECK-LABEL: overflow_add_no_consts:
				; CHECK: add r0, r1
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], r2
				; CHECK: movhi r0, #8
				define i32 @overflow_add_no_consts(i8 zeroext %a, i8 zeroext %b, i8 zeroext %limit) {
				%add = add i8 %a, %b
				%cmp = icmp ugt i8 %add, %limit
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: overflow_add_const_limit:
				; CHECK: add r0, r1
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], #128
				; CHECK: movhi r0, #8
				define i32 @overflow_add_const_limit(i8 zeroext %a, i8 zeroext %b) {
				%add = add i8 %a, %b
				%cmp = icmp ugt i8 %add, 128
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: overflow_add_positive_const_limit:
				; CHECK: adds r0, #1
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], #128
				; CHECK: movhi r0, #8
				define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
				%add = add i8 %a, 1
				%cmp = icmp ugt i8 %add, 128
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: unsafe_add_underflow:
				; CHECK: subs r0, #2
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], #255
				; CHECK: moveq r0, #8
				define i32 @unsafe_add_underflow(i8 zeroext %a) {
				%add = add i8 %a, -2
				%cmp = icmp ugt i8 %add, 254
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: safe_add_underflow:
				; CHECK: subs [[MINUS_1:r[0-9]+]], r0, #1
				; CHECK-NOT: uxtb
				; CHECK: cmp [[MINUS_1]], #254
				; CHECK: movhi r0, #8
				define i32 @safe_add_underflow(i8 zeroext %a) {
				%add = add i8 %a, -1
				%cmp = icmp ugt i8 %add, 254
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: safe_add_underflow_neg:
				; CHECK: subs [[MINUS_1:r[0-9]+]], r0, #2
				; CHECK-NOT: uxtb
				; CHECK: cmp [[MINUS_1]], #251
				; CHECK: movlo r0, #8
				define i32 @safe_add_underflow_neg(i8 zeroext %a) {
				%add = add i8 %a, -2
				%cmp = icmp ule i8 %add, -6
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: overflow_sub_negative_const_limit:
				; CHECK: adds r0, #1
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], #128
				; CHECK: movhi r0, #8
				define i32 @overflow_sub_negative_const_limit(i8 zeroext %a) {
				%sub = sub i8 %a, -1
				%cmp = icmp ugt i8 %sub, 128
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: unsafe_sub_underflow:
				; CHECK: subs r0, #6
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], #250
				; CHECK: movhi r0, #8
				define i32 @unsafe_sub_underflow(i8 zeroext %a) {
				%sub = sub i8 %a, 6
				%cmp = icmp ugt i8 %sub, 250
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: safe_sub_underflow:
				; CHECK: subs [[MINUS_1:r[0-9]+]], r0, #1
				; CHECK-NOT: uxtb
				; CHECK: cmp [[MINUS_1]], #255
				; CHECK: movlo r0, #8
				define i32 @safe_sub_underflow(i8 zeroext %a) {
				%sub = sub i8 %a, 1
				%cmp = icmp ule i8 %sub, 254
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK-LABEL: safe_sub_underflow_neg
				; CHECK: subs [[MINUS_1:r[0-9]+]], r0, #4
				; CHECK-NOT: uxtb
				; CHECK: cmp [[MINUS_1]], #250
				; CHECK: movhi r0, #8
				define i32 @safe_sub_underflow_neg(i8 zeroext %a) {
				%sub = sub i8 %a, 4
				%cmp = icmp uge i8 %sub, -5
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

				; CHECK: subs r0, #4
				; CHECK: uxtb [[EXT:r[0-9]+]], r0
				; CHECK: cmp [[EXT]], #253
				; CHECK: movlo r0, #8
				define i32 @unsafe_sub_underflow_neg(i8 zeroext %a) {
				%sub = sub i8 %a, 4
				%cmp = icmp ult i8 %sub, -3
				%res = select i1 %cmp, i32 8, i32 16
				ret i32 %res
				}

llvm/trunk/test/CodeGen/ARM/arm-cgp-signed-icmps.ll

	; RUN: llc -mtriple=thumbv8.main -mcpu=cortex-m33 -arm-disable-cgp=false -mattr=-use-misched %s -o - \| FileCheck %s --check-prefix=CHECK-COMMON --check-prefix=CHECK-NODSP			; RUN: llc -mtriple=thumbv8m.main -mcpu=cortex-m33 -arm-disable-cgp=false -mattr=-use-misched %s -o - \| FileCheck %s --check-prefix=CHECK-COMMON --check-prefix=CHECK-NODSP
	; RUN: llc -mtriple=thumbv7em %s -arm-disable-cgp=false -arm-enable-scalar-dsp=true -o - \| FileCheck %s --check-prefix=CHECK-COMMON --check-prefix=CHECK-DSP			; RUN: llc -mtriple=thumbv7em %s -arm-disable-cgp=false -arm-enable-scalar-dsp=true -o - \| FileCheck %s --check-prefix=CHECK-COMMON --check-prefix=CHECK-DSP
	; RUN: llc -mtriple=thumbv8 %s -arm-disable-cgp=false -arm-enable-scalar-dsp=true -arm-enable-scalar-dsp-imms=true -o - \| FileCheck %s --check-prefix=CHECK-COMMON --check-prefix=CHECK-DSP-IMM			; RUN: llc -mtriple=thumbv8 %s -arm-disable-cgp=false -arm-enable-scalar-dsp=true -arm-enable-scalar-dsp-imms=true -o - \| FileCheck %s --check-prefix=CHECK-COMMON --check-prefix=CHECK-DSP-IMM

	; CHECK-COMMON-LABEL: eq_sgt			; CHECK-COMMON-LABEL: eq_sgt
	; CHECK-NODSP: add			; CHECK-NODSP: add
	; CHECK-NODSP: uxtb			; CHECK-NODSP: uxtb
	; CHECK-NODSP: sxtb			; CHECK-NODSP: sxtb
	; CHECK-NODSP: cmp			; CHECK-NODSP: cmp
	Show All 32 Lines
	; CHECK-NODSP: sub			; CHECK-NODSP: sub
	; CHECK-NODSP: sxth			; CHECK-NODSP: sxth
	; CHECK-NODSP: uxth			; CHECK-NODSP: uxth
	; CHECK-NODSP: add			; CHECK-NODSP: add
	; CHECK-NODSP: sxth			; CHECK-NODSP: sxth
	; CHECK-NODSP: cmp			; CHECK-NODSP: cmp
	; CHECK-NODSP: cmp			; CHECK-NODSP: cmp

	; CHECK-DSP: sxth [[ARG:r[0-9]+]], r2			; CHECK-DSP: sub
	; CHECK-DSP: subs [[SUB:r[0-9]+]],			; CHECK-DSP: sxth
	; CHECK-DSP: uadd16 [[ADD:r[0-9]+]],			; CHECK-DSP: add
	; CHECK-DSP: sxth.w [[SEXT:r[0-9]+]], [[ADD]]			; CHECK-DSP: uxth
	; CHECK-DSP: cmp [[SEXT]], [[ARG]]			; CHECK-DSP: sxth
	; CHECK-DSP-NOT: uxt			; CHECK-DSP: cmp
	; CHECK-DSP: cmp [[SUB]], r2			; CHECK-DSP: cmp

				; CHECK-DSP-IMM: sxth [[ARG:r[0-9]+]], r2
				; CHECK-DSP-IMM: uadd16 [[ADD:r[0-9]+]],
				; CHECK-DSP-IMM: sxth.w [[SEXT:r[0-9]+]], [[ADD]]
				; CHECK-DSP-IMM: cmp [[SEXT]], [[ARG]]
				; CHECK-DSP-IMM-NOT: uxt
				; CHECK-DSP-IMM: movs [[ONE:r[0-9]+]], #1
				; CHECK-DSP-IMM: usub16 [[SUB:r[0-9]+]], r1, [[ONE]]
				; CHECK-DSP-IMM: cmp [[SUB]], r2
	define i16 @ugt_slt(i16 *%x, i16 zeroext %y, i16 zeroext %z) {			define i16 @ugt_slt(i16 *%x, i16 zeroext %y, i16 zeroext %z) {
	entry:			entry:
	%load0 = load i16, i16* %x, align 1			%load0 = load i16, i16* %x, align 1
	%add = add i16 %load0, %z			%add = add i16 %load0, %z
	%sub = sub i16 %y, 1			%sub = sub i16 %y, 1
	%cmp = icmp slt i16 %add, %z			%cmp = icmp slt i16 %add, %z
	%cmp1 = icmp ugt i16 %sub, %z			%cmp1 = icmp ugt i16 %sub, %z
	%res0 = select i1 %cmp, i16 35, i16 -1			%res0 = select i1 %cmp, i16 35, i16 -1
	Show All 36 Lines

llvm/trunk/test/CodeGen/ARM/pr39060.ll

				; RUN: llc -mtriple=armv7a-linux-androideabi %s -o - \| FileCheck %s

				@a = local_unnamed_addr global i16 -1, align 2
				@b = local_unnamed_addr global i16 0, align 2

				; CHECK-LABEL: pr39060:
				; CHECK: ldrh
				; CHECK: ldrh
				; CHECK: sub
				; CHECK: uxth
				define void @pr39060() local_unnamed_addr #0 {
				entry:
				%0 = load i16, i16* @a, align 2
				%1 = load i16, i16* @b, align 2
				%sub = add i16 %1, -1
				%cmp = icmp eq i16 %0, %sub
				br i1 %cmp, label %if.else, label %if.then

				if.then:
				tail call void bitcast (void (...)* @f to void ()*)() #2
				br label %if.end

				if.else:
				tail call void bitcast (void (...)* @g to void ()*)() #2
				br label %if.end

				if.end:
				ret void
				}

				declare void @f(...) local_unnamed_addr #1

				declare void @g(...) local_unnamed_addr #1