This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
cgp-usubo.ll
-
lsr-loop-exit-cond.ll
-
Transforms/CodeGenPrepare/X86/
-
CodeGenPrepare/
-
X86/
-
overflow-intrinsics.ll

Differential D57789

[CGP] form usub with overflow from sub+icmp
ClosedPublic

Authored by spatel on Feb 5 2019, 2:27 PM.

Download Raw Diff

Details

Reviewers

nikic
fhahn
hfinkel
nemanjai
RKSimon
efriedma
arsenm
nhaehnle

Commits

rGd8b4efcb6b4a: [CGP] form usub with overflow from sub+icmp
rL354298: [CGP] form usub with overflow from sub+icmp

Summary

The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
...and those are shown in the IR test file and x86 codegen file.

Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
I replicated the codegen tests for AArch64 to show that forming usubo should be generally good (and that lines up with the existing uaddo sibling transform).

There's a potential regression seen in the AMDGPU test file when trying to form a SAD op though, so I added a hack to try to avoid that.
If I'm seeing the PPC changes correctly, this is a small improvement on those tests.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Feb 5 2019, 2:27 PM

Herald added subscribers: jsji, kristof.beyls, tpr and 4 others. · View Herald TranscriptFeb 5 2019, 2:27 PM

Patch updated:
No code changes, but the previous rev had a glitch in the IR test file.

dmgreen added a subscriber: dmgreen.Feb 6 2019, 4:49 AM

I'll apply this and try it to see what is changed in the test cases. I'll report back when that's done.

test/CodeGen/PowerPC/bdzlr.ll
57 ↗	(On Diff #185419)	I have to look at the entire codegen here a little more closely. The reason I think this requires deeper investigation is that it is changing a branch from the CTR-decrementing `bdzlr` to the `bnelr` that doesn't touch the CTR. So I want to make sure we are not preventing the formation of CTR loops by this change.

spatel marked an inline comment as done.Feb 8 2019, 6:36 AM

spatel added inline comments.

test/CodeGen/PowerPC/bdzlr.ll
57 ↗	(On Diff #185419)	Thanks. If this looks like a regression, then we're probably better off adding a TLI hook to guard against this (and the existing UADDO) transforms. That would let me remove the SAD hack and allow targets to enable this at their convenience. D57833 also suggests that might be the better (more conservative) way forward. @dmgreen @arsenm @nhaehnle - any preference/thoughts about that?

Hello. I ran some benchmarks and they were kind of all over the place, but on average a little down. The Arm backend seems to be quite opinionated about the uaddo's, using it's own nodes for a lot of things, so there might be some missing parts. There were enough ups in there to make us think that this can be good (and can show areas of improvement like in D57833), but we may need a few fixes first. Putting it behind a target hook in the meantime whilst we take a look does sound sensible.

I haven't ran any AArch64 tests, just Arm (we had some infrastructure failures that need some emergency fixes). It may be better there already.

Patch updated:
Side-step controversy by adding a TLI hook. This should preserve existing behavior for uaddo and usubo for all targets, except x86 which now allows usubo formation. It seems likely that other targets like AArch64, AMDGPU, and PPC should allow usubo too, but they can do that selectively as follow-ups.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 10:28 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Thanks. We will try and figure out some of the changes, see if we can get this enabled.

I'l otherwise leave this review to people more familiar with x86 :)

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 4:15 AM

Independent of this patch, but just so everyone's aware - there's currently no consistency in the way we transform to the overflow intrinsics. We may transform to sadd.with.overflow as an IR canonicalization (no target checks):
https://godbolt.org/z/2ajU23

I don't think there's any place in the IR optimizer or backend that creates ssub.with.overflow from raw ops at the moment.

Ping.

LGTM - cheers

This revision is now accepted and ready to land.Feb 18 2019, 9:25 AM

Closed by commit rL354298: [CGP] form usub with overflow from sub+icmp (authored by spatel). · Explain WhyFeb 18 2019, 3:35 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in rL354746: [CGP] add special-cases to form unsigned add with overflow (PR40486).Feb 24 2019, 7:30 AM

spatel mentioned this in rGcb04ba032f57: [CGP] add special-cases to form unsigned add with overflow (PR40486).

spatel mentioned this in D58872: [InstCombine] Start canonicalizing to uadd.sat and usub.sat.Mar 3 2019, 7:28 AM

This patch causes 5% regression of one of our eigen benchmarks on Haswell.

The problem is when it combines the CMP in a hot block with SUB in a cold block into a single SUB in hot block, on a two address architecture like x86, if the operand of CMP has other uses, it needs to make an extra COPY before the original CMP, so there is one more instruction in hot block.

Another patch r355823 papered over the problem in our code, but it didn't fix the root cause.

The regression is only observed on Haswell, it doesn't impact Skylake.

In D57789#1429714, @Carrot wrote:

This patch causes 5% regression of one of our eigen benchmarks on Haswell.

The problem is when it combines the CMP in a hot block with SUB in a cold block into a single SUB in hot block, on a two address architecture like x86, if the operand of CMP has other uses, it needs to make an extra COPY before the original CMP, so there is one more instruction in hot block.

Another patch r355823 papered over the problem in our code, but it didn't fix the root cause.

The regression is only observed on Haswell, it doesn't impact Skylake.

Thanks for letting me know. We could limit this transform based on profile metadata (or is there some other heuristic to determine that 1 block is hot and the other is cold?). In the absence of that information, I think this is the right theoretical optimization at this layer as shown in the improvements in the tests with this patch. If you have a test that shows the problem, I can take a look.

A bug https://bugs.llvm.org/show_bug.cgi?id=41129 is filed for the regression.
Thanks a lot for the investigation.
Please let me know if more information is required.

spatel mentioned this in D59602: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129).Mar 20 2019, 10:07 AM

spatel mentioned this in rL356665: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129).Mar 21 2019, 6:58 AM

spatel mentioned this in rGd47eac59efb1: [CodeGenPrepare] limit formation of overflow intrinsics (PR41129).

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

TargetLowering.h

17 lines

lib/

CodeGen/

CodeGenPrepare.cpp

91 lines

Target/

X86/

X86ISelLowering.h

5 lines

X86ISelLowering.cpp

7 lines

test/

CodeGen/

X86/

cgp-usubo.ll

45 lines

lsr-loop-exit-cond.ll

42 lines

Transforms/

CodeGenPrepare/

X86/

overflow-intrinsics.ll

81 lines

Diff 187278

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,433 Lines • ▼ Show 20 Lines	public:
}		}

/// Try to convert an extract element of a vector binary operation into an		/// Try to convert an extract element of a vector binary operation into an
/// extract element followed by a scalar operation.		/// extract element followed by a scalar operation.
virtual bool shouldScalarizeBinop(SDValue VecOp) const {		virtual bool shouldScalarizeBinop(SDValue VecOp) const {
return false;		return false;
}		}

		/// Try to convert math with an overflow comparison into the corresponding DAG
		/// node operation. Targets may want to override this independently of whether
		/// the operation is legal/custom for the given type because it may obscure
		/// matching of other patterns.
		virtual bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const {
		// TODO: The default logic is inherited from code in CodeGenPrepare.
		// The opcode should not make a difference by default?
		if (Opcode != ISD::UADDO)
		return false;

		// Allow the transform as long as we have an integer type that is not
		// obviously illegal and unsupported.
		if (VT.isVector())
		return false;
		return VT.isSimple() \|\| !isOperationExpand(Opcode, VT);
		}

// Return true if it is profitable to use a scalar input to a BUILD_VECTOR		// Return true if it is profitable to use a scalar input to a BUILD_VECTOR
// even if the vector itself has multiple uses.		// even if the vector itself has multiple uses.
virtual bool aggressivelyPreferBuildVectorSources(EVT VecVT) const {		virtual bool aggressivelyPreferBuildVectorSources(EVT VecVT) const {
return false;		return false;
}		}

// Return true if CodeGenPrepare should consider splitting large offset of a		// Return true if CodeGenPrepare should consider splitting large offset of a
// GEP to make the GEP fit into the addressing mode and can be sunk into the		// GEP to make the GEP fit into the addressing mode and can be sunk into the
▲ Show 20 Lines • Show All 1,500 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,156 Lines • ▼ Show 20 Lines	if (SrcVT != DstVT)
return false;		return false;

return SinkCast(CI);		return SinkCast(CI);
}		}

static void replaceMathCmpWithIntrinsic(BinaryOperator BO, CmpInst Cmp,		static void replaceMathCmpWithIntrinsic(BinaryOperator BO, CmpInst Cmp,
Instruction *InsertPt,		Instruction *InsertPt,
Intrinsic::ID IID) {		Intrinsic::ID IID) {
		Value *Arg0 = BO->getOperand(0);
		Value *Arg1 = BO->getOperand(1);

		// We allow matching the canonical IR (add X, C) back to (usubo X, -C).
		if (BO->getOpcode() == Instruction::Add &&
		IID == Intrinsic::usub_with_overflow) {
		assert(isa<Constant>(Arg1) && "Unexpected input for usubo");
		Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));
		}

IRBuilder<> Builder(InsertPt);		IRBuilder<> Builder(InsertPt);
Value *MathOV = Builder.CreateBinaryIntrinsic(IID, BO->getOperand(0),		Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);
BO->getOperand(1));
Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");		Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");
Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");		Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
BO->replaceAllUsesWith(Math);		BO->replaceAllUsesWith(Math);
Cmp->replaceAllUsesWith(OV);		Cmp->replaceAllUsesWith(OV);
BO->eraseFromParent();		BO->eraseFromParent();
Cmp->eraseFromParent();		Cmp->eraseFromParent();
}		}

/// Try to combine the compare into a call to the llvm.uadd.with.overflow		/// Try to combine the compare into a call to the llvm.uadd.with.overflow
/// intrinsic. Return true if any changes were made.		/// intrinsic. Return true if any changes were made.
static bool combineToUAddWithOverflow(CmpInst *Cmp, const TargetLowering &TLI,		static bool combineToUAddWithOverflow(CmpInst *Cmp, const TargetLowering &TLI,
const DataLayout &DL) {		const DataLayout &DL) {
Value A, B;		Value A, B;
BinaryOperator *Add;		BinaryOperator *Add;
if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B), m_BinOp(Add))))		if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B), m_BinOp(Add))))
return false;		return false;

// Allow the transform as long as we have an integer type that is not		if (!TLI.shouldFormOverflowOp(ISD::UADDO,
// obviously illegal and unsupported.		TLI.getValueType(DL, Add->getType())))
Type *Ty = Add->getType();
if (!isa<IntegerType>(Ty))
return false;
EVT CodegenVT = TLI.getValueType(DL, Ty);
if (!CodegenVT.isSimple() && TLI.isOperationExpand(ISD::UADDO, CodegenVT))
return false;		return false;

// We don't want to move around uses of condition values this late, so we		// We don't want to move around uses of condition values this late, so we
// check if it is legal to create the call to the intrinsic in the basic		// check if it is legal to create the call to the intrinsic in the basic
// block containing the icmp.		// block containing the icmp.
if (Add->getParent() != Cmp->getParent() && !Add->hasOneUse())		if (Add->getParent() != Cmp->getParent() && !Add->hasOneUse())
return false;		return false;

#ifndef NDEBUG		#ifndef NDEBUG
// Someday m_UAddWithOverflow may get smarter, but this is a safe assumption		// Someday m_UAddWithOverflow may get smarter, but this is a safe assumption
// for now:		// for now:
if (Add->hasOneUse())		if (Add->hasOneUse())
assert(*Add->user_begin() == Cmp && "expected!");		assert(*Add->user_begin() == Cmp && "expected!");
#endif		#endif

Instruction *InPt = Add->hasOneUse() ? cast<Instruction>(Cmp)		Instruction *InPt = Add->hasOneUse() ? cast<Instruction>(Cmp)
: cast<Instruction>(Add);		: cast<Instruction>(Add);
replaceMathCmpWithIntrinsic(Add, Cmp, InPt, Intrinsic::uadd_with_overflow);		replaceMathCmpWithIntrinsic(Add, Cmp, InPt, Intrinsic::uadd_with_overflow);
return true;		return true;
}		}

		static bool combineToUSubWithOverflow(CmpInst *Cmp, const TargetLowering &TLI,
		const DataLayout &DL, bool &ModifiedDT) {
		// Convert (A u> B) to (A u< B) to simplify pattern matching.
		Value A = Cmp->getOperand(0), B = Cmp->getOperand(1);
		ICmpInst::Predicate Pred = Cmp->getPredicate();
		if (Pred == ICmpInst::ICMP_UGT) {
		std::swap(A, B);
		Pred = ICmpInst::ICMP_ULT;
		}
		// Convert special-case: (A == 0) is the same as (A u< 1).
		if (Pred == ICmpInst::ICMP_EQ && match(B, m_ZeroInt())) {
		B = ConstantInt::get(B->getType(), 1);
		Pred = ICmpInst::ICMP_ULT;
		}
		if (Pred != ICmpInst::ICMP_ULT)
		return false;

		// Walk the users of a variable operand of a compare looking for a subtract or
		// add with that same operand. Also match the 2nd operand of the compare to
		// the add/sub, but that may be a negated constant operand of an add.
		Value *CmpVariableOperand = isa<Constant>(A) ? B : A;
		BinaryOperator *Sub = nullptr;
		for (User *U : CmpVariableOperand->users()) {
		// A - B, A u< B --> usubo(A, B)
		if (match(U, m_Sub(m_Specific(A), m_Specific(B)))) {
		Sub = cast<BinaryOperator>(U);
		break;
		}

		// A + (-C), A u< C (canonicalized form of (sub A, C))
		const APInt CmpC, AddC;
		if (match(U, m_Add(m_Specific(A), m_APInt(AddC))) &&
		match(B, m_APInt(CmpC)) && AddC == -(CmpC)) {
		Sub = cast<BinaryOperator>(U);
		break;
		}
		}
		if (!Sub)
		return false;

		if (!TLI.shouldFormOverflowOp(ISD::USUBO,
		TLI.getValueType(DL, Sub->getType())))
		return false;

		// Pattern matched and profitability checked. Check dominance to determine the
		// insertion point for an intrinsic that replaces the subtract and compare.
		DominatorTree DT(*Sub->getFunction());
		bool SubDominates = DT.dominates(Sub, Cmp);
		if (!SubDominates && !DT.dominates(Cmp, Sub))
		return false;
		Instruction *InPt = SubDominates ? cast<Instruction>(Sub)
		: cast<Instruction>(Cmp);
		replaceMathCmpWithIntrinsic(Sub, Cmp, InPt, Intrinsic::usub_with_overflow);
		// Reset callers - do not crash by iterating over a dead instruction.
		ModifiedDT = true;
		return true;
		}

/// Sink the given CmpInst into user blocks to reduce the number of virtual		/// Sink the given CmpInst into user blocks to reduce the number of virtual
/// registers that must be created and coalesced. This is a clear win except on		/// registers that must be created and coalesced. This is a clear win except on
/// targets with multiple condition code registers (PowerPC), where it might		/// targets with multiple condition code registers (PowerPC), where it might
/// lose; some adjustment may be wanted there.		/// lose; some adjustment may be wanted there.
///		///
/// Return true if any changes are made.		/// Return true if any changes are made.
static bool sinkCmpExpression(CmpInst *Cmp, const TargetLowering &TLI) {		static bool sinkCmpExpression(CmpInst *Cmp, const TargetLowering &TLI) {
if (TLI.hasMultipleConditionRegisters())		if (TLI.hasMultipleConditionRegisters())
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	static bool sinkCmpExpression(CmpInst *Cmp, const TargetLowering &TLI) {
if (Cmp->use_empty()) {		if (Cmp->use_empty()) {
Cmp->eraseFromParent();		Cmp->eraseFromParent();
MadeChange = true;		MadeChange = true;
}		}

return MadeChange;		return MadeChange;
}		}

static bool optimizeCmpExpression(CmpInst *Cmp, const TargetLowering &TLI,		static bool optimizeCmp(CmpInst *Cmp, const TargetLowering &TLI,
const DataLayout &DL) {		const DataLayout &DL, bool &ModifiedDT) {
if (sinkCmpExpression(Cmp, TLI))		if (sinkCmpExpression(Cmp, TLI))
return true;		return true;

if (combineToUAddWithOverflow(Cmp, TLI, DL))		if (combineToUAddWithOverflow(Cmp, TLI, DL))
return true;		return true;

		if (combineToUSubWithOverflow(Cmp, TLI, DL, ModifiedDT))
		return true;

return false;		return false;
}		}

/// Duplicate and sink the given 'and' instruction into user blocks where it is		/// Duplicate and sink the given 'and' instruction into user blocks where it is
/// used in a compare to allow isel to generate better code for targets where		/// used in a compare to allow isel to generate better code for targets where
/// this operation can be combined.		/// this operation can be combined.
///		///
/// Return true if any changes are made.		/// Return true if any changes are made.
▲ Show 20 Lines • Show All 5,470 Lines • ▼ Show 20 Lines	if (isa<ZExtInst>(I) \|\| isa<SExtInst>(I)) {
} else {		} else {
bool MadeChange = optimizeExt(I);		bool MadeChange = optimizeExt(I);
return MadeChange \| optimizeExtUses(I);		return MadeChange \| optimizeExtUses(I);
}		}
}		}
return false;		return false;
}		}

if (CmpInst *CI = dyn_cast<CmpInst>(I))		if (auto *Cmp = dyn_cast<CmpInst>(I))
if (TLI && optimizeCmpExpression(CI, TLI, DL))		if (TLI && optimizeCmp(Cmp, TLI, DL, ModifiedDT))
return true;		return true;

if (LoadInst *LI = dyn_cast<LoadInst>(I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
LI->setMetadata(LLVMContext::MD_invariant_group, nullptr);		LI->setMetadata(LLVMContext::MD_invariant_group, nullptr);
if (TLI) {		if (TLI) {
bool Modified = optimizeLoadExt(LI);		bool Modified = optimizeLoadExt(LI);
unsigned AS = LI->getPointerAddressSpace();		unsigned AS = LI->getPointerAddressSpace();
Modified \|= optimizeMemoryInst(I, I->getOperand(0), LI->getType(), AS);		Modified \|= optimizeMemoryInst(I, I->getOperand(0), LI->getType(), AS);
▲ Show 20 Lines • Show All 370 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,065 Lines • ▼ Show 20 Lines	public:
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

/// Scalar ops always have equal or better analysis/performance/power than		/// Scalar ops always have equal or better analysis/performance/power than
/// the vector equivalent, so this always makes sense if the scalar op is		/// the vector equivalent, so this always makes sense if the scalar op is
/// supported.		/// supported.
bool shouldScalarizeBinop(SDValue) const override;		bool shouldScalarizeBinop(SDValue) const override;

		/// Overflow nodes should get combined/lowered to optimal instructions
		/// (they should allow eliminating explicit compares by getting flags from
		/// math ops).
		bool shouldFormOverflowOp(unsigned Opcode, EVT VT) const override;

bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,		bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,
unsigned AddrSpace) const override {		unsigned AddrSpace) const override {
// If we can replace more than 2 scalar stores, there will be a reduction		// If we can replace more than 2 scalar stores, there will be a reduction
// in instructions even after we add a vector constant load.		// in instructions even after we add a vector constant load.
return NumElem > 2;		return NumElem > 2;
}		}

bool isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT) const override;		bool isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT) const override;
▲ Show 20 Lines • Show All 516 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,928 Lines • ▼ Show 20 Lines	if (!isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), VecVT))
return true;		return true;

// If the vector op is supported, but the scalar op is not, the transform may		// If the vector op is supported, but the scalar op is not, the transform may
// not be worthwhile.		// not be worthwhile.
EVT ScalarVT = VecVT.getScalarType();		EVT ScalarVT = VecVT.getScalarType();
return isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), ScalarVT);		return isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), ScalarVT);
}		}

		bool X86TargetLowering::shouldFormOverflowOp(unsigned Opcode, EVT VT) const {
		// TODO: Allow vectors?
		if (VT.isVector())
		return false;
		return VT.isSimple() \|\| !isOperationExpand(Opcode, VT);
		}

bool X86TargetLowering::isCheapToSpeculateCttz() const {		bool X86TargetLowering::isCheapToSpeculateCttz() const {
// Speculate cttz only if we can directly use TZCNT.		// Speculate cttz only if we can directly use TZCNT.
return Subtarget.hasBMI();		return Subtarget.hasBMI();
}		}

bool X86TargetLowering::isCheapToSpeculateCtlz() const {		bool X86TargetLowering::isCheapToSpeculateCtlz() const {
// Speculate ctlz only if we can directly use LZCNT.		// Speculate ctlz only if we can directly use LZCNT.
return Subtarget.hasLZCNT();		return Subtarget.hasLZCNT();
▲ Show 20 Lines • Show All 38,422 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cgp-usubo.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s

	; CodeGenPrepare is expected to form overflow intrinsics to improve DAG/isel.			; CodeGenPrepare is expected to form overflow intrinsics to improve DAG/isel.

	define i1 @usubo_ult_i64(i64 %x, i64 %y, i64* %p) nounwind {			define i1 @usubo_ult_i64(i64 %x, i64 %y, i64* %p) nounwind {
	; CHECK-LABEL: usubo_ult_i64:			; CHECK-LABEL: usubo_ult_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: subq %rsi, %rdi			; CHECK-NEXT: subq %rsi, %rdi
	; CHECK-NEXT: movq %rdi, (%rdx)
	; CHECK-NEXT: setb %al			; CHECK-NEXT: setb %al
				; CHECK-NEXT: movq %rdi, (%rdx)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%s = sub i64 %x, %y			%s = sub i64 %x, %y
	store i64 %s, i64* %p			store i64 %s, i64* %p
	%ov = icmp ult i64 %x, %y			%ov = icmp ult i64 %x, %y
	ret i1 %ov			ret i1 %ov
	}			}

	; Verify insertion point for single-BB. Toggle predicate.			; Verify insertion point for single-BB. Toggle predicate.

	define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) nounwind {			define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) nounwind {
	; CHECK-LABEL: usubo_ugt_i32:			; CHECK-LABEL: usubo_ugt_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: cmpl %edi, %esi
	; CHECK-NEXT: seta %al
	; CHECK-NEXT: subl %esi, %edi			; CHECK-NEXT: subl %esi, %edi
				; CHECK-NEXT: setb %al
	; CHECK-NEXT: movl %edi, (%rdx)			; CHECK-NEXT: movl %edi, (%rdx)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ov = icmp ugt i32 %y, %x			%ov = icmp ugt i32 %y, %x
	%s = sub i32 %x, %y			%s = sub i32 %x, %y
	store i32 %s, i32* %p			store i32 %s, i32* %p
	ret i1 %ov			ret i1 %ov
	}			}

	; Constant operand should match.			; Constant operand should match.

	define i1 @usubo_ugt_constant_op0_i8(i8 %x, i8* %p) nounwind {			define i1 @usubo_ugt_constant_op0_i8(i8 %x, i8* %p) nounwind {
	; CHECK-LABEL: usubo_ugt_constant_op0_i8:			; CHECK-LABEL: usubo_ugt_constant_op0_i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movb $42, %cl			; CHECK-NEXT: movb $42, %cl
	; CHECK-NEXT: subb %dil, %cl			; CHECK-NEXT: subb %dil, %cl
	; CHECK-NEXT: cmpb $42, %dil			; CHECK-NEXT: setb %al
	; CHECK-NEXT: seta %al
	; CHECK-NEXT: movb %cl, (%rsi)			; CHECK-NEXT: movb %cl, (%rsi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%s = sub i8 42, %x			%s = sub i8 42, %x
	%ov = icmp ugt i8 %x, 42			%ov = icmp ugt i8 %x, 42
	store i8 %s, i8* %p			store i8 %s, i8* %p
	ret i1 %ov			ret i1 %ov
	}			}

	; Compare with constant operand 0 is canonicalized by commuting, but verify match for non-canonical form.			; Compare with constant operand 0 is canonicalized by commuting, but verify match for non-canonical form.

	define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) nounwind {			define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) nounwind {
	; CHECK-LABEL: usubo_ult_constant_op0_i16:			; CHECK-LABEL: usubo_ult_constant_op0_i16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl $43, %ecx			; CHECK-NEXT: movw $43, %cx
	; CHECK-NEXT: subl %edi, %ecx			; CHECK-NEXT: subw %di, %cx
	; CHECK-NEXT: cmpw $43, %di			; CHECK-NEXT: setb %al
	; CHECK-NEXT: seta %al
	; CHECK-NEXT: movw %cx, (%rsi)			; CHECK-NEXT: movw %cx, (%rsi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%s = sub i16 43, %x			%s = sub i16 43, %x
	%ov = icmp ult i16 43, %x			%ov = icmp ult i16 43, %x
	store i16 %s, i16* %p			store i16 %s, i16* %p
	ret i1 %ov			ret i1 %ov
	}			}

	; Subtract with constant operand 1 is canonicalized to add.			; Subtract with constant operand 1 is canonicalized to add.

	define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) nounwind {			define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) nounwind {
	; CHECK-LABEL: usubo_ult_constant_op1_i16:			; CHECK-LABEL: usubo_ult_constant_op1_i16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %ecx			; CHECK-NEXT: subw $44, %di
	; CHECK-NEXT: addl $-44, %ecx
	; CHECK-NEXT: cmpw $44, %di
	; CHECK-NEXT: setb %al			; CHECK-NEXT: setb %al
	; CHECK-NEXT: movw %cx, (%rsi)			; CHECK-NEXT: movw %di, (%rsi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%s = add i16 %x, -44			%s = add i16 %x, -44
	%ov = icmp ult i16 %x, 44			%ov = icmp ult i16 %x, 44
	store i16 %s, i16* %p			store i16 %s, i16* %p
	ret i1 %ov			ret i1 %ov
	}			}

	define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) nounwind {			define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) nounwind {
	; CHECK-LABEL: usubo_ugt_constant_op1_i8:			; CHECK-LABEL: usubo_ugt_constant_op1_i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: cmpb $45, %dil			; CHECK-NEXT: subb $45, %dil
	; CHECK-NEXT: setb %al			; CHECK-NEXT: setb %al
	; CHECK-NEXT: addb $-45, %dil
	; CHECK-NEXT: movb %dil, (%rsi)			; CHECK-NEXT: movb %dil, (%rsi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ov = icmp ugt i8 45, %x			%ov = icmp ugt i8 45, %x
	%s = add i8 %x, -45			%s = add i8 %x, -45
	store i8 %s, i8* %p			store i8 %s, i8* %p
	ret i1 %ov			ret i1 %ov
	}			}

	; Special-case: subtract 1 changes the compare predicate and constant.			; Special-case: subtract 1 changes the compare predicate and constant.

	define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) nounwind {			define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) nounwind {
	; CHECK-LABEL: usubo_eq_constant1_op1_i32:			; CHECK-LABEL: usubo_eq_constant1_op1_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: # kill: def $edi killed $edi def $rdi			; CHECK-NEXT: subl $1, %edi
	; CHECK-NEXT: leal -1(%rdi), %ecx			; CHECK-NEXT: setb %al
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NEXT: movl %edi, (%rsi)
	; CHECK-NEXT: sete %al
	; CHECK-NEXT: movl %ecx, (%rsi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%s = add i32 %x, -1			%s = add i32 %x, -1
	%ov = icmp eq i32 %x, 0			%ov = icmp eq i32 %x, 0
	store i32 %s, i32* %p			store i32 %s, i32* %p
	ret i1 %ov			ret i1 %ov
	}			}

	; Verify insertion point for multi-BB.			; Verify insertion point for multi-BB.

	declare void @call(i1)			declare void @call(i1)

	define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) nounwind {			define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) nounwind {
	; CHECK-LABEL: usubo_ult_sub_dominates_i64:			; CHECK-LABEL: usubo_ult_sub_dominates_i64:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: testb $1, %cl			; CHECK-NEXT: testb $1, %cl
	; CHECK-NEXT: je .LBB7_2			; CHECK-NEXT: je .LBB7_2
	; CHECK-NEXT: # %bb.1: # %t			; CHECK-NEXT: # %bb.1: # %t
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: subq %rsi, %rdi
	; CHECK-NEXT: subq %rsi, %rax
	; CHECK-NEXT: movq %rax, (%rdx)
	; CHECK-NEXT: testb $1, %cl
	; CHECK-NEXT: je .LBB7_2
	; CHECK-NEXT: # %bb.3: # %end
	; CHECK-NEXT: cmpq %rsi, %rdi
	; CHECK-NEXT: setb %al			; CHECK-NEXT: setb %al
	; CHECK-NEXT: retq			; CHECK-NEXT: movq %rdi, (%rdx)
				; CHECK-NEXT: testb $1, %cl
				; CHECK-NEXT: jne .LBB7_3
	; CHECK-NEXT: .LBB7_2: # %f			; CHECK-NEXT: .LBB7_2: # %f
	; CHECK-NEXT: movl %ecx, %eax			; CHECK-NEXT: movl %ecx, %eax
				; CHECK-NEXT: .LBB7_3: # %end
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	br i1 %cond, label %t, label %f			br i1 %cond, label %t, label %f

	t:			t:
	%s = sub i64 %x, %y			%s = sub i64 %x, %y
	store i64 %s, i64* %p			store i64 %s, i64* %p
	br i1 %cond, label %end, label %f			br i1 %cond, label %end, label %f
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll

	Show All 10 Lines
	; GENERIC: ## %bb.0: ## %entry			; GENERIC: ## %bb.0: ## %entry
	; GENERIC-NEXT: pushq %rbp			; GENERIC-NEXT: pushq %rbp
	; GENERIC-NEXT: pushq %r14			; GENERIC-NEXT: pushq %r14
	; GENERIC-NEXT: pushq %rbx			; GENERIC-NEXT: pushq %rbx
	; GENERIC-NEXT: ## kill: def $ecx killed $ecx def $rcx			; GENERIC-NEXT: ## kill: def $ecx killed $ecx def $rcx
	; GENERIC-NEXT: movl (%rdx), %eax			; GENERIC-NEXT: movl (%rdx), %eax
	; GENERIC-NEXT: movl 4(%rdx), %ebx			; GENERIC-NEXT: movl 4(%rdx), %ebx
	; GENERIC-NEXT: decl %ecx			; GENERIC-NEXT: decl %ecx
	; GENERIC-NEXT: leaq 20(%rdx), %r14			; GENERIC-NEXT: leaq 20(%rdx), %r11
	; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9			; GENERIC-NEXT: movq _Te0@{{.*}}(%rip), %r9
	; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8			; GENERIC-NEXT: movq _Te1@{{.*}}(%rip), %r8
	; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10			; GENERIC-NEXT: movq _Te3@{{.*}}(%rip), %r10
	; GENERIC-NEXT: movq %rcx, %r11			; GENERIC-NEXT: movq %rcx, %r14
	; GENERIC-NEXT: jmp LBB0_1			; GENERIC-NEXT: jmp LBB0_1
	; GENERIC-NEXT: .p2align 4, 0x90			; GENERIC-NEXT: .p2align 4, 0x90
	; GENERIC-NEXT: LBB0_2: ## %bb1			; GENERIC-NEXT: LBB0_2: ## %bb1
	; GENERIC-NEXT: ## in Loop: Header=BB0_1 Depth=1			; GENERIC-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; GENERIC-NEXT: movl %edi, %ebx			; GENERIC-NEXT: movl %edi, %ebx
	; GENERIC-NEXT: shrl $16, %ebx			; GENERIC-NEXT: shrl $16, %ebx
	; GENERIC-NEXT: movzbl %bl, %ebx			; GENERIC-NEXT: movzbl %bl, %ebx
	; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax			; GENERIC-NEXT: xorl (%r8,%rbx,4), %eax
	; GENERIC-NEXT: xorl -4(%r14), %eax			; GENERIC-NEXT: xorl -4(%r11), %eax
	; GENERIC-NEXT: shrl $24, %edi			; GENERIC-NEXT: shrl $24, %edi
	; GENERIC-NEXT: movzbl %bpl, %ebx			; GENERIC-NEXT: movzbl %bpl, %ebx
	; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx			; GENERIC-NEXT: movl (%r10,%rbx,4), %ebx
	; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx			; GENERIC-NEXT: xorl (%r9,%rdi,4), %ebx
	; GENERIC-NEXT: xorl (%r14), %ebx			; GENERIC-NEXT: xorl (%r11), %ebx
	; GENERIC-NEXT: decq %r11			; GENERIC-NEXT: addq $16, %r11
	; GENERIC-NEXT: addq $16, %r14
	; GENERIC-NEXT: LBB0_1: ## %bb			; GENERIC-NEXT: LBB0_1: ## %bb
	; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1			; GENERIC-NEXT: ## =>This Inner Loop Header: Depth=1
	; GENERIC-NEXT: movzbl %al, %edi			; GENERIC-NEXT: movzbl %al, %edi
	; GENERIC-NEXT: ## kill: def $eax killed $eax def $rax			; GENERIC-NEXT: ## kill: def $eax killed $eax def $rax
	; GENERIC-NEXT: shrl $24, %eax			; GENERIC-NEXT: shrl $24, %eax
	; GENERIC-NEXT: movl %ebx, %ebp			; GENERIC-NEXT: movl %ebx, %ebp
	; GENERIC-NEXT: shrl $16, %ebp			; GENERIC-NEXT: shrl $16, %ebp
	; GENERIC-NEXT: movzbl %bpl, %ebp			; GENERIC-NEXT: movzbl %bpl, %ebp
	; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp			; GENERIC-NEXT: movl (%r8,%rbp,4), %ebp
	; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp			; GENERIC-NEXT: xorl (%r9,%rax,4), %ebp
	; GENERIC-NEXT: xorl -12(%r14), %ebp			; GENERIC-NEXT: xorl -12(%r11), %ebp
	; GENERIC-NEXT: shrl $24, %ebx			; GENERIC-NEXT: shrl $24, %ebx
	; GENERIC-NEXT: movl (%r10,%rdi,4), %edi			; GENERIC-NEXT: movl (%r10,%rdi,4), %edi
	; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi			; GENERIC-NEXT: xorl (%r9,%rbx,4), %edi
	; GENERIC-NEXT: xorl -8(%r14), %edi			; GENERIC-NEXT: xorl -8(%r11), %edi
	; GENERIC-NEXT: movl %ebp, %eax			; GENERIC-NEXT: movl %ebp, %eax
	; GENERIC-NEXT: shrl $24, %eax			; GENERIC-NEXT: shrl $24, %eax
	; GENERIC-NEXT: movl (%r9,%rax,4), %eax			; GENERIC-NEXT: movl (%r9,%rax,4), %eax
	; GENERIC-NEXT: testq %r11, %r11			; GENERIC-NEXT: subq $1, %r14
	; GENERIC-NEXT: jne LBB0_2			; GENERIC-NEXT: jae LBB0_2
	; GENERIC-NEXT: ## %bb.3: ## %bb2			; GENERIC-NEXT: ## %bb.3: ## %bb2
	; GENERIC-NEXT: shlq $4, %rcx			; GENERIC-NEXT: shlq $4, %rcx
	; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000			; GENERIC-NEXT: andl $-16777216, %eax ## imm = 0xFF000000
	; GENERIC-NEXT: movl %edi, %ebx			; GENERIC-NEXT: movl %edi, %ebx
	; GENERIC-NEXT: shrl $16, %ebx			; GENERIC-NEXT: shrl $16, %ebx
	; GENERIC-NEXT: movzbl %bl, %ebx			; GENERIC-NEXT: movzbl %bl, %ebx
	; GENERIC-NEXT: movzbl 2(%r8,%rbx,4), %ebx			; GENERIC-NEXT: movzbl 2(%r8,%rbx,4), %ebx
	; GENERIC-NEXT: shll $16, %ebx			; GENERIC-NEXT: shll $16, %ebx
	Show All 26 Lines
	; ATOM: ## %bb.0: ## %entry			; ATOM: ## %bb.0: ## %entry
	; ATOM-NEXT: pushq %rbp			; ATOM-NEXT: pushq %rbp
	; ATOM-NEXT: pushq %r15			; ATOM-NEXT: pushq %r15
	; ATOM-NEXT: pushq %r14			; ATOM-NEXT: pushq %r14
	; ATOM-NEXT: pushq %rbx			; ATOM-NEXT: pushq %rbx
	; ATOM-NEXT: ## kill: def $ecx killed $ecx def $rcx			; ATOM-NEXT: ## kill: def $ecx killed $ecx def $rcx
	; ATOM-NEXT: movl (%rdx), %r15d			; ATOM-NEXT: movl (%rdx), %r15d
	; ATOM-NEXT: movl 4(%rdx), %eax			; ATOM-NEXT: movl 4(%rdx), %eax
	; ATOM-NEXT: leaq 20(%rdx), %r14			; ATOM-NEXT: leaq 20(%rdx), %r11
	; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9			; ATOM-NEXT: movq _Te0@{{.*}}(%rip), %r9
	; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8			; ATOM-NEXT: movq _Te1@{{.*}}(%rip), %r8
	; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10			; ATOM-NEXT: movq _Te3@{{.*}}(%rip), %r10
	; ATOM-NEXT: decl %ecx			; ATOM-NEXT: decl %ecx
	; ATOM-NEXT: movq %rcx, %r11			; ATOM-NEXT: movq %rcx, %r14
	; ATOM-NEXT: jmp LBB0_1			; ATOM-NEXT: jmp LBB0_1
	; ATOM-NEXT: .p2align 4, 0x90			; ATOM-NEXT: .p2align 4, 0x90
	; ATOM-NEXT: LBB0_2: ## %bb1			; ATOM-NEXT: LBB0_2: ## %bb1
	; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1			; ATOM-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; ATOM-NEXT: shrl $16, %eax			; ATOM-NEXT: shrl $16, %eax
	; ATOM-NEXT: shrl $24, %edi			; ATOM-NEXT: shrl $24, %edi
	; ATOM-NEXT: decq %r11			; ATOM-NEXT: movzbl %al, %eax
	; ATOM-NEXT: movzbl %al, %ebp			; ATOM-NEXT: xorl (%r8,%rax,4), %r15d
	; ATOM-NEXT: movzbl %bl, %eax			; ATOM-NEXT: movzbl %bl, %eax
	; ATOM-NEXT: movl (%r10,%rax,4), %eax			; ATOM-NEXT: movl (%r10,%rax,4), %eax
	; ATOM-NEXT: xorl (%r8,%rbp,4), %r15d			; ATOM-NEXT: xorl -4(%r11), %r15d
	; ATOM-NEXT: xorl (%r9,%rdi,4), %eax			; ATOM-NEXT: xorl (%r9,%rdi,4), %eax
	; ATOM-NEXT: xorl -4(%r14), %r15d			; ATOM-NEXT: xorl (%r11), %eax
	; ATOM-NEXT: xorl (%r14), %eax			; ATOM-NEXT: addq $16, %r11
	; ATOM-NEXT: addq $16, %r14
	; ATOM-NEXT: LBB0_1: ## %bb			; ATOM-NEXT: LBB0_1: ## %bb
	; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1			; ATOM-NEXT: ## =>This Inner Loop Header: Depth=1
	; ATOM-NEXT: movl %eax, %edi			; ATOM-NEXT: movl %eax, %edi
	; ATOM-NEXT: movl %r15d, %ebp			; ATOM-NEXT: movl %r15d, %ebp
	; ATOM-NEXT: shrl $24, %eax			; ATOM-NEXT: shrl $24, %eax
	; ATOM-NEXT: shrl $16, %edi			; ATOM-NEXT: shrl $16, %edi
	; ATOM-NEXT: shrl $24, %ebp			; ATOM-NEXT: shrl $24, %ebp
	; ATOM-NEXT: movzbl %dil, %edi			; ATOM-NEXT: movzbl %dil, %edi
	; ATOM-NEXT: movl (%r8,%rdi,4), %ebx			; ATOM-NEXT: movl (%r8,%rdi,4), %ebx
	; ATOM-NEXT: movzbl %r15b, %edi			; ATOM-NEXT: movzbl %r15b, %edi
	; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx			; ATOM-NEXT: xorl (%r9,%rbp,4), %ebx
	; ATOM-NEXT: movl (%r10,%rdi,4), %edi			; ATOM-NEXT: movl (%r10,%rdi,4), %edi
	; ATOM-NEXT: xorl -12(%r14), %ebx			; ATOM-NEXT: xorl -12(%r11), %ebx
	; ATOM-NEXT: xorl (%r9,%rax,4), %edi			; ATOM-NEXT: xorl (%r9,%rax,4), %edi
	; ATOM-NEXT: movl %ebx, %eax			; ATOM-NEXT: movl %ebx, %eax
	; ATOM-NEXT: xorl -8(%r14), %edi			; ATOM-NEXT: xorl -8(%r11), %edi
	; ATOM-NEXT: shrl $24, %eax			; ATOM-NEXT: shrl $24, %eax
	; ATOM-NEXT: movl (%r9,%rax,4), %r15d			; ATOM-NEXT: movl (%r9,%rax,4), %r15d
	; ATOM-NEXT: testq %r11, %r11			; ATOM-NEXT: subq $1, %r14
	; ATOM-NEXT: movl %edi, %eax			; ATOM-NEXT: movl %edi, %eax
	; ATOM-NEXT: jne LBB0_2			; ATOM-NEXT: jae LBB0_2
	; ATOM-NEXT: ## %bb.3: ## %bb2			; ATOM-NEXT: ## %bb.3: ## %bb2
	; ATOM-NEXT: shrl $16, %eax			; ATOM-NEXT: shrl $16, %eax
	; ATOM-NEXT: shrl $8, %edi			; ATOM-NEXT: shrl $8, %edi
	; ATOM-NEXT: movzbl %bl, %ebp			; ATOM-NEXT: movzbl %bl, %ebp
	; ATOM-NEXT: andl $-16777216, %r15d ## imm = 0xFF000000			; ATOM-NEXT: andl $-16777216, %r15d ## imm = 0xFF000000
	; ATOM-NEXT: shlq $4, %rcx			; ATOM-NEXT: shlq $4, %rcx
	; ATOM-NEXT: movzbl %al, %eax			; ATOM-NEXT: movzbl %al, %eax
	; ATOM-NEXT: movzbl 3(%r9,%rdi,4), %edi			; ATOM-NEXT: movzbl 3(%r9,%rdi,4), %edi
	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	;
%a = add i42 %x, 1		%a = add i42 %x, 1
%ov = icmp eq i42 %a, 0		%ov = icmp eq i42 %a, 0
store i42 %a, i42* %p		store i42 %a, i42* %p
ret i1 %ov		ret i1 %ov
}		}

define i1 @usubo_ult_i64(i64 %x, i64 %y, i64* %p) {		define i1 @usubo_ult_i64(i64 %x, i64 %y, i64* %p) {
; CHECK-LABEL: @usubo_ult_i64(		; CHECK-LABEL: @usubo_ult_i64(
; CHECK-NEXT: [[S:%.]] = sub i64 [[X:%.]], [[Y:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[X:%.]], i64 [[Y:%.*]])
; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP1]], 0
; CHECK-NEXT: [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%s = sub i64 %x, %y		%s = sub i64 %x, %y
store i64 %s, i64* %p		store i64 %s, i64* %p
%ov = icmp ult i64 %x, %y		%ov = icmp ult i64 %x, %y
ret i1 %ov		ret i1 %ov
}		}

; Verify insertion point for single-BB. Toggle predicate.		; Verify insertion point for single-BB. Toggle predicate.

define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) {		define i1 @usubo_ugt_i32(i32 %x, i32 %y, i32* %p) {
; CHECK-LABEL: @usubo_ugt_i32(		; CHECK-LABEL: @usubo_ugt_i32(
; CHECK-NEXT: [[OV:%.]] = icmp ugt i32 [[Y:%.]], [[X:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[X:%.]], i32 [[Y:%.*]])
; CHECK-NEXT: [[S:%.*]] = sub i32 [[X]], [[Y]]		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
; CHECK-NEXT: store i32 [[S]], i32* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i32, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i32 [[MATH]], i32* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%ov = icmp ugt i32 %y, %x		%ov = icmp ugt i32 %y, %x
%s = sub i32 %x, %y		%s = sub i32 %x, %y
store i32 %s, i32* %p		store i32 %s, i32* %p
ret i1 %ov		ret i1 %ov
}		}

; Constant operand should match.		; Constant operand should match.

define i1 @usubo_ugt_constant_op0_i8(i8 %x, i8* %p) {		define i1 @usubo_ugt_constant_op0_i8(i8 %x, i8* %p) {
; CHECK-LABEL: @usubo_ugt_constant_op0_i8(		; CHECK-LABEL: @usubo_ugt_constant_op0_i8(
; CHECK-NEXT: [[S:%.]] = sub i8 42, [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 42, i8 [[X:%.]])
; CHECK-NEXT: [[OV:%.*]] = icmp ugt i8 [[X]], 42		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i8, i1 } [[TMP1]], 0
; CHECK-NEXT: store i8 [[S]], i8* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i8, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i8 [[MATH]], i8* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%s = sub i8 42, %x		%s = sub i8 42, %x
%ov = icmp ugt i8 %x, 42		%ov = icmp ugt i8 %x, 42
store i8 %s, i8* %p		store i8 %s, i8* %p
ret i1 %ov		ret i1 %ov
}		}

; Compare with constant operand 0 is canonicalized by commuting, but verify match for non-canonical form.		; Compare with constant operand 0 is canonicalized by commuting, but verify match for non-canonical form.

define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) {		define i1 @usubo_ult_constant_op0_i16(i16 %x, i16* %p) {
; CHECK-LABEL: @usubo_ult_constant_op0_i16(		; CHECK-LABEL: @usubo_ult_constant_op0_i16(
; CHECK-NEXT: [[S:%.]] = sub i16 43, [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call { i16, i1 } @llvm.usub.with.overflow.i16(i16 43, i16 [[X:%.]])
; CHECK-NEXT: [[OV:%.*]] = icmp ult i16 43, [[X]]		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i16, i1 } [[TMP1]], 0
; CHECK-NEXT: store i16 [[S]], i16* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i16, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i16 [[MATH]], i16* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%s = sub i16 43, %x		%s = sub i16 43, %x
%ov = icmp ult i16 43, %x		%ov = icmp ult i16 43, %x
store i16 %s, i16* %p		store i16 %s, i16* %p
ret i1 %ov		ret i1 %ov
}		}

; Subtract with constant operand 1 is canonicalized to add.		; Subtract with constant operand 1 is canonicalized to add.

define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) {		define i1 @usubo_ult_constant_op1_i16(i16 %x, i16* %p) {
; CHECK-LABEL: @usubo_ult_constant_op1_i16(		; CHECK-LABEL: @usubo_ult_constant_op1_i16(
; CHECK-NEXT: [[S:%.]] = add i16 [[X:%.]], -44		; CHECK-NEXT: [[TMP1:%.]] = call { i16, i1 } @llvm.usub.with.overflow.i16(i16 [[X:%.]], i16 44)
; CHECK-NEXT: [[OV:%.*]] = icmp ult i16 [[X]], 44		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i16, i1 } [[TMP1]], 0
; CHECK-NEXT: store i16 [[S]], i16* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i16, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i16 [[MATH]], i16* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%s = add i16 %x, -44		%s = add i16 %x, -44
%ov = icmp ult i16 %x, 44		%ov = icmp ult i16 %x, 44
store i16 %s, i16* %p		store i16 %s, i16* %p
ret i1 %ov		ret i1 %ov
}		}

define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) {		define i1 @usubo_ugt_constant_op1_i8(i8 %x, i8* %p) {
; CHECK-LABEL: @usubo_ugt_constant_op1_i8(		; CHECK-LABEL: @usubo_ugt_constant_op1_i8(
; CHECK-NEXT: [[OV:%.]] = icmp ugt i8 45, [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call { i8, i1 } @llvm.usub.with.overflow.i8(i8 [[X:%.]], i8 45)
; CHECK-NEXT: [[S:%.*]] = add i8 [[X]], -45		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i8, i1 } [[TMP1]], 0
; CHECK-NEXT: store i8 [[S]], i8* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i8, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i8 [[MATH]], i8* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%ov = icmp ugt i8 45, %x		%ov = icmp ugt i8 45, %x
%s = add i8 %x, -45		%s = add i8 %x, -45
store i8 %s, i8* %p		store i8 %s, i8* %p
ret i1 %ov		ret i1 %ov
}		}

; Special-case: subtract 1 changes the compare predicate and constant.		; Special-case: subtract 1 changes the compare predicate and constant.

define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) {		define i1 @usubo_eq_constant1_op1_i32(i32 %x, i32* %p) {
; CHECK-LABEL: @usubo_eq_constant1_op1_i32(		; CHECK-LABEL: @usubo_eq_constant1_op1_i32(
; CHECK-NEXT: [[S:%.]] = add i32 [[X:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[X:%.]], i32 1)
; CHECK-NEXT: [[OV:%.*]] = icmp eq i32 [[X]], 0		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i32, i1 } [[TMP1]], 0
; CHECK-NEXT: store i32 [[S]], i32* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i32, i1 } [[TMP1]], 1
; CHECK-NEXT: ret i1 [[OV]]		; CHECK-NEXT: store i32 [[MATH]], i32* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
%s = add i32 %x, -1		%s = add i32 %x, -1
%ov = icmp eq i32 %x, 0		%ov = icmp eq i32 %x, 0
store i32 %s, i32* %p		store i32 %s, i32* %p
ret i1 %ov		ret i1 %ov
}		}

; Verify insertion point for multi-BB.		; Verify insertion point for multi-BB.

declare void @call(i1)		declare void @call(i1)

define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) {		define i1 @usubo_ult_sub_dominates_i64(i64 %x, i64 %y, i64* %p, i1 %cond) {
; CHECK-LABEL: @usubo_ult_sub_dominates_i64(		; CHECK-LABEL: @usubo_ult_sub_dominates_i64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[COND:%.]], label [[T:%.]], label [[F:%.*]]		; CHECK-NEXT: br i1 [[COND:%.]], label [[T:%.]], label [[F:%.*]]
; CHECK: t:		; CHECK: t:
; CHECK-NEXT: [[S:%.]] = sub i64 [[X:%.]], [[Y:%.*]]		; CHECK-NEXT: [[TMP0:%.]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[X:%.]], i64 [[Y:%.*]])
; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0
		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
		; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
; CHECK-NEXT: br i1 [[COND]], label [[END:%.*]], label [[F]]		; CHECK-NEXT: br i1 [[COND]], label [[END:%.*]], label [[F]]
; CHECK: f:		; CHECK: f:
; CHECK-NEXT: ret i1 [[COND]]		; CHECK-NEXT: ret i1 [[COND]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[OV:%.*]] = icmp ult i64 [[X]], [[Y]]		; CHECK-NEXT: ret i1 [[OV1]]
; CHECK-NEXT: ret i1 [[OV]]
;		;
entry:		entry:
br i1 %cond, label %t, label %f		br i1 %cond, label %t, label %f

t:		t:
%s = sub i64 %x, %y		%s = sub i64 %x, %y
store i64 %s, i64* %p		store i64 %s, i64* %p
br i1 %cond, label %end, label %f		br i1 %cond, label %end, label %f
Show All 12 Lines
; CHECK-NEXT: br i1 [[COND:%.]], label [[T:%.]], label [[F:%.*]]		; CHECK-NEXT: br i1 [[COND:%.]], label [[T:%.]], label [[F:%.*]]
; CHECK: t:		; CHECK: t:
; CHECK-NEXT: [[OV:%.]] = icmp ult i64 [[X:%.]], [[Y:%.*]]		; CHECK-NEXT: [[OV:%.]] = icmp ult i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: call void @call(i1 [[OV]])		; CHECK-NEXT: call void @call(i1 [[OV]])
; CHECK-NEXT: br i1 [[OV]], label [[END:%.*]], label [[F]]		; CHECK-NEXT: br i1 [[OV]], label [[END:%.*]], label [[F]]
; CHECK: f:		; CHECK: f:
; CHECK-NEXT: ret i1 [[COND]]		; CHECK-NEXT: ret i1 [[COND]]
; CHECK: end:		; CHECK: end:
; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i64 [[X]], [[Y]]		; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[X]], i64 [[Y]])
; CHECK-NEXT: [[S:%.*]] = sub i64 [[X]], [[Y]]		; CHECK-NEXT: [[MATH:%.*]] = extractvalue { i64, i1 } [[TMP0]], 0
; CHECK-NEXT: store i64 [[S]], i64* [[P:%.*]]		; CHECK-NEXT: [[OV1:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
; CHECK-NEXT: ret i1 [[TMP0]]		; CHECK-NEXT: store i64 [[MATH]], i64* [[P:%.*]]
		; CHECK-NEXT: ret i1 [[OV1]]
;		;
entry:		entry:
br i1 %cond, label %t, label %f		br i1 %cond, label %t, label %f

t:		t:
%ov = icmp ult i64 %x, %y		%ov = icmp ult i64 %x, %y
call void @call(i1 %ov)		call void @call(i1 %ov)
br i1 %ov, label %end, label %f		br i1 %ov, label %end, label %f
Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CGP] form usub with overflow from sub+icmpClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 187278

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/cgp-usubo.ll

llvm/trunk/test/CodeGen/X86/lsr-loop-exit-cond.ll

llvm/trunk/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

[CGP] form usub with overflow from sub+icmp
ClosedPublic