This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
5/10
CodeGenPrepare.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
2020_12_02_decrementing_loop.ll
-
uadd_inc_iv.ll
-
usub_inc_iv.ll

Differential D96399

[X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the offset
ClosedPublic

Authored by mkazantsev on Feb 10 2021, 12:30 AM.

Download Raw Diff

Details

Reviewers

spatel
reames
aqjune
greened

Commits

rG9d5af555891d: [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the…
rGd9e93e8e57fe: [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the…

Summary

While optimizing the memory instruction, we sometimes need to add
offset to the value of IV. We could avoid doing so if the IV.next is
already defined at the point of interest. In this case, we may get two
possible advantages from this:

If the IV step happens to match with the offset, we don't need to add the offset at all;
We reduce overlap of live ranges of IV and IV.next. They may stop overlapping and it will lead to better register allocation. Even if the overlap will preserve, we are not introducing a new overlap, so it should be a neutral transform.

Currently I've only added support for IVs that get decremented using usub
intrinsic. We could also support AddInstr, however there is some weird
interaction with some other transform that may lead to infinite compilation
in this case (seems like same transform is done and undone over and over).
I need to investigate why it happens, but generally we could do that too.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mkazantsev created this revision.Feb 10 2021, 12:30 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptFeb 10 2021, 12:30 AM

mkazantsev requested review of this revision.Feb 10 2021, 12:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 12:30 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mkazantsev planned changes to this revision.Feb 10 2021, 10:40 PM

Found one reason of TODO, it may only happen with AddInstr, will add in follow-up patch. Seems that there is one more hang with AddInstr. So I suggest to have usub first and then follow-up with add support.

Rebased

Ping.

The change from the previous output to the new output looks valid to me.

$ ./build/alive-tv D96399.srctgt.ll -disable-undef-input

----------------------------------------
define * @src(* %p, i64 %idx) {
%0:
  %addr1 = mul i64 %idx, 4
  %ptr = gep * %p, 1 x i64 %addr1
  %ptr2 = gep * %ptr, 1 x i64 -4
  ret * %ptr2
}
=>
define * @tgt(* %p, i64 %idx) {
%0:
  %res = usub_overflow {i64, i1, i24} %idx, 1
  %sub = extractvalue {i64, i1, i24} %res, 0
  %addr2 = mul i64 %sub, 4
  %ptr2 = gep * %p, 1 x i64 %addr2
  ret * %ptr2
}
Transformation seems to be correct!

(The disable-undef-input flag added to resolve timeout, but it is still okay with undef inputs I think)

Max,

A couple of high level comments here.

You're code is specific to generating the address as GEPs. I think we want to handle this for the non-GEP path as well.
I think this makes more sense to phrase as an optimization step for the AddrMode we're about to sink. Doing it that way should also address (1) above.
I'm definitely not okay with the initial patch being so narrowly focused. If you were handling generic add and sub, but not the overflow intrinsics I'd be less concerned, but the very narrow focus on one of the overflow along with your todo makes me strongly suspect there is a lurking correctness issue which the usub tests simply happen not to expose.
I think I might see the correctness issue, or at least a hint of it. Consider the case where addressing does overflow. The wrapping semantics of a GEP are not the same as the usubo. That difference means that if overflow occurs, your optimized AddrMode is incorrect. I believe you need to restrict this transform to when you can prove overflow causes the memory inst not to be reached.

This revision now requires changes to proceed.Feb 22 2021, 11:35 AM

You're code is specific to generating the address as GEPs. I think we want to handle this for the non-GEP path as well.

I think this makes more sense to phrase as an optimization step for the AddrMode we're about to sink. Doing it that way should also address (1) above.

Agreed. I tried doing this during AddrMode computation, but it's impossible because it's lacking knowledge about the user. But I think it can be hoisted out of condition.

I think I might see the correctness issue, or at least a hint of it. Consider the case where addressing does overflow. The wrapping semantics of a GEP are not the same as the usubo. That difference means that if overflow occurs, your optimized AddrMode is incorrect. I believe you need to restrict this transform to when you can prove overflow causes the memory inst not to be reached.

I don't quite get the point here. If overflow occurs in optimized case, it also occurs in non-optimized case. My transform does not change the actual offset, it just simplifies the way how it's computed.

mkazantsev planned changes to this revision.Feb 24 2021, 1:08 AM

Moved the logic into AddrMode optimization code, so that GEP and non-GEP modes can use it separately. So it should resolve concerns 1 and 2.

As for potential functional bugs and TODO, need to think more. Placing (WIP) until resolved.

Harbormaster completed remote builds in B90575: Diff 326035.Feb 24 2021, 3:53 AM

mkazantsev planned changes to this revision.Feb 24 2021, 4:05 AM

Generalized to add/sub/uadd with overflow cases.

Harbormaster completed remote builds in B90762: Diff 326307.Feb 25 2021, 1:03 AM

Need to address correctness issue with overflow.

Added tests & safety checks against add/sub with nuw/nsw flags to avoid potential misuse of poisoned values. So far we conservatively restrain from optimizing such doubteous cases. Follow-up planned: try to prove flags is possible, if not - drop them (?).

mkazantsev added a child revision: D97537: [Codegenprepare] Use IV increment instead of IV if we can prove it is not a poisoned value.Feb 26 2021, 3:02 AM

Harbormaster completed remote builds in B90989: Diff 326626.Feb 26 2021, 3:19 AM

reames requested changes to this revision.Feb 26 2021, 10:02 AM

reames added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1279	Can you precommit the extraction of the lambda to a static function? The diff is confusing to read.
1318–1319	Same with the rename here. (Or at least, this appears to be only a rename?)
3838	Please remove whitespace change, feel free to commit separately.
3868	You really should be able to factor out some common code here with the isIVIncrement function above. Maybe time for a getIVIncrement function? Hm, though there appears to be a bug in this copy. The "incr" you identify doesn't appear to be tied to the PN.
5107–5108	The need for domtree here introduces a potential compile time problem given the way CGP manages domtree invalidation. Just noting it for now.

This revision now requires changes to proceed.Feb 26 2021, 10:02 AM

I suggest taking a look at the matchSimpleRecurrence routine I just added to ValueTracking. I strongly suspect you can simplify some of this code a lot.

mkazantsev added inline comments.Feb 28 2021, 8:08 PM

llvm/lib/CodeGen/CodeGenPrepare.cpp
3868	`IVInc` is the incoming value of the Phi node from the backedge which has the same phi node as argument thru matcher. That makes it tied, no?

mkazantsev marked 2 inline comments as done.Feb 28 2021, 9:15 PM

mkazantsev marked an inline comment as done.Feb 28 2021, 11:29 PM

mkazantsev added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
5107–5108	Let's see if it will have visible measurable impact, and if so, think how to tackle it.

Addressed comments: all independent changes commited separately.

mkazantsev planned changes to this revision.Feb 28 2021, 11:42 PM

mkazantsev updated this revision to Diff 327035.Mar 1 2021, 12:04 AM

Harbormaster completed remote builds in B91283: Diff 327035.Mar 1 2021, 12:44 AM

Harbormaster completed remote builds in B91281: Diff 327033.Mar 1 2021, 5:43 AM

LGTM w/required comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
3890	I'm not really convinced of your second point above (e.g. the generic offset one). In particular, it's not clear to me that an arbitrary offset is always better than an overlapped live interval. I'm also not convinced you're wrong. I'm not going to require you stage this, but if you need to revert for any reason I'll strongly suggest staging first the case where the existing base offset cancels, and then doing the general offset case in a second patch.
3897	As written, there's a potential overflow here for SINT_MAX and Scale = 8. Please use APInt.smul_ov here.
3900	If I'm reading the code correctly, you want this to be adding IVInc to the instruction list, not the phi node.

This revision is now accepted and ready to land.Mar 3 2021, 3:56 PM

This revision was landed with ongoing or failed builds.Mar 4 2021, 12:23 AM

Closed by commit rGd9e93e8e57fe: [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the… (authored by mkazantsev). · Explain Why

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rGd9e93e8e57fe: [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the….

http://llvm-compile-time-tracker.com/compare.php?from=b15ce2f344ac7845729d2be0a8316b20a32c6292&to=d9e93e8e57fe63babc319cbaf84f1afeccb83696&stat=instructions

Can this be less expensive for compile time?

@nikic

From a cursory look, the problem is likely that you're doing an unconditional call to getDT(), which will lead to unnecessary DomTree calculations. The way this function is supposed to be used is directly when checking dominance, after other checks have already been done.

mkazantsev added a commit: rG9d5af555891d: [X86][CodeGenPrepare] Try to reuse IV's incremented value instead of adding the….Mar 4 2021, 1:49 AM

reames mentioned this in rGe0cfd451718e: [CGP] Lazily compute domtree only when needed during address matching.Mar 4 2021, 9:33 AM

I went ahead and pushed a change (e0cfd451) to lazily compute domtree as per @nikic's guess. @xbolva00 do you still see the compile time regression with this change?

In D96399#2604000, @reames wrote:

I went ahead and pushed a change (e0cfd451) to lazily compute domtree as per @nikic's guess. @xbolva00 do you still see the compile time regression with this change?

Thanks! Here are the results for the range with both commits (it does not look like any other relevant changes landed in the meantime): http://llvm-compile-time-tracker.com/compare.php?from=b15ce2f344ac7845729d2be0a8316b20a32c6292&to=e0cfd451718e2524cc5b8f98ecd72a75d37146cc&stat=instructions

For most programs the regression is resolved, only mafft still has a noteworthy regression, as well as the ReleaseLTO-g configurations (don't know if related to the LTO part or the -g part though).

In D96399#2604070, @nikic wrote:

For most programs the regression is resolved, only mafft still has a noteworthy regression, as well as the ReleaseLTO-g configurations (don't know if related to the LTO part or the -g part though).

How concerned by this are you? To me this looks relatively isolated, and likely directly related to code changes in the respective benchmarks. (Though, is that an easy way to check that without building it all myself?) I can see one more small tweak I can make to defer domtree usage a bit further (will do so), but at some point it's pretty fundamental to this change (which I do think is worthwhile). Thoughts?

Pushed 6af94d22 which defers the domtree compute slightly further.

As an aside, the analysis invalidation in this code is a mess. We appear to be leaving loopinfo (which is derived from DT) stale. It just happens that the particular variety of stale doesn't expose problems in practice (joy). It's also not clear why we don't just update DT in the various transforms which modify it. None of them appear particularly hard to do.

In D96399#2604109, @reames wrote:

In D96399#2604070, @nikic wrote:

For most programs the regression is resolved, only mafft still has a noteworthy regression, as well as the ReleaseLTO-g configurations (don't know if related to the LTO part or the -g part though).

How concerned by this are you? To me this looks relatively isolated, and likely directly related to code changes in the respective benchmarks. (Though, is that an easy way to check that without building it all myself?) I can see one more small tweak I can make to defer domtree usage a bit further (will do so), but at some point it's pretty fundamental to this change (which I do think is worthwhile). Thoughts?

I'm not very concerned about that particular regression, but rather the potential implications. If this is indeed still caused by domtree calculations (rather than second order effects -- that also sounds plausible), then it is quite likely that there are also pathological cases (this is a typical issue for domtree invalidation + use in the same pass).

As an aside, the analysis invalidation in this code is a mess. We appear to be leaving loopinfo (which is derived from DT) stale. It just happens that the particular variety of stale doesn't expose problems in practice (joy). It's also not clear why we don't just update DT in the various transforms which modify it. None of them appear particularly hard to do.

Yeah, it's a real mess. What I find particularly concerning is that a lot of transforms set a ModifiedDT flag, despite not doing any CFG changes! For example combineToUAddWithOverflow() seems to use this flag to avoid instruction iterator invalidation -- flushing the DT and reprocessing the whole function is a pretty big hammer for that problem.

I think we're at the point where if we do see problematic cases due to dom tree handling, we should just fix CGP to not use the current invalidation scheme. Doing that is not a huge amount of work, and we're spending more time avoiding it than is warranted.

If necessary, I'll even volunteer to do it. I will put that off until we have a motivating test case though. :)

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

100 lines

test/

CodeGen/

X86/

2020_12_02_decrementing_loop.ll

19 lines

uadd_inc_iv.ll

7 lines

usub_inc_iv.ll

20 lines

Diff 328049

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,270 Lines • ▼ Show 20 Lines	static bool OptimizeNoopCopyExpression(CastInst *CI, const TargetLowering &TLI,

// If, after promotion, these are the same types, this is a noop copy.		// If, after promotion, these are the same types, this is a noop copy.
if (SrcVT != DstVT)		if (SrcVT != DstVT)
return false;		return false;

return SinkCast(CI);		return SinkCast(CI);
}		}

/// If given \p PN is an inductive variable with value IVInc coming from the		/// If given \p PN is an inductive variable with value IVInc coming from the
		reamesUnsubmitted Done Reply Inline Actions Can you precommit the extraction of the lambda to a static function? The diff is confusing to read. reames: Can you precommit the extraction of the lambda to a static function? The diff is confusing to…
/// backedge, and on each iteration it gets increased by Step, return pair		/// backedge, and on each iteration it gets increased by Step, return pair
/// <IVInc, Step>. Otherwise, return None.		/// <IVInc, Step>. Otherwise, return None.
static Optional<std::pair<Instruction , Constant > >		static Optional<std::pair<Instruction , Constant > >
getIVIncrement(const PHINode PN, const LoopInfo LI) {		getIVIncrement(const PHINode PN, const LoopInfo LI) {
const Loop *L = LI->getLoopFor(PN->getParent());		const Loop *L = LI->getLoopFor(PN->getParent());
if (!L \|\| L->getHeader() != PN->getParent() \|\| !L->getLoopLatch())		if (!L \|\| L->getHeader() != PN->getParent() \|\| !L->getLoopLatch())
return None;		return None;
auto *IVInc =		auto *IVInc =
Show All 22 Lines	if (auto IVInc = getIVIncrement(PN, LI))
return IVInc->first == BO;		return IVInc->first == BO;
return false;		return false;
}		}

bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,		bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,
Value Arg0, Value Arg1,		Value Arg0, Value Arg1,
CmpInst *Cmp,		CmpInst *Cmp,
Intrinsic::ID IID) {		Intrinsic::ID IID) {
auto IsReplacableIVIncrement = [this, &Cmp](BinaryOperator *BO) {		auto IsReplacableIVIncrement = [this, &Cmp](BinaryOperator *BO) {
if (!isIVIncrement(BO, LI))		if (!isIVIncrement(BO, LI))
		reamesUnsubmitted Done Reply Inline Actions Same with the rename here. (Or at least, this appears to be only a rename?) reames: Same with the rename here. (Or at least, this appears to be only a rename?)
return false;		return false;
const Loop *L = LI->getLoopFor(BO->getParent());		const Loop *L = LI->getLoopFor(BO->getParent());
// IV increment may have other users than the IV. We do not want to make		// IV increment may have other users than the IV. We do not want to make
// dominance queries to analyze the legality of moving it towards the cmp,		// dominance queries to analyze the legality of moving it towards the cmp,
// so just check that there is no other users.		// so just check that there is no other users.
if (!BO->hasOneUse())		if (!BO->hasOneUse())
return false;		return false;
// Do not risk on moving increment into a child loop.		// Do not risk on moving increment into a child loop.
▲ Show 20 Lines • Show All 1,731 Lines • ▼ Show 20 Lines
/// A helper class for matching addressing modes.		/// A helper class for matching addressing modes.
///		///
/// This encapsulates the logic for matching the target-legal addressing modes.		/// This encapsulates the logic for matching the target-legal addressing modes.
class AddressingModeMatcher {		class AddressingModeMatcher {
SmallVectorImpl<Instruction*> &AddrModeInsts;		SmallVectorImpl<Instruction*> &AddrModeInsts;
const TargetLowering &TLI;		const TargetLowering &TLI;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
const DataLayout &DL;		const DataLayout &DL;
		const LoopInfo &LI;
		const DominatorTree &DT;

/// AccessTy/MemoryInst - This is the type for the access (e.g. double) and		/// AccessTy/MemoryInst - This is the type for the access (e.g. double) and
/// the memory instruction that we're computing this address for.		/// the memory instruction that we're computing this address for.
Type *AccessTy;		Type *AccessTy;
unsigned AddrSpace;		unsigned AddrSpace;
Instruction *MemoryInst;		Instruction *MemoryInst;

/// This is the addressing mode that we're building up. This is		/// This is the addressing mode that we're building up. This is
Show All 19 Lines	class AddressingModeMatcher {
/// True if we are optimizing for size.		/// True if we are optimizing for size.
bool OptSize;		bool OptSize;

ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;
BlockFrequencyInfo *BFI;		BlockFrequencyInfo *BFI;

AddressingModeMatcher(		AddressingModeMatcher(
SmallVectorImpl<Instruction *> &AMI, const TargetLowering &TLI,		SmallVectorImpl<Instruction *> &AMI, const TargetLowering &TLI,
const TargetRegisterInfo &TRI, Type AT, unsigned AS, Instruction MI,		const TargetRegisterInfo &TRI, const LoopInfo &LI,
		const DominatorTree &DT, Type AT, unsigned AS, Instruction MI,
ExtAddrMode &AM, const SetOfInstrs &InsertedInsts,		ExtAddrMode &AM, const SetOfInstrs &InsertedInsts,
InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,		InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,
std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,
bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI)		bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI)
: AddrModeInsts(AMI), TLI(TLI), TRI(TRI),		: AddrModeInsts(AMI), TLI(TLI), TRI(TRI),
DL(MI->getModule()->getDataLayout()), AccessTy(AT), AddrSpace(AS),		DL(MI->getModule()->getDataLayout()), LI(LI), DT(DT), AccessTy(AT),
MemoryInst(MI), AddrMode(AM), InsertedInsts(InsertedInsts),		AddrSpace(AS), MemoryInst(MI), AddrMode(AM),
PromotedInsts(PromotedInsts), TPT(TPT), LargeOffsetGEP(LargeOffsetGEP),		InsertedInsts(InsertedInsts), PromotedInsts(PromotedInsts), TPT(TPT),
OptSize(OptSize), PSI(PSI), BFI(BFI) {		LargeOffsetGEP(LargeOffsetGEP), OptSize(OptSize), PSI(PSI), BFI(BFI) {
IgnoreProfitability = false;		IgnoreProfitability = false;
}		}

public:		public:
/// Find the maximal addressing mode that a load/store of V can fold,		/// Find the maximal addressing mode that a load/store of V can fold,
/// give an access type of AccessTy. This returns a list of involved		/// give an access type of AccessTy. This returns a list of involved
/// instructions in AddrModeInsts.		/// instructions in AddrModeInsts.
/// \p InsertedInsts The instructions inserted by other CodeGenPrepare		/// \p InsertedInsts The instructions inserted by other CodeGenPrepare
/// optimizations.		/// optimizations.
/// \p PromotedInsts maps the instructions to their type before promotion.		/// \p PromotedInsts maps the instructions to their type before promotion.
/// \p The ongoing transaction where every action should be registered.		/// \p The ongoing transaction where every action should be registered.
static ExtAddrMode		static ExtAddrMode
Match(Value V, Type AccessTy, unsigned AS, Instruction *MemoryInst,		Match(Value V, Type AccessTy, unsigned AS, Instruction *MemoryInst,
SmallVectorImpl<Instruction *> &AddrModeInsts,		SmallVectorImpl<Instruction *> &AddrModeInsts,
const TargetLowering &TLI, const TargetRegisterInfo &TRI,		const TargetLowering &TLI, const LoopInfo &LI, const DominatorTree &DT,
const SetOfInstrs &InsertedInsts, InstrToOrigTy &PromotedInsts,		const TargetRegisterInfo &TRI, const SetOfInstrs &InsertedInsts,
TypePromotionTransaction &TPT,		InstrToOrigTy &PromotedInsts, TypePromotionTransaction &TPT,
std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> &LargeOffsetGEP,
bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {		bool OptSize, ProfileSummaryInfo PSI, BlockFrequencyInfo BFI) {
ExtAddrMode Result;		ExtAddrMode Result;

bool Success = AddressingModeMatcher(AddrModeInsts, TLI, TRI, AccessTy, AS,		bool Success = AddressingModeMatcher(
MemoryInst, Result, InsertedInsts,		AddrModeInsts, TLI, TRI, LI, DT, AccessTy, AS, MemoryInst, Result,
PromotedInsts, TPT, LargeOffsetGEP,		InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI,
OptSize, PSI, BFI)		BFI).matchAddr(V, 0);
.matchAddr(V, 0);
(void)Success; assert(Success && "Couldn't select anything?");		(void)Success; assert(Success && "Couldn't select anything?");
return Result;		return Result;
}		}

private:		private:
bool matchScaledValue(Value *ScaleReg, int64_t Scale, unsigned Depth);		bool matchScaledValue(Value *ScaleReg, int64_t Scale, unsigned Depth);
bool matchAddr(Value *Addr, unsigned Depth);		bool matchAddr(Value *Addr, unsigned Depth);
bool matchOperationAddr(User *AddrInst, unsigned Opcode, unsigned Depth,		bool matchOperationAddr(User *AddrInst, unsigned Opcode, unsigned Depth,
▲ Show 20 Lines • Show All 679 Lines • ▼ Show 20 Lines	bool AddressingModeMatcher::matchScaledValue(Value *ScaleReg, int64_t Scale,
if (!TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace))		if (!TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace))
return false;		return false;

// It was legal, so commit it.		// It was legal, so commit it.
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;

// Okay, we decided that we can add ScaleReg+Scale to AddrMode. Check now		// Okay, we decided that we can add ScaleReg+Scale to AddrMode. Check now
// to see if ScaleReg is actually X+C. If so, we can turn this into adding		// to see if ScaleReg is actually X+C. If so, we can turn this into adding
// XScale + CScale to addr mode.		// XScale + CScale to addr mode. If we found available IV increment, do not
		// go any further: we can reuse it and cannot eliminate it.
ConstantInt CI = nullptr; Value AddLHS = nullptr;		ConstantInt CI = nullptr; Value AddLHS = nullptr;
if (isa<Instruction>(ScaleReg) && // not a constant expr.		if (isa<Instruction>(ScaleReg) && // not a constant expr.
		reamesUnsubmitted Done Reply Inline Actions Please remove whitespace change, feel free to commit separately. reames: Please remove whitespace change, feel free to commit separately.
match(ScaleReg, m_Add(m_Value(AddLHS), m_ConstantInt(CI))) &&		match(ScaleReg, m_Add(m_Value(AddLHS), m_ConstantInt(CI))) &&
		!isIVIncrement(cast<BinaryOperator>(ScaleReg), &LI) &&
CI->getValue().isSignedIntN(64)) {		CI->getValue().isSignedIntN(64)) {
TestAddrMode.InBounds = false;		TestAddrMode.InBounds = false;
TestAddrMode.ScaledReg = AddLHS;		TestAddrMode.ScaledReg = AddLHS;
TestAddrMode.BaseOffs += CI->getSExtValue() * TestAddrMode.Scale;		TestAddrMode.BaseOffs += CI->getSExtValue() * TestAddrMode.Scale;

// If this addressing mode is legal, commit it and remember that we folded		// If this addressing mode is legal, commit it and remember that we folded
// this instruction.		// this instruction.
if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace)) {		if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace)) {
AddrModeInsts.push_back(cast<Instruction>(ScaleReg));		AddrModeInsts.push_back(cast<Instruction>(ScaleReg));
AddrMode = TestAddrMode;		AddrMode = TestAddrMode;
return true;		return true;
}		}
		// Restore status quo.
		TestAddrMode = AddrMode;
		}

		auto GetConstantStep = [this](const Value * V)
		->Optional<std::pair<Instruction *, APInt> > {
		auto *PN = dyn_cast<PHINode>(V);
		if (!PN)
		return None;
		auto IVInc = getIVIncrement(PN, &LI);
		if (!IVInc)
		return None;
		// TODO: The result of the intrinsics above is two-compliment. However when
		// IV inc is expressed as add or sub, iv.next is potentially a poison value.
		// If it has nuw or nsw flags, we need to make sure that these flags are
		// inferrable at the point of memory instruction. Otherwise we are replacing
		reamesUnsubmitted Not Done Reply Inline Actions You really should be able to factor out some common code here with the isIVIncrement function above. Maybe time for a getIVIncrement function? Hm, though there appears to be a bug in this copy. The "incr" you identify doesn't appear to be tied to the PN. reames: You really should be able to factor out some common code here with the isIVIncrement function…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions `IVInc` is the incoming value of the Phi node from the backedge which has the same phi node as argument thru matcher. That makes it tied, no? mkazantsev: `IVInc` is the incoming value of the Phi node from the backedge which has the same phi node as…
		// well-defined two-compliment computation with poison. Currently, to avoid
		// potentially complex analysis needed to prove this, we reject such cases.
		if (auto *OIVInc = dyn_cast<OverflowingBinaryOperator>(IVInc->first))
		if (OIVInc->hasNoSignedWrap() \|\| OIVInc->hasNoUnsignedWrap())
		return None;
		if (auto *ConstantStep = dyn_cast<ConstantInt>(IVInc->second))
		return std::make_pair(IVInc->first, ConstantStep->getValue());
		return None;
		};

		// Try to account for the following special case:
		// 1. ScaleReg is an inductive variable;
		// 2. We use it with non-zero offset;
		// 3. IV's increment is available at the point of memory instruction.
		//
		// In this case, we may reuse the IV increment instead of the IV Phi to
		// achieve the following advantages:
		// 1. If IV step matches the offset, we will have no need in the offset;
		if (AddrMode.BaseOffs) {
		if (auto IVStep = GetConstantStep(ScaleReg)) {
		Instruction *IVInc = IVStep->first;
		APInt Step = IVStep->second;
		reamesUnsubmitted Not Done Reply Inline Actions I'm not really convinced of your second point above (e.g. the generic offset one). In particular, it's not clear to me that an arbitrary offset is always better than an overlapped live interval. I'm also not convinced you're wrong. I'm not going to require you stage this, but if you need to revert for any reason I'll strongly suggest staging first the case where the existing base offset cancels, and then doing the general offset case in a second patch. reames: I'm not really convinced of your second point above (e.g. the generic offset one). In…
		APInt Offset = Step * AddrMode.Scale;
		if (Offset.isSignedIntN(64) && TestAddrMode.BaseOffs == Offset &&
		DT.dominates(IVInc, MemoryInst)) {
		TestAddrMode.InBounds = false;
		TestAddrMode.ScaledReg = IVInc;
		TestAddrMode.BaseOffs -= Offset.getLimitedValue();
		// If this addressing mode is legal, commit it..
		reamesUnsubmitted Not Done Reply Inline Actions As written, there's a potential overflow here for SINT_MAX and Scale = 8. Please use APInt.smul_ov here. reames: As written, there's a potential overflow here for SINT_MAX and Scale = 8. Please use APInt.
		if (TLI.isLegalAddressingMode(DL, TestAddrMode, AccessTy, AddrSpace)) {
		AddrModeInsts.push_back(cast<Instruction>(IVInc));
		AddrMode = TestAddrMode;
		reamesUnsubmitted Not Done Reply Inline Actions If I'm reading the code correctly, you want this to be adding IVInc to the instruction list, not the phi node. reames: If I'm reading the code correctly, you want this to be adding IVInc to the instruction list…
		return true;
		}
		// Restore status quo.
		TestAddrMode = AddrMode;
		}
		}
}		}

// Otherwise, not (x+c)*scale, just return what we have.		// Otherwise, just return what we have.
return true;		return true;
}		}

/// This is a little filter, which returns true if an addressing computation		/// This is a little filter, which returns true if an addressing computation
/// involving I might be folded into a load/store accessing it.		/// involving I might be folded into a load/store accessing it.
/// This doesn't need to be perfect, but needs to accept at least		/// This doesn't need to be perfect, but needs to accept at least
/// the set of instructions that MatchOperationAddr can.		/// the set of instructions that MatchOperationAddr can.
static bool MightBeFoldableInst(Instruction *I) {		static bool MightBeFoldableInst(Instruction *I) {
▲ Show 20 Lines • Show All 1,073 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = MemoryUses.size(); i != e; ++i) {
// Do a match against the root of this address, ignoring profitability. This		// Do a match against the root of this address, ignoring profitability. This
// will tell us if the addressing mode for the memory operation will		// will tell us if the addressing mode for the memory operation will
// actually cover the shared instruction.		// actually cover the shared instruction.
ExtAddrMode Result;		ExtAddrMode Result;
std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,
0);		0);
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
AddressingModeMatcher Matcher(		AddressingModeMatcher Matcher(MatchedAddrModeInsts, TLI, TRI, LI, DT,
MatchedAddrModeInsts, TLI, TRI, AddressAccessTy, AS, MemoryInst, Result,		AddressAccessTy, AS, MemoryInst, Result,
InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI, BFI);		InsertedInsts, PromotedInsts, TPT,
		LargeOffsetGEP, OptSize, PSI, BFI);
Matcher.IgnoreProfitability = true;		Matcher.IgnoreProfitability = true;
bool Success = Matcher.matchAddr(Address, 0);		bool Success = Matcher.matchAddr(Address, 0);
(void)Success; assert(Success && "Couldn't select anything?");		(void)Success; assert(Success && "Couldn't select anything?");

// The match was to check the profitability, the changes made are not		// The match was to check the profitability, the changes made are not
// part of the original matcher. Therefore, they should be dropped		// part of the original matcher. Therefore, they should be dropped
// otherwise the original matcher will not present the right state.		// otherwise the original matcher will not present the right state.
TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	while (!worklist.empty()) {
}		}

// For non-PHIs, determine the addressing mode being computed. Note that		// For non-PHIs, determine the addressing mode being computed. Note that
// the result may differ depending on what other uses our candidate		// the result may differ depending on what other uses our candidate
// addressing instructions might have.		// addressing instructions might have.
AddrModeInsts.clear();		AddrModeInsts.clear();
std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,
0);		0);
		Function *F = MemoryInst->getParent()->getParent();
ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(		ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(
V, AccessTy, AddrSpace, MemoryInst, AddrModeInsts, TLI, TRI,		V, AccessTy, AddrSpace, MemoryInst, AddrModeInsts, TLI, LI, getDT(*F),
InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI,		*TRI, InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP, OptSize, PSI,
		reamesUnsubmitted Not Done Reply Inline Actions The need for domtree here introduces a potential compile time problem given the way CGP manages domtree invalidation. Just noting it for now. reames: The need for domtree here introduces a potential compile time problem given the way CGP manages…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Let's see if it will have visible measurable impact, and if so, think how to tackle it. mkazantsev: Let's see if it will have visible measurable impact, and if so, think how to tackle it.
BFI.get());		BFI.get());

GetElementPtrInst *GEP = LargeOffsetGEP.first;		GetElementPtrInst *GEP = LargeOffsetGEP.first;
if (GEP && !NewGEPBases.count(GEP)) {		if (GEP && !NewGEPBases.count(GEP)) {
// If splitting the underlying data structure can reduce the offset of a		// If splitting the underlying data structure can reduce the offset of a
// GEP, collect the GEP. Skip the GEPs that are the new bases of		// GEP, collect the GEP. Skip the GEPs that are the new bases of
// previously split data structures.		// previously split data structures.
LargeOffsetGEPMap[GEP->getPointerOperand()].push_back(LargeOffsetGEP);		LargeOffsetGEPMap[GEP->getPointerOperand()].push_back(LargeOffsetGEP);
▲ Show 20 Lines • Show All 2,994 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/2020_12_02_decrementing_loop.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-macosx \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-macosx \| FileCheck %s

	; TODO: We can get rid of movq here by using different offset and %rax.
	define i32 @test_01(i32* %p, i64 %len, i32 %x) {			define i32 @test_01(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: test_01:			; CHECK-LABEL: test_01:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: movq %rsi, %rax
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB0_1: ## %loop			; CHECK-NEXT: LBB0_1: ## %loop
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subq $1, %rax			; CHECK-NEXT: subq $1, %rsi
	; CHECK-NEXT: jb LBB0_4			; CHECK-NEXT: jb LBB0_4
	; CHECK-NEXT: ## %bb.2: ## %backedge			; CHECK-NEXT: ## %bb.2: ## %backedge
	; CHECK-NEXT: ## in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)			; CHECK-NEXT: cmpl %edx, (%rdi,%rsi,4)
	; CHECK-NEXT: movq %rax, %rsi
	; CHECK-NEXT: jne LBB0_1			; CHECK-NEXT: jne LBB0_1
	; CHECK-NEXT: ## %bb.3: ## %failure			; CHECK-NEXT: ## %bb.3: ## %failure
	; CHECK-NEXT: ud2			; CHECK-NEXT: ud2
	; CHECK-NEXT: LBB0_4: ## %exit			; CHECK-NEXT: LBB0_4: ## %exit
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: movl $-1, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	br label %loop			br label %loop
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

	failure: ; preds = %backedge			failure: ; preds = %backedge
	unreachable			unreachable
	}			}

	define i32 @test_02(i32* %p, i64 %len, i32 %x) {			define i32 @test_02(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: test_02:			; CHECK-LABEL: test_02:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: movq %rsi, %rax
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB2_1: ## %loop			; CHECK-NEXT: LBB2_1: ## %loop
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subq $1, %rax			; CHECK-NEXT: subq $1, %rsi
	; CHECK-NEXT: jb LBB2_4			; CHECK-NEXT: jb LBB2_4
	; CHECK-NEXT: ## %bb.2: ## %backedge			; CHECK-NEXT: ## %bb.2: ## %backedge
	; CHECK-NEXT: ## in Loop: Header=BB2_1 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB2_1 Depth=1
	; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)			; CHECK-NEXT: cmpl %edx, (%rdi,%rsi,4)
	; CHECK-NEXT: movq %rax, %rsi
	; CHECK-NEXT: jne LBB2_1			; CHECK-NEXT: jne LBB2_1
	; CHECK-NEXT: ## %bb.3: ## %failure			; CHECK-NEXT: ## %bb.3: ## %failure
	; CHECK-NEXT: ud2			; CHECK-NEXT: ud2
	; CHECK-NEXT: LBB2_4: ## %exit			; CHECK-NEXT: LBB2_4: ## %exit
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: movl $-1, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%start = add i64 %len, -1			%start = add i64 %len, -1
	Show All 18 Lines

	failure: ; preds = %backedge			failure: ; preds = %backedge
	unreachable			unreachable
	}			}

	define i32 @test_03(i32* %p, i64 %len, i32 %x) {			define i32 @test_03(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: test_03:			; CHECK-LABEL: test_03:
	; CHECK: ## %bb.0: ## %entry			; CHECK: ## %bb.0: ## %entry
	; CHECK-NEXT: movq %rsi, %rax
	; CHECK-NEXT: .p2align 4, 0x90			; CHECK-NEXT: .p2align 4, 0x90
	; CHECK-NEXT: LBB3_1: ## %loop			; CHECK-NEXT: LBB3_1: ## %loop
	; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ## =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: subq $1, %rax			; CHECK-NEXT: subq $1, %rsi
	; CHECK-NEXT: jb LBB3_4			; CHECK-NEXT: jb LBB3_4
	; CHECK-NEXT: ## %bb.2: ## %backedge			; CHECK-NEXT: ## %bb.2: ## %backedge
	; CHECK-NEXT: ## in Loop: Header=BB3_1 Depth=1			; CHECK-NEXT: ## in Loop: Header=BB3_1 Depth=1
	; CHECK-NEXT: cmpl %edx, -4(%rdi,%rsi,4)			; CHECK-NEXT: cmpl %edx, (%rdi,%rsi,4)
	; CHECK-NEXT: movq %rax, %rsi
	; CHECK-NEXT: jne LBB3_1			; CHECK-NEXT: jne LBB3_1
	; CHECK-NEXT: ## %bb.3: ## %failure			; CHECK-NEXT: ## %bb.3: ## %failure
	; CHECK-NEXT: ud2			; CHECK-NEXT: ud2
	; CHECK-NEXT: LBB3_4: ## %exit			; CHECK-NEXT: LBB3_4: ## %exit
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: movl $-1, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%start = add i64 %len, -100			%start = add i64 %len, -100
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/uadd_inc_iv.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=x86_64-linux -codegenprepare -S < %s \| FileCheck %s			; RUN: opt -mtriple=x86_64-linux -codegenprepare -S < %s \| FileCheck %s

	declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)			declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)

	define i32 @test_01(i32* %p, i64 %len, i32 %x) {			define i32 @test_01(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: @test_01(			; CHECK-LABEL: @test_01(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[IV]], i64 1)			; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[IV]], i64 1)
	; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0			; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0
	; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1			; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
	; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]			; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4			; CHECK-NEXT: [[SUNKADDR3:%.*]] = mul i64 [[MATH]], 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8
	; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR]]			; CHECK-NEXT: [[SUNKADDR4:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR3]]
	; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 4			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR4]] to i32*
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR2]] to i32*
	; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4			; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4
	; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]			; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]
	; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	; CHECK: failure:			; CHECK: failure:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	Show All 26 Lines

llvm/test/CodeGen/X86/usub_inc_iv.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=x86_64-linux -codegenprepare -S < %s \| FileCheck %s			; RUN: opt -mtriple=x86_64-linux -codegenprepare -S < %s \| FileCheck %s

	define i32 @test_01(i32* %p, i64 %len, i32 %x) {			define i32 @test_01(i32* %p, i64 %len, i32 %x) {
	; CHECK-LABEL: @test_01(			; CHECK-LABEL: @test_01(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[IV]], i64 1)			; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[IV]], i64 1)
	; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0			; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0
	; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1			; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
	; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]			; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4			; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[MATH]], 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8
	; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR]]			; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR]]
	; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 -4			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR1]] to i32*
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR2]] to i32*
	; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4			; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4
	; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]			; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]
	; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	; CHECK: failure:			; CHECK: failure:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[MATH:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[IV]], i64 1)			; CHECK-NEXT: [[TMP0:%.*]] = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 [[IV]], i64 1)
	; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0			; CHECK-NEXT: [[MATH]] = extractvalue { i64, i1 } [[TMP0]], 0
	; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1			; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP0]], 1
	; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]			; CHECK-NEXT: br i1 [[OV]], label [[EXIT:%.*]], label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4			; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[MATH]], 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[P:%.]] to i8
	; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR]]			; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP1]], i64 [[SUNKADDR]]
	; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 -4			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR1]] to i32*
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[SUNKADDR2]] to i32*
	; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4			; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP2]] unordered, align 4
	; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]			; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]
	; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	; CHECK: failure:			; CHECK: failure:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	Show All 32 Lines
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], -1			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], -1
	; CHECK-NEXT: [[COND_0:%.*]] = call i1 @use(i64 [[IV_NEXT]])			; CHECK-NEXT: [[COND_0:%.*]] = call i1 @use(i64 [[IV_NEXT]])
	; CHECK-NEXT: br i1 [[COND_0]], label [[MIDDLE:%.]], label [[FAILURE:%.]]			; CHECK-NEXT: br i1 [[COND_0]], label [[MIDDLE:%.]], label [[FAILURE:%.]]
	; CHECK: middle:			; CHECK: middle:
	; CHECK-NEXT: [[COND_1:%.*]] = icmp eq i64 [[IV]], 0			; CHECK-NEXT: [[COND_1:%.*]] = icmp eq i64 [[IV]], 0
	; CHECK-NEXT: br i1 [[COND_1]], label [[EXIT:%.*]], label [[BACKEDGE]]			; CHECK-NEXT: br i1 [[COND_1]], label [[EXIT:%.*]], label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4			; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV_NEXT]], 4
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P:%.]] to i8			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P:%.]] to i8
	; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP0]], i64 [[SUNKADDR]]			; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP0]], i64 [[SUNKADDR]]
	; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 -4			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SUNKADDR1]] to i32*
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SUNKADDR2]] to i32*
	; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4			; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4
	; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]			; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]
	; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	; CHECK: failure:			; CHECK: failure:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[BACKEDGE:%.]] ], [ [[LEN:%.]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], -1			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], -1
	; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_TRUE:%.]], label [[BACKEDGE]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_TRUE:%.]], label [[BACKEDGE]]
	; CHECK: if.true:			; CHECK: if.true:
	; CHECK-NEXT: [[COND_1:%.*]] = icmp eq i64 [[IV]], 0			; CHECK-NEXT: [[COND_1:%.*]] = icmp eq i64 [[IV]], 0
	; CHECK-NEXT: br i1 [[COND_1]], label [[EXIT:%.*]], label [[BACKEDGE]]			; CHECK-NEXT: br i1 [[COND_1]], label [[EXIT:%.*]], label [[BACKEDGE]]
	; CHECK: backedge:			; CHECK: backedge:
	; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV]], 4			; CHECK-NEXT: [[SUNKADDR:%.*]] = mul i64 [[IV_NEXT]], 4
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P:%.]] to i8			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P:%.]] to i8
	; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP0]], i64 [[SUNKADDR]]			; CHECK-NEXT: [[SUNKADDR1:%.]] = getelementptr i8, i8 [[TMP0]], i64 [[SUNKADDR]]
	; CHECK-NEXT: [[SUNKADDR2:%.]] = getelementptr i8, i8 [[SUNKADDR1]], i64 -4			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SUNKADDR1]] to i32*
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SUNKADDR2]] to i32*
	; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4			; CHECK-NEXT: [[LOADED:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4
	; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]			; CHECK-NEXT: [[COND_2:%.]] = icmp eq i32 [[LOADED]], [[X:%.]]
	; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND_2]], label [[FAILURE:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	; CHECK: failure:			; CHECK: failure:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	Show All 25 Lines