This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
1
MachineLICM.cpp
11/23
MachineSink.cpp
-
test/
-
CodeGen/
-
AArch64/
-
loop-sink.mir
-
X86/
-
sink-cheap-instructions.ll
-
DebugInfo/MIR/X86/
-
MIR/
-
X86/
-
mlicm-sink.mir

Differential D93694

[MachineLICM][MachineSink] Move SinkIntoLoop from MachineLICM to MachineSink
ClosedPublic

Authored by SjoerdMeijer on Dec 22 2020, 5:49 AM.

Download Raw Diff

Details

Reviewers

fhahn
dmgreen
djasper
shchenz
qcolombet
samparker

Commits

rG48ecba350ed6: [MachineLICM][MachineSink] Move SinkIntoLoop to MachineSink.

Summary

This moves SinkIntoLoop from MachineLICM to MachineSink.

Not only is this arguably a better home for this, but more importantly there is infrastructure available in MachineSink that is required for making SinkIntoLoop more generic. For example, hasStoreBetween that is available could be queried by SinkIntoLoop to make it less conservative. This would be the next step of this work. At the moment, this is a non-functional change.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Dec 22 2020, 5:49 AM

Herald added subscribers: asbirlea, hiraditya. · View Herald TranscriptDec 22 2020, 5:49 AM

SjoerdMeijer requested review of this revision.Dec 22 2020, 5:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 5:49 AM

fhahn added inline comments.Dec 22 2020, 5:57 AM

llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll
8 ↗	(On Diff #313310)	Can this be a MIR test? That would make it slightly easier to relate the test to the code changes.

SjoerdMeijer added inline comments.Dec 22 2020, 6:04 AM

llvm/test/CodeGen/AArch64/machine-licm-sink-instr.ll
8 ↗	(On Diff #313310)	Yeah, I definitely thought about that. But since it is also nice/convenient to look at the final codegen, so decided to create a llc test. But if you prefer a mir test, I am happy with that too.

Now with a MIR test.

fhahn added inline comments.Dec 22 2020, 8:01 AM

llvm/lib/CodeGen/MachineLICM.cpp
780	This only checks for stores/calls in the preheader. What about the case where there are stores/clobbering calls in the loop body, e.g. something like %20:gpr64common = ADRP target-flags(aarch64-page) @A STRWui %3, %20, target-flags(aarch64-pageoff, aarch64-nc) @A :: (store 4 into `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`) in the test case?
llvm/test/CodeGen/AArch64/machine-licm-sink-instr.mir
159 ↗	(On Diff #313341)	I don't think there's anything preventing `_Z3usei` to modify `@A` in the loop, so if we move the load into the loop, we may read different values on each iteration?

Hmmmm, good points, will check. I was assuming that the legality checks were performed, but yeah, this is impressively broken if not!

Yeah, so thanks again for looking at this. After looking more into this, I don't think this change at the moment makes much sense.

Sinking instructions is a bit of a mess as we have LoopSink, MachineSink, and MachineLICM all doing this (in different ways, and/or different use cases). This change here and function HasSuccessiveStoreInst() that I added is very similar to hasStoreBetween() in MachineSink. It's actually better, so I would like reuse that. As different strategies I have considered hoisting that out to a new MachineLoopUtils helper, but there's state (caching) that doesn't make this very convenient. I think the better alternative is to move function SinkIntoLoop() from MachineLICM to MachineSink because that is actually a more natural place for it, and there seems to be more/better infrastructure in MachineSink to do this (like hasStoreBetween() that I mentioned earlier).

A first prep step for this D94082, which creates a new helper isLoopInvariant() which we can then also use in MachineSink. After this, I would like to repurpose this ticket and move SinkIntoLoop() from MachineLICM to MachineSink if we agree that this makes sense. That should be a NFC, and then as a third step I would like to make the change that I wanted to make.

See updated Title and Summary.

Fixed formatting and description of the command line option. Related to this, and in addition to my previous message, this is off-by-default and is currently covered by exactly 1 test:
test/CodeGen/X86/sink-cheap-instructions.ll. After this NFC change, I will look into extending this.

lkail added a subscriber: lkail.Jan 7 2021, 7:04 AM

SjoerdMeijer added a child revision: D94308: [MachineSink] SinkIntoLoop: analyse stores and aliases in between.Jan 8 2021, 7:31 AM

I have upload WIP patch D94308 to show the direction and motivation for this change here.

Little friendly ping: how do we like this little reshuffle?

samparker added reviewers: shchenz, qcolombet.Jan 15 2021, 6:03 AM

samparker added a subscriber: samparker.Jan 15 2021, 6:51 AM

samparker added inline comments.

llvm/lib/CodeGen/MachineSink.cpp
86	Is there a plan to add an option for the maximum number of instructions sunk, or something like that?
374	SmallVectorImpl<MachineInstr*>&
398	What are we doing these checks for?
1216	So how do we know that MI is in a loop?

SjoerdMeijer added inline comments.Jan 15 2021, 7:20 AM

llvm/lib/CodeGen/MachineSink.cpp
86	Thanks for looking at this! Getting loop-sinking to do something useful is going to be at least a 3-step approach: This is just moving the code, not changing it, so is a NFC and just a prep step use the infrastructure in MachineSink. Note, that loop-sink is disabled by default, so needs an option to be triggered. A first functional change to make loop-sink a little bit less restrictive, which it really is at the moment, is the change in D94308. This let's it do a bit more analysis using functions in MachineSink, making it a bit more powerful. Nothing changes much: still off by default, but is a good demonstrator for the reshuffle. This is the going to be he interesting step: decision making when and how many instructions to sink. This will be driven by the register pressure, and deciding if reducing live-ranges and loop sinking will help in better performance. I guess this is the place where also a maximum can be introduced and option to control this further, although I agree of course there's in principle nothing stopping us from introducing this earlier. Once we are happy with 3), this should be enabled by default, that should be the end goal of this exercise.
398	I think this is being conservative: when an instruction has multiple defs, it's trickier to analyse or we don't want to sink these instructions and skip over it. But I haven't looked much into this (just moved the code), so will need to do that.
1216	Yeah, good point. I think the only simple answer here is that at this point we don't. Will look into this.

Is it possible to implement this new logic to the original MachineSink infra? If so, we can not limit to sink the instruction in loop preheader? I guess we prevent this kind of sinking in machinesink pass is because we treat it as nonprofitable in MachineSinking::isProfitableToSinkTo?
And we already have some simple register estimation model in MachineSinking::isProfitableToSinkTo, should we do this kind of sinking based on register pressure, for example, only sinking to loop if the register pressure of the loop is high?

llvm/lib/CodeGen/MachineSink.cpp
1223	Is this necessary? Assuming MI is a loop user, how does a loop COPY user impact the cost of sinking instruction `I` to the loop? `I` could still be any instruction like a load instruction.

Thank you very much for looking at this @shchenz !

In D93694#2504069, @shchenz wrote:

Is it possible to implement this new logic to the original MachineSink infra? If so, we can not limit to sink the instruction in loop preheader? I guess we prevent this kind of sinking in machinesink pass is because we treat it as nonprofitable in MachineSinking::isProfitableToSinkTo?

It's good you mention this, because this was actually my first approach! :-) I started integrating this logic into the existing code, but run into problems. The main problem was that the existing logic and the new loop sink started interacting, which means I got infinite loops of new blocks being created and things moved. This could probably be fixed, which is what I started doing, but then found that this didn't make things clearer. Thus, since the existing logic and loop sink are slightly different algorithms, I saw the benefit of keeping them separate, for simplicity and clarity. Thus, in this draft, first the original sink logic runs, followed by loop sink. And there might also be a benefit of running them separate and one after the other, as opposed to one integrated solution (that's something I can imagine, but don't have the proof for yet). Long story short, my preference is to keep it separated for now, but that shouldn't exclude reconsidering this if we find a need for this later.

And we already have some simple register estimation model in MachineSinking::isProfitableToSinkTo, should we do this kind of sinking based on register pressure, for example, only sinking to loop if the register pressure of the loop is high?

Exactly right. That is another example and motivation to move this code to here, this is exactly the kind of things I want to reuse. Another example is function hasStoreBetween, which is what I am using in D94308. This patch is functionally complete, but I just need to add all the different MIR tests; but again, it's the justification of moving this code to MachineSink.
Please see also my reply earlier to @samparker, i.e. the four steps that I outlined that are necessary to make loop sinking useful. I am just pointing this out, because using isProfitableToSinkTo is exactly one of these next steps to let loop sink make an informed decision. And in other words, this is exactly what I will be doing once we have agreed on this NFC change, this change is the basis for these next steps.

llvm/lib/CodeGen/MachineSink.cpp
1223	You're absolutely right again, this could be any instruction, and is thus not necessary. But in absence of any cost-modeling or other decision making (based on register pressure), the original authors of this code found that it is always profitable to move the instruction if the user is a copy. So yes, this is exactly the first thing that we will be changing when we integrate `MachineSinking::isProfitableToSinkTo` into this work in a next step.

Added statistics NumLoopSunk, the number of instructions sunk into a loop,
Check that a use of a candidate is in a loop,
Added a bunch of negative tests.

Ported over the only positive test test/CodeGen/X86/sink-cheap-instructions.ll to a MIR test: this is sink_add in the new MIR test file.
Walk candidates in reverse order, to start at the bottom of def-use chains, and try using the uses first before attempting to sink a def.

The plan sounds good to me. Thanks for doing this @SjoerdMeijer

llvm/lib/CodeGen/MachineSink.cpp
376	Should we call `TII->shouldSink()` at the very beginning? Some targets may not want to sink some instructions.
384	Maybe I am wrong, but for the sinking, we must sink an instruction to a block that is dominated by the instruction's parent block. So if the destinate block is executed, BB must also be executed. Do we need the check for `IsGuaranteedToExecute`? I think this is not the same as hoisting. For hoisting, we must make sure the load must be executed as it will be hoisted to the loop preheader which will be executed surely.
394	This may be unnecessary as all the instructions come from loop preheader?
1219	This seems strange, MI should be a user inside the loop, why its parent block must be equal to the loop preheader?

SjoerdMeijer added inline comments.Jan 22 2021, 5:44 AM

llvm/lib/CodeGen/MachineSink.cpp
376	Thanks, agreed.
384	yeah, good catch. We are sinking from `preheader` to a loop block `BB`, and `preheader` dominates `BB`. I agree that we don't need this check, this is indeed different for hoisting and sinking. I will remove it.
394	I agreed and changed this into an assert, which seemed like to a good precondition to assert. But this new assert triggered for one of the test cases where the preheader is the entry block, containing COPY instructions of the function arguments which use physical registers. Physical registers are marked as not loop invariant, which is why this assert triggered. So I think I will just keep this for now.
1219	I thought I had a reason, but it doesn't seem to make sense, so will remove that part of the condition.

Thanks @shchenz, comments addressed.

samparker added inline comments.Jan 22 2021, 6:55 AM

llvm/lib/CodeGen/MachineSink.cpp
473	Do we need to break here as soon as SinkIntoLoop is false? Otherwise, couldn't we sink an 'earlier' instruction that is an input to an instruction that hasn't been sunk?

SjoerdMeijer added inline comments.Jan 22 2021, 7:21 AM

llvm/lib/CodeGen/MachineSink.cpp
473	Unless I miss something, but I don't think so. I think that is too conservative. For example, if we have a number of sink candidates, and stop after e.g. the very first one if that can't be sunk for some reason, then we miss a lot of opportunities if this instruction is completely Independent from the other candidates. Correct me if I am wrong, but I think what you describe is simply a case of not "IsSafeToMove": there is a dependency that we must respect and can't break. That's why I also walk the candidates in reverse order. Because for a little def-use chain in a preheader, these 2 candidate instructions: %12:gpr64sp = nuw ADDXri %9, 12, 0 %2:gpr64all = COPY %12 if we start with trying to move ADD, then that has a use in the same block, and we can't move the def passed that. If we first move the COPY, then the ADD, or at least try that, we can move the partial/whole chain.

samparker added inline comments.Jan 25 2021, 4:14 AM

llvm/lib/CodeGen/MachineSink.cpp
473	I thought IsSafeToMove is just about side-effects and memory operations? I was thinking purely about the data dependency chain. In your copy example, what's stopping us from skipping the COPY but sinking the ADD? (I understand we'll always sink the COPY really, so let's pretend it's just another general consumer of ADD)

SjoerdMeijer added inline comments.Jan 25 2021, 5:27 AM

llvm/lib/CodeGen/MachineSink.cpp
473	Yeah, okay. In my examples and added tests, I am sinking only load instructions (probably need to add some more tests). The extra argument `DontMoveAcrossStore` set to true makes this conservative. But you're right that for non-memory ops, this isn't sufficient. In D94308, I added a local `MachineSinking::IsSafeToMove` that adds alias checks. So, long story short, this NFC and refactoring is bug-compatible with the original code. I thought separating out this refactoring from any new changes would be best, in order to make this change manageable and not too big. (Please note that this off by default.) This still looks the easiest to me, but if you think this is best, then I will start merging D94308 into this, let me know.

samparker added inline comments.Jan 25 2021, 6:53 AM

llvm/lib/CodeGen/MachineSink.cpp
473	Well how about just adding the break then? I think it's too simple to ignore and would be best to start with a base change that might be correct.

SjoerdMeijer added inline comments.Jan 25 2021, 6:58 AM

llvm/lib/CodeGen/MachineSink.cpp
473	Yeah, okay, nice one, let's do that.

Thanks, this now:

breaks if we can't sink an instruction,
and I have removed test/CodeGen/X86/sink-cheap-instructions.ll as that has no value anymore; this has been integrated as a MIR test and keeping this IR test around doesn't add anything at this point. When this loop sinking starts doing something useful and correct, better target tests need to be added.

Herald added a subscriber: pengfei. · View Herald TranscriptJan 25 2021, 8:09 AM

Just added a few more test, trying to sink an add instruction: sink_add, store_after_add, and aliased_store_after_add

The last 2 are negative tests, and shouldn't sink things passed stores. The first, sink_add, is interesting because it shows that the original sink algorithm, and then loop-sink working one after the other, working "together", which is the good thing to do here. This means that a load instruction from the entry block:

Sink instr %12:gpr32common = LDRWui %9:gpr64common, 0 :: (load 4 from %ir.read, !tbaa !0)
      into block bb.1.for.body.preheader:

is sunk into the preheader so that the whole chain ends up in the preheader:

bb.1.for.body.preheader:
    successors: %bb.3(0x80000000)
    %12:gpr32common = LDRWui %9:gpr64common, 0 :: (load 4 from %ir.read, !tbaa !0)
    %14:gpr32sp = ADDWri %12, 42, 0
    %1:gpr32all = COPY %14
    B %bb.3

Which are then all considered for loop sinking (and that will be sunk with follow up patch D94308).

LGTM

This revision is now accepted and ready to land.Jan 27 2021, 12:46 AM

Many thanks @shchenz, @fhahn and @samparker for your help and reviews!

Closed by commit rG48ecba350ed6: [MachineLICM][MachineSink] Move SinkIntoLoop to MachineSink. (authored by SjoerdMeijer). · Explain WhyJan 27 2021, 2:53 AM

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG48ecba350ed6: [MachineLICM][MachineSink] Move SinkIntoLoop to MachineSink..

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineLICM.cpp

92 lines

MachineSink.cpp

156 lines

test/

CodeGen/

AArch64/

loop-sink.mir

1399 lines

X86/

sink-cheap-instructions.ll

DebugInfo/

MIR/

X86/

mlicm-sink.mir

Diff 319511

llvm/lib/CodeGen/MachineLICM.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	AvoidSpeculation("avoid-speculation",
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
HoistCheapInsts("hoist-cheap-insts",		HoistCheapInsts("hoist-cheap-insts",
cl::desc("MachineLICM should hoist even cheap instructions"),		cl::desc("MachineLICM should hoist even cheap instructions"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
SinkInstsToAvoidSpills("sink-insts-to-avoid-spills",
cl::desc("MachineLICM should sink instructions into "
"loops to avoid register spills"),
cl::init(false), cl::Hidden);
static cl::opt<bool>
HoistConstStores("hoist-const-stores",		HoistConstStores("hoist-const-stores",
cl::desc("Hoist invariant stores"),		cl::desc("Hoist invariant stores"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);
// The default threshold of 100 (i.e. if target block is 100 times hotter)		// The default threshold of 100 (i.e. if target block is 100 times hotter)
// is based on empirical data on a single target and is subject to tuning.		// is based on empirical data on a single target and is subject to tuning.
static cl::opt<unsigned>		static cl::opt<unsigned>
BlockFrequencyRatioThreshold("block-freq-ratio-threshold",		BlockFrequencyRatioThreshold("block-freq-ratio-threshold",
cl::desc("Do not hoist instructions if target"		cl::desc("Do not hoist instructions if target"
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	private:

void ExitScopeIfDone(		void ExitScopeIfDone(
MachineDomTreeNode *Node,		MachineDomTreeNode *Node,
DenseMap<MachineDomTreeNode *, unsigned> &OpenChildren,		DenseMap<MachineDomTreeNode *, unsigned> &OpenChildren,
DenseMap<MachineDomTreeNode , MachineDomTreeNode > &ParentMap);		DenseMap<MachineDomTreeNode , MachineDomTreeNode > &ParentMap);

void HoistOutOfLoop(MachineDomTreeNode *HeaderN);		void HoistOutOfLoop(MachineDomTreeNode *HeaderN);

void SinkIntoLoop();

void InitRegPressure(MachineBasicBlock *BB);		void InitRegPressure(MachineBasicBlock *BB);

DenseMap<unsigned, int> calcRegisterCost(const MachineInstr *MI,		DenseMap<unsigned, int> calcRegisterCost(const MachineInstr *MI,
bool ConsiderSeen,		bool ConsiderSeen,
bool ConsiderUnseenAsDef);		bool ConsiderUnseenAsDef);

void UpdateRegPressure(const MachineInstr *MI,		void UpdateRegPressure(const MachineInstr *MI,
bool ConsiderUnseenAsDef = false);		bool ConsiderUnseenAsDef = false);
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (!PreRegAlloc)
HoistRegionPostRA();		HoistRegionPostRA();
else {		else {
// CSEMap is initialized for loop header when the first instruction is		// CSEMap is initialized for loop header when the first instruction is
// being hoisted.		// being hoisted.
MachineDomTreeNode *N = DT->getNode(CurLoop->getHeader());		MachineDomTreeNode *N = DT->getNode(CurLoop->getHeader());
FirstInLoop = true;		FirstInLoop = true;
HoistOutOfLoop(N);		HoistOutOfLoop(N);
CSEMap.clear();		CSEMap.clear();

if (SinkInstsToAvoidSpills)
SinkIntoLoop();
}		}
}		}

return Changed;		return Changed;
}		}

/// Return true if instruction stores to the specified frame.		/// Return true if instruction stores to the specified frame.
static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {		static bool InstructionStoresToFI(const MachineInstr *MI, int FI) {
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator
MII = NextMII;		MII = NextMII;
}		}

// If it's a leaf node, it's done. Traverse upwards to pop ancestors.		// If it's a leaf node, it's done. Traverse upwards to pop ancestors.
ExitScopeIfDone(Node, OpenChildren, ParentMap);		ExitScopeIfDone(Node, OpenChildren, ParentMap);
}		}
}		}

/// Sink instructions into loops if profitable. This especially tries to prevent
/// register spills caused by register pressure if there is little to no
/// overhead moving instructions into loops.
void MachineLICMBase::SinkIntoLoop() {
MachineBasicBlock *Preheader = getCurPreheader();
if (!Preheader)
return;

SmallVector<MachineInstr *, 8> Candidates;
for (MachineBasicBlock::instr_iterator I = Preheader->instr_begin();
I != Preheader->instr_end(); ++I) {
// We need to ensure that we can safely move this instruction into the loop.
// As such, it must not have side-effects, e.g. such as a call has.
LLVM_DEBUG(dbgs() << "LICM: Analysing sink candidate: " << *I);
if (IsLoopInvariantInst(I) && !HasLoopPHIUse(&I)) {
LLVM_DEBUG(dbgs() << "LICM: Added as sink candidate.\n");
Candidates.push_back(&*I);
continue;
}
LLVM_DEBUG(dbgs() << "LICM: Not added as sink candidate.\n");
}

for (MachineInstr *I : Candidates) {
const MachineOperand &MO = I->getOperand(0);
if (!MO.isDef() \|\| !MO.isReg() \|\| !MO.getReg())
continue;
if (!MRI->hasOneDef(MO.getReg()))
continue;
bool CanSink = true;
MachineBasicBlock *SinkBlock = nullptr;
LLVM_DEBUG(dbgs() << "LICM: Try sinking: " << *I);

for (MachineInstr &MI : MRI->use_instructions(MO.getReg())) {
LLVM_DEBUG(dbgs() << "LICM: Analysing use: "; MI.dump());
// FIXME: Come up with a proper cost model that estimates whether sinking
// the instruction (and thus possibly executing it on every loop
// iteration) is more expensive than a register.
// For now assumes that copies are cheap and thus almost always worth it.
if (!MI.isCopy()) {
CanSink = false;
break;
}
if (!SinkBlock) {
SinkBlock = MI.getParent();
LLVM_DEBUG(dbgs() << "LICM: Setting sink block to: "
<< printMBBReference(*SinkBlock) << "\n");
continue;
}
SinkBlock = DT->findNearestCommonDominator(SinkBlock, MI.getParent());
if (!SinkBlock) {
LLVM_DEBUG(dbgs() << "LICM: Can't find nearest dominator\n");
CanSink = false;
break;
}
LLVM_DEBUG(dbgs() << "LICM: Setting nearest common dom block: " <<
printMBBReference(*SinkBlock) << "\n");
}
if (!CanSink) {
LLVM_DEBUG(dbgs() << "LICM: Can't sink instruction.\n");
continue;
}
if (!SinkBlock) {
LLVM_DEBUG(dbgs() << "LICM: Not sinking, can't find sink block.\n");
continue;
}
if (SinkBlock == Preheader) {
LLVM_DEBUG(dbgs() << "LICM: Not sinking, sink block is the preheader\n");
continue;
}

LLVM_DEBUG(dbgs() << "LICM: Sinking to " << printMBBReference(*SinkBlock)
<< " from " << printMBBReference(*I->getParent())
<< ": " << *I);
SinkBlock->splice(SinkBlock->getFirstNonPHI(), Preheader, I);

// The instruction is moved from its basic block, so do not retain the
// debug information.
assert(!I->isDebugInstr() && "Should not sink debug inst");
I->setDebugLoc(DebugLoc());
}
}

static bool isOperandKill(const MachineOperand &MO, MachineRegisterInfo *MRI) {		static bool isOperandKill(const MachineOperand &MO, MachineRegisterInfo *MRI) {
		fhahnUnsubmitted Not Done Reply Inline Actions This only checks for stores/calls in the preheader. What about the case where there are stores/clobbering calls in the loop body, e.g. something like %20:gpr64common = ADRP target-flags(aarch64-page) @A STRWui %3, %20, target-flags(aarch64-pageoff, aarch64-nc) @A :: (store 4 into `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`) in the test case? fhahn: This only checks for stores/calls in the preheader. What about the case where there are…
return MO.isKill() \|\| MRI->hasOneNonDBGUse(MO.getReg());		return MO.isKill() \|\| MRI->hasOneNonDBGUse(MO.getReg());
}		}

/// Find all virtual register references that are liveout of the preheader to		/// Find all virtual register references that are liveout of the preheader to
/// initialize the starting "register pressure". Note this does not count live		/// initialize the starting "register pressure". Note this does not count live
/// through (livein but not used) registers.		/// through (livein but not used) registers.
void MachineLICMBase::InitRegPressure(MachineBasicBlock *BB) {		void MachineLICMBase::InitRegPressure(MachineBasicBlock *BB) {
std::fill(RegPressure.begin(), RegPressure.end(), 0);		std::fill(RegPressure.begin(), RegPressure.end(), 0);
▲ Show 20 Lines • Show All 720 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineSink.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	cl::desc(
"speculative execution of up to 1 instruction to avoid branching to "		"speculative execution of up to 1 instruction to avoid branching to "
"splitted critical edge"),		"splitted critical edge"),
cl::init(40), cl::Hidden);		cl::init(40), cl::Hidden);

static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold(		static cl::opt<unsigned> SinkLoadInstsPerBlockThreshold(
"machine-sink-load-instrs-threshold",		"machine-sink-load-instrs-threshold",
cl::desc("Do not try to find alias store for a load if there is a in-path "		cl::desc("Do not try to find alias store for a load if there is a in-path "
"block whose instruction number is higher than this threshold."),		"block whose instruction number is higher than this threshold."),
cl::init(2000), cl::Hidden);		cl::init(2000), cl::Hidden);
		samparkerUnsubmitted Not Done Reply Inline Actions Is there a plan to add an option for the maximum number of instructions sunk, or something like that? samparker: Is there a plan to add an option for the maximum number of instructions sunk, or something like…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Thanks for looking at this! Getting loop-sinking to do something useful is going to be at least a 3-step approach: This is just moving the code, not changing it, so is a NFC and just a prep step use the infrastructure in MachineSink. Note, that loop-sink is disabled by default, so needs an option to be triggered. A first functional change to make loop-sink a little bit less restrictive, which it really is at the moment, is the change in D94308. This let's it do a bit more analysis using functions in MachineSink, making it a bit more powerful. Nothing changes much: still off by default, but is a good demonstrator for the reshuffle. This is the going to be he interesting step: decision making when and how many instructions to sink. This will be driven by the register pressure, and deciding if reducing live-ranges and loop sinking will help in better performance. I guess this is the place where also a maximum can be introduced and option to control this further, although I agree of course there's in principle nothing stopping us from introducing this earlier. Once we are happy with 3), this should be enabled by default, that should be the end goal of this exercise. SjoerdMeijer: Thanks for looking at this! Getting loop-sinking to do something useful is going to be at…

static cl::opt<unsigned> SinkLoadBlocksThreshold(		static cl::opt<unsigned> SinkLoadBlocksThreshold(
"machine-sink-load-blocks-threshold",		"machine-sink-load-blocks-threshold",
cl::desc("Do not try to find alias store for a load if the block number in "		cl::desc("Do not try to find alias store for a load if the block number in "
"the straight line is higher than this threshold."),		"the straight line is higher than this threshold."),
cl::init(20), cl::Hidden);		cl::init(20), cl::Hidden);

		static cl::opt<bool>
		SinkInstsIntoLoop("sink-insts-to-avoid-spills",
		cl::desc("Sink instructions into loops to avoid "
		"register spills"),
		cl::init(false), cl::Hidden);

STATISTIC(NumSunk, "Number of machine instructions sunk");		STATISTIC(NumSunk, "Number of machine instructions sunk");
		STATISTIC(NumLoopSunk, "Number of machine instructions sunk into a loop");
STATISTIC(NumSplit, "Number of critical edges split");		STATISTIC(NumSplit, "Number of critical edges split");
STATISTIC(NumCoalesces, "Number of copies coalesced");		STATISTIC(NumCoalesces, "Number of copies coalesced");
STATISTIC(NumPostRACopySink, "Number of copies sunk after RA");		STATISTIC(NumPostRACopySink, "Number of copies sunk after RA");

namespace {		namespace {

class MachineSinking : public MachineFunctionPass {		class MachineSinking : public MachineFunctionPass {
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	private:
/// to the copy source.		/// to the copy source.
void SalvageUnsunkDebugUsersOfCopy(MachineInstr &,		void SalvageUnsunkDebugUsersOfCopy(MachineInstr &,
MachineBasicBlock *TargetBlock);		MachineBasicBlock *TargetBlock);
bool AllUsesDominatedByBlock(Register Reg, MachineBasicBlock *MBB,		bool AllUsesDominatedByBlock(Register Reg, MachineBasicBlock *MBB,
MachineBasicBlock *DefMBB, bool &BreakPHIEdge,		MachineBasicBlock *DefMBB, bool &BreakPHIEdge,
bool &LocalUse) const;		bool &LocalUse) const;
MachineBasicBlock FindSuccToSinkTo(MachineInstr &MI, MachineBasicBlock MBB,		MachineBasicBlock FindSuccToSinkTo(MachineInstr &MI, MachineBasicBlock MBB,
bool &BreakPHIEdge, AllSuccsCache &AllSuccessors);		bool &BreakPHIEdge, AllSuccsCache &AllSuccessors);

		void FindLoopSinkCandidates(MachineLoop L, MachineBasicBlock BB,
		SmallVectorImpl<MachineInstr *> &Candidates);
		bool SinkIntoLoop(MachineLoop *L, MachineInstr &I);

bool isProfitableToSinkTo(Register Reg, MachineInstr &MI,		bool isProfitableToSinkTo(Register Reg, MachineInstr &MI,
MachineBasicBlock *MBB,		MachineBasicBlock *MBB,
MachineBasicBlock *SuccToSinkTo,		MachineBasicBlock *SuccToSinkTo,
AllSuccsCache &AllSuccessors);		AllSuccsCache &AllSuccessors);

bool PerformTrivialForwardCoalescing(MachineInstr &MI,		bool PerformTrivialForwardCoalescing(MachineInstr &MI,
MachineBasicBlock *MBB);		MachineBasicBlock *MBB);

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	for (MachineOperand &MO : MRI->use_nodbg_operands(Reg)) {
// Check that it dominates.		// Check that it dominates.
if (!DT->dominates(MBB, UseBlock))		if (!DT->dominates(MBB, UseBlock))
return false;		return false;
}		}

return true;		return true;
}		}

		/// Return true if this machine instruction loads from global offset table or
		/// constant pool.
		static bool mayLoadFromGOTOrConstantPool(MachineInstr &MI) {
		assert(MI.mayLoad() && "Expected MI that loads!");

		// If we lost memory operands, conservatively assume that the instruction
		// reads from everything..
		if (MI.memoperands_empty())
		return true;

		for (MachineMemOperand *MemOp : MI.memoperands())
		if (const PseudoSourceValue *PSV = MemOp->getPseudoValue())
		if (PSV->isGOT() \|\| PSV->isConstantPool())
		return true;

		return false;
		}

		void MachineSinking::FindLoopSinkCandidates(MachineLoop L, MachineBasicBlock BB,
		SmallVectorImpl<MachineInstr *> &Candidates) {
		samparkerUnsubmitted Not Done Reply Inline Actions SmallVectorImpl<MachineInstr>& samparker:* SmallVectorImpl<MachineInstr*>&
		for (auto &MI : *BB) {
		LLVM_DEBUG(dbgs() << "LoopSink: Analysing candidate: " << MI);
		shchenzUnsubmitted Not Done Reply Inline Actions Should we call `TII->shouldSink()` at the very beginning? Some targets may not want to sink some instructions. shchenz: Should we call `TII->shouldSink()` at the very beginning? Some targets may not want to sink…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Thanks, agreed. SjoerdMeijer: Thanks, agreed.
		if (!TII->shouldSink(MI)) {
		LLVM_DEBUG(dbgs() << "LoopSink: Instruction not a candidate for this "
		"target\n");
		continue;
		}
		if (!L->isLoopInvariant(MI)) {
		LLVM_DEBUG(dbgs() << "LoopSink: Instruction is not loop invariant\n");
		continue;
		shchenzUnsubmitted Not Done Reply Inline Actions Maybe I am wrong, but for the sinking, we must sink an instruction to a block that is dominated by the instruction's parent block. So if the destinate block is executed, BB must also be executed. Do we need the check for `IsGuaranteedToExecute`? I think this is not the same as hoisting. For hoisting, we must make sure the load must be executed as it will be hoisted to the loop preheader which will be executed surely. shchenz: Maybe I am wrong, but for the sinking, we must sink an instruction to a block that is dominated…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions yeah, good catch. We are sinking from `preheader` to a loop block `BB`, and `preheader` dominates `BB`. I agree that we don't need this check, this is indeed different for hoisting and sinking. I will remove it. SjoerdMeijer: yeah, good catch. We are sinking from `preheader` to a loop block `BB`, and `preheader`…
		}
		bool DontMoveAcrossStore = true;
		if (!MI.isSafeToMove(AA, DontMoveAcrossStore)) {
		LLVM_DEBUG(dbgs() << "LoopSink: Instruction not safe to move.\n");
		continue;
		}
		if (MI.mayLoad() && !mayLoadFromGOTOrConstantPool(MI)) {
		LLVM_DEBUG(dbgs() << "LoopSink: Dont sink GOT or constant pool loads\n");
		continue;
		}
		shchenzUnsubmitted Not Done Reply Inline Actions This may be unnecessary as all the instructions come from loop preheader? shchenz: This may be unnecessary as all the instructions come from loop preheader?
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I agreed and changed this into an assert, which seemed like to a good precondition to assert. But this new assert triggered for one of the test cases where the preheader is the entry block, containing COPY instructions of the function arguments which use physical registers. Physical registers are marked as not loop invariant, which is why this assert triggered. So I think I will just keep this for now. SjoerdMeijer: I agreed and changed this into an assert, which seemed like to a good precondition to assert.
		if (MI.isConvergent())
		continue;

		const MachineOperand &MO = MI.getOperand(0);
		samparkerUnsubmitted Not Done Reply Inline Actions What are we doing these checks for? samparker: What are we doing these checks for?
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I think this is being conservative: when an instruction has multiple defs, it's trickier to analyse or we don't want to sink these instructions and skip over it. But I haven't looked much into this (just moved the code), so will need to do that. SjoerdMeijer: I think this is being conservative: when an instruction has multiple defs, it's trickier to…
		if (!MO.isReg() \|\| !MO.getReg() \|\| !MO.isDef())
		continue;
		if (!MRI->hasOneDef(MO.getReg()))
		continue;

		LLVM_DEBUG(dbgs() << "LoopSink: Instruction added as candidate.\n");
		Candidates.push_back(&MI);
		}
		}

bool MachineSinking::runOnMachineFunction(MachineFunction &MF) {		bool MachineSinking::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

LLVM_DEBUG(dbgs() << "****** Machine Sinking ******\n");		LLVM_DEBUG(dbgs() << "****** Machine Sinking ******\n");

TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
Show All 33 Lines	for (auto &Pair : ToSplit) {
} else		} else
LLVM_DEBUG(dbgs() << " *** Not legal to break critical edge\n");		LLVM_DEBUG(dbgs() << " *** Not legal to break critical edge\n");
}		}
// If this iteration over the code changed anything, keep iterating.		// If this iteration over the code changed anything, keep iterating.
if (!MadeChange) break;		if (!MadeChange) break;
EverMadeChange = true;		EverMadeChange = true;
}		}

		if (SinkInstsIntoLoop) {
		SmallVector<MachineLoop *, 8> Loops(LI->begin(), LI->end());
		for (auto *L : Loops) {
		MachineBasicBlock *Preheader = LI->findLoopPreheader(L);
		if (!Preheader) {
		LLVM_DEBUG(dbgs() << "LoopSink: Can't find preheader\n");
		continue;
		}
		SmallVector<MachineInstr *, 8> Candidates;
		FindLoopSinkCandidates(L, Preheader, Candidates);

		// Walk the candidates in reverse order so that we start with the use
		// of a def-use chain, if there is any.
		for (auto It = Candidates.rbegin(); It != Candidates.rend(); ++It) {
		MachineInstr I = It;
		if (!SinkIntoLoop(L, *I))
		samparkerUnsubmitted Not Done Reply Inline Actions Do we need to break here as soon as SinkIntoLoop is false? Otherwise, couldn't we sink an 'earlier' instruction that is an input to an instruction that hasn't been sunk? samparker: Do we need to break here as soon as SinkIntoLoop is false? Otherwise, couldn't we sink an…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Unless I miss something, but I don't think so. I think that is too conservative. For example, if we have a number of sink candidates, and stop after e.g. the very first one if that can't be sunk for some reason, then we miss a lot of opportunities if this instruction is completely Independent from the other candidates. Correct me if I am wrong, but I think what you describe is simply a case of not "IsSafeToMove": there is a dependency that we must respect and can't break. That's why I also walk the candidates in reverse order. Because for a little def-use chain in a preheader, these 2 candidate instructions: %12:gpr64sp = nuw ADDXri %9, 12, 0 %2:gpr64all = COPY %12 if we start with trying to move ADD, then that has a use in the same block, and we can't move the def passed that. If we first move the COPY, then the ADD, or at least try that, we can move the partial/whole chain. SjoerdMeijer: Unless I miss something, but I don't think so. I think that is too conservative. For example…
		samparkerUnsubmitted Not Done Reply Inline Actions I thought IsSafeToMove is just about side-effects and memory operations? I was thinking purely about the data dependency chain. In your copy example, what's stopping us from skipping the COPY but sinking the ADD? (I understand we'll always sink the COPY really, so let's pretend it's just another general consumer of ADD) samparker: I thought IsSafeToMove is just about side-effects and memory operations? I was thinking purely…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Yeah, okay. In my examples and added tests, I am sinking only load instructions (probably need to add some more tests). The extra argument `DontMoveAcrossStore` set to true makes this conservative. But you're right that for non-memory ops, this isn't sufficient. In D94308, I added a local `MachineSinking::IsSafeToMove` that adds alias checks. So, long story short, this NFC and refactoring is bug-compatible with the original code. I thought separating out this refactoring from any new changes would be best, in order to make this change manageable and not too big. (Please note that this off by default.) This still looks the easiest to me, but if you think this is best, then I will start merging D94308 into this, let me know. SjoerdMeijer: Yeah, okay. In my examples and added tests, I am sinking only load instructions (probably need…
		samparkerUnsubmitted Not Done Reply Inline Actions Well how about just adding the break then? I think it's too simple to ignore and would be best to start with a base change that might be correct. samparker: Well how about just adding the break then? I think it's too simple to ignore and would be best…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Yeah, okay, nice one, let's do that. SjoerdMeijer: Yeah, okay, nice one, let's do that.
		break;
		EverMadeChange = true;
		++NumLoopSunk;
		}
		}
		}

HasStoreCache.clear();		HasStoreCache.clear();
StoreInstrCache.clear();		StoreInstrCache.clear();

// Now clear any kill flags for recorded registers.		// Now clear any kill flags for recorded registers.
for (auto I : RegsToClearKillFlags)		for (auto I : RegsToClearKillFlags)
MRI->clearKillFlags(I);		MRI->clearKillFlags(I);
RegsToClearKillFlags.clear();		RegsToClearKillFlags.clear();

▲ Show 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *BB : depth_first(From)) {
}		}
}		}
// If there is no store at all, cache the result.		// If there is no store at all, cache the result.
if (!SawStore)		if (!SawStore)
HasStoreCache[BlockPair] = false;		HasStoreCache[BlockPair] = false;
return HasAliasedStore;		return HasAliasedStore;
}		}

		/// Sink instructions into loops if profitable. This especially tries to prevent
		/// register spills caused by register pressure if there is little to no
		/// overhead moving instructions into loops.
		bool MachineSinking::SinkIntoLoop(MachineLoop *L, MachineInstr &I) {
		LLVM_DEBUG(dbgs() << "LoopSink: Finding sink block for: " << I);
		MachineBasicBlock *Preheader = L->getLoopPreheader();
		assert(Preheader && "Loop sink needs a preheader block");
		MachineBasicBlock *SinkBlock = nullptr;
		bool CanSink = true;
		const MachineOperand &MO = I.getOperand(0);

		for (MachineInstr &MI : MRI->use_instructions(MO.getReg())) {
		LLVM_DEBUG(dbgs() << "LoopSink: Analysing use: " << MI);
		if (!L->contains(&MI)) {
		LLVM_DEBUG(dbgs() << "LoopSink: Use not in loop, can't sink.\n");
		CanSink = false;
		break;
		}

		// FIXME: Come up with a proper cost model that estimates whether sinking
		// the instruction (and thus possibly executing it on every loop
		// iteration) is more expensive than a register.
		// For now assumes that copies are cheap and thus almost always worth it.
		if (!MI.isCopy()) {
		LLVM_DEBUG(dbgs() << "LoopSink: Use is not a copy\n");
		CanSink = false;
		break;
		samparkerUnsubmitted Not Done Reply Inline Actions So how do we know that MI is in a loop? samparker: So how do we know that MI is in a loop?
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Yeah, good point. I think the only simple answer here is that at this point we don't. Will look into this. SjoerdMeijer: Yeah, good point. I think the only simple answer here is that at this point we don't. Will look…
		}
		if (!SinkBlock) {
		SinkBlock = MI.getParent();
		shchenzUnsubmitted Not Done Reply Inline Actions This seems strange, MI should be a user inside the loop, why its parent block must be equal to the loop preheader? shchenz: This seems strange, MI should be a user inside the loop, why its parent block must be equal to…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I thought I had a reason, but it doesn't seem to make sense, so will remove that part of the condition. SjoerdMeijer: I thought I had a reason, but it doesn't seem to make sense, so will remove that part of the…
		LLVM_DEBUG(dbgs() << "LoopSink: Setting sink block to: "
		<< printMBBReference(*SinkBlock) << "\n");
		continue;
		}
		shchenzUnsubmitted Not Done Reply Inline Actions Is this necessary? Assuming MI is a loop user, how does a loop COPY user impact the cost of sinking instruction `I` to the loop? `I` could still be any instruction like a load instruction. shchenz: Is this necessary? Assuming MI is a loop user, how does a loop COPY user impact the cost of…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions You're absolutely right again, this could be any instruction, and is thus not necessary. But in absence of any cost-modeling or other decision making (based on register pressure), the original authors of this code found that it is always profitable to move the instruction if the user is a copy. So yes, this is exactly the first thing that we will be changing when we integrate `MachineSinking::isProfitableToSinkTo` into this work in a next step. SjoerdMeijer: You're absolutely right again, this could be any instruction, and is thus not necessary. But in…
		SinkBlock = DT->findNearestCommonDominator(SinkBlock, MI.getParent());
		if (!SinkBlock) {
		LLVM_DEBUG(dbgs() << "LoopSink: Can't find nearest dominator\n");
		CanSink = false;
		break;
		}
		LLVM_DEBUG(dbgs() << "LoopSink: Setting nearest common dom block: " <<
		printMBBReference(*SinkBlock) << "\n");
		}

		if (!CanSink) {
		LLVM_DEBUG(dbgs() << "LoopSink: Can't sink instruction.\n");
		return false;
		}
		if (!SinkBlock) {
		LLVM_DEBUG(dbgs() << "LoopSink: Not sinking, can't find sink block.\n");
		return false;
		}
		if (SinkBlock == Preheader) {
		LLVM_DEBUG(dbgs() << "LoopSink: Not sinking, sink block is the preheader\n");
		return false;
		}

		LLVM_DEBUG(dbgs() << "LoopSink: Sinking instruction!\n");
		SinkBlock->splice(SinkBlock->getFirstNonPHI(), Preheader, I);

		// The instruction is moved from its basic block, so do not retain the
		// debug information.
		assert(!I.isDebugInstr() && "Should not sink debug inst");
		I.setDebugLoc(DebugLoc());
		return true;
		}

/// SinkInstruction - Determine whether it is safe to sink the specified machine		/// SinkInstruction - Determine whether it is safe to sink the specified machine
/// instruction out of its current block into a successor.		/// instruction out of its current block into a successor.
bool MachineSinking::SinkInstruction(MachineInstr &MI, bool &SawStore,		bool MachineSinking::SinkInstruction(MachineInstr &MI, bool &SawStore,
AllSuccsCache &AllSuccessors) {		AllSuccsCache &AllSuccessors) {
// Don't sink instructions that the target prefers not to sink.		// Don't sink instructions that the target prefers not to sink.
if (!TII->shouldSink(MI))		if (!TII->shouldSink(MI))
return false;		return false;

▲ Show 20 Lines • Show All 522 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/loop-sink.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple aarch64 -run-pass=machine-sink -sink-insts-to-avoid-spills %s -o - 2>&1 \| FileCheck %s
				--- \|
				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64"

				@A = external dso_local global [100 x i32], align 4
				%struct.A = type { i32, i32, i32, i32, i32, i32 }

				define void @cant_sink_adds_call_in_block(i8* nocapture readonly %input, %struct.A* %a) {
				%1 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 1
				%2 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 2
				%3 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 3
				%4 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 4
				%5 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 5
				%scevgep = getelementptr i8, i8* %input, i64 1
				br label %.backedge

				.backedge: ; preds = %.backedge.backedge, %0
				%lsr.iv = phi i8* [ %scevgep1, %.backedge.backedge ], [ %scevgep, %0 ]
				%6 = load i8, i8* %lsr.iv, align 1
				%7 = zext i8 %6 to i32
				switch i32 %7, label %.backedge.backedge [
				i32 0, label %8
				i32 10, label %10
				i32 20, label %11
				i32 30, label %12
				i32 40, label %13
				i32 50, label %14
				]

				8: ; preds = %.backedge
				%9 = bitcast %struct.A* %a to i32*
				tail call void @_Z6assignPj(i32* %9)
				br label %.backedge.backedge

				10: ; preds = %.backedge
				tail call void @_Z6assignPj(i32* %1)
				br label %.backedge.backedge

				11: ; preds = %.backedge
				tail call void @_Z6assignPj(i32* %2)
				br label %.backedge.backedge

				12: ; preds = %.backedge
				tail call void @_Z6assignPj(i32* %3)
				br label %.backedge.backedge

				13: ; preds = %.backedge
				tail call void @_Z6assignPj(i32* %4)
				br label %.backedge.backedge

				14: ; preds = %.backedge
				tail call void @_Z6assignPj(i32* %5)
				br label %.backedge.backedge

				.backedge.backedge: ; preds = %14, %13, %12, %11, %10, %8, %.backedge
				%scevgep1 = getelementptr i8, i8* %lsr.iv, i64 1
				br label %.backedge
				}

				define i32 @load_not_safe_to_move_consecutive_call(i32 %n) {
				entry:
				%cmp63 = icmp sgt i32 %n, 0
				br i1 %cmp63, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%0 = load i32, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0), align 4
				%call0 = tail call i32 @use(i32 %n)
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				ret i32 %sum.0.lcssa

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.065 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.065, %0
				%lsr.iv.next = add i32 %lsr.iv, -1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define i32 @load_not_safe_to_move_consecutive_call_use(i32 %n) {
				entry:
				%cmp63 = icmp sgt i32 %n, 0
				br i1 %cmp63, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%0 = load i32, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0), align 4
				%call0 = tail call i32 @use(i32 %0)
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				ret i32 %sum.0.lcssa

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.065 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.065, %0
				%lsr.iv.next = add i32 %lsr.iv, -1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define i32 @cant_sink_use_outside_loop(i32 %n) {
				entry:
				%cmp63 = icmp sgt i32 %n, 0
				br i1 %cmp63, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%0 = load i32, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0), align 4
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				%use.outside.loop = phi i32 [ 0, %entry ], [ %0, %for.body ]
				%call = tail call i32 @use(i32 %use.outside.loop)
				ret i32 %sum.0.lcssa

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.065 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.065, %sum.065
				%lsr.iv.next = add i32 %lsr.iv, -1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define i32 @use_is_not_a_copy(i32 %n) {
				entry:
				%cmp63 = icmp sgt i32 %n, 0
				br i1 %cmp63, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%0 = load i32, i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0), align 4
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				ret i32 %sum.0.lcssa

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.065 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.065, %0
				%lsr.iv.next = add i32 %lsr.iv, -1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define dso_local void @sink_add(i32* noalias nocapture readonly %read, i32* noalias nocapture %write, i32 %n) local_unnamed_addr #0 {
				entry:
				%0 = load i32, i32* %read, align 4, !tbaa !6
				%cmp10 = icmp sgt i32 %n, 0
				br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%1 = add i32 %0, 42
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				store i32 %sum.0.lcssa, i32* %write, align 4, !tbaa !6
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%lsr.iv1 = phi i32 [ %1, %for.body.preheader ], [ %lsr.iv.next2, %for.body ]
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.011 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.011, %lsr.iv1
				%lsr.iv.next = add i32 %lsr.iv, -1
				%lsr.iv.next2 = add i32 %lsr.iv1, 1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define dso_local void @store_after_add(i32* noalias nocapture readonly %read, i32* noalias nocapture %write, i32* nocapture %store, i32 %n) local_unnamed_addr #0 {
				entry:
				%0 = load i32, i32* %read, align 4, !tbaa !6
				%cmp10 = icmp sgt i32 %n, 0
				br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%1 = add i32 %0, 42
				store i32 43, i32* %store, align 4, !tbaa !6
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				store i32 %sum.0.lcssa, i32* %write, align 4, !tbaa !6
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%lsr.iv1 = phi i32 [ %1, %for.body.preheader ], [ %lsr.iv.next2, %for.body ]
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.011 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.011, %lsr.iv1
				%lsr.iv.next = add i32 %lsr.iv, -1
				%lsr.iv.next2 = add i32 %lsr.iv1, 1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !10
				}

				define dso_local void @aliased_store_after_add(i32* noalias nocapture readonly %read, i32* noalias nocapture %write, i32* nocapture %store, i32 %n) local_unnamed_addr #0 {
				entry:
				%0 = load i32, i32* %read, align 4, !tbaa !6
				%cmp10 = icmp sgt i32 %n, 0
				br i1 %cmp10, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%1 = add i32 %0, 42
				store i32 43, i32* %read, align 4, !tbaa !6
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ %n, %entry ], [ %div, %for.body ]
				store i32 %sum.0.lcssa, i32* %write, align 4, !tbaa !6
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%lsr.iv1 = phi i32 [ %1, %for.body.preheader ], [ %lsr.iv.next2, %for.body ]
				%lsr.iv = phi i32 [ %n, %for.body.preheader ], [ %lsr.iv.next, %for.body ]
				%sum.011 = phi i32 [ %div, %for.body ], [ %n, %for.body.preheader ]
				%div = sdiv i32 %sum.011, %lsr.iv1
				%lsr.iv.next = add i32 %lsr.iv, -1
				%lsr.iv.next2 = add i32 %lsr.iv1, 1
				%exitcond.not = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !10
				}


				declare i32 @use(i32)
				declare void @_Z6assignPj(i32*)

				!6 = !{!7, !7, i64 0}
				!7 = !{!"int", !8, i64 0}
				!8 = !{!"omnipotent char", !9, i64 0}
				!9 = !{!"Simple C/C++ TBAA"}
				!10 = distinct !{!10, !11}
				!11 = !{!"llvm.loop.mustprogress"}

				...
				---
				name: cant_sink_adds_call_in_block
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr64all, preferred-register: '' }
				- { id: 1, class: gpr64all, preferred-register: '' }
				- { id: 2, class: gpr64all, preferred-register: '' }
				- { id: 3, class: gpr64all, preferred-register: '' }
				- { id: 4, class: gpr64all, preferred-register: '' }
				- { id: 5, class: gpr64all, preferred-register: '' }
				- { id: 6, class: gpr64sp, preferred-register: '' }
				- { id: 7, class: gpr64all, preferred-register: '' }
				- { id: 8, class: gpr64common, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64sp, preferred-register: '' }
				- { id: 11, class: gpr64sp, preferred-register: '' }
				- { id: 12, class: gpr64sp, preferred-register: '' }
				- { id: 13, class: gpr64sp, preferred-register: '' }
				- { id: 14, class: gpr64sp, preferred-register: '' }
				- { id: 15, class: gpr64sp, preferred-register: '' }
				- { id: 16, class: gpr64, preferred-register: '' }
				- { id: 17, class: gpr32, preferred-register: '' }
				- { id: 18, class: gpr32sp, preferred-register: '' }
				- { id: 19, class: gpr32, preferred-register: '' }
				- { id: 20, class: gpr64, preferred-register: '' }
				- { id: 21, class: gpr64, preferred-register: '' }
				- { id: 22, class: gpr64sp, preferred-register: '' }
				- { id: 23, class: gpr64sp, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%8' }
				- { reg: '$x1', virtual-reg: '%9' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				jumpTable:
				kind: block-address
				entries:
				- id: 0
				blocks: [ '%bb.2', '%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8',
				'%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.3', '%bb.8',
				'%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8',
				'%bb.8', '%bb.8', '%bb.4', '%bb.8', '%bb.8', '%bb.8',
				'%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8',
				'%bb.5', '%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8',
				'%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.6', '%bb.8',
				'%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8', '%bb.8',
				'%bb.8', '%bb.8', '%bb.7' ]
				body: \|
				; CHECK-LABEL: name: cant_sink_adds_call_in_block
				; CHECK: bb.0 (%ir-block.0):
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $x0, $x1
				; CHECK: [[COPY:%[0-9]+]]:gpr64common = COPY $x1
				; CHECK: [[COPY1:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK: [[ADDXri:%[0-9]+]]:gpr64sp = nuw ADDXri [[COPY]], 4, 0
				; CHECK: [[COPY2:%[0-9]+]]:gpr64all = COPY [[ADDXri]]
				; CHECK: [[ADDXri1:%[0-9]+]]:gpr64sp = nuw ADDXri [[COPY]], 8, 0
				; CHECK: [[COPY3:%[0-9]+]]:gpr64all = COPY [[ADDXri1]]
				; CHECK: [[ADDXri2:%[0-9]+]]:gpr64sp = nuw ADDXri [[COPY]], 12, 0
				; CHECK: [[COPY4:%[0-9]+]]:gpr64all = COPY [[ADDXri2]]
				; CHECK: [[ADDXri3:%[0-9]+]]:gpr64sp = nuw ADDXri [[COPY]], 16, 0
				; CHECK: [[COPY5:%[0-9]+]]:gpr64all = COPY [[ADDXri3]]
				; CHECK: [[ADDXri4:%[0-9]+]]:gpr64sp = nuw ADDXri [[COPY]], 20, 0
				; CHECK: [[COPY6:%[0-9]+]]:gpr64all = COPY [[ADDXri4]]
				; CHECK: [[ADDXri5:%[0-9]+]]:gpr64sp = ADDXri [[COPY1]], 1, 0
				; CHECK: [[COPY7:%[0-9]+]]:gpr64all = COPY [[ADDXri5]]
				; CHECK: [[MOVaddrJT:%[0-9]+]]:gpr64 = MOVaddrJT target-flags(aarch64-page) %jump-table.0, target-flags(aarch64-pageoff, aarch64-nc) %jump-table.0
				; CHECK: bb.1..backedge:
				; CHECK: successors: %bb.9(0x09249249), %bb.2(0x76db6db7)
				; CHECK: [[PHI:%[0-9]+]]:gpr64sp = PHI [[COPY7]], %bb.0, %7, %bb.9
				; CHECK: [[LDRBBui:%[0-9]+]]:gpr32 = LDRBBui [[PHI]], 0 :: (load 1 from %ir.lsr.iv)
				; CHECK: [[SUBREG_TO_REG:%[0-9]+]]:gpr64 = SUBREG_TO_REG 0, killed [[LDRBBui]], %subreg.sub_32
				; CHECK: [[COPY8:%[0-9]+]]:gpr32sp = COPY [[SUBREG_TO_REG]].sub_32
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri killed [[COPY8]], 50, 0, implicit-def $nzcv
				; CHECK: Bcc 8, %bb.9, implicit $nzcv
				; CHECK: bb.2..backedge:
				; CHECK: successors: %bb.3(0x13b13b14), %bb.9(0x09d89d8a), %bb.4(0x13b13b14), %bb.5(0x13b13b14), %bb.6(0x13b13b14), %bb.7(0x13b13b14), %bb.8(0x13b13b14)
				; CHECK: early-clobber %21:gpr64, early-clobber %22:gpr64sp = JumpTableDest32 [[MOVaddrJT]], [[SUBREG_TO_REG]], %jump-table.0
				; CHECK: BR killed %21
				; CHECK: bb.3 (%ir-block.8):
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $x0 = COPY [[COPY]]
				; CHECK: BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.9
				; CHECK: bb.4 (%ir-block.10):
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $x0 = COPY [[COPY2]]
				; CHECK: BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.9
				; CHECK: bb.5 (%ir-block.11):
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $x0 = COPY [[COPY3]]
				; CHECK: BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.9
				; CHECK: bb.6 (%ir-block.12):
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $x0 = COPY [[COPY4]]
				; CHECK: BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.9
				; CHECK: bb.7 (%ir-block.13):
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $x0 = COPY [[COPY5]]
				; CHECK: BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.9
				; CHECK: bb.8 (%ir-block.14):
				; CHECK: successors: %bb.9(0x80000000)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $x0 = COPY [[COPY6]]
				; CHECK: BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: bb.9..backedge.backedge:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: [[ADDXri6:%[0-9]+]]:gpr64sp = ADDXri [[PHI]], 1, 0
				; CHECK: [[COPY9:%[0-9]+]]:gpr64all = COPY [[ADDXri6]]
				; CHECK: B %bb.1
				bb.0 (%ir-block.0):
				successors: %bb.1(0x80000000)
				liveins: $x0, $x1

				%9:gpr64common = COPY $x1
				%8:gpr64common = COPY $x0
				%10:gpr64sp = nuw ADDXri %9, 4, 0
				%0:gpr64all = COPY %10
				%11:gpr64sp = nuw ADDXri %9, 8, 0
				%1:gpr64all = COPY %11
				%12:gpr64sp = nuw ADDXri %9, 12, 0
				%2:gpr64all = COPY %12
				%13:gpr64sp = nuw ADDXri %9, 16, 0
				%3:gpr64all = COPY %13
				%14:gpr64sp = nuw ADDXri %9, 20, 0
				%4:gpr64all = COPY %14
				%15:gpr64sp = ADDXri %8, 1, 0
				%5:gpr64all = COPY %15
				%20:gpr64 = MOVaddrJT target-flags(aarch64-page) %jump-table.0, target-flags(aarch64-pageoff, aarch64-nc) %jump-table.0

				bb.1..backedge:
				successors: %bb.8(0x09249249), %bb.9(0x76db6db7)

				%6:gpr64sp = PHI %5, %bb.0, %7, %bb.8
				%17:gpr32 = LDRBBui %6, 0 :: (load 1 from %ir.lsr.iv)
				%16:gpr64 = SUBREG_TO_REG 0, killed %17, %subreg.sub_32
				%18:gpr32sp = COPY %16.sub_32
				%19:gpr32 = SUBSWri killed %18, 50, 0, implicit-def $nzcv
				Bcc 8, %bb.8, implicit $nzcv

				bb.9..backedge:
				successors: %bb.2(0x13b13b14), %bb.8(0x09d89d8a), %bb.3(0x13b13b14), %bb.4(0x13b13b14), %bb.5(0x13b13b14), %bb.6(0x13b13b14), %bb.7(0x13b13b14)

				early-clobber %21:gpr64, early-clobber %22:gpr64sp = JumpTableDest32 %20, %16, %jump-table.0
				BR killed %21

				bb.2 (%ir-block.8):
				successors: %bb.8(0x80000000)

				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = COPY %9
				BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.8

				bb.3 (%ir-block.10):
				successors: %bb.8(0x80000000)

				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = COPY %0
				BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.8

				bb.4 (%ir-block.11):
				successors: %bb.8(0x80000000)

				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = COPY %1
				BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.8

				bb.5 (%ir-block.12):
				successors: %bb.8(0x80000000)

				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = COPY %2
				BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.8

				bb.6 (%ir-block.13):
				successors: %bb.8(0x80000000)

				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = COPY %3
				BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.8

				bb.7 (%ir-block.14):
				successors: %bb.8(0x80000000)

				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = COPY %4
				BL @_Z6assignPj, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit-def $sp
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp

				bb.8..backedge.backedge:
				successors: %bb.1(0x80000000)

				%23:gpr64sp = ADDXri %6, 1, 0
				%7:gpr64all = COPY %23
				B %bb.1

				...
				---
				name: load_not_safe_to_move_consecutive_call
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32sp, preferred-register: '' }
				- { id: 3, class: gpr32, preferred-register: '' }
				- { id: 4, class: gpr32all, preferred-register: '' }
				- { id: 5, class: gpr32all, preferred-register: '' }
				- { id: 6, class: gpr32common, preferred-register: '' }
				- { id: 7, class: gpr32, preferred-register: '' }
				- { id: 8, class: gpr64common, preferred-register: '' }
				- { id: 9, class: gpr32, preferred-register: '' }
				- { id: 10, class: gpr32all, preferred-register: '' }
				- { id: 11, class: gpr32, preferred-register: '' }
				- { id: 12, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$w0', virtual-reg: '%6' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: load_not_safe_to_move_consecutive_call
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.2(0x30000000)
				; CHECK: liveins: $w0
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 11, %bb.2, implicit $nzcv
				; CHECK: B %bb.1
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @A
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui killed [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $w0 = COPY [[COPY]]
				; CHECK: BL @use, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY]], %bb.0, %4, %bb.3
				; CHECK: $w0 = COPY [[PHI]]
				; CHECK: RET_ReallyLR implicit $w0
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.2(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI1:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %5, %bb.3
				; CHECK: [[PHI2:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.1, %4, %bb.3
				; CHECK: [[SDIVWr:%[0-9]+]]:gpr32 = SDIVWr [[PHI2]], [[LDRWui]]
				; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[SDIVWr]]
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI1]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY2:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: Bcc 0, %bb.2, implicit $nzcv
				; CHECK: B %bb.3
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $w0

				%6:gpr32common = COPY $w0
				%7:gpr32 = SUBSWri %6, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%8:gpr64common = ADRP target-flags(aarch64-page) @A
				%9:gpr32 = LDRWui killed %8, target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$w0 = COPY %6
				BL @use, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.3

				bb.2.for.cond.cleanup:
				%1:gpr32all = PHI %6, %bb.0, %4, %bb.3
				$w0 = COPY %1
				RET_ReallyLR implicit $w0

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%2:gpr32sp = PHI %6, %bb.1, %5, %bb.3
				%3:gpr32 = PHI %6, %bb.1, %4, %bb.3
				%11:gpr32 = SDIVWr %3, %9
				%4:gpr32all = COPY %11
				%12:gpr32 = SUBSWri %2, 1, 0, implicit-def $nzcv
				%5:gpr32all = COPY %12
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...
				---
				name: load_not_safe_to_move_consecutive_call_use
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32sp, preferred-register: '' }
				- { id: 3, class: gpr32, preferred-register: '' }
				- { id: 4, class: gpr32all, preferred-register: '' }
				- { id: 5, class: gpr32all, preferred-register: '' }
				- { id: 6, class: gpr32common, preferred-register: '' }
				- { id: 7, class: gpr32, preferred-register: '' }
				- { id: 8, class: gpr64common, preferred-register: '' }
				- { id: 9, class: gpr32, preferred-register: '' }
				- { id: 10, class: gpr32all, preferred-register: '' }
				- { id: 11, class: gpr32, preferred-register: '' }
				- { id: 12, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$w0', virtual-reg: '%6' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: load_not_safe_to_move_consecutive_call_use
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.2(0x30000000)
				; CHECK: liveins: $w0
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 11, %bb.2, implicit $nzcv
				; CHECK: B %bb.1
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @A
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui killed [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $w0 = COPY [[LDRWui]]
				; CHECK: BL @use, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY]], %bb.0, %4, %bb.3
				; CHECK: $w0 = COPY [[PHI]]
				; CHECK: RET_ReallyLR implicit $w0
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.2(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI1:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %5, %bb.3
				; CHECK: [[PHI2:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.1, %4, %bb.3
				; CHECK: [[SDIVWr:%[0-9]+]]:gpr32 = SDIVWr [[PHI2]], [[LDRWui]]
				; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[SDIVWr]]
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI1]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY2:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: Bcc 0, %bb.2, implicit $nzcv
				; CHECK: B %bb.3
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $w0

				%6:gpr32common = COPY $w0
				%7:gpr32 = SUBSWri %6, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%8:gpr64common = ADRP target-flags(aarch64-page) @A
				%9:gpr32 = LDRWui killed %8, target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$w0 = COPY %9
				BL @use, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				B %bb.3

				bb.2.for.cond.cleanup:
				%1:gpr32all = PHI %6, %bb.0, %4, %bb.3
				$w0 = COPY %1
				RET_ReallyLR implicit $w0

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%2:gpr32sp = PHI %6, %bb.1, %5, %bb.3
				%3:gpr32 = PHI %6, %bb.1, %4, %bb.3
				%11:gpr32 = SDIVWr %3, %9
				%4:gpr32all = COPY %11
				%12:gpr32 = SUBSWri %2, 1, 0, implicit-def $nzcv
				%5:gpr32all = COPY %12
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...
				---
				name: cant_sink_use_outside_loop
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32all, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32all, preferred-register: '' }
				- { id: 3, class: gpr32sp, preferred-register: '' }
				- { id: 4, class: gpr32all, preferred-register: '' }
				- { id: 5, class: gpr32all, preferred-register: '' }
				- { id: 6, class: gpr32all, preferred-register: '' }
				- { id: 7, class: gpr32common, preferred-register: '' }
				- { id: 8, class: gpr32all, preferred-register: '' }
				- { id: 9, class: gpr32all, preferred-register: '' }
				- { id: 10, class: gpr32, preferred-register: '' }
				- { id: 11, class: gpr64common, preferred-register: '' }
				- { id: 12, class: gpr32, preferred-register: '' }
				- { id: 13, class: gpr32, preferred-register: '' }
				- { id: 14, class: gpr32, preferred-register: '' }
				- { id: 15, class: gpr32all, preferred-register: '' }
				liveins:
				- { reg: '$w0', virtual-reg: '%7' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: true
				hasCalls: true
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: cant_sink_use_outside_loop
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.4(0x30000000)
				; CHECK: liveins: $w0
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 10, %bb.1, implicit $nzcv
				; CHECK: bb.4:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY $wzr
				; CHECK: [[COPY2:%[0-9]+]]:gpr32all = COPY [[COPY1]]
				; CHECK: B %bb.2
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @A
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui killed [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				; CHECK: [[COPY3:%[0-9]+]]:gpr32all = COPY [[LDRWui]]
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY]], %bb.4, %5, %bb.5
				; CHECK: [[PHI1:%[0-9]+]]:gpr32all = PHI [[COPY2]], %bb.4, [[COPY3]], %bb.5
				; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $w0 = COPY [[PHI1]]
				; CHECK: BL @use, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
				; CHECK: ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				; CHECK: $w0 = COPY [[PHI]]
				; CHECK: RET_ReallyLR implicit $w0
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.5(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI2:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %6, %bb.3
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI2]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY4:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: Bcc 1, %bb.3, implicit $nzcv
				; CHECK: bb.5:
				; CHECK: successors: %bb.2(0x80000000)
				; CHECK: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 1
				; CHECK: [[COPY5:%[0-9]+]]:gpr32all = COPY [[MOVi32imm]]
				; CHECK: B %bb.2
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $w0

				%7:gpr32common = COPY $w0
				%9:gpr32all = COPY $wzr
				%8:gpr32all = COPY %9
				%10:gpr32 = SUBSWri %7, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%11:gpr64common = ADRP target-flags(aarch64-page) @A
				%12:gpr32 = LDRWui killed %11, target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				%0:gpr32all = COPY %12
				B %bb.3

				bb.2.for.cond.cleanup:
				%1:gpr32all = PHI %7, %bb.0, %5, %bb.3
				%2:gpr32all = PHI %8, %bb.0, %0, %bb.3
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$w0 = COPY %2
				BL @use, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				$w0 = COPY %1
				RET_ReallyLR implicit $w0

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%3:gpr32sp = PHI %7, %bb.1, %6, %bb.3
				%13:gpr32 = MOVi32imm 1
				%5:gpr32all = COPY %13
				%14:gpr32 = SUBSWri %3, 1, 0, implicit-def $nzcv
				%6:gpr32all = COPY %14
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...
				---
				name: use_is_not_a_copy
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32sp, preferred-register: '' }
				- { id: 3, class: gpr32, preferred-register: '' }
				- { id: 4, class: gpr32all, preferred-register: '' }
				- { id: 5, class: gpr32all, preferred-register: '' }
				- { id: 6, class: gpr32common, preferred-register: '' }
				- { id: 7, class: gpr32, preferred-register: '' }
				- { id: 8, class: gpr64common, preferred-register: '' }
				- { id: 9, class: gpr32, preferred-register: '' }
				- { id: 10, class: gpr32, preferred-register: '' }
				- { id: 11, class: gpr32, preferred-register: '' }
				liveins:
				- { reg: '$w0', virtual-reg: '%6' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: use_is_not_a_copy
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.2(0x30000000)
				; CHECK: liveins: $w0
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 11, %bb.2, implicit $nzcv
				; CHECK: B %bb.1
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[ADRP:%[0-9]+]]:gpr64common = ADRP target-flags(aarch64-page) @A
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32 = LDRWui killed [[ADRP]], target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32all = PHI [[COPY]], %bb.0, %4, %bb.3
				; CHECK: $w0 = COPY [[PHI]]
				; CHECK: RET_ReallyLR implicit $w0
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.2(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI1:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %5, %bb.3
				; CHECK: [[PHI2:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.1, %4, %bb.3
				; CHECK: [[SDIVWr:%[0-9]+]]:gpr32 = SDIVWr [[PHI2]], [[LDRWui]]
				; CHECK: [[COPY1:%[0-9]+]]:gpr32all = COPY [[SDIVWr]]
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI1]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY2:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: Bcc 0, %bb.2, implicit $nzcv
				; CHECK: B %bb.3
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $w0

				%6:gpr32common = COPY $w0
				%7:gpr32 = SUBSWri %6, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%8:gpr64common = ADRP target-flags(aarch64-page) @A
				%9:gpr32 = LDRWui killed %8, target-flags(aarch64-pageoff, aarch64-nc) @A :: (dereferenceable load 4 from `i32* getelementptr inbounds ([100 x i32], [100 x i32]* @A, i64 0, i64 0)`)
				B %bb.3

				bb.2.for.cond.cleanup:
				%1:gpr32all = PHI %6, %bb.0, %4, %bb.3
				$w0 = COPY %1
				RET_ReallyLR implicit $w0

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%2:gpr32sp = PHI %6, %bb.1, %5, %bb.3
				%3:gpr32 = PHI %6, %bb.1, %4, %bb.3
				%10:gpr32 = SDIVWr %3, %9
				%4:gpr32all = COPY %10
				%11:gpr32 = SUBSWri %2, 1, 0, implicit-def $nzcv
				%5:gpr32all = COPY %11
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...
				---
				name: sink_add
				alignment: 16
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32sp, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32, preferred-register: '' }
				- { id: 3, class: gpr32common, preferred-register: '' }
				- { id: 4, class: gpr32sp, preferred-register: '' }
				- { id: 5, class: gpr32, preferred-register: '' }
				- { id: 6, class: gpr32all, preferred-register: '' }
				- { id: 7, class: gpr32all, preferred-register: '' }
				- { id: 8, class: gpr32all, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64common, preferred-register: '' }
				- { id: 11, class: gpr32common, preferred-register: '' }
				- { id: 12, class: gpr32common, preferred-register: '' }
				- { id: 13, class: gpr32, preferred-register: '' }
				- { id: 14, class: gpr32sp, preferred-register: '' }
				- { id: 15, class: gpr32, preferred-register: '' }
				- { id: 16, class: gpr32, preferred-register: '' }
				- { id: 17, class: gpr32sp, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				- { reg: '$w2', virtual-reg: '%11' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: sink_add
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.2(0x30000000)
				; CHECK: liveins: $x0, $x1, $w2
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w2
				; CHECK: [[COPY1:%[0-9]+]]:gpr64common = COPY $x1
				; CHECK: [[COPY2:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 11, %bb.2, implicit $nzcv
				; CHECK: B %bb.1
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32common = LDRWui [[COPY2]], 0 :: (load 4 from %ir.read, !tbaa !0)
				; CHECK: [[ADDWri:%[0-9]+]]:gpr32sp = ADDWri [[LDRWui]], 42, 0
				; CHECK: [[COPY3:%[0-9]+]]:gpr32all = COPY [[ADDWri]]
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.0, %6, %bb.3
				; CHECK: STRWui [[PHI]], [[COPY1]], 0 :: (store 4 into %ir.write, !tbaa !0)
				; CHECK: RET_ReallyLR
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.2(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI1:%[0-9]+]]:gpr32common = PHI [[COPY3]], %bb.1, %8, %bb.3
				; CHECK: [[PHI2:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %7, %bb.3
				; CHECK: [[PHI3:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.1, %6, %bb.3
				; CHECK: [[SDIVWr:%[0-9]+]]:gpr32 = SDIVWr [[PHI3]], [[PHI1]]
				; CHECK: [[COPY4:%[0-9]+]]:gpr32all = COPY [[SDIVWr]]
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI2]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY5:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: [[ADDWri1:%[0-9]+]]:gpr32sp = ADDWri [[PHI1]], 1, 0
				; CHECK: [[COPY6:%[0-9]+]]:gpr32all = COPY [[ADDWri1]]
				; CHECK: Bcc 0, %bb.2, implicit $nzcv
				; CHECK: B %bb.3
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $x0, $x1, $w2

				%11:gpr32common = COPY $w2
				%10:gpr64common = COPY $x1
				%9:gpr64common = COPY $x0
				%12:gpr32common = LDRWui %9, 0 :: (load 4 from %ir.read, !tbaa !6)
				%13:gpr32 = SUBSWri %11, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%14:gpr32sp = ADDWri %12, 42, 0
				%1:gpr32all = COPY %14
				B %bb.3

				bb.2.for.cond.cleanup:
				%2:gpr32 = PHI %11, %bb.0, %6, %bb.3
				STRWui %2, %10, 0 :: (store 4 into %ir.write, !tbaa !6)
				RET_ReallyLR

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%3:gpr32common = PHI %1, %bb.1, %8, %bb.3
				%4:gpr32sp = PHI %11, %bb.1, %7, %bb.3
				%5:gpr32 = PHI %11, %bb.1, %6, %bb.3
				%15:gpr32 = SDIVWr %5, %3
				%6:gpr32all = COPY %15
				%16:gpr32 = SUBSWri %4, 1, 0, implicit-def $nzcv
				%7:gpr32all = COPY %16
				%17:gpr32sp = ADDWri %3, 1, 0
				%8:gpr32all = COPY %17
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...
				---
				name: store_after_add
				alignment: 16
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32sp, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32, preferred-register: '' }
				- { id: 3, class: gpr32common, preferred-register: '' }
				- { id: 4, class: gpr32sp, preferred-register: '' }
				- { id: 5, class: gpr32, preferred-register: '' }
				- { id: 6, class: gpr32all, preferred-register: '' }
				- { id: 7, class: gpr32all, preferred-register: '' }
				- { id: 8, class: gpr32all, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64common, preferred-register: '' }
				- { id: 11, class: gpr64common, preferred-register: '' }
				- { id: 12, class: gpr32common, preferred-register: '' }
				- { id: 13, class: gpr32common, preferred-register: '' }
				- { id: 14, class: gpr32, preferred-register: '' }
				- { id: 15, class: gpr32, preferred-register: '' }
				- { id: 16, class: gpr32sp, preferred-register: '' }
				- { id: 17, class: gpr32, preferred-register: '' }
				- { id: 18, class: gpr32, preferred-register: '' }
				- { id: 19, class: gpr32sp, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				- { reg: '$x2', virtual-reg: '%11' }
				- { reg: '$w3', virtual-reg: '%12' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: store_after_add
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.2(0x30000000)
				; CHECK: liveins: $x0, $x1, $x2, $w3
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w3
				; CHECK: [[COPY1:%[0-9]+]]:gpr64common = COPY $x2
				; CHECK: [[COPY2:%[0-9]+]]:gpr64common = COPY $x1
				; CHECK: [[COPY3:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 11, %bb.2, implicit $nzcv
				; CHECK: B %bb.1
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32common = LDRWui [[COPY3]], 0 :: (load 4 from %ir.read, !tbaa !0)
				; CHECK: [[ADDWri:%[0-9]+]]:gpr32sp = ADDWri [[LDRWui]], 42, 0
				; CHECK: [[COPY4:%[0-9]+]]:gpr32all = COPY [[ADDWri]]
				; CHECK: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 43
				; CHECK: STRWui killed [[MOVi32imm]], [[COPY1]], 0 :: (store 4 into %ir.store, !tbaa !0)
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.0, %6, %bb.3
				; CHECK: STRWui [[PHI]], [[COPY2]], 0 :: (store 4 into %ir.write, !tbaa !0)
				; CHECK: RET_ReallyLR
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.2(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI1:%[0-9]+]]:gpr32common = PHI [[COPY4]], %bb.1, %8, %bb.3
				; CHECK: [[PHI2:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %7, %bb.3
				; CHECK: [[PHI3:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.1, %6, %bb.3
				; CHECK: [[SDIVWr:%[0-9]+]]:gpr32 = SDIVWr [[PHI3]], [[PHI1]]
				; CHECK: [[COPY5:%[0-9]+]]:gpr32all = COPY [[SDIVWr]]
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI2]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY6:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: [[ADDWri1:%[0-9]+]]:gpr32sp = ADDWri [[PHI1]], 1, 0
				; CHECK: [[COPY7:%[0-9]+]]:gpr32all = COPY [[ADDWri1]]
				; CHECK: Bcc 0, %bb.2, implicit $nzcv
				; CHECK: B %bb.3
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $x0, $x1, $x2, $w3

				%12:gpr32common = COPY $w3
				%11:gpr64common = COPY $x2
				%10:gpr64common = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr32common = LDRWui %9, 0 :: (load 4 from %ir.read, !tbaa !6)
				%15:gpr32 = SUBSWri %12, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%16:gpr32sp = ADDWri %13, 42, 0
				%1:gpr32all = COPY %16
				%14:gpr32 = MOVi32imm 43
				STRWui killed %14, %11, 0 :: (store 4 into %ir.store, !tbaa !6)
				B %bb.3

				bb.2.for.cond.cleanup:
				%2:gpr32 = PHI %12, %bb.0, %6, %bb.3
				STRWui %2, %10, 0 :: (store 4 into %ir.write, !tbaa !6)
				RET_ReallyLR

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%3:gpr32common = PHI %1, %bb.1, %8, %bb.3
				%4:gpr32sp = PHI %12, %bb.1, %7, %bb.3
				%5:gpr32 = PHI %12, %bb.1, %6, %bb.3
				%17:gpr32 = SDIVWr %5, %3
				%6:gpr32all = COPY %17
				%18:gpr32 = SUBSWri %4, 1, 0, implicit-def $nzcv
				%7:gpr32all = COPY %18
				%19:gpr32sp = ADDWri %3, 1, 0
				%8:gpr32all = COPY %19
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...
				---
				name: aliased_store_after_add
				alignment: 16
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				failedISel: false
				tracksRegLiveness: true
				hasWinCFI: false
				registers:
				- { id: 0, class: gpr32sp, preferred-register: '' }
				- { id: 1, class: gpr32all, preferred-register: '' }
				- { id: 2, class: gpr32, preferred-register: '' }
				- { id: 3, class: gpr32common, preferred-register: '' }
				- { id: 4, class: gpr32sp, preferred-register: '' }
				- { id: 5, class: gpr32, preferred-register: '' }
				- { id: 6, class: gpr32all, preferred-register: '' }
				- { id: 7, class: gpr32all, preferred-register: '' }
				- { id: 8, class: gpr32all, preferred-register: '' }
				- { id: 9, class: gpr64common, preferred-register: '' }
				- { id: 10, class: gpr64common, preferred-register: '' }
				- { id: 11, class: gpr64common, preferred-register: '' }
				- { id: 12, class: gpr32common, preferred-register: '' }
				- { id: 13, class: gpr32common, preferred-register: '' }
				- { id: 14, class: gpr32, preferred-register: '' }
				- { id: 15, class: gpr32, preferred-register: '' }
				- { id: 16, class: gpr32sp, preferred-register: '' }
				- { id: 17, class: gpr32, preferred-register: '' }
				- { id: 18, class: gpr32, preferred-register: '' }
				- { id: 19, class: gpr32sp, preferred-register: '' }
				liveins:
				- { reg: '$x0', virtual-reg: '%9' }
				- { reg: '$x1', virtual-reg: '%10' }
				- { reg: '$x2', virtual-reg: '%11' }
				- { reg: '$w3', virtual-reg: '%12' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 1
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 0
				cvBytesOfCalleeSavedRegisters: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				localFrameSize: 0
				savePoint: ''
				restorePoint: ''
				fixedStack: []
				stack: []
				callSites: []
				debugValueSubstitutions: []
				constants: []
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: aliased_store_after_add
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.1(0x50000000), %bb.2(0x30000000)
				; CHECK: liveins: $x0, $x1, $x2, $w3
				; CHECK: [[COPY:%[0-9]+]]:gpr32common = COPY $w3
				; CHECK: [[COPY1:%[0-9]+]]:gpr64common = COPY $x2
				; CHECK: [[COPY2:%[0-9]+]]:gpr64common = COPY $x1
				; CHECK: [[COPY3:%[0-9]+]]:gpr64common = COPY $x0
				; CHECK: [[SUBSWri:%[0-9]+]]:gpr32 = SUBSWri [[COPY]], 1, 0, implicit-def $nzcv
				; CHECK: Bcc 11, %bb.2, implicit $nzcv
				; CHECK: B %bb.1
				; CHECK: bb.1.for.body.preheader:
				; CHECK: successors: %bb.3(0x80000000)
				; CHECK: [[LDRWui:%[0-9]+]]:gpr32common = LDRWui [[COPY3]], 0 :: (load 4 from %ir.read, !tbaa !0)
				; CHECK: [[ADDWri:%[0-9]+]]:gpr32sp = ADDWri [[LDRWui]], 42, 0
				; CHECK: [[COPY4:%[0-9]+]]:gpr32all = COPY [[ADDWri]]
				; CHECK: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 43
				; CHECK: STRWui killed [[MOVi32imm]], [[COPY3]], 0 :: (store 4 into %ir.read, !tbaa !0)
				; CHECK: B %bb.3
				; CHECK: bb.2.for.cond.cleanup:
				; CHECK: [[PHI:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.0, %6, %bb.3
				; CHECK: STRWui [[PHI]], [[COPY2]], 0 :: (store 4 into %ir.write, !tbaa !0)
				; CHECK: RET_ReallyLR
				; CHECK: bb.3.for.body:
				; CHECK: successors: %bb.2(0x04000000), %bb.3(0x7c000000)
				; CHECK: [[PHI1:%[0-9]+]]:gpr32common = PHI [[COPY4]], %bb.1, %8, %bb.3
				; CHECK: [[PHI2:%[0-9]+]]:gpr32sp = PHI [[COPY]], %bb.1, %7, %bb.3
				; CHECK: [[PHI3:%[0-9]+]]:gpr32 = PHI [[COPY]], %bb.1, %6, %bb.3
				; CHECK: [[SDIVWr:%[0-9]+]]:gpr32 = SDIVWr [[PHI3]], [[PHI1]]
				; CHECK: [[COPY5:%[0-9]+]]:gpr32all = COPY [[SDIVWr]]
				; CHECK: [[SUBSWri1:%[0-9]+]]:gpr32 = SUBSWri [[PHI2]], 1, 0, implicit-def $nzcv
				; CHECK: [[COPY6:%[0-9]+]]:gpr32all = COPY [[SUBSWri1]]
				; CHECK: [[ADDWri1:%[0-9]+]]:gpr32sp = ADDWri [[PHI1]], 1, 0
				; CHECK: [[COPY7:%[0-9]+]]:gpr32all = COPY [[ADDWri1]]
				; CHECK: Bcc 0, %bb.2, implicit $nzcv
				; CHECK: B %bb.3
				bb.0.entry:
				successors: %bb.1(0x50000000), %bb.2(0x30000000)
				liveins: $x0, $x1, $x2, $w3

				%12:gpr32common = COPY $w3
				%11:gpr64common = COPY $x2
				%10:gpr64common = COPY $x1
				%9:gpr64common = COPY $x0
				%13:gpr32common = LDRWui %9, 0 :: (load 4 from %ir.read, !tbaa !6)
				%15:gpr32 = SUBSWri %12, 1, 0, implicit-def $nzcv
				Bcc 11, %bb.2, implicit $nzcv
				B %bb.1

				bb.1.for.body.preheader:
				successors: %bb.3(0x80000000)

				%16:gpr32sp = ADDWri %13, 42, 0
				%1:gpr32all = COPY %16
				%14:gpr32 = MOVi32imm 43
				STRWui killed %14, %9, 0 :: (store 4 into %ir.read, !tbaa !6)
				B %bb.3

				bb.2.for.cond.cleanup:
				%2:gpr32 = PHI %12, %bb.0, %6, %bb.3
				STRWui %2, %10, 0 :: (store 4 into %ir.write, !tbaa !6)
				RET_ReallyLR

				bb.3.for.body:
				successors: %bb.2(0x04000000), %bb.3(0x7c000000)

				%3:gpr32common = PHI %1, %bb.1, %8, %bb.3
				%4:gpr32sp = PHI %12, %bb.1, %7, %bb.3
				%5:gpr32 = PHI %12, %bb.1, %6, %bb.3
				%17:gpr32 = SDIVWr %5, %3
				%6:gpr32all = COPY %17
				%18:gpr32 = SUBSWri %4, 1, 0, implicit-def $nzcv
				%7:gpr32all = COPY %18
				%19:gpr32sp = ADDWri %3, 1, 0
				%8:gpr32all = COPY %19
				Bcc 0, %bb.2, implicit $nzcv
				B %bb.3

				...

llvm/test/CodeGen/X86/sink-cheap-instructions.ll

This file was deleted.

	; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-linux -sink-insts-to-avoid-spills \| FileCheck %s -check-prefix=SINK

	; Ensure that we sink copy-like instructions into loops to avoid register
	; spills.

	; CHECK: Spill
	; SINK-NOT: Spill

	%struct.A = type { i32, i32, i32, i32, i32, i32 }

	define void @_Z1fPhP1A(i8* nocapture readonly %input, %struct.A* %a) {
	%1 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0
	%2 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 1
	%3 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 2
	%4 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 3
	%5 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 4
	%6 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 5
	br label %.backedge

	.backedge:
	%.0 = phi i8* [ %input, %0 ], [ %7, %.backedge.backedge ]
	%7 = getelementptr inbounds i8, i8* %.0, i64 1
	%8 = load i8, i8* %7, align 1
	switch i8 %8, label %.backedge.backedge [
	i8 0, label %9
	i8 10, label %10
	i8 20, label %11
	i8 30, label %12
	i8 40, label %13
	i8 50, label %14
	]

	; <label>:9
	tail call void @_Z6assignPj(i32* %1)
	br label %.backedge.backedge

	; <label>:10
	tail call void @_Z6assignPj(i32* %2)
	br label %.backedge.backedge

	.backedge.backedge:
	br label %.backedge

	; <label>:11
	tail call void @_Z6assignPj(i32* %3)
	br label %.backedge.backedge

	; <label>:12
	tail call void @_Z6assignPj(i32* %4)
	br label %.backedge.backedge

	; <label>:13
	tail call void @_Z6assignPj(i32* %5)
	br label %.backedge.backedge

	; <label>:14
	tail call void @_Z6assignPj(i32* %6)
	br label %.backedge.backedge
	}

	declare void @_Z6assignPj(i32*)

llvm/test/DebugInfo/MIR/X86/mlicm-sink.mir

This file was deleted.

	--- \|
	; RUN: llc --run-pass=machinelicm -sink-insts-to-avoid-spills %s -o - \| FileCheck %s --match-full-lines
	; CHECK-LABEL: bb.4 (%ir-block.9):
	; CHECK: %0:gr64 = nuw ADD64ri8 %9, 4, implicit-def dead $eflags
	;
	; When instructions are sunk to prevent register spills, line numbers should not be retained.
	target triple = "x86_64-unknown-linux-gnu"

	%struct.A = type { i32, i32, i32, i32, i32, i32 }

	define void @p(i8* nocapture readonly %input, %struct.A* %a) !dbg !10 {
	%1 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 1, !dbg !18
	%2 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 2
	%3 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 3
	%4 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 4
	%5 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 5
	%scevgep = getelementptr i8, i8* %input, i64 1
	br label %.backedge

	.backedge: ; preds = %.backedge.backedge, %0
	%lsr.iv = phi i8* [ %scevgep1, %.backedge.backedge ], [ %scevgep, %0 ]
	%6 = load i8, i8* %lsr.iv, align 1
	switch i8 %6, label %.backedge.backedge [
	i8 0, label %7
	i8 10, label %9
	i8 20, label %10
	i8 30, label %11
	i8 40, label %12
	i8 50, label %13
	]

	7: ; preds = %.backedge
	%8 = bitcast %struct.A* %a to i32*
	tail call void @f(i32* %8)
	br label %.backedge.backedge

	9: ; preds = %.backedge
	tail call void @f(i32* %1)
	br label %.backedge.backedge

	.backedge.backedge: ; preds = %13, %12, %11, %10, %9, %7, %.backedge
	%scevgep1 = getelementptr i8, i8* %lsr.iv, i64 1
	br label %.backedge

	10: ; preds = %.backedge
	tail call void @f(i32* %2)
	br label %.backedge.backedge

	11: ; preds = %.backedge
	tail call void @f(i32* %3)
	br label %.backedge.backedge

	12: ; preds = %.backedge
	tail call void @f(i32* %4)
	br label %.backedge.backedge

	13: ; preds = %.backedge
	tail call void @f(i32* %5)
	br label %.backedge.backedge
	}

	declare void @f(i32*)

	; Function Attrs: nounwind
	declare void @llvm.stackprotector(i8, i8*)


	!llvm.dbg.cu = !{!0}
	!llvm.module.flags = !{!7, !8}
	!llvm.ident = !{!9}

	!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0 ", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !3)
	!1 = !DIFile(filename: "t.ll", directory: "tmp/X86")
	!2 = !{}
	!3 = !{!4}
	!4 = !DIGlobalVariableExpression(var: !5, expr: !DIExpression())
	!5 = !DIGlobalVariable(name: "x", scope: !0, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true)
	!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
	!7 = !{i32 2, !"Dwarf Version", i32 4}
	!8 = !{i32 2, !"Debug Info Version", i32 3}
	!9 = !{!"clang version 10.0.0 "}
	!10 = distinct !DISubprogram(name: "p", scope: !1, file: !1, line: 2, type: !11, scopeLine: 3, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition \| DISPFlagOptimized, unit: !0, retainedNodes: !16)
	!11 = !DISubroutineType(types: !12)
	!12 = !{null, !13}
	!13 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !14, size: 64)
	!14 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !15)
	!15 = !DIBasicType(name: "unsigned int", size: 32, encoding: DW_ATE_unsigned)
	!16 = !{!17}
	!17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !1, line: 2, type: !15)
	!18 = !DILocation(line: 4, column: 3, scope: !10)


	...
	---
	name: p
	tracksRegLiveness: true
	registers:
	- { id: 0, class: gr64, preferred-register: '' }
	- { id: 1, class: gr64, preferred-register: '' }
	- { id: 2, class: gr64, preferred-register: '' }
	- { id: 3, class: gr64, preferred-register: '' }
	- { id: 4, class: gr64, preferred-register: '' }
	- { id: 5, class: gr64, preferred-register: '' }
	- { id: 6, class: gr64, preferred-register: '' }
	- { id: 7, class: gr64, preferred-register: '' }
	- { id: 8, class: gr64, preferred-register: '' }
	- { id: 9, class: gr64, preferred-register: '' }
	- { id: 10, class: gr64_nosp, preferred-register: '' }
	- { id: 11, class: gr32, preferred-register: '' }
	- { id: 12, class: gr64, preferred-register: '' }
	- { id: 13, class: gr64, preferred-register: '' }
	- { id: 14, class: gr64, preferred-register: '' }
	- { id: 15, class: gr64, preferred-register: '' }
	jumpTable:
	kind: label-difference32
	entries:
	- id: 0
	blocks: [ '%bb.2', '%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4',
	'%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.3', '%bb.4',
	'%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4',
	'%bb.4', '%bb.4', '%bb.5', '%bb.4', '%bb.4', '%bb.4',
	'%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4',
	'%bb.6', '%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4',
	'%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.7', '%bb.4',
	'%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4', '%bb.4',
	'%bb.4', '%bb.4', '%bb.8' ]
	body: \|
	bb.0 (%ir-block.0):
	successors: %bb.1(0x80000000)
	liveins: $rdi, $rsi

	%9:gr64 = COPY $rsi
	%8:gr64 = COPY $rdi
	%0:gr64 = nuw ADD64ri8 %9, 4, implicit-def dead $eflags, debug-location !18
	%1:gr64 = nuw ADD64ri8 %9, 8, implicit-def dead $eflags
	%2:gr64 = nuw ADD64ri8 %9, 12, implicit-def dead $eflags
	%3:gr64 = nuw ADD64ri8 %9, 16, implicit-def dead $eflags
	%4:gr64 = nuw ADD64ri8 %9, 20, implicit-def dead $eflags
	%5:gr64 = INC64r %8, implicit-def dead $eflags

	bb.1..backedge:
	successors: %bb.4(0x09249249), %bb.9(0x76db6db7)

	%6:gr64 = PHI %5, %bb.0, %7, %bb.4
	%11:gr32 = MOVZX32rm8 %6, 1, $noreg, 0, $noreg :: (load 1 from %ir.lsr.iv)
	%10:gr64_nosp = SUBREG_TO_REG 0, killed %11, %subreg.sub_32bit
	%12:gr64 = SUB64ri8 %10, 50, implicit-def $eflags
	JCC_1 %bb.4, 7, implicit $eflags

	bb.9..backedge:
	successors: %bb.2(0x13b13b14), %bb.4(0x09d89d8a), %bb.3(0x13b13b14), %bb.5(0x13b13b14), %bb.6(0x13b13b14), %bb.7(0x13b13b14), %bb.8(0x13b13b14)

	%13:gr64 = LEA64r $rip, 1, $noreg, %jump-table.0, $noreg
	%14:gr64 = MOVSX64rm32 %13, 4, %10, 0, $noreg :: (load 4 from jump-table)
	%15:gr64 = ADD64rr %14, %13, implicit-def dead $eflags
	JMP64r killed %15

	bb.2 (%ir-block.7):
	successors: %bb.4(0x80000000)

	ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	$rdi = COPY %9
	CALL64pcrel32 target-flags(x86-plt) @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
	ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	JMP_1 %bb.4

	bb.3 (%ir-block.9):
	successors: %bb.4(0x80000000)

	ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	$rdi = COPY %0
	CALL64pcrel32 target-flags(x86-plt) @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
	ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp

	bb.4..backedge.backedge:
	successors: %bb.1(0x80000000)

	%7:gr64 = INC64r %6, implicit-def dead $eflags
	JMP_1 %bb.1

	bb.5 (%ir-block.10):
	successors: %bb.4(0x80000000)

	ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	$rdi = COPY %1
	CALL64pcrel32 target-flags(x86-plt) @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
	ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	JMP_1 %bb.4

	bb.6 (%ir-block.11):
	successors: %bb.4(0x80000000)

	ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	$rdi = COPY %2
	CALL64pcrel32 target-flags(x86-plt) @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
	ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	JMP_1 %bb.4

	bb.7 (%ir-block.12):
	successors: %bb.4(0x80000000)

	ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	$rdi = COPY %3
	CALL64pcrel32 target-flags(x86-plt) @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
	ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	JMP_1 %bb.4

	bb.8 (%ir-block.13):
	successors: %bb.4(0x80000000)

	ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	$rdi = COPY %4
	CALL64pcrel32 target-flags(x86-plt) @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
	ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
	JMP_1 %bb.4

	...