This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
ClosedPublic

Authored by skatkov on Jun 29 2017, 9:40 PM.

Download Raw Diff

Details

Reviewers

loladiro
spatel
efriedma

Commits

rG0b7b59ada33a: [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
rL307628: [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope

Summary

CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.

However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.

The added test case shows an example.

Diff Detail

Repository: rL LLVM

Event Timeline

skatkov created this revision.Jun 29 2017, 9:40 PM

efriedma added inline comments.Jun 30 2017, 12:24 PM

lib/CodeGen/CodeGenPrepare.cpp
4241 ↗	(On Diff #104814)	This is quadratic in the number of instruction in the BB.

This should be better, please take a look.

efriedma added inline comments.Jul 6 2017, 3:23 PM

lib/CodeGen/CodeGenPrepare.cpp
4361 ↗	(On Diff #105261)	This is still essentially quadratic: optimizeMemoryInst itself is called in a loop which iterates over the basic block.

skatkov added inline comments.Jul 6 2017, 8:53 PM

lib/CodeGen/CodeGenPrepare.cpp
4361 ↗	(On Diff #105261)	Do you see a better way to detect that all instructions in AddrModeInsts is defined earlier than MemoryInst or you just propose to drop this improvement due to its complexity?

ok, I will think whether it can be improved further...

Any hints are appreciated :)

Err, wait a sec... you don't need to check dominance at all.

The point of the check is to make sure we don't end up in an infinite loop: if we already sank the addressing mode, we don't want to sink it again.

If we didn't look through a PHI node to come up with an instruction in AddrModeInsts, it must dominate the memory instruction because the IR wouldn't be valid otherwise, so checking dominance is useless. If we did look through a PHI node to come up with an instruction in AddrModeInsts, it isn't really local anyway; we're using the PHI node, not the instruction itself, so the dominance relationship is irrelevant.

So I think the right solution here is to actually track how we came up with each instruction in AddrModeInsts, and skip the IsNonLocalValue check for the ones that came from PHI nodes.

Good point. Meaning that if we traversed through Phi node it means that even it is in the same block, address computation is after memory instruction, otherwise it would not traverse Phi node. I will take a look into this deeper.

Side note, actually I think that the second reason for this check is to allow selection dag to use address folding. It traverses the instructions from the last one to the first one in basic block. So if addressing computation in the other basic block or after current memory instruction it will not be able to fold address computation to memory instruction...

Thank you Eli for the idea.

Please take a look.

LGTM

This revision is now accepted and ready to land.Jul 10 2017, 4:11 PM

Closed by commit rL307628: [CGP] Relax a bit restriction for optimizeMemoryInst to extend scope (authored by skatkov). · Explain WhyJul 10 2017, 11:25 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

CodeGenPrepare.cpp

7 lines

test/

CodeGen/

X86/

sink-gep-before-mem-inst.ll

25 lines

Diff 105966

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 4,264 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeMemoryInst(Instruction MemoryInst, Value Addr,
worklist.push_back(Addr);		worklist.push_back(Addr);

// Use a worklist to iteratively look through PHI nodes, and ensure that		// Use a worklist to iteratively look through PHI nodes, and ensure that
// the addressing mode obtained from the non-PHI roots of the graph		// the addressing mode obtained from the non-PHI roots of the graph
// are equivalent.		// are equivalent.
Value *Consensus = nullptr;		Value *Consensus = nullptr;
unsigned NumUsesConsensus = 0;		unsigned NumUsesConsensus = 0;
bool IsNumUsesConsensusValid = false;		bool IsNumUsesConsensusValid = false;
		bool PhiSeen = false;
SmallVector<Instruction*, 16> AddrModeInsts;		SmallVector<Instruction*, 16> AddrModeInsts;
ExtAddrMode AddrMode;		ExtAddrMode AddrMode;
TypePromotionTransaction TPT(RemovedInsts);		TypePromotionTransaction TPT(RemovedInsts);
TypePromotionTransaction::ConstRestorationPt LastKnownGood =		TypePromotionTransaction::ConstRestorationPt LastKnownGood =
TPT.getRestorationPoint();		TPT.getRestorationPoint();
while (!worklist.empty()) {		while (!worklist.empty()) {
Value *V = worklist.back();		Value *V = worklist.back();
worklist.pop_back();		worklist.pop_back();

// Break use-def graph loops.		// Break use-def graph loops.
if (!Visited.insert(V).second) {		if (!Visited.insert(V).second) {
Consensus = nullptr;		Consensus = nullptr;
break;		break;
}		}

// For a PHI node, push all of its incoming values.		// For a PHI node, push all of its incoming values.
if (PHINode *P = dyn_cast<PHINode>(V)) {		if (PHINode *P = dyn_cast<PHINode>(V)) {
for (Value *IncValue : P->incoming_values())		for (Value *IncValue : P->incoming_values())
worklist.push_back(IncValue);		worklist.push_back(IncValue);
		PhiSeen = true;
continue;		continue;
}		}

// For non-PHIs, determine the addressing mode being computed. Note that		// For non-PHIs, determine the addressing mode being computed. Note that
// the result may differ depending on what other uses our candidate		// the result may differ depending on what other uses our candidate
// addressing instructions might have.		// addressing instructions might have.
SmallVector<Instruction*, 16> NewAddrModeInsts;		SmallVector<Instruction*, 16> NewAddrModeInsts;
ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(		ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(
Show All 37 Lines	bool CodeGenPrepare::optimizeMemoryInst(Instruction MemoryInst, Value Addr,
// ones were determined, bail out now.		// ones were determined, bail out now.
if (!Consensus) {		if (!Consensus) {
TPT.rollback(LastKnownGood);		TPT.rollback(LastKnownGood);
return false;		return false;
}		}
TPT.commit();		TPT.commit();

// If all the instructions matched are already in this BB, don't do anything.		// If all the instructions matched are already in this BB, don't do anything.
if (none_of(AddrModeInsts, [&](Value *V) {		// If we saw Phi node then it is not local definitely.
		if (!PhiSeen && none_of(AddrModeInsts, [&](Value *V) {
return IsNonLocalValue(V, MemoryInst->getParent());		return IsNonLocalValue(V, MemoryInst->getParent());
})) {		})) {
DEBUG(dbgs() << "CGP: Found local addrmode: " << AddrMode << "\n");		DEBUG(dbgs() << "CGP: Found local addrmode: " << AddrMode << "\n");
return false;		return false;
}		}

// Insert this computation right after this user. Since our caller is		// Insert this computation right after this user. Since our caller is
// scanning from the top of the BB to the bottom, reuse of the expr are		// scanning from the top of the BB to the bottom, reuse of the expr are
// guaranteed to happen later.		// guaranteed to happen later.
IRBuilder<> Builder(MemoryInst);		IRBuilder<> Builder(MemoryInst);
▲ Show 20 Lines • Show All 2,163 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sink-gep-before-mem-inst.ll

				; RUN: opt < %s -S -codegenprepare -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

				define i64 @test.after(i8 addrspace(1)* readonly align 8) {
				; CHECK-LABEL: test.after
				; CHECK: sunkaddr
				entry:
				%.0 = getelementptr inbounds i8, i8 addrspace(1)* %0, i64 8
				%addr = bitcast i8 addrspace(1)* %.0 to i32 addrspace(1)*
				br label %header

				header:
				%addr.in.loop = phi i32 addrspace(1)* [ %addr, %entry ], [ %addr.after, %header ]
				%local_2_ = phi i64 [ 0, %entry ], [ %.9, %header ]
				%.7 = load i32, i32 addrspace(1)* %addr.in.loop, align 8
				fence acquire
				%.1 = getelementptr inbounds i8, i8 addrspace(1)* %0, i64 8
				%addr.after = bitcast i8 addrspace(1)* %.1 to i32 addrspace(1)*
				%.8 = sext i32 %.7 to i64
				%.9 = add i64 %local_2_, %.8
				%not. = icmp sgt i64 %.9, 999
				br i1 %not., label %exit, label %header

				exit:
				ret i64 %.9
				}