Download Raw Diff

Details

Reviewers

qcolombet
chandlerc
echristo
rnk

Summary

This is not meant as an actual code review to get this patch submitted but to have a basis for further discussion.

It is meant to experiment around a solution for http://llvm.org/PR22230

With this patch, LLVM, generates pretty nice code for the example from the bug report (see below). Obviously it is far from complete or correct.

.section TEXT,text,regular,pure_instructions
.macosx_version_min 10, 10
.globl Z1fPhP1A
.align 4, 0x90
Z1fPhP1A: ## @_Z1fPhP1A
.cfi_startproc

BB#0: ## %entry

pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
pushq %r15
pushq %r14
pushq %rbx
pushq %rax
Ltmp3:
.cfi_offset %rbx, -40
Ltmp4:
.cfi_offset %r14, -32
Ltmp5:
.cfi_offset %r15, -24
movq %rsi, %r14
movq %rdi, %rbx
incq %rbx
leaq LJTI0_0(%rip), %r15
jmp LBB0_1
.align 4, 0x90
LBB0_6: ## %for.cond.backedge

in Loop: Header=BB0_1 Depth=1

incq %rbx
LBB0_1: ## %for.cond

=>This Inner Loop Header: Depth=1

movzbl (%rbx), %eax
cmpq $3, %rax
ja LBB0_6

BB#2: ## %for.cond
in Loop: Header=BB0_1 Depth=1

movslq (%r15,%rax,4), %rax
addq %r15, %rax
jmpq *%rax
LBB0_3: ## %if.then

in Loop: Header=BB0_1 Depth=1

movq %r14, %rdi
jmp LBB0_5
LBB0_4: ## %if.then4

in Loop: Header=BB0_1 Depth=1

leaq 4(%r14), %rdi
jmp LBB0_5
LBB0_7: ## %if.then8

in Loop: Header=BB0_1 Depth=1

leaq 8(%r14), %rdi
jmp LBB0_5
LBB0_8: ## %if.then12

in Loop: Header=BB0_1 Depth=1

leaq 12(%r14), %rdi
LBB0_5: ## %for.cond.backedge

in Loop: Header=BB0_1 Depth=1

callq __Z6assignPj
jmp LBB0_6
.cfi_endproc
.align 2, 0x90
L0_0_set_3 = LBB0_3-LJTI0_0
L0_0_set_4 = LBB0_4-LJTI0_0
L0_0_set_7 = LBB0_7-LJTI0_0
L0_0_set_8 = LBB0_8-LJTI0_0
LJTI0_0:
.long L0_0_set_3
.long L0_0_set_4
.long L0_0_set_7
.long L0_0_set_8

.subsections_via_symbols

Diff Detail

Event Timeline

djasper updated this revision to Diff 18977.Jan 29 2015, 10:27 AM

djasper retitled this revision from to Experiment with keeping GEPs near calls.

djasper updated this object.

djasper edited the test plan for this revision. (Show Details)

djasper added reviewers: chandlerc, rnk, qcolombet.

djasper added a subscriber: Unknown Object (MLST).

echristo added a reviewer: echristo.Jan 29 2015, 10:29 AM

Hi Daniel,

I do not know if you still expect feedback from this post, as we already discuss the matter on IRC.
I add a couple of comment here for the record.

Note: Please move the commit list to llvm instead of CFE :).

Thanks,
Q.

lib/CodeGen/RegisterCoalescer.cpp
847	This test is used to avoid rematerializing expensive operation.
849	This check is used to ensure that we won’t introduce correctness issues. The following code only supports trivial rematerialization. For more general rematerialization, you have to do more interfere checks.

And for the rest of us... Am I right in gleaning from this that we need to do a better job of taking advantage of addressing modes used to compute the targets of indirect calls?

djasper edited edge metadata.Feb 23 2015, 1:21 PM

djasper edited subscribers, added: Unknown Object (MLST); removed: Unknown Object (MLST).

Current work-in-progress.

To summarize what was discussed through other channels (IRC mostly):

We eventually want to have a lazy code motion pass which sinks GEPs into loops if they can be folded into address mode instructions (well, if sinking does not significantly increase the cost of the operation inside the loop) and dependent on register pressure.
For the first attempts, it seems easier to re-use MachineLICM. Conceptually, this move loop-invariant code, so it isn't the "wrong" place and it already calculates much of the required information.

Still incomplete, but feedback is appreciated.

djasper updated this revision to Diff 20540.Feb 23 2015, 1:29 PM

Hi Daniel,

Could you upload a patch with the full context?

Thanks,
-Quentin

ab added a subscriber: ab.Mar 5 2015, 3:07 PM

Slightly updated patch with full context.

Uploaded patch with full context. We probably want to hide this behavior behind a flag, too, and do some benchmarks/experiments.

Hi Daniel,

Thanks for your patience.

First of all, yes this new logic needs to be hidden behind a flag until it is properly tuned.
In particular, we miss a profitability model, i.e., reduce exceeded register pressure, before doing the sinking.

Now, regarding the code itself, the sinking logic, modulo the missing profitability model, looks right to me, but it does not seem correct to have it in HoistOutOfLoop :).

Cheers,
-Quentin

lib/CodeGen/MachineLICM.cpp
776 ↗	(On Diff #21320)	A comment here would be welcome. I suspect you are just interested in the fact that the instruction shouldn't use or define physical registers as well as memory operand (which is part of what this function does). Without a comment, at first glance, this is confusing because instructions in the preheader should be loop invariant!
782 ↗	(On Diff #21320)	The idiom is usually !MO.getReg()
789 ↗	(On Diff #21320)	I believe this check shouldn't be needed for the general approach.

djasper added inline comments.Mar 13 2015, 2:26 AM

lib/CodeGen/MachineLICM.cpp
776 ↗	(On Diff #21320)	Added comment and removed HasLoopPHIUse, which I am no longer sure is necessary.
782 ↗	(On Diff #21320)	Done.
789 ↗	(On Diff #21320)	I generally agree, but we need some sort of cost model. And for now, assuming that we'd always want to sink copies seems like a reasonable first one. Added a comment to this effect.

Addressed review comments
Added test.

Hi Daniel,

LGTM with minor nitpicks.

Thanks,
-Quentin

lib/CodeGen/MachineLICM.cpp
797 ↗	(On Diff #21906)	-> it must not have...
798 ↗	(On Diff #21906)	The test for HasLoopPHIUse is not strictly necessary, but it acts as an optimization for the next check I believe. Indeed, if the candidate is used within a phi for the loop, it won’t be sunken.

This revision is now accepted and ready to land.Mar 13 2015, 10:14 AM

Addressed comments and submitted as r232262.

lib/CodeGen/MachineLICM.cpp
797 ↗	(On Diff #21906)	Fixed.
798 ↗	(On Diff #21906)	Re-instated.

Diff 18977

lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 1,200 Lines • ▼ Show 20 Lines	for (unsigned Idx = 0; Idx < VectorWidth; ++Idx) {
Instruction *OldBr = IfBlock->getTerminator();		Instruction *OldBr = IfBlock->getTerminator();
BranchInst::Create(CondBlock, NewIfBlock, Cmp, OldBr);		BranchInst::Create(CondBlock, NewIfBlock, Cmp, OldBr);
OldBr->eraseFromParent();		OldBr->eraseFromParent();
IfBlock = NewIfBlock;		IfBlock = NewIfBlock;
}		}
CI->eraseFromParent();		CI->eraseFromParent();
}		}

bool CodeGenPrepare::OptimizeCallInst(CallInst *CI, bool& ModifiedDT) {
BasicBlock *BB = CI->getParent();

// Lower inline assembly if we can.
// If we found an inline asm expession, and if the target knows how to
// lower it to normal LLVM code, do so now.
if (TLI && isa<InlineAsm>(CI->getCalledValue())) {
if (TLI->ExpandInlineAsm(CI)) {
// Avoid invalidating the iterator.
CurInstIterator = BB->begin();
// Avoid processing instructions out of order, which could cause
// reuse before a value is defined.
SunkAddrs.clear();
return true;
}
// Sink address computing for memory operands into the block.
if (OptimizeInlineAsmInst(CI))
return true;
}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
if (II) {
switch (II->getIntrinsicID()) {
default: break;
case Intrinsic::objectsize: {
// Lower all uses of llvm.objectsize.*
bool Min = (cast<ConstantInt>(II->getArgOperand(1))->getZExtValue() == 1);
Type *ReturnTy = CI->getType();
Constant *RetVal = ConstantInt::get(ReturnTy, Min ? 0 : -1ULL);

// Substituting this can cause recursive simplifications, which can
// invalidate our iterator. Use a WeakVH to hold onto it in case this
// happens.
WeakVH IterHandle(CurInstIterator);

replaceAndRecursivelySimplify(CI, RetVal,
TLI ? TLI->getDataLayout() : nullptr,
TLInfo, ModifiedDT ? nullptr : DT);

// If the iterator instruction was recursively deleted, start over at the
// start of the block.
if (IterHandle != CurInstIterator) {
CurInstIterator = BB->begin();
SunkAddrs.clear();
}
return true;
}
case Intrinsic::masked_load: {
// Scalarize unsupported vector masked load
if (!TTI->isLegalMaskedLoad(CI->getType(), 1)) {
ScalarizeMaskedLoad(CI);
ModifiedDT = true;
return true;
}
return false;
}
case Intrinsic::masked_store: {
if (!TTI->isLegalMaskedStore(CI->getArgOperand(0)->getType(), 1)) {
ScalarizeMaskedStore(CI);
ModifiedDT = true;
return true;
}
return false;
}
}

if (TLI) {
SmallVector<Value*, 2> PtrOps;
Type *AccessTy;
if (TLI->GetAddrModeArguments(II, PtrOps, AccessTy))
while (!PtrOps.empty())
if (OptimizeMemoryInst(II, PtrOps.pop_back_val(), AccessTy))
return true;
}
}

// From here on out we're working with named functions.
if (!CI->getCalledFunction()) return false;

// We'll need DataLayout from here on out.
const DataLayout *TD = TLI ? TLI->getDataLayout() : nullptr;
if (!TD) return false;

// Lower all default uses of _chk calls. This is very similar
// to what InstCombineCalls does, but here we are only lowering calls
// to fortified library functions (e.g. __memcpy_chk) that have the default
// "don't know" as the objectsize. Anything else should be left alone.
FortifiedLibCallSimplifier Simplifier(TD, TLInfo, true);
if (Value *V = Simplifier.optimizeCall(CI)) {
CI->replaceAllUsesWith(V);
CI->eraseFromParent();
return true;
}
return false;
}

/// DupRetToEnableTailCallOpts - Look for opportunities to duplicate return		/// DupRetToEnableTailCallOpts - Look for opportunities to duplicate return
/// instructions to the predecessor to enable tail call optimizations. The		/// instructions to the predecessor to enable tail call optimizations. The
/// case it is currently looking for is:		/// case it is currently looking for is:
/// @code		/// @code
/// bb0:		/// bb0:
/// %tmp0 = tail call i32 @f0()		/// %tmp0 = tail call i32 @f0()
/// br label %return		/// br label %return
/// bb1:		/// bb1:
▲ Show 20 Lines • Show All 2,045 Lines • ▼ Show 20 Lines	if (IterHandle != CurInstIterator) {
CurInstIterator = BB->begin();		CurInstIterator = BB->begin();
SunkAddrs.clear();		SunkAddrs.clear();
}		}
}		}
++NumMemoryInsts;		++NumMemoryInsts;
return true;		return true;
}		}

		bool CodeGenPrepare::OptimizeCallInst(CallInst *CI, bool& ModifiedDT) {
		BasicBlock *BB = CI->getParent();
		Value *Addr = CI->getArgOperand(0);
		if (IsNonLocalValue(Addr, CI->getParent())) {
		Value *&SunkAddr = SunkAddrs[Addr];
		if (SunkAddr) {
		llvm::errs() << "AAAA\n";
		} else {
		Type *IntPtrTy = TLI->getDataLayout()->getIntPtrType(Addr->getType());
		Value *V = CI->getArgOperand(0);
		SmallVector<Instruction *, 16> NewAddrModeInsts;
		TypePromotionTransaction TPT;
		ExtAddrMode Mode = AddressingModeMatcher::Match(
		V, V->getType(), CI, NewAddrModeInsts, *TLI, InsertedTruncsSet,
		PromotedInsts, TPT);
		// llvm::errs() << Mode.BaseReg << " " << Mode.Scale << " " << Mode.BaseGV
		// << " " << Mode.BaseOffs << "\n";
		IRBuilder<> Builder(CI);
		Value *Result = nullptr;
		if (Mode.BaseReg) {
		Value *V = Mode.BaseReg;
		if (V->getType()->isPointerTy())
		V = Builder.CreatePtrToInt(V, IntPtrTy, "sunkaddr");
		if (V->getType() != IntPtrTy)
		V = Builder.CreateIntCast(V, IntPtrTy, /isSigned=/true, "sunkaddr");
		Result = V;
		}

		if (Mode.BaseOffs) {
		Value *V = ConstantInt::get(IntPtrTy, Mode.BaseOffs);
		if (Result)
		Result = Builder.CreateAdd(Result, V, "sunkaddr");
		else
		Result = V;
		}
		if (!Result)
		SunkAddr = Constant::getNullValue(Addr->getType());
		else
		SunkAddr = Builder.CreateIntToPtr(Result, Addr->getType(), "sunkaddr");
		CI->replaceUsesOfWith(Addr, SunkAddr);
		}
		RecursivelyDeleteTriviallyDeadInstructions(Addr, TLInfo);
		return true;
		}

		// Lower inline assembly if we can.
		// If we found an inline asm expession, and if the target knows how to
		// lower it to normal LLVM code, do so now.
		if (TLI && isa<InlineAsm>(CI->getCalledValue())) {
		if (TLI->ExpandInlineAsm(CI)) {
		// Avoid invalidating the iterator.
		CurInstIterator = BB->begin();
		// Avoid processing instructions out of order, which could cause
		// reuse before a value is defined.
		SunkAddrs.clear();
		return true;
		}
		// Sink address computing for memory operands into the block.
		if (OptimizeInlineAsmInst(CI))
		return true;
		}

		IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
		if (II) {
		switch (II->getIntrinsicID()) {
		default: break;
		case Intrinsic::objectsize: {
		// Lower all uses of llvm.objectsize.*
		bool Min = (cast<ConstantInt>(II->getArgOperand(1))->getZExtValue() == 1);
		Type *ReturnTy = CI->getType();
		Constant *RetVal = ConstantInt::get(ReturnTy, Min ? 0 : -1ULL);

		// Substituting this can cause recursive simplifications, which can
		// invalidate our iterator. Use a WeakVH to hold onto it in case this
		// happens.
		WeakVH IterHandle(CurInstIterator);

		replaceAndRecursivelySimplify(CI, RetVal,
		TLI ? TLI->getDataLayout() : nullptr,
		TLInfo, ModifiedDT ? nullptr : DT);

		// If the iterator instruction was recursively deleted, start over at the
		// start of the block.
		if (IterHandle != CurInstIterator) {
		CurInstIterator = BB->begin();
		SunkAddrs.clear();
		}
		return true;
		}
		case Intrinsic::masked_load: {
		// Scalarize unsupported vector masked load
		if (!TTI->isLegalMaskedLoad(CI->getType(), 1)) {
		ScalarizeMaskedLoad(CI);
		ModifiedDT = true;
		return true;
		}
		return false;
		}
		case Intrinsic::masked_store: {
		if (!TTI->isLegalMaskedStore(CI->getArgOperand(0)->getType(), 1)) {
		ScalarizeMaskedStore(CI);
		ModifiedDT = true;
		return true;
		}
		return false;
		}
		}

		if (TLI) {
		SmallVector<Value*, 2> PtrOps;
		Type *AccessTy;
		if (TLI->GetAddrModeArguments(II, PtrOps, AccessTy))
		while (!PtrOps.empty())
		if (OptimizeMemoryInst(II, PtrOps.pop_back_val(), AccessTy))
		return true;
		}
		}

		// From here on out we're working with named functions.
		if (!CI->getCalledFunction()) return false;

		// We'll need DataLayout from here on out.
		const DataLayout *TD = TLI ? TLI->getDataLayout() : nullptr;
		if (!TD) return false;

		// Lower all default uses of _chk calls. This is very similar
		// to what InstCombineCalls does, but here we are only lowering calls
		// to fortified library functions (e.g. __memcpy_chk) that have the default
		// "don't know" as the objectsize. Anything else should be left alone.
		FortifiedLibCallSimplifier Simplifier(TD, TLInfo, true);
		if (Value *V = Simplifier.optimizeCall(CI)) {
		CI->replaceAllUsesWith(V);
		CI->eraseFromParent();
		return true;
		}
		return false;
		}

/// OptimizeInlineAsmInst - If there are any memory operands, use		/// OptimizeInlineAsmInst - If there are any memory operands, use
/// OptimizeMemoryInst to sink their address computing into the block when		/// OptimizeMemoryInst to sink their address computing into the block when
/// possible / profitable.		/// possible / profitable.
bool CodeGenPrepare::OptimizeInlineAsmInst(CallInst *CS) {		bool CodeGenPrepare::OptimizeInlineAsmInst(CallInst *CS) {
bool MadeChange = false;		bool MadeChange = false;

TargetLowering::AsmOperandInfoVector		TargetLowering::AsmOperandInfoVector
TargetConstraints = TLI->ParseConstraints(CS);		TargetConstraints = TLI->ParseConstraints(CS);
▲ Show 20 Lines • Show All 1,429 Lines • Show Last 20 Lines

lib/CodeGen/RegisterCoalescer.cpp

Show First 20 Lines • Show All 834 Lines • ▼ Show 20 Lines	if (ValNo->isPHIDef() \|\| ValNo->isUnused())
return false;		return false;
MachineInstr *DefMI = LIS->getInstructionFromIndex(ValNo->def);		MachineInstr *DefMI = LIS->getInstructionFromIndex(ValNo->def);
if (!DefMI)		if (!DefMI)
return false;		return false;
if (DefMI->isCopyLike()) {		if (DefMI->isCopyLike()) {
IsDefCopy = true;		IsDefCopy = true;
return false;		return false;
}		}
if (!TII->isAsCheapAsAMove(DefMI))		llvm::errs() << "AAA\n---\n";
return false;		DefMI->dump();
if (!TII->isTriviallyReMaterializable(DefMI, AA))		llvm::errs() << TII->isAsCheapAsAMove(DefMI) << " "
return false;		<< TII->isTriviallyReMaterializable(DefMI, AA) << "\n";
		//if (!TII->isAsCheapAsAMove(DefMI))
		qcolombetUnsubmitted Not Done Reply Inline Actions This test is used to avoid rematerializing expensive operation. qcolombet: This test is used to avoid rematerializing expensive operation.
		// return false;
		//if (!TII->isTriviallyReMaterializable(DefMI, AA))
		qcolombetUnsubmitted Not Done Reply Inline Actions This check is used to ensure that we won’t introduce correctness issues. The following code only supports trivial rematerialization. For more general rematerialization, you have to do more interfere checks. qcolombet: This check is used to ensure that we won’t introduce correctness issues. The following code…
		// return false;
bool SawStore = false;		bool SawStore = false;
if (!DefMI->isSafeToMove(TII, AA, SawStore))		if (!DefMI->isSafeToMove(TII, AA, SawStore))
return false;		return false;
const MCInstrDesc &MCID = DefMI->getDesc();		const MCInstrDesc &MCID = DefMI->getDesc();
if (MCID.getNumDefs() != 1)		if (MCID.getNumDefs() != 1)
return false;		return false;
// Only support subregister destinations when the def is read-undef.		// Only support subregister destinations when the def is read-undef.
MachineOperand &DstOperand = CopyMI->getOperand(0);		MachineOperand &DstOperand = CopyMI->getOperand(0);
▲ Show 20 Lines • Show All 1,934 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Experiment with keeping GEPs near calls
ClosedPublic

Details

BB#0: ## %entry

Diff Detail

Event Timeline

Revision Contents

Diff 18977

lib/CodeGen/CodeGenPrepare.cpp

lib/CodeGen/RegisterCoalescer.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Experiment with keeping GEPs near callsClosedPublic

Details

BB#0: ## %entry

Diff Detail

Event Timeline

Revision Contents

Diff 18977

lib/CodeGen/CodeGenPrepare.cpp

lib/CodeGen/RegisterCoalescer.cpp

Experiment with keeping GEPs near calls
ClosedPublic