Download Raw Diff

Details

Reviewers

ABataev
RKSimon
spatel
anton-afanasyev

Summary

Calculating the spill cost is expensive and we are looking to all instructions among scalars of VectorizableTree in the region to find CallInst instance. But we can avoid that and look for call instructions while extending the scheduler region instead. I measured, for example, while building SPEC FP 2006(C and C++ buildable) we have a ~2% ratio of CallInst present in all cost estimations calculation for vectorizable kernels. This change invokes getSpillCost() when requires.

This is a part https://reviews.llvm.org/D57779 change [SLP] Add support for throttling.

Diff Detail

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
	60,070 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-overloaded::vluxseg.c
	60,030 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

dtemirbulatov created this revision.Sep 28 2021, 6:04 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 28 2021, 6:04 AM

dtemirbulatov requested review of this revision.Sep 28 2021, 6:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2021, 6:04 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dtemirbulatov edited the summary of this revision. (Show Details)Sep 28 2021, 6:16 AM

dtemirbulatov added reviewers: ABataev, RKSimon, spatel, anton-afanasyev.

dtemirbulatov edited the summary of this revision. (Show Details)Sep 28 2021, 6:18 AM

Harbormaster completed remote builds in B126089: Diff 375547.Sep 28 2021, 6:22 AM

dtemirbulatov edited the summary of this revision. (Show Details)Sep 28 2021, 6:27 AM

ABataev added inline comments.Sep 28 2021, 6:35 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2621–2622	Not sure it is a good idea to add tree here. Maybe just add an internal flag and update `R->NoCallInst` after scheduling the region, reading the flag from BlockScheduling? or return as one of the results of `BS.tryScheduleBundle`.

ABataev added inline comments.Sep 28 2021, 6:38 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2621–2622	Also, maybe it is worth trying to change `BoUpSLP::getSpillCost` too? For example, we can try to calculate `NumCalls` in the scheduling function and avoid doing this in the function `BoUpSLP::getSpillCost`.

dtemirbulatov edited the summary of this revision. (Show Details)Sep 28 2021, 2:19 PM

Rebased, fixed remark.

dtemirbulatov marked an inline comment as done.Dec 6 2021, 4:56 AM

dtemirbulatov added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2621–2622	I think the implementation is too complex, here in getSpillCost we are calculating the number of calls between two given instructions and as we might change a tree entry the state to gather this might become even more complicated.

Harbormaster completed remote builds in B137630: Diff 392024.Dec 6 2021, 5:36 AM

ABataev added inline comments.Dec 6 2021, 8:46 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5977–5984	Hmm, you're using `NoCalls` here but it is always `false`. Currently `NoCallInst` is always false.
5987	Better to make it `(!NoCallInst \|\| getSpillCost() == 0) && ...`.

Rebased, fixed remarks.

dtemirbulatov marked 2 inline comments as done.Dec 8 2021, 11:16 PM

Harbormaster completed remote builds in B138369: Diff 393041.Dec 9 2021, 1:17 AM

ABataev added inline comments.Dec 9 2021, 5:41 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5977–5984	bool NoCallInst = all_of(BlocksSchedules, [](const decltype(BlocksSchedules)::value_type &BS) {return BSIter.second->NoCalls; }));
7763	lifetime intrinsics are also kind of `DbgInfoIntrinsic`? Or `llvm.assume`? Better to check for all these kinds of intrinsics.

Addressed remarks.

Harbormaster completed remote builds in B138792: Diff 393666.Dec 10 2021, 11:53 PM

Missed updating if statement at 5370, fixed.

Harbormaster completed remote builds in B138795: Diff 393669.Dec 11 2021, 3:12 AM

ABataev added inline comments.Dec 13 2021, 5:27 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5811–5812	`auto *CI`
5812–5813	There is special function `IntrinsicInst::isAssumeLikeIntrinsic()`, use it here
7763	`auto *CI`
7764	There is special function `IntrinsicInst::isAssumeLikeIntrinsic()`, use it here

Rebased, fixed remarks.

dtemirbulatov marked 2 inline comments as done.Dec 24 2021, 2:47 PM

dtemirbulatov added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5812–5813	We already dynamically cast to CallInst and instanceof AssumeInst is way cheaper.
7764	We already dynamically cast to CallInst and instanceof to AssumeInst is way cheaper.

ABataev added inline comments.Dec 24 2021, 2:52 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5812–5813	`intrinsicinst::isAssumeLikeIntrinsic()` is way more universal and easier to maintain.

nikic added a subscriber: nikic.Dec 24 2021, 3:10 PM

nikic added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5811–5812	I think it would be best to simply skip all intrinsics here. Intrinsics being expanded inline rather than producing libcalls is a good default assumption if you don't want to model this in detail.

Rebase, I decided to stick with isAssumeLikeIntrinsic() and avoid just checking for any intrinsic since there might be special embedded targets where intrinsic might end up in an actual call to a function.

Harbormaster completed remote builds in B144025: Diff 400851.Jan 18 2022, 9:03 AM

Rebased, fix formatting. Ping.

Harbormaster completed remote builds in B150027: Diff 409341.Feb 16 2022, 12:50 PM

ABataev added inline comments.Feb 16 2022, 1:15 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814–5816	I believe `isAssumeLikeIntrinsic()` covers all the checks here, no?

dtemirbulatov marked an inline comment as done.Feb 17 2022, 10:20 AM

dtemirbulatov added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814–5816	Ah, Correct.

Addressed remark.

Can we have tests? It shall reduce the cost for some of the assume-like intrinsics with this patch.

Harbormaster completed remote builds in B150313: Diff 409759.Feb 17 2022, 2:06 PM

Diff 409759

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,612 Lines • ▼ Show 20 Lines
#endif		#endif

friend struct GraphTraits<BoUpSLP *>;		friend struct GraphTraits<BoUpSLP *>;
friend struct DOTGraphTraits<BoUpSLP *>;		friend struct DOTGraphTraits<BoUpSLP *>;

/// Contains all scheduling data for a basic block.		/// Contains all scheduling data for a basic block.
struct BlockScheduling {		struct BlockScheduling {
BlockScheduling(BasicBlock *BB)		BlockScheduling(BasicBlock *BB)
: BB(BB), ChunkSize(BB->size()), ChunkPos(ChunkSize) {}		: BB(BB), ChunkSize(BB->size()), ChunkPos(ChunkSize) {}

		ABataevUnsubmitted Done Reply Inline Actions Not sure it is a good idea to add tree here. Maybe just add an internal flag and update `R->NoCallInst` after scheduling the region, reading the flag from BlockScheduling? or return as one of the results of `BS.tryScheduleBundle`. ABataev: Not sure it is a good idea to add tree here. Maybe just add an internal flag and update `R…
		ABataevUnsubmitted Not Done Reply Inline Actions Also, maybe it is worth trying to change `BoUpSLP::getSpillCost` too? For example, we can try to calculate `NumCalls` in the scheduling function and avoid doing this in the function `BoUpSLP::getSpillCost`. ABataev: Also, maybe it is worth trying to change `BoUpSLP::getSpillCost` too? For example, we can try…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions I think the implementation is too complex, here in getSpillCost we are calculating the number of calls between two given instructions and as we might change a tree entry the state to gather this might become even more complicated. dtemirbulatov: I think the implementation is too complex, here in getSpillCost we are calculating the number…
void clear() {		void clear() {
ReadyInsts.clear();		ReadyInsts.clear();
ScheduleStart = nullptr;		ScheduleStart = nullptr;
ScheduleEnd = nullptr;		ScheduleEnd = nullptr;
FirstLoadStoreInRegion = nullptr;		FirstLoadStoreInRegion = nullptr;
LastLoadStoreInRegion = nullptr;		LastLoadStoreInRegion = nullptr;

// Reduce the maximum schedule region size by the size of the		// Reduce the maximum schedule region size by the size of the
// previous scheduling run.		// previous scheduling run.
ScheduleRegionSizeLimit -= ScheduleRegionSize;		ScheduleRegionSizeLimit -= ScheduleRegionSize;
if (ScheduleRegionSizeLimit < MinScheduleRegionSize)		if (ScheduleRegionSizeLimit < MinScheduleRegionSize)
ScheduleRegionSizeLimit = MinScheduleRegionSize;		ScheduleRegionSizeLimit = MinScheduleRegionSize;
ScheduleRegionSize = 0;		ScheduleRegionSize = 0;

// Make a new scheduling region, i.e. all existing ScheduleData is not		// Make a new scheduling region, i.e. all existing ScheduleData is not
// in the new region yet.		// in the new region yet.
++SchedulingRegionID;		++SchedulingRegionID;
		NoCalls = true;
}		}

ScheduleData getScheduleData(Value V) {		ScheduleData getScheduleData(Value V) {
ScheduleData *SD = ScheduleDataMap[V];		ScheduleData *SD = ScheduleDataMap[V];
if (SD && SD->SchedulingRegionID == SchedulingRegionID)		if (SD && SD->SchedulingRegionID == SchedulingRegionID)
return SD;		return SD;
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	struct BlockScheduling {
/// The maximum size allowed for the scheduling region.		/// The maximum size allowed for the scheduling region.
int ScheduleRegionSizeLimit = ScheduleRegionSizeBudget;		int ScheduleRegionSizeLimit = ScheduleRegionSizeBudget;

/// The ID of the scheduling region. For a new vectorization iteration this		/// The ID of the scheduling region. For a new vectorization iteration this
/// is incremented which "removes" all ScheduleData from the region.		/// is incremented which "removes" all ScheduleData from the region.
// Make sure that the initial SchedulingRegionID is greater than the		// Make sure that the initial SchedulingRegionID is greater than the
// initial SchedulingRegionID in ScheduleData (which is 0).		// initial SchedulingRegionID in ScheduleData (which is 0).
int SchedulingRegionID = 1;		int SchedulingRegionID = 1;

		/// Indicates that no CallInst found in the tree and we don't need to
		/// calculate spill cost.
		bool NoCalls = true;
};		};

/// Attaches the BlockScheduling structures to basic blocks.		/// Attaches the BlockScheduling structures to basic blocks.
MapVector<BasicBlock *, std::unique_ptr<BlockScheduling>> BlocksSchedules;		MapVector<BasicBlock *, std::unique_ptr<BlockScheduling>> BlocksSchedules;

/// Performs the "real" scheduling. Done before vectorization is actually		/// Performs the "real" scheduling. Done before vectorization is actually
/// performed in a basic block.		/// performed in a basic block.
void scheduleBlock(BlockScheduling *BS);		void scheduleBlock(BlockScheduling *BS);
▲ Show 20 Lines • Show All 2,910 Lines • ▼ Show 20 Lines	BasicBlock::reverse_iterator InstIt = ++Inst->getIterator().getReverse(),
PrevInstIt =		PrevInstIt =
PrevInst->getIterator().getReverse();		PrevInst->getIterator().getReverse();
while (InstIt != PrevInstIt) {		while (InstIt != PrevInstIt) {
if (PrevInstIt == PrevInst->getParent()->rend()) {		if (PrevInstIt == PrevInst->getParent()->rend()) {
PrevInstIt = Inst->getParent()->rbegin();		PrevInstIt = Inst->getParent()->rbegin();
continue;		continue;
}		}

// Debug information does not impact spill cost.		// Debug information does not impact spill cost.
if ((isa<CallInst>(&*PrevInstIt) &&		if (isa<CallInst>(&*PrevInstIt)) {
		ABataevUnsubmitted Done Reply Inline Actions `auto CI` ABataev:* `auto *CI`
		nikicUnsubmitted Not Done Reply Inline Actions I think it would be best to simply skip all intrinsics here. Intrinsics being expanded inline rather than producing libcalls is a good default assumption if you don't want to model this in detail. nikic: I think it would be best to simply skip all intrinsics here. Intrinsics being expanded inline…
!isa<DbgInfoIntrinsic>(&*PrevInstIt)) &&		auto II = dyn_cast<IntrinsicInst>(&PrevInstIt);
		ABataevUnsubmitted Not Done Reply Inline Actions There is special function `IntrinsicInst::isAssumeLikeIntrinsic()`, use it here ABataev: There is special function `IntrinsicInst::isAssumeLikeIntrinsic()`, use it here
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions We already dynamically cast to CallInst and instanceof AssumeInst is way cheaper. dtemirbulatov: We already dynamically cast to CallInst and instanceof AssumeInst is way cheaper.
		ABataevUnsubmitted Not Done Reply Inline Actions `intrinsicinst::isAssumeLikeIntrinsic()` is way more universal and easier to maintain. ABataev: `intrinsicinst::isAssumeLikeIntrinsic()` is way more universal and easier to maintain.
&*PrevInstIt != PrevInst)		if (!II \|\| !II->isAssumeLikeIntrinsic())
NumCalls++;		NumCalls++;
		}
		ABataevUnsubmitted Done Reply Inline Actions I believe `isAssumeLikeIntrinsic()` covers all the checks here, no? ABataev: I believe `isAssumeLikeIntrinsic()` covers all the checks here, no?
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions Ah, Correct. dtemirbulatov: Ah, Correct.

++PrevInstIt;		++PrevInstIt;
}		}

if (NumCalls) {		if (NumCalls) {
SmallVector<Type*, 4> V;		SmallVector<Type*, 4> V;
for (auto *II : LiveValues) {		for (auto *II : LiveValues) {
auto *ScalarTy = II->getType();		auto *ScalarTy = II->getType();
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	if (MinBWs.count(ScalarRoot)) {
ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),		ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),
VecTy, EU.Lane);		VecTy, EU.Lane);
} else {		} else {
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

InstructionCost SpillCost = getSpillCost();		bool NoCallInst = all_of(BlocksSchedules,
		[](const decltype(BlocksSchedules)::value_type &BS) {
		return BS.second->NoCalls;
		});

		InstructionCost SpillCost = NoCallInst ? 0 : getSpillCost();
		assert((!NoCallInst \|\| getSpillCost() == 0) && "Incorrect spill cost");
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;
		ABataevUnsubmitted Done Reply Inline Actions Hmm, you're using `NoCalls` here but it is always `false`. Currently `NoCallInst` is always false. ABataev: Hmm, you're using `NoCalls` here but it is always `false`. Currently `NoCallInst` is always…
		ABataevUnsubmitted Not Done Reply Inline Actions bool NoCallInst = all_of(BlocksSchedules, [](const decltype(BlocksSchedules)::value_type &BS) {return BSIter.second->NoCalls; })); ABataev: ``` bool NoCallInst = all_of(BlocksSchedules, [](const decltype(BlocksSchedules)::value_type…
if (FirstUsers.size() == 1) {		if (FirstUsers.size() == 1) {
int Limit = ShuffleMask.front().size() * 2;		int Limit = ShuffleMask.front().size() * 2;
if (all_of(ShuffleMask.front(), [Limit](int Idx) { return Idx < Limit; }) &&		if (all_of(ShuffleMask.front(), [Limit](int Idx) { return Idx < Limit; }) &&
		ABataevUnsubmitted Done Reply Inline Actions Better to make it `(!NoCallInst \|\| getSpillCost() == 0) && ...`. ABataev: Better to make it `(!NoCallInst \|\| getSpillCost() == 0) && ...`.
!ShuffleVectorInst::isIdentityMask(ShuffleMask.front())) {		!ShuffleVectorInst::isIdentityMask(ShuffleMask.front())) {
InstructionCost C = TTI->getShuffleCost(		InstructionCost C = TTI->getShuffleCost(
TTI::SK_PermuteSingleSrc,		TTI::SK_PermuteSingleSrc,
cast<FixedVectorType>(FirstUsers.front()->getType()),		cast<FixedVectorType>(FirstUsers.front()->getType()),
ShuffleMask.front());		ShuffleMask.front());
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for final shuffle of insertelement external users "		<< " for final shuffle of insertelement external users "
<< *VectorizableTree.front()->Scalars.front() << ".\n"		<< *VectorizableTree.front()->Scalars.front() << ".\n"
▲ Show 20 Lines • Show All 1,759 Lines • ▼ Show 20 Lines	if (I->mayReadOrWriteMemory() &&
// Update the linked list of memory accessing instructions.		// Update the linked list of memory accessing instructions.
if (CurrentLoadStore) {		if (CurrentLoadStore) {
CurrentLoadStore->NextLoadStore = SD;		CurrentLoadStore->NextLoadStore = SD;
} else {		} else {
FirstLoadStoreInRegion = SD;		FirstLoadStoreInRegion = SD;
}		}
CurrentLoadStore = SD;		CurrentLoadStore = SD;
}		}
		if (isa<CallInst>(I)) {
		ABataevUnsubmitted Not Done Reply Inline Actions lifetime intrinsics are also kind of `DbgInfoIntrinsic`? Or `llvm.assume`? Better to check for all these kinds of intrinsics. ABataev: lifetime intrinsics are also kind of `DbgInfoIntrinsic`? Or `llvm.assume`? Better to check for…
		ABataevUnsubmitted Done Reply Inline Actions `auto CI` ABataev:* `auto *CI`
		auto *II = dyn_cast<IntrinsicInst>(I);
		ABataevUnsubmitted Not Done Reply Inline Actions There is special function `IntrinsicInst::isAssumeLikeIntrinsic()`, use it here ABataev: There is special function `IntrinsicInst::isAssumeLikeIntrinsic()`, use it here
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions We already dynamically cast to CallInst and instanceof to AssumeInst is way cheaper. dtemirbulatov: We already dynamically cast to CallInst and instanceof to AssumeInst is way cheaper.
		if (!II \|\| !II->isAssumeLikeIntrinsic())
		NoCalls = false;
		}
}		}
if (NextLoadStore) {		if (NextLoadStore) {
if (CurrentLoadStore)		if (CurrentLoadStore)
CurrentLoadStore->NextLoadStore = NextLoadStore;		CurrentLoadStore->NextLoadStore = NextLoadStore;
} else {		} else {
LastLoadStoreInRegion = CurrentLoadStore;		LastLoadStoreInRegion = CurrentLoadStore;
}		}
}		}
▲ Show 20 Lines • Show All 2,782 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Avoid calculating expensive spill cost when it is not required
Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 409759

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Avoid calculating expensive spill cost when it is not requiredNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 409759

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

[SLP] Avoid calculating expensive spill cost when it is not required
Needs ReviewPublic