This is an archive of the discontinued LLVM Phabricator instance.

[RS4GC] Extract rematerilazable candidate search. NFC.
ClosedPublic

Authored by skatkov on Jan 31 2022, 9:29 PM.

Download Raw Diff

Details

Reviewers

reames
dantrushin

Commits

rG66f1c6fc7136: [RS4GC] Extract rematerilazable candidate search. NFC.

Summary

Finding re-materialization chain for derived pointer does not depend on
call site. To avoid this finding for each call site it can be extracted in
a separate routine.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

skatkov created this revision.Jan 31 2022, 9:29 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 31 2022, 9:29 PM

skatkov requested review of this revision.Jan 31 2022, 9:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 9:29 PM

Harbormaster completed remote builds in B146818: Diff 404809.Jan 31 2022, 10:17 PM

General direction seems reasonable, a couple inline comments, but likely to converge to an LGTM quickly.

However, I'd suggest taking a step back and asking whether this code makes sense at all. I don't want to make this blocking, but I think it does make sense to reexamine before going much further.

Given a random derived pointer P and it's base B, we can always find the offset via sub(ptrtoint(P), ptrtoint(B)). Materializing the ptrtoint is maybe undesirable, but we already do this in a couple places in the existing code. (Remember, this is a lowering pass.) See gc.get.pointer.offset.

I'd argue this means we should either be remating everything, or nothing. The current heuristic - which amounts to "can we find the gep chain?" - really doesn't seem defensible.

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
287	Shouldn't this always be the base pointer of the last element in the chain? If so, might be good to either not store explicitly, or at least assert.
2275	This really looks like something we should fix in the findBasePointer code, but we can leave it for the moment. We're adding complexity to avoid fixing another problem, and that seems non ideal.
2330	You're doing a copy here, but I don't see either the copy or the source modified. Did you mean this to be a const reference?

I agree that this re-materialization is not great (especially cost model). Will explain my understanding in a separate comment.

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
287	According to comment below (special handling of findBasePointer inefficiency) RootOfChain might differ from found base pointer and we need to keep it to make a replacement of RootOfChain during re-materialization. According to findRematerializationCandidates, RootOfChain might be only base or alternative phi. This is by construction. What assert do you propose to add? Just to repeat a logic of findRematerializationCandidates? If we solve the problem with inefficiency of findBasePointer we will need this field at all.
2275	Agreed. I wanted to keep it as NFC.
2330	Next line is a reverse of the chain.

Hi Philip, I've tried to consider re-materialization of derived pointers from live range profitability.
Let's our dp = base + Idx *scale + offset.
Let's disp = dp - base = Idx *scale + offset. (this is about your potential proposal to remat all as I've understood it).

Let's we have the following points in a program:
0 Def of Idx and Off
1 Def of dp and probably disp if we consider this case.
2 Call site with statepoint (dp and base should ba alive here)
2.2 Remat of dp close to call site. (If we do re-materialization)
2.8 Remat of dp close to use. (If we do re-materialization)
3 Use of dp.

Now consider 5 scenarios of re-materialization:
A. No re-materialization at all.
B. Remat at call site
C. Remat at use
D. Remat at call site with disp pre-computerd.
E. Remat at call site with disp at use.

What live in different ranges?
[0,1) - Idx and off for A, B, C, D, E.
[1,2) -
A: base, dp
B: Off, Idx, base
C: Off, Idx, base
D: base, disp
E: base, disp
[2, 2.2) -
A: dp
B: Off, Idx, base
C: Off, Idx, base
D: base, disp
E: base, disp
[2.2, 2.8) -
A: dp
B: dp
C: Off, Idx, base
D: dp
E: base, disp
[2.8, 3) - only dp for A, B, C, D, E.

So in general case, in terms of liveness, no re-materialization is better than any re-materialization.
And disp case is better than Idx + Off in case both of them are not constant.
But if Idx and Off are constant and disp as a result we get
[0,1) - none
[1,2) -
A: base, dp
B: base
C: base
D: base
E: base
[2, 2.2) -
A: dp
B: base
C: base
D: base
E: base
[2.2, 2.8) -
A: dp
B: dp
C: base
D: dp
E: base
[2.8, 3) - only dp for A, B, C, D, E.
So any kind of re-materialization is better than no re-materialization.

So in terms of liveness, re-materialization is profitable if Idx and Off are constants.

However re-materialization has some additional effects.
First of all dp computation may potentially be folded into use and in the case re-materialization may be positive in spite of extend of live range due to we eliminate the derived pointer computation.

Additionally even if do not fold derived pointer into all users we may move the derived pointer computation from hot block to cold one (positive impact) or from cold block to hot block (negative impact).

Taking all above (which possibly misses some corner cases) I would say that re-materialization is profitable if

We can fold address computation to all users and Idx and Off are constants.
Rematerialization insert point is colder that original derived point computation and Idx and Off are constants.
If Idx and Off are not constants, think three times before moving to colder block.

Any thoguhts? Would be very glad to consider this from other angles and continue discussion.
For a while we can land this re-factoring due to it change nothing - NFC.

Sorry for the long comment.

LGTM to this patch. All of my style/code concerns had good justifications.

On the general topic, I was thinking of remat slightly differently. You broke down the indexing expression into the component pieces (Scale, Idx, Disp). I was thinking about simply computing (Ptr-Base) or RawOffset. RawOffset might be factorable into component pieces, but I wasn't thinking in terms of extending the live ranges of those pieces.

Instead, I was thinking of rewriting this code:
base = ...
p = some_offset_function(base)
...
use (p)

Into
base = ...
rawoffset = some_offset_function(base) - p
...
p = base + rawoffset
use (p)

With this form, p+rawOffset should reliable fold into use (for memory uses at least), and the live range of rawoffset is exactly the same as the original p. We do have an extra variable in the sub-expression, but that's very localized register pressure as the variable for some_offset_function extends one instruction (sub) past the original definition of p.

This revision is now accepted and ready to land.Feb 2 2022, 4:12 PM

This revision was landed with ongoing or failed builds.Feb 3 2022, 7:11 PM

Closed by commit rG66f1c6fc7136: [RS4GC] Extract rematerilazable candidate search. NFC. (authored by skatkov). · Explain Why

This revision was automatically updated to reflect the committed changes.

skatkov added a commit: rG66f1c6fc7136: [RS4GC] Extract rematerilazable candidate search. NFC..

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

RewriteStatepointsForGC.cpp

105 lines

Diff 405855

llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	struct PartiallyConstructedSafepointRecord {
Instruction *UnwindToken;		Instruction *UnwindToken;

/// Record live values we are rematerialized instead of relocating.		/// Record live values we are rematerialized instead of relocating.
/// They are not included into 'LiveSet' field.		/// They are not included into 'LiveSet' field.
/// Maps rematerialized copy to it's original value.		/// Maps rematerialized copy to it's original value.
RematerializedValueMapTy RematerializedValues;		RematerializedValueMapTy RematerializedValues;
};		};

		struct RematerizlizationCandidateRecord {
		// Chain from derived pointer to base.
		SmallVector<Instruction *, 3> ChainToBase;
		// Original base.
		reamesUnsubmitted Not Done Reply Inline Actions Shouldn't this always be the base pointer of the last element in the chain? If so, might be good to either not store explicitly, or at least assert. reames: Shouldn't this always be the base pointer of the last element in the chain? If so, might be…
		skatkovAuthorUnsubmitted Done Reply Inline Actions According to comment below (special handling of findBasePointer inefficiency) RootOfChain might differ from found base pointer and we need to keep it to make a replacement of RootOfChain during re-materialization. According to findRematerializationCandidates, RootOfChain might be only base or alternative phi. This is by construction. What assert do you propose to add? Just to repeat a logic of findRematerializationCandidates? If we solve the problem with inefficiency of findBasePointer we will need this field at all. skatkov: According to comment below (special handling of findBasePointer inefficiency) RootOfChain might…
		Value *RootOfChain;
		// Cost of chain.
		InstructionCost Cost;
		};
		using RematCandTy = MapVector<Value *, RematerizlizationCandidateRecord>;

} // end anonymous namespace		} // end anonymous namespace

static ArrayRef<Use> GetDeoptBundleOperands(const CallBase *Call) {		static ArrayRef<Use> GetDeoptBundleOperands(const CallBase *Call) {
Optional<OperandBundleUse> DeoptBundle =		Optional<OperandBundleUse> DeoptBundle =
Call->getOperandBundle(LLVMContext::OB_deopt);		Call->getOperandBundle(LLVMContext::OB_deopt);

if (!DeoptBundle.hasValue()) {		if (!DeoptBundle.hasValue()) {
assert(AllowStatepointWithNoDeoptInfo &&		assert(AllowStatepointWithNoDeoptInfo &&
▲ Show 20 Lines • Show All 1,924 Lines • ▼ Show 20 Lines	if (CIVI == CurrentIncomingValues.end())
return false;		return false;
BasicBlock *CurrentIncomingBB = CIVI->second;		BasicBlock *CurrentIncomingBB = CIVI->second;
if (CurrentIncomingBB != AlternateRootPhi.getIncomingBlock(i))		if (CurrentIncomingBB != AlternateRootPhi.getIncomingBlock(i))
return false;		return false;
}		}
return true;		return true;
}		}

// From the statepoint live set pick values that are cheaper to recompute then		// Find derived pointers that can be recomputed cheap enough and fill
// to relocate. Remove this values from the live set, rematerialize them after		// RematerizationCandidates with such candidates.
// statepoint and record them in "Info" structure. Note that similar to		static void
// relocated values we don't do any user adjustments here.		findRematerializationCandidates(PointerToBaseTy PointerToBase,
static void rematerializeLiveValues(CallBase *Call,		RematCandTy &RematerizationCandidates,
PartiallyConstructedSafepointRecord &Info,
PointerToBaseTy &PointerToBase,
TargetTransformInfo &TTI) {		TargetTransformInfo &TTI) {
const unsigned int ChainLengthThreshold = 10;		const unsigned int ChainLengthThreshold = 10;

// Record values we are going to delete from this statepoint live set.		for (auto P2B : PointerToBase) {
// We can not di this in following loop due to iterator invalidation.		auto *Derived = P2B.first;
SmallVector<Value *, 32> LiveValuesToBeDeleted;		auto *Base = P2B.second;
		// Consider only derived pointers.
		if (Derived == Base)
		continue;

for (Value *LiveValue: Info.LiveSet) {		// For each live pointer find its defining chain.
// For each live pointer find its defining chain
SmallVector<Instruction *, 3> ChainToBase;		SmallVector<Instruction *, 3> ChainToBase;
assert(PointerToBase.count(LiveValue));
Value *RootOfChain =		Value *RootOfChain =
findRematerializableChainToBasePointer(ChainToBase,		findRematerializableChainToBasePointer(ChainToBase, Derived);
LiveValue);

// Nothing to do, or chain is too long		// Nothing to do, or chain is too long
if ( ChainToBase.size() == 0 \|\|		if ( ChainToBase.size() == 0 \|\|
ChainToBase.size() > ChainLengthThreshold)		ChainToBase.size() > ChainLengthThreshold)
continue;		continue;

// Handle the scenario where the RootOfChain is not equal to the		// Handle the scenario where the RootOfChain is not equal to the
// Base Value, but they are essentially the same phi values.		// Base Value, but they are essentially the same phi values.
if (RootOfChain != PointerToBase[LiveValue]) {		if (RootOfChain != PointerToBase[Derived]) {
PHINode *OrigRootPhi = dyn_cast<PHINode>(RootOfChain);		PHINode *OrigRootPhi = dyn_cast<PHINode>(RootOfChain);
PHINode *AlternateRootPhi = dyn_cast<PHINode>(PointerToBase[LiveValue]);		PHINode *AlternateRootPhi = dyn_cast<PHINode>(PointerToBase[Derived]);
if (!OrigRootPhi \|\| !AlternateRootPhi)		if (!OrigRootPhi \|\| !AlternateRootPhi)
continue;		continue;
// PHI nodes that have the same incoming values, and belonging to the same		// PHI nodes that have the same incoming values, and belonging to the same
// basic blocks are essentially the same SSA value. When the original phi		// basic blocks are essentially the same SSA value. When the original phi
// has incoming values with different base pointers, the original phi is		// has incoming values with different base pointers, the original phi is
// marked as conflict, and an additional `AlternateRootPhi` with the same		// marked as conflict, and an additional `AlternateRootPhi` with the same
// incoming values get generated by the findBasePointer function. We need		// incoming values get generated by the findBasePointer function. We need
// to identify the newly generated AlternateRootPhi (.base version of phi)		// to identify the newly generated AlternateRootPhi (.base version of phi)
// and RootOfChain (the original phi node itself) are the same, so that we		// and RootOfChain (the original phi node itself) are the same, so that we
// can rematerialize the gep and casts. This is a workaround for the		// can rematerialize the gep and casts. This is a workaround for the
// deficiency in the findBasePointer algorithm.		// deficiency in the findBasePointer algorithm.
if (!AreEquivalentPhiNodes(OrigRootPhi, AlternateRootPhi))		if (!AreEquivalentPhiNodes(OrigRootPhi, AlternateRootPhi))
		reamesUnsubmitted Not Done Reply Inline Actions This really looks like something we should fix in the findBasePointer code, but we can leave it for the moment. We're adding complexity to avoid fixing another problem, and that seems non ideal. reames: This really looks like something we should fix in the findBasePointer code, but we can leave it…
		skatkovAuthorUnsubmitted Done Reply Inline Actions Agreed. I wanted to keep it as NFC. skatkov: Agreed. I wanted to keep it as NFC.
continue;		continue;
// Now that the phi nodes are proved to be the same, assert that
// findBasePointer's newly generated AlternateRootPhi is present in the
// liveset of the call.
assert(Info.LiveSet.count(AlternateRootPhi));
}		}
// Compute cost of this chain		// Compute cost of this chain.
InstructionCost Cost = chainToBasePointerCost(ChainToBase, TTI);		InstructionCost Cost = chainToBasePointerCost(ChainToBase, TTI);
// TODO: We can also account for cases when we will be able to remove some		// TODO: We can also account for cases when we will be able to remove some
// of the rematerialized values by later optimization passes. I.e if		// of the rematerialized values by later optimization passes. I.e if
// we rematerialized several intersecting chains. Or if original values		// we rematerialized several intersecting chains. Or if original values
// don't have any uses besides this statepoint.		// don't have any uses besides this statepoint.

		// Ok, there is a candidate.
		RematerizlizationCandidateRecord Record;
		Record.ChainToBase = ChainToBase;
		Record.RootOfChain = RootOfChain;
		Record.Cost = Cost;
		RematerizationCandidates.insert({ Derived, Record });
		}
		}

		// From the statepoint live set pick values that are cheaper to recompute then
		// to relocate. Remove this values from the live set, rematerialize them after
		// statepoint and record them in "Info" structure. Note that similar to
		// relocated values we don't do any user adjustments here.
		static void rematerializeLiveValues(CallBase *Call,
		PartiallyConstructedSafepointRecord &Info,
		PointerToBaseTy &PointerToBase,
		RematCandTy &RematerizationCandidates,
		TargetTransformInfo &TTI) {
		// Record values we are going to delete from this statepoint live set.
		// We can not di this in following loop due to iterator invalidation.
		SmallVector<Value *, 32> LiveValuesToBeDeleted;

		for (Value *LiveValue : Info.LiveSet) {
		auto It = RematerizationCandidates.find(LiveValue);
		if (It == RematerizationCandidates.end())
		continue;

		RematerizlizationCandidateRecord &Record = It->second;

		InstructionCost Cost = Record.Cost;
// For invokes we need to rematerialize each chain twice - for normal and		// For invokes we need to rematerialize each chain twice - for normal and
// for unwind basic blocks. Model this by multiplying cost by two.		// for unwind basic blocks. Model this by multiplying cost by two.
if (isa<InvokeInst>(Call)) {		if (isa<InvokeInst>(Call))
Cost *= 2;		Cost *= 2;
}
// If it's too expensive - skip it		// If it's too expensive - skip it.
if (Cost >= RematerializationThreshold)		if (Cost >= RematerializationThreshold)
continue;		continue;

// Remove value from the live set		// Remove value from the live set
LiveValuesToBeDeleted.push_back(LiveValue);		LiveValuesToBeDeleted.push_back(LiveValue);

// Clone instructions and record them inside "Info" structure		// Clone instructions and record them inside "Info" structure.

// Walk backwards to visit top-most instructions first		// For each live pointer find get its defining chain.
		SmallVector<Instruction *, 3> ChainToBase = Record.ChainToBase;
		reamesUnsubmitted Not Done Reply Inline Actions You're doing a copy here, but I don't see either the copy or the source modified. Did you mean this to be a const reference? reames: You're doing a copy here, but I don't see either the copy or the source modified. Did you mean…
		skatkovAuthorUnsubmitted Done Reply Inline Actions Next line is a reverse of the chain. skatkov: Next line is a reverse of the chain.
		// Walk backwards to visit top-most instructions first.
std::reverse(ChainToBase.begin(), ChainToBase.end());		std::reverse(ChainToBase.begin(), ChainToBase.end());

// Utility function which clones all instructions from "ChainToBase"		// Utility function which clones all instructions from "ChainToBase"
// and inserts them before "InsertBefore". Returns rematerialized value		// and inserts them before "InsertBefore". Returns rematerialized value
// which should be used after statepoint.		// which should be used after statepoint.
auto rematerializeChain = [&ChainToBase](		auto rematerializeChain = [&ChainToBase](
Instruction InsertBefore, Value RootOfChain, Value *AlternateLiveBase) {		Instruction InsertBefore, Value RootOfChain, Value *AlternateLiveBase) {
Instruction *LastClonedValue = nullptr;		Instruction *LastClonedValue = nullptr;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	#endif
};		};

// Different cases for calls and invokes. For invokes we need to clone		// Different cases for calls and invokes. For invokes we need to clone
// instructions both on normal and unwind path.		// instructions both on normal and unwind path.
if (isa<CallInst>(Call)) {		if (isa<CallInst>(Call)) {
Instruction *InsertBefore = Call->getNextNode();		Instruction *InsertBefore = Call->getNextNode();
assert(InsertBefore);		assert(InsertBefore);
Instruction *RematerializedValue = rematerializeChain(		Instruction *RematerializedValue = rematerializeChain(
InsertBefore, RootOfChain, PointerToBase[LiveValue]);		InsertBefore, Record.RootOfChain, PointerToBase[LiveValue]);
Info.RematerializedValues[RematerializedValue] = LiveValue;		Info.RematerializedValues[RematerializedValue] = LiveValue;
} else {		} else {
auto *Invoke = cast<InvokeInst>(Call);		auto *Invoke = cast<InvokeInst>(Call);

Instruction *NormalInsertBefore =		Instruction *NormalInsertBefore =
&*Invoke->getNormalDest()->getFirstInsertionPt();		&*Invoke->getNormalDest()->getFirstInsertionPt();
Instruction *UnwindInsertBefore =		Instruction *UnwindInsertBefore =
&*Invoke->getUnwindDest()->getFirstInsertionPt();		&*Invoke->getUnwindDest()->getFirstInsertionPt();

Instruction *NormalRematerializedValue = rematerializeChain(		Instruction *NormalRematerializedValue = rematerializeChain(
NormalInsertBefore, RootOfChain, PointerToBase[LiveValue]);		NormalInsertBefore, Record.RootOfChain, PointerToBase[LiveValue]);
Instruction *UnwindRematerializedValue = rematerializeChain(		Instruction *UnwindRematerializedValue = rematerializeChain(
UnwindInsertBefore, RootOfChain, PointerToBase[LiveValue]);		UnwindInsertBefore, Record.RootOfChain, PointerToBase[LiveValue]);

Info.RematerializedValues[NormalRematerializedValue] = LiveValue;		Info.RematerializedValues[NormalRematerializedValue] = LiveValue;
Info.RematerializedValues[UnwindRematerializedValue] = LiveValue;		Info.RematerializedValues[UnwindRematerializedValue] = LiveValue;
}		}
}		}

// Remove rematerializaed values from the live set		// Remove rematerializaed values from the live set
for (auto LiveValue: LiveValuesToBeDeleted) {		for (auto LiveValue: LiveValuesToBeDeleted) {
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	for (auto &Info : Records) {
});		});
}		}

for (CallInst *CI : Holders)		for (CallInst *CI : Holders)
CI->eraseFromParent();		CI->eraseFromParent();

Holders.clear();		Holders.clear();

		// Compute the cost of possible re-materialization of derived pointers.
		RematCandTy RematerizationCandidates;
		findRematerializationCandidates(PointerToBase, RematerizationCandidates, TTI);

// In order to reduce live set of statepoint we might choose to rematerialize		// In order to reduce live set of statepoint we might choose to rematerialize
// some values instead of relocating them. This is purely an optimization and		// some values instead of relocating them. This is purely an optimization and
// does not influence correctness.		// does not influence correctness.
for (size_t i = 0; i < Records.size(); i++)		for (size_t i = 0; i < Records.size(); i++)
rematerializeLiveValues(ToUpdate[i], Records[i], PointerToBase, TTI);		rematerializeLiveValues(ToUpdate[i], Records[i], PointerToBase,
		RematerizationCandidates, TTI);

// We need this to safely RAUW and delete call or invoke return values that		// We need this to safely RAUW and delete call or invoke return values that
// may themselves be live over a statepoint. For details, please see usage in		// may themselves be live over a statepoint. For details, please see usage in
// makeStatepointExplicitImpl.		// makeStatepointExplicitImpl.
std::vector<DeferredReplacement> Replacements;		std::vector<DeferredReplacement> Replacements;

// Now run through and replace the existing statepoints with new ones with		// Now run through and replace the existing statepoints with new ones with
// the live variables listed. We do not yet update uses of the values being		// the live variables listed. We do not yet update uses of the values being
▲ Show 20 Lines • Show All 558 Lines • Show Last 20 Lines