This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/lib/Transforms/IPO/
-
trunk/
-
lib/
-
Transforms/
-
IPO/
-
FunctionImport.cpp

Differential D27696

[ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC)
ClosedPublic

Authored by tejohnson on Dec 12 2016, 8:18 PM.

Download Raw Diff

Details

Reviewers

mehdi_amini

Commits

rG475b51a70004: [ThinLTO] Thin link efficiency: skip candidate added later with higher…
rL289867: [ThinLTO] Thin link efficiency: skip candidate added later with higher…

Summary

Thin link efficiency improvement. After adding an importing candidate to
the worklist we might have later added it again with a higher threshold.
Skip it when popped from the worklist if we recorded a higher threshold
than the current worklist entry, it will get processed again at the
higher threshold when that entry is popped.

This required adding the summary's GUID to the worklist, so that it can
be used to query the recorded highest threshold for it when we pop from the
worklist.

Diff Detail

Repository: rL LLVM

Event Timeline

tejohnson updated this revision to Diff 81175.Dec 12 2016, 8:18 PM

tejohnson retitled this revision from to [ThinLTO] Thin link efficiency: skip candidate added later with higher threshold (NFC).

tejohnson updated this object.

tejohnson added a reviewer: mehdi_amini.

tejohnson added a subscriber: llvm-commits.

I'm surprised you're working on optimizing this, is this showing up on any profile?
Did you measure the actual benefit here?

In D27696#620610, @mehdi_amini wrote:

I'm surprised you're working on optimizing this, is this showing up on any profile?
Did you measure the actual benefit here?

Yes, see also D27687. I was running with a large app and also with the thresholds cranked up, and it was slowing down too much. I've shaved off at least 35% in that particular case from these improvements.

mehdi_amini added inline comments.Dec 12 2016, 8:50 PM

lib/Transforms/IPO/FunctionImport.cpp
330 ↗	(On Diff #81175)	I'm fairly lost in the logic here between `GetAdjustedThreshold` and `GetBonusMultiplier`, still trying to make sense of what we're currently doing here.

mehdi_amini added inline comments.Dec 12 2016, 9:49 PM

lib/Transforms/IPO/FunctionImport.cpp
330 ↗	(On Diff #81175)	OK I figured what's going on, it is quite obscure to infer from the code though, and I believe the comment above is misleading.
330 ↗	(On Diff #81175)	(I take back what I wrote for the comment, was looking at the wrong place... So confusing!)

lib/Transforms/IPO/FunctionImport.cpp
330 ↗	(On Diff #81175)	This code motion to use `AdjThreshold` instead of `Threshold` below seems like a standalone fix?
343 ↗	(On Diff #81175)	All this logic to skip inserting in the worklist seems redundant with the new logic you added below while iterating over the work list (the latter seems to be a superset).

tejohnson added inline comments.Dec 14 2016, 6:54 AM

lib/Transforms/IPO/FunctionImport.cpp
330 ↗	(On Diff #81175)	I'm fairly lost in the logic here between GetAdjustedThreshold and GetBonusMultiplier, still trying to make sense of what we're currently doing here. The difference is that GetBonusMultiplier is only applied to the current callsite, when the call is hot. It isn't recorded or passed down to the next level of calls (otherwise it would be compounded). GetAdjustedThreshold applies the decay factor for the next level of calls. This code motion to use AdjThreshold instead of Threshold below seems like a standalone fix? This isn't a fix per se. By itself this part of the change should be a no op. E.g. before we would record the (non-decayed) Threshold in the ImportLists, which would then be pulled out into ProcessedThreshold and compared again to the (non-decayed) Threshold here the next time we saw this function. With this change, we instead record the decayed AdjThreshold in the ImportLists, and compare against the new decayed AdjThreshold here the next time we see this function. The reason for this change is so that we can compare the decayed threshold recorded in the Worklist to the threshold recorded in the ImportLists further down when we iterate through the Worklist. I.e. we need to compare apples to apples there.
343 ↗	(On Diff #81175)	All this logic to skip inserting in the worklist seems redundant with the new logic you added below while iterating over the work list (the latter seems to be a superset). It's true that with the change I am adding below where we iterate over the work list this is no longer strictly necessary. However, there's no good reason to insert into the work list again with a lower threshold if we know we have already added this function with a higher threshold, it will just make the work list longer for no benefit. And will require a redundant add of this GUID to the ExportList below - all easily avoidable since we have to access the current threshold anyway to update it above. The handling I added below to skip work list items when we pull them off the work list is to handle the case where we later added the function at a higher threshold (and it was already in the work list at the earlier lower threshold).

mehdi_amini added inline comments.Dec 14 2016, 8:45 AM

lib/Transforms/IPO/FunctionImport.cpp
330 ↗	(On Diff #81175)	The difference is that GetBonusMultiplier is only applied to the current callsite, when the call is hot. It isn't recorded or passed down to the next level of calls (otherwise it would be compounded). GetAdjustedThreshold applies the decay factor for the next level of calls. Yeah I got it in the end, still not straightfoward from the code. I though about improving but can't figure :) A high level description may be helpful, I'll think about it. This isn't a fix per se. By itself this part of the change should be a no op. E.g. before we would record the (non-decayed) Threshold in the ImportLists, which would then be pulled out into > ProcessedThreshold and compared again to the (non-decayed) Threshold here the next time we saw this function. With this change, we instead record the decayed AdjThreshold in the ImportLists, and compare against the new decayed AdjThreshold here the next time we see this function. I was considering the case where a function can be reached from a hot path or from a call path, and it didn't seem like a NFC change. For instance, first visit with a cold edge and a threshold of 100 -> set ProcessedThreshold to 100 and push the decayed threshold, let say 70, on the stack. visit with a hot edge and a threshold of 99 -> compare to the previous ProcessedThreshold and decide to not push on the stack. With your change, I believe we would: first visit with the cold edge with a threshold of 100 -> set ProcessedThreshold to the decayed threshold of 70 and push it on the stack. visit with the hot edge and a threshold of 99 -> compare to the previous ProcessedThreshold and decide to set ProcessedThreshold to the decayed threshold of 99 and push it on the stack.

tejohnson added inline comments.Dec 14 2016, 8:59 AM

lib/Transforms/IPO/FunctionImport.cpp
330 ↗	(On Diff #81175)	Gah - you're right! This change indeed fixes a bug in the logic, where we could have been missing imports along some hot paths. I will come up with a test case that is affected by this change. Do you want me to split this into a different patch on Phab, or is it enough to commit separately with a test case? Regarding the confusing handling of different bonuses, let me add a comment where we compute these two different adjustments.

LGTM.

(I'm fine with you committing separately without a new revision on Phab)

This revision is now accepted and ready to land.Dec 14 2016, 9:13 AM

tejohnson mentioned this in rL289843: [ThinLTO] Ensure callees get hot threshold when first seen on cold path.Dec 15 2016, 10:31 AM

Remove non-NFC change split out and committed in r289843.

tejohnson updated this object.Dec 15 2016, 11:14 AM

Closed by commit rL289867: [ThinLTO] Thin link efficiency: skip candidate added later with higher… (authored by tejohnson). · Explain WhyDec 15 2016, 12:58 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

IPO/

FunctionImport.cpp

17 lines

Diff 81641

llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp

Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	static void exportGlobalInModule(const ModuleSummaryIndex &Index,

auto *Summary = FindGlobalSummaryInModule(GUID);		auto *Summary = FindGlobalSummaryInModule(GUID);
if (!Summary)		if (!Summary)
return;		return;
// We found it in the current module, mark as exported		// We found it in the current module, mark as exported
ExportList.insert(GUID);		ExportList.insert(GUID);
}		}

using EdgeInfo = std::pair<const FunctionSummary , unsigned / Threshold */>;		using EdgeInfo = std::tuple<const FunctionSummary , unsigned / Threshold */,
		GlobalValue::GUID>;

/// Compute the list of functions to import for a given caller. Mark these		/// Compute the list of functions to import for a given caller. Mark these
/// imported functions and the symbols they reference in their source module as		/// imported functions and the symbols they reference in their source module as
/// exported from their source module.		/// exported from their source module.
static void computeImportForFunction(		static void computeImportForFunction(
const FunctionSummary &Summary, const ModuleSummaryIndex &Index,		const FunctionSummary &Summary, const ModuleSummaryIndex &Index,
const unsigned Threshold, const GVSummaryMapTy &DefinedGVSummaries,		const unsigned Threshold, const GVSummaryMapTy &DefinedGVSummaries,
SmallVectorImpl<EdgeInfo> &Worklist,		SmallVectorImpl<EdgeInfo> &Worklist,
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	if (ExportLists) {
for (auto &Ref : ResolvedCalleeSummary->refs()) {		for (auto &Ref : ResolvedCalleeSummary->refs()) {
auto GUID = Ref.getGUID();		auto GUID = Ref.getGUID();
exportGlobalInModule(Index, ExportModulePath, GUID, ExportList);		exportGlobalInModule(Index, ExportModulePath, GUID, ExportList);
}		}
}		}
}		}

// Insert the newly imported function to the worklist.		// Insert the newly imported function to the worklist.
Worklist.emplace_back(ResolvedCalleeSummary, AdjThreshold);		Worklist.emplace_back(ResolvedCalleeSummary, AdjThreshold, GUID);
}		}
}		}

/// Given the list of globals defined in a module, compute the list of imports		/// Given the list of globals defined in a module, compute the list of imports
/// as well as the list of "exports", i.e. the list of symbols referenced from		/// as well as the list of "exports", i.e. the list of symbols referenced from
/// another module (that may require promotion).		/// another module (that may require promotion).
static void ComputeImportForModule(		static void ComputeImportForModule(
const GVSummaryMapTy &DefinedGVSummaries, const ModuleSummaryIndex &Index,		const GVSummaryMapTy &DefinedGVSummaries, const ModuleSummaryIndex &Index,
Show All 17 Lines	for (auto &GVSummary : DefinedGVSummaries) {
computeImportForFunction(*FuncSummary, Index, ImportInstrLimit,		computeImportForFunction(*FuncSummary, Index, ImportInstrLimit,
DefinedGVSummaries, Worklist, ImportList,		DefinedGVSummaries, Worklist, ImportList,
ExportLists);		ExportLists);
}		}

// Process the newly imported functions and add callees to the worklist.		// Process the newly imported functions and add callees to the worklist.
while (!Worklist.empty()) {		while (!Worklist.empty()) {
auto FuncInfo = Worklist.pop_back_val();		auto FuncInfo = Worklist.pop_back_val();
auto *Summary = FuncInfo.first;		auto *Summary = std::get<0>(FuncInfo);
auto Threshold = FuncInfo.second;		auto Threshold = std::get<1>(FuncInfo);
		auto GUID = std::get<2>(FuncInfo);

		// Check if we later added this summary with a higher threshold.
		// If so, skip this entry.
		auto ExportModulePath = Summary->modulePath();
		auto &LatestProcessedThreshold = ImportList[ExportModulePath][GUID];
		if (LatestProcessedThreshold > Threshold)
		continue;

computeImportForFunction(*Summary, Index, Threshold, DefinedGVSummaries,		computeImportForFunction(*Summary, Index, Threshold, DefinedGVSummaries,
Worklist, ImportList, ExportLists);		Worklist, ImportList, ExportLists);
}		}
}		}

} // anonymous namespace		} // anonymous namespace

▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines