This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrolling] Re-prioritize Peeling and Partial unrolling
ClosedPublic

Authored by mkazantsev on Feb 21 2017, 10:05 PM.

Download Raw Diff

Details

Reviewers

reames
igor-laevsky
anna
mkuper
sanjoy
apilipenko

Commits

rGeed71b9e1ccd: [LoopUnrolling] Re-prioritize Peeling and Partial unrolling
rL296897: [LoopUnrolling] Re-prioritize Peeling and Partial unrolling

Summary

In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.

This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.

Diff Detail

Event Timeline

mkazantsev created this revision.Feb 21 2017, 10:05 PM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 21 2017, 10:05 PM

mkazantsev added a child revision: D30161: [LoopUnrolling] Peel loops with invariant backedge Phi input.Feb 21 2017, 10:06 PM

mkuper added a subscriber: mkuper.Feb 21 2017, 10:42 PM

mkuper added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
789	The current heuristic is supposed to be that we only peel when we don't know the trip count, but have an approximate one due to profile information. Partial unrolling requires a known trip count. So in the old code, they don't actually compete for priority, and this change looks like it should be NFC. I'm not sure this is completely NFC, though - I don't remember if computePeelCount actually enforces an unknown trip count, or just assumes it's unknown, because otherwise we'd quit at line 838. So you probably want this to be "if (!TripCount && UP.PeelCount)", or change computePeelCount() to bail on a known trip count. Otherwise, what you may in effect be doing is enabling peeling for loops with a constant low trip count that we chose not to unroll. Which is probably not something you want to do.

mkazantsev added inline comments.Feb 22 2017, 3:01 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
789	In fact, even if trip count is known, peeling can still be profitable. One of such cases is shown at https://reviews.llvm.org/D30161, where Phi's back edge input is a constant. Partial unrolling here does not give any benefits, while peeling does. My understanding is following: Peeling should not set peel count unless it has proved that peeling IS profitable. In this case nothing bad happens with small loops that we don't unroll: they will only be peeled if it is good. Maybe it is reasonable to add the following restriction to peeling: TripCount should be either unknown or above some threshold (for example, partial unrolling threshold). Does it make sense for you?

mkuper added inline comments.Feb 22 2017, 10:22 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
789	Sorry, I wasn't clear. I understand that there are cases where peeling is profitable even when the trip count is known - D30161 is a really nice example. But right now, when we have profile information, we try to peel by the estimated number of iterations. That specific kind of peeling, combined with this patch, has two issues: getLoopEstimatedTripCount() returns the estimated (based on the profile) trip count. This may be imprecise, and at this point, somewhat different from the actual trip count, especially for sampling-based FDO. So you may end up peeling by the "estimated" trip-count even though you know the actual tripcount. That sounds wrong. Even if we fix getLoopEstimatedTripCount() to return the real trip count, when available, you still basically get "let's peel a loop by its real tripcount", which is equivalent to full unrolling. In theory, it shouldn't happen, because full unrolling and peeling should be using the same thresholds, so if we decided not to fully unroll, we'll decide not to "peel" either. But I'm not sure this actually holds. I'm not saying we should disable any kind of peeling when the real trip count is known. Only that we should disable the "peel by the estimated trip count" heuristic.

mkazantsev updated this revision to Diff 89848.Feb 27 2017, 1:45 AM

mkazantsev edited the summary of this revision. (Show Details)

mkazantsev added a reviewer: mkuper.

mkazantsev marked 3 inline comments as done.

mkuper added inline comments.Feb 28 2017, 10:42 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
789	This patch doesn't really make sense as a stand-alone, and I'm not sure it's the right way to go as preparation to D30161 either. I would suggest something like: In this patch, invert the priorities of peeling and partial unrolling, but bail out of computePeelCount when the real trip count is known. This should be NFC. In D30161, change the logic of computePeelCount() to be something like: if conditions are right: peel by 1 else if trip count is known: bail out else: check if we should peel by estimated trip count.
lib/Transforms/Utils/LoopUnrollPeel.cpp
75 ↗	(On Diff #89848)	If you do decide to call this twice - I'd expect this to also get hit only the second time around.

mkazantsev updated this revision to Diff 90121.Feb 28 2017, 9:29 PM

mkazantsev marked an inline comment as done.

mkazantsev edited the summary of this revision. (Show Details)

mkazantsev marked an inline comment as done.Mar 1 2017, 4:07 AM

mkazantsev added inline comments.

lib/Transforms/Utils/LoopUnrollPeel.cpp
75 ↗	(On Diff #89848)	No longer calling twice.

LGTM

This revision is now accepted and ready to land.Mar 1 2017, 2:17 PM

Closed by commit rL296897: [LoopUnrolling] Re-prioritize Peeling and Partial unrolling (authored by sanjoy). · Explain WhyMar 3 2017, 10:31 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

18 lines

Diff 89322

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 778 Lines • ▼ Show 20 Lines	if (getUnrolledLoopSize(LoopSize, UP) < UP.Threshold) {
TripCount = FullUnrollTripCount;		TripCount = FullUnrollTripCount;
TripMultiple = UP.UpperBound ? 1 : TripMultiple;		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
return ExplicitUnroll;		return ExplicitUnroll;
}		}
}		}
}		}
}		}

// 4rd priority is partial unrolling.		// 4th priority is loop peeling
		computePeelCount(L, LoopSize, UP);
		if (UP.PeelCount) {
		mkuperUnsubmitted Done Reply Inline Actions The current heuristic is supposed to be that we only peel when we don't know the trip count, but have an approximate one due to profile information. Partial unrolling requires a known trip count. So in the old code, they don't actually compete for priority, and this change looks like it should be NFC. I'm not sure this is completely NFC, though - I don't remember if computePeelCount actually enforces an unknown trip count, or just assumes it's unknown, because otherwise we'd quit at line 838. So you probably want this to be "if (!TripCount && UP.PeelCount)", or change computePeelCount() to bail on a known trip count. Otherwise, what you may in effect be doing is enabling peeling for loops with a constant low trip count that we chose not to unroll. Which is probably not something you want to do. mkuper: The current heuristic is supposed to be that we only peel when we don't know the trip count…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions In fact, even if trip count is known, peeling can still be profitable. One of such cases is shown at https://reviews.llvm.org/D30161, where Phi's back edge input is a constant. Partial unrolling here does not give any benefits, while peeling does. My understanding is following: Peeling should not set peel count unless it has proved that peeling IS profitable. In this case nothing bad happens with small loops that we don't unroll: they will only be peeled if it is good. Maybe it is reasonable to add the following restriction to peeling: TripCount should be either unknown or above some threshold (for example, partial unrolling threshold). Does it make sense for you? mkazantsev: In fact, even if trip count is known, peeling can still be profitable. One of such cases is…
		mkuperUnsubmitted Done Reply Inline Actions Sorry, I wasn't clear. I understand that there are cases where peeling is profitable even when the trip count is known - D30161 is a really nice example. But right now, when we have profile information, we try to peel by the estimated number of iterations. That specific kind of peeling, combined with this patch, has two issues: getLoopEstimatedTripCount() returns the estimated (based on the profile) trip count. This may be imprecise, and at this point, somewhat different from the actual trip count, especially for sampling-based FDO. So you may end up peeling by the "estimated" trip-count even though you know the actual tripcount. That sounds wrong. Even if we fix getLoopEstimatedTripCount() to return the real trip count, when available, you still basically get "let's peel a loop by its real tripcount", which is equivalent to full unrolling. In theory, it shouldn't happen, because full unrolling and peeling should be using the same thresholds, so if we decided not to fully unroll, we'll decide not to "peel" either. But I'm not sure this actually holds. I'm not saying we should disable any kind of peeling when the real trip count is known. Only that we should disable the "peel by the estimated trip count" heuristic. mkuper: Sorry, I wasn't clear. I understand that there are cases where peeling is profitable even when…
		mkuperUnsubmitted Done Reply Inline Actions This patch doesn't really make sense as a stand-alone, and I'm not sure it's the right way to go as preparation to D30161 either. I would suggest something like: In this patch, invert the priorities of peeling and partial unrolling, but bail out of computePeelCount when the real trip count is known. This should be NFC. In D30161, change the logic of computePeelCount() to be something like: if conditions are right: peel by 1 else if trip count is known: bail out else: check if we should peel by estimated trip count. mkuper: This patch doesn't really make sense as a stand-alone, and I'm not sure it's the right way to…
		UP.Runtime = false;
		UP.Count = 1;
		return ExplicitUnroll;
		}

		// 5th priority is partial unrolling.
// Try partial unroll only when TripCount could be staticaly calculated.		// Try partial unroll only when TripCount could be staticaly calculated.
if (TripCount) {		if (TripCount) {
UP.Partial \|= ExplicitUnroll;		UP.Partial \|= ExplicitUnroll;
if (!UP.Partial) {		if (!UP.Partial) {
DEBUG(dbgs() << " will not try to unroll partially because "		DEBUG(dbgs() << " will not try to unroll partially because "
<< "-unroll-allow-partial not given\n");		<< "-unroll-allow-partial not given\n");
UP.Count = 0;		UP.Count = 0;
return false;		return false;
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	static bool computeUnrollCount(
if (PragmaFullUnroll)		if (PragmaFullUnroll)
ORE->emit(		ORE->emit(
OptimizationRemarkMissed(DEBUG_TYPE,		OptimizationRemarkMissed(DEBUG_TYPE,
"CantFullUnrollAsDirectedRuntimeTripCount",		"CantFullUnrollAsDirectedRuntimeTripCount",
L->getStartLoc(), L->getHeader())		L->getStartLoc(), L->getHeader())
<< "Unable to fully unroll loop as directed by unroll(full) pragma "		<< "Unable to fully unroll loop as directed by unroll(full) pragma "
"because loop has a runtime trip count.");		"because loop has a runtime trip count.");

// 5th priority is loop peeling
computePeelCount(L, LoopSize, UP);
if (UP.PeelCount) {
UP.Runtime = false;
UP.Count = 1;
return ExplicitUnroll;
}

// 6th priority is runtime unrolling.		// 6th priority is runtime unrolling.
// Don't unroll a runtime trip count loop when it is disabled.		// Don't unroll a runtime trip count loop when it is disabled.
if (HasRuntimeUnrollDisablePragma(L)) {		if (HasRuntimeUnrollDisablePragma(L)) {
UP.Count = 0;		UP.Count = 0;
return false;		return false;
}		}

// Check if the runtime trip count is too small when profile is available.		// Check if the runtime trip count is too small when profile is available.
▲ Show 20 Lines • Show All 358 Lines • Show Last 20 Lines