This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Scalar/
-
lib/
-
Transforms/
-
Scalar/
-
LoopDeletion.cpp

Differential D108848

[LoopDeletion] Separate logic in breakBackedgeIfNotTaken using symboic max trip count [nfc]
ClosedPublic

Authored by reames on Aug 27 2021, 2:40 PM.

Download Raw Diff

Details

Reviewers

nikic
fhahn
mkazantsev
lebedev.ri

Commits

rG3af8a11bc66b: [LoopDeletion] Separate logic in breakBackedgeIfNotTaken using symboic max trip…

Summary

As mentioned in D108833, the logic for figuring out if a backedge is dead was somewhat interwoven with the SCEV based logic and the symbolic eval logic. This is my attempt at making the code easier to follow.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Aug 27 2021, 2:40 PM

Herald added subscribers: javed.absar, bollu, hiraditya, mcrosier. · View Herald TranscriptAug 27 2021, 2:40 PM

reames requested review of this revision.Aug 27 2021, 2:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2021, 2:40 PM

Harbormaster completed remote builds in B121554: Diff 369191.Aug 27 2021, 4:22 PM

This looks reasonable to me, but doesn't seem to be NFC in practice. Your previous change in D108833 resulted in a minor code size decrease in CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/ArchiveCommandLine.cpp.o and CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Console/Main.cpp.o at O3, and it looks like this change restores the previous size. I strongly suspect that this is just another SCEV invalidation artifact though.

LGTM, but I'm also not sure if it's an NFC.

This revision is now accepted and ready to land.Aug 29 2021, 9:55 PM

In D108848#2970549, @nikic wrote:

This looks reasonable to me, but doesn't seem to be NFC in practice. Your previous change in D108833 resulted in a minor code size decrease in CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/ArchiveCommandLine.cpp.o and CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Console/Main.cpp.o at O3, and it looks like this change restores the previous size. I strongly suspect that this is just another SCEV invalidation artifact though.

My best guess here is that we have a case where the trip count logic is able to compute a constant maximum of zero, but for which the exact symbolic trip count is not zero. I am uncomfortable landing this with the regression noted because I don't know how broadly this triggers.

Any chance you're setup to easily capture the IR (even just the whole module) for one of these? I'd prefer not to have to get the test-suite setup again, it tends to be pretty time consuming.

I think I found the difference, see D108921.

In D108848#2972327, @reames wrote:

In D108848#2970549, @nikic wrote:

This looks reasonable to me, but doesn't seem to be NFC in practice. Your previous change in D108833 resulted in a minor code size decrease in CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Common/ArchiveCommandLine.cpp.o and CMakeFiles/7zip-benchmark.dir/CPP/7zip/UI/Console/Main.cpp.o at O3, and it looks like this change restores the previous size. I strongly suspect that this is just another SCEV invalidation artifact though.

My best guess here is that we have a case where the trip count logic is able to compute a constant maximum of zero, but for which the exact symbolic trip count is not zero. I am uncomfortable landing this with the regression noted because I don't know how broadly this triggers.

Any chance you're setup to easily capture the IR (even just the whole module) for one of these? I'd prefer not to have to get the test-suite setup again, it tends to be pretty time consuming.

Here's the unoptimized IR for both files: https://gist.github.com/nikic/3df8cd975e6a59d287dc00687be1ce27 opt -O3 reproduces the difference.

In D108848#2972459, @nikic wrote:

Here's the unoptimized IR for both files: https://gist.github.com/nikic/3df8cd975e6a59d287dc00687be1ce27 opt -O3 reproduces the difference.

Thanks. It turns out my guess didn't cover the cases seen in practice, bugpointing and reducing now.

Digging into the ArchiveCommandLine.ll case, the symbolic expression we compute (instead of the constant max zero) is (-1 + (zext i32 (1 umax (1 smin %29)) to i64))<nsw>. This comes from the howFarToZero codepath. I tried bugpointing this, and the reduced example seems to depend on overflow logic again, but the reduced example is different enough that I'm not 100% sure the unreduced case is the same. It could simply be a missing combine. Either way, we end up with another case where max == 0, and exact = <some symbolic possibly zero value>.

Given this seems to be a broader pattern, I tried tweaking the constructor of ExitLimit to set the exact trip count to zero when the max trip count is zero. Oddly, that causes regression in some Thumb2 code placement tests. I have no idea why just yet.

My proposal is to land D108921 (once reviewed), then understand the Thumb regression, fix that, and then sink the zero check into ExitLimit, then land this patch.

reames mentioned this in rG6600e1759be1: [SCEV] If max BTC is zero, then so is the exact BTC [1 of N].Aug 31 2021, 8:50 AM

reames mentioned this in rGc49503a76db5: [SCEV] Add a testcase for zero max btc with non-constant exact btc.Aug 31 2021, 11:00 AM

D109015 should address the code size differences found by Nikita. Once that has landed, I will also land this one as it's already been LGTMed.

Closed by commit rG3af8a11bc66b: [LoopDeletion] Separate logic in breakBackedgeIfNotTaken using symboic max trip… (authored by reames). · Explain WhySep 1 2021, 1:31 PM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG3af8a11bc66b: [LoopDeletion] Separate logic in breakBackedgeIfNotTaken using symboic max trip….

Not NFC?

https://bugs.llvm.org/show_bug.cgi?id=52535

In D108848#3138354, @xbolva00 wrote:

Not NFC?

https://bugs.llvm.org/show_bug.cgi?id=52535

Took a quick look at this. No obvious relation, I'd guess some type of pass ordering and/or analysis invalidation. If you can reduce me something which changes behavior with this pass, I'll take another look.

In D108848#3147643, @reames wrote:

In D108848#3138354, @xbolva00 wrote:

Not NFC?

https://bugs.llvm.org/show_bug.cgi?id=52535

Took a quick look at this. No obvious relation, I'd guess some type of pass ordering and/or analysis invalidation. If you can reduce me something which changes behavior with this pass, I'll take another look.

Looks like another potential code change has been tracked down to this commit.

For the reproducer below, loop-deletion replaces the branch in for.inc8 with an unconditional one with the patch reverted:

> for.inc8:                                         ; preds = %for.inc8.loopexit, %for.cond4thread-pre-split
31c40
<   br i1 %exitcond.not, label %cleanup, label %for.body, !llvm.loop !11
---
>   br label %cleanup

@a = internal unnamed_addr global i1 false, align 4
@b = internal unnamed_addr global i32 0, align 4

define i32 @main() {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.inc8
  %d.015 = phi i32 [ 0, %entry ], [ %inc9, %for.inc8 ]
  %.b = load i1, i1* @a, align 4
  %conv1 = select i1 %.b, i32 -128, i32 1
  %cmp2.not = icmp ugt i32 %conv1, %d.015
  br i1 %cmp2.not, label %cleanup, label %for.cond4thread-pre-split

for.cond4thread-pre-split:                        ; preds = %for.body
  %.pr = load i32, i32* @b, align 4
  %cmp514 = icmp slt i32 %.pr, 1
  br i1 %cmp514, label %for.body7, label %for.inc8

for.body7:                                        ; preds = %for.cond4thread-pre-split, %for.body7
  tail call void @foo() #2
  %0 = load i32, i32* @b, align 4
  %inc = add nsw i32 %0, 1
  store i32 %inc, i32* @b, align 4
  %cmp5 = icmp slt i32 %0, 0
  br i1 %cmp5, label %for.body7, label %for.inc8, !llvm.loop !9

for.inc8:                                         ; preds = %for.body7, %for.cond4thread-pre-split
  %inc9 = add nuw nsw i32 %d.015, 1
  %exitcond.not = icmp eq i32 %inc9, 19
  br i1 %exitcond.not, label %cleanup, label %for.body, !llvm.loop !11

cleanup:                                          ; preds = %for.body, %for.inc8
  store i1 true, i1* @a, align 4
  ret i32 0
}

declare void @foo() local_unnamed_addr #1

!9 = distinct !{!9, !10}
!10 = !{!"llvm.loop.mustprogress"}
!11 = distinct !{!11, !10}

I still don't see anything obvious, but am going to take a deeper look. If I don't find a cause for the just-reported regression, I'll revert the original change.

Well, it turns out the answer is that I'm a bloody idiot. The code is definitely wrong, not sure why I didn't notice this before.

The issue is that we use isKnownNonZero to filter the canProveOnFirstIteration call. This filtering depends on a *lower bound* of the actual trip count. My change passed an *upper bound*. That results in us failing to call canProveOnFirstIteration for cases we can prove a loop exit is taken.

Working on a fix now. May end up just reverting depending on whether the fix is reasonable clean on the newer code or not.

reames added a reverting change: rG523573e90ddb: [LoopDeletion] Revert 3af8a11 and add test coverage for breakage.Jan 17 2022, 11:44 AM

Change reverted. I added a test case to demonstrate the issue with this patch so that we don't make the same mistake in the future.

In retrospect, I clearly should have reverted this earlier. My apologies.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopDeletion.cpp

26 lines

Diff 370042

llvm/lib/Transforms/Scalar/LoopDeletion.cpp

	Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines
	breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE,			breakBackedgeIfNotTaken(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
	LoopInfo &LI, MemorySSA *MSSA,			LoopInfo &LI, MemorySSA *MSSA,
	OptimizationRemarkEmitter &ORE) {			OptimizationRemarkEmitter &ORE) {
	assert(L->isLCSSAForm(DT) && "Expected LCSSA!");			assert(L->isLCSSAForm(DT) && "Expected LCSSA!");

	if (!L->getLoopLatch())			if (!L->getLoopLatch())
	return LoopDeletionResult::Unmodified;			return LoopDeletionResult::Unmodified;

	auto *BTCMax = SE.getConstantMaxBackedgeTakenCount(L);			auto *BTC = SE.getSymbolicMaxBackedgeTakenCount(L);
	if (!BTCMax->isZero()) {			if (BTC->isZero()) {
	auto *BTC = SE.getBackedgeTakenCount(L);			// SCEV knows this backedge isn't taken!
	if (!BTC->isZero()) {			breakLoopBackedge(L, DT, SE, LI, MSSA);
	if (!isa<SCEVCouldNotCompute>(BTC) && SE.isKnownNonZero(BTC))			return LoopDeletionResult::Deleted;
	return LoopDeletionResult::Unmodified;
	if (!canProveExitOnFirstIteration(L, DT, LI))
	return LoopDeletionResult::Unmodified;
	}
	}			}

				// If SCEV leaves open the possibility of a zero trip count, see if
				// symbolically evaluating the first iteration lets us prove the backedge
				// unreachable.
				if (isa<SCEVCouldNotCompute>(BTC) \|\| !SE.isKnownNonZero(BTC))
				if (canProveExitOnFirstIteration(L, DT, LI)) {
	breakLoopBackedge(L, DT, SE, LI, MSSA);			breakLoopBackedge(L, DT, SE, LI, MSSA);
	return LoopDeletionResult::Deleted;			return LoopDeletionResult::Deleted;
	}			}

				return LoopDeletionResult::Unmodified;
				}

	/// Remove a loop if it is dead.			/// Remove a loop if it is dead.
	///			///
	/// A loop is considered dead either if it does not impact the observable			/// A loop is considered dead either if it does not impact the observable
	/// behavior of the program other than finite running time, or if it is			/// behavior of the program other than finite running time, or if it is
	/// required to make progress by an attribute such as 'mustprogress' or			/// required to make progress by an attribute such as 'mustprogress' or
	/// 'llvm.loop.mustprogress' and does not make any. This may remove			/// 'llvm.loop.mustprogress' and does not make any. This may remove
	/// infinite loops that have been required to make progress.			/// infinite loops that have been required to make progress.
	///			///
	▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines