This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/1
LoopStrengthReduce.cpp
-
test/Transforms/LoopStrengthReduce/
-
Transforms/
-
LoopStrengthReduce/
-
X86/
-
nested-loop.ll
-
dont_move_nowrap_op_before_test.ll

Differential D91562

[LSR] Drop poison flags for post-inc IVs (PR46943)
Needs ReviewPublic

Authored by pnkfelix on Nov 16 2020, 1:09 PM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

Includes generalized regression test that puts icmp and cond br in distinct
blocks, and an update to an existing test to reflect correction to LSR
transformation.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	440 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
	60 ms	linux > SanitizerCommon-Unit._/Sanitizer-x86_64-Test::SanitizerCommon.StackDepotPrint

Event Timeline

pnkfelix created this revision.Nov 16 2020, 1:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2020, 1:09 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

pnkfelix requested review of this revision.Nov 16 2020, 1:09 PM

jrmuizel added a subscriber: jrmuizel.Nov 16 2020, 1:24 PM

Harbormaster completed remote builds in B78998: Diff 305580.Nov 16 2020, 1:42 PM

pnkfelix added inline comments.Nov 16 2020, 1:51 PM

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
2401	thanks lint! (old habits die hard.) will fix.

nikic retitled this revision from Fix for LLVM bug 46943. to [LSR] Drop poison flags for post-inc IVs (PR46943).Nov 17 2020, 11:48 AM

The approach here doesn't look right to me. Poison flags have no relation to the position where a certain instruction is executed. What's relevant is where the instruction is used.

This transform adds a new poison-is-UB user to the post-inc IV. Whereever that code is, should be the place that drops poison flags. The code is pretty convoluted though, so I didn't immediately spot where that happens.

In D91562#2400598, @nikic wrote:

The approach here doesn't look right to me. Poison flags have no relation to the position where a certain instruction is executed. What's relevant is where the instruction is used.

This transform adds a new poison-is-UB user to the post-inc IV. Whereever that code is, should be the place that drops poison flags. The code is pretty convoluted though, so I didn't immediately spot where that happens.

The main instance I know of of a new poison-is-UB user is the TermBr itself that will immediately succeed Cond after the transformation. Based on what the transformation is doing, I guess *all* users of the Cond are potentially poison-is-UB users.

That is, for the specific bug that I'm recreating: (1.) the addition now happens before the Cond instead of after (and still overflows; it always overflowed), (2.) the poison value flows into the Cond, yielding poison, and (3.) the poison from the Cond flows into the TermBr, yielding UB.

I considered starting from the TermBr, and tracing backwards to find every operation that flows into its computation. But I was concerned that would be overly conservative and potentially introduce regressions. I considered the change I posted to be the simplest change I could make that would still fix the bug.

I suppose I could try to trace the value-flow backwards from the TermBr, but only remove the nowrap flags from the instructions that are *dominated* by the Cond... would that be preferable to you, @nikic ?

aqjune added a subscriber: aqjune.Nov 27 2020, 2:14 PM

That is, for the specific bug that I'm recreating: (1.) the addition now happens before the Cond instead of after (and still overflows; it always overflowed), (2.) the poison value flows into the Cond, yielding poison, and (3.) the poison from the Cond flows into the TermBr, yielding UB.

To be explicit here: The issue is not whether the addition happens before or after the comparison, in fact you can freely interchange these instructions, as they have no sequencing constraint. I expect that if you swap x.3 and x.4 in your test case, then your fix will no longer work, even though dropping nuw is equally necessary in that case. The problematic part is the new user of the postinc IV.

I would expect the patch for this to look closer to something like this: https://gist.github.com/nikic/7e1301a81542d95953a44243c8ee5b5e That is, drop the poison flags when SCEV expansion reuses a postinc value in some form, on the assumption that the new user might have UB-on-poison semantics.

The programUndefinedIfPoison() check there is supposed to avoid dropping poison flags if there is an existing UB-on-poison user, in particular if the loop is already in postinc form. Unfortunately this doesn't actually work right now, because getGuaranteedNonPoisonOps() doesn't take into account branch instructions :/

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

81 lines

test/

Transforms/

LoopStrengthReduce/

X86/

nested-loop.ll

2 lines

dont_move_nowrap_op_before_test.ll

41 lines

Diff 305580

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 2,344 Lines • ▼ Show 20 Lines	ICmpInst LSRInstance::OptimizeMax(ICmpInst Cond, IVStrideUse* &CondUse) {
Instruction *Cmp = cast<Instruction>(Sel->getOperand(0));		Instruction *Cmp = cast<Instruction>(Sel->getOperand(0));
Cond->eraseFromParent();		Cond->eraseFromParent();
Sel->eraseFromParent();		Sel->eraseFromParent();
if (Cmp->use_empty())		if (Cmp->use_empty())
Cmp->eraseFromParent();		Cmp->eraseFromParent();
return NewCond;		return NewCond;
}		}

		// Remove any nowrap flags that occur on instructions in the control flow
		// between current position of Cond and its planned new position.
		//
		// This is to account for situations where an increment that occurs after the
		// Cond was allowed to wrap when its (poison) value could not possibly flow into
		// the Cond, but with the Cond's new position after LSR, the poison does flow
		// into the Cond, yielding undefined behavior.
		void removeNoWrapBetweenCondAndTermBr(ICmpInst Cond, BranchInst TermBr) {
		// In the common case, the Cond and the TermBr will be in the same block...
		auto FinalIP = BasicBlock::iterator(TermBr);
		bool SameBlock = true;
		// ... but this does not hold in general
		if (Cond->getParent() != TermBr->getParent()) {
		// initial processing instead stops with the Cond block's end instruction,
		// and we'll deal with the rest of the instructions below.
		FinalIP = Cond->getParent()->end();
		SameBlock = false;
		}

		auto InterimIP = BasicBlock::iterator(Cond);
		++InterimIP;
		while (InterimIP != FinalIP) {
		Instruction InterimInst = &InterimIP;
		LLVM_DEBUG(dbgs() << " Will be moving Cond: "; Cond->print(dbgs());
		dbgs() << " past InterimInst: "; InterimInst->print(dbgs());
		dbgs() << " so, drop its poison generating flags.\n");
		InterimInst->dropPoisonGeneratingFlags();
		++InterimIP;
		}

		// Easy and common case: We're done.
		if (SameBlock)
		return;

		// Otherwise: traverse successors to find remaining instructions between Cond
		// and TermBr.
		SmallVector<BasicBlock *, 4> Worklist;
		SmallPtrSet<BasicBlock *, 4> Visited;
		for (BasicBlock *Succ : successors(Cond->getParent())) {
		Visited.insert(Succ);
		Worklist.push_back(Succ);
		}

		while (!Worklist.empty()) {
		BasicBlock *BB = Worklist.pop_back_val();

		// For every BB that doesn't have TermBr, we'll need to keep processing its
		// successors.
		bool add_successors = true;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'add_successors' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'add_successors' [readability-identifier…
		pnkfelixAuthorUnsubmitted Done Reply Inline Actions thanks lint! (old habits die hard.) will fix. pnkfelix: thanks lint! (old habits die hard.) will fix.

		for (Instruction &InterimInst : *BB) {
		if (&InterimInst != TermBr) {
		LLVM_DEBUG(dbgs() << " Will be moving Cond: "; Cond->print(dbgs());
		dbgs() << " past InterimInst: "; InterimInst.print(dbgs());
		dbgs() << " so, drop its poison generating flags.\n");
		InterimInst.dropPoisonGeneratingFlags();
		} else {
		// found the terminating condition; no need to process remainder of
		// this block (nor its successors). But we may have other blocks that
		// do still need processing in our worklist, so cannot just return here.
		add_successors = false;
		break;
		}
		}

		if (add_successors) {
		for (BasicBlock *Succ : successors(BB)) {
		// Don't process the same BB twice
		if (!Visited.insert(Succ).second)
		continue;
		Worklist.push_back(Succ);
		}
		}
		}
		}

/// Change loop terminating condition to use the postinc iv when possible.		/// Change loop terminating condition to use the postinc iv when possible.
void		void
LSRInstance::OptimizeLoopTermCond() {		LSRInstance::OptimizeLoopTermCond() {
SmallPtrSet<Instruction *, 4> PostIncs;		SmallPtrSet<Instruction *, 4> PostIncs;

// We need a different set of heuristics for rotated and non-rotated loops.		// We need a different set of heuristics for rotated and non-rotated loops.
// If a loop is rotated then the latch is also the backedge, so inserting		// If a loop is rotated then the latch is also the backedge, so inserting
// post-inc expressions just before the latch is ideal. To reduce live ranges		// post-inc expressions just before the latch is ideal. To reduce live ranges
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	for (BasicBlock *ExitingBlock : ExitingBlocks) {

LLVM_DEBUG(dbgs() << " Change loop exiting icmp to use postinc iv: "		LLVM_DEBUG(dbgs() << " Change loop exiting icmp to use postinc iv: "
<< *Cond << '\n');		<< *Cond << '\n');

// It's possible for the setcc instruction to be anywhere in the loop, and		// It's possible for the setcc instruction to be anywhere in the loop, and
// possible for it to have multiple users. If it is not immediately before		// possible for it to have multiple users. If it is not immediately before
// the exiting block branch, move it.		// the exiting block branch, move it.
if (&*++BasicBlock::iterator(Cond) != TermBr) {		if (&*++BasicBlock::iterator(Cond) != TermBr) {

		// Remove any nowrap flags that occur on instructions in the control flow
		// between current position of Cond and its planned new position.
		removeNoWrapBetweenCondAndTermBr(Cond, TermBr);

if (Cond->hasOneUse()) {		if (Cond->hasOneUse()) {
Cond->moveBefore(TermBr);		Cond->moveBefore(TermBr);
} else {		} else {
// Clone the terminating condition and insert into the loopend.		// Clone the terminating condition and insert into the loopend.
ICmpInst *OldCond = Cond;		ICmpInst *OldCond = Cond;
Cond = cast<ICmpInst>(Cond->clone());		Cond = cast<ICmpInst>(Cond->clone());
Cond->setName(L->getHeader()->getName() + ".termcond");		Cond->setName(L->getHeader()->getName() + ".termcond");
ExitingBlock->getInstList().insert(TermBr->getIterator(), Cond);		ExitingBlock->getInstList().insert(TermBr->getIterator(), Cond);
▲ Show 20 Lines • Show All 3,437 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/X86/nested-loop.ll

	Show All 34 Lines
	; CHECK-NEXT: [[LSR_IV_NEXT]] = add i64 [[LSR_IV]], -1			; CHECK-NEXT: [[LSR_IV_NEXT]] = add i64 [[LSR_IV]], -1
	; CHECK-NEXT: [[SCEVGEP]] = getelementptr i8, i8* [[LSR_IV3]], i64 1			; CHECK-NEXT: [[SCEVGEP]] = getelementptr i8, i8* [[LSR_IV3]], i64 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[LSR_IV_NEXT]], 0			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[LSR_IV_NEXT]], 0
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY2]], label [[FOR_INC_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY2]], label [[FOR_INC_LOOPEXIT:%.*]]
	; CHECK: for.inc.loopexit:			; CHECK: for.inc.loopexit:
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[INDVARS_IV_NEXT3]] = add nuw nsw i64 [[INDVARS_IV2]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT3]] = add nuw nsw i64 [[INDVARS_IV2]], 1
	; CHECK-NEXT: [[LSR_IV_NEXT2]] = add nuw nsw i64 [[LSR_IV1]], [[T0]]			; CHECK-NEXT: [[LSR_IV_NEXT2]] = add i64 [[LSR_IV1]], [[T0]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT3]], [[T1]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT3]], [[T1]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%cmp215 = icmp sgt i32 %size, 1			%cmp215 = icmp sgt i32 %size, 1
	%t0 = zext i32 %size to i64			%t0 = zext i32 %size to i64
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopStrengthReduce/dont_move_nowrap_op_before_test.ll

This file was added.

				; RUN: opt < %s -loop-reduce -S \| FileCheck %s

				; If the output of a no-wrapping operation flows into the loop test expression,
				; then we cannot move the loop test expression after that operation without also
				; removing the no-wrap flag.

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				declare void @dummy(i8 zeroext)

				define i8 @test() {
				br label %loop.1

				; If loop test is moved, then no-wrap must be removed.
				; (also double-checks loop test actually moved, to avoid false negative).

				; CHECK: %x.2 = phi i8 [ 0, %0 ], [ %x.5, %loop.2 ]
				; CHECK-NEXT: call void @dummy(i8 %x.2)
				; CHECK-NEXT: %x.4 = add i8 %x.2, 1
				; CHECK: %x.5 = add i8 %x.4, 0
				; CHECK-NEXT: %x.3 = icmp eq i8 %x.5, 0

				loop.1:
				%x.2 = phi i8 [ 0, %0 ], [ %x.5, %loop.2 ]
				call void @dummy(i8 %x.2)

				%x.3 = icmp eq i8 %x.2, -1
				; after final loop test, this produces (unused) poison ...
				%x.4 = add nuw i8 %x.2, 1
				; (in general, the icmp and cond br may be in distinct blocks)
				br label %loop.2

				loop.2:
				%x.5 = add i8 %x.4, 0
				; ... but when test is moved here by LSR, poison becomes used.
				br i1 %x.3, label %done, label %loop.1

				done:
				ret i8 %x.2
				}