This is an archive of the discontinued LLVM Phabricator instance.

During PHI elimination, split critical edges that move copies out of loops
ClosedPublic

Authored by djasper on Mar 2 2015, 2:10 PM.

Download Raw Diff

Details

Reviewers

qcolombet
chandlerc

Summary

This prevents the behavior observed in llvm.org/PR22369. I am not sure whether I am reading the code correctly, but the early exit based on isLiveOutPastPHIs() seems like a bug as it prevents the nice loop-splitting logic below.

This is still work in progress as it currently breaks a few tests. Mainly, it seems to influence which copies are local and which are non-local, which influences the order in which the RegisterCoalescer visits copies. Coalescing the wrong registers first has unwanted side-effects.

Diff Detail

Event Timeline

djasper updated this revision to Diff 21037.Mar 2 2015, 2:10 PM

djasper retitled this revision from to During PHI elimination, split critical edges that move copies out of loops.

djasper updated this object.

djasper edited the test plan for this revision. (Show Details)

djasper added reviewers: chandlerc, qcolombet.

djasper added a subscriber: Unknown Object (MLST).

Hi Daniel,

This is still work in progress as it currently breaks a few tests. Mainly, it seems to influence which copies are local and which are non-local, which influences the order in which the RegisterCoalescer visits copies. Coalescing the wrong registers first has unwanted side-effects.

What does this mean for the livability of this patch?

Thanks,
-Quentin

Not sure what that means. I think the changes here by itself are good, but they lead to worse behavior in later passes, specifically register coalescing. I'll take a look at whether the register coalescer heuristic can be tweaked in a simple way to avoid this regression.

My concern was this:

it currently breaks a few tests

What do you mean by "break"?

From your last comment, I believe we generate worse code.
If that the case, then just put a flag to preserve the previous behavior and file PR for the regression you find so that we can work on fixing them.
Other than that, the change itself looks good to me.

Thanks,
-Quentin

Hide new behavior behind flag.
Add test.

I have hidden the new behavior behind a flag and added a test.

With the flag enabled by default, these four tests break:

LLVM :: CodeGen/Hexagon/hwloop-cleanup.ll

triggers an assertion.

LLVM :: CodeGen/X86/coalescer-commute4.ll
LLVM :: CodeGen/X86/phys_subreg_coalesce-2.ll
LLVM :: CodeGen/X86/zlib-longest-match.ll

generate worse code. Likely these need to be solved by improving the register allocator.

The change itself LGTM (but nitpick below).

As mentioned in by you on IRC we perform some unlucky choices in the coalescing order in test/CodeGen/X86/coalescer-commute4.ll with this patch. Are there more? If it's just some tests like this failing but benchmarks generally improving, then it's okay to XFAIL the tests, or trying to rewrite them in a way that we are lucky with the heuristic. If benchmarks generally regress with the changes, then we need further research on how to avoid that...

lib/CodeGen/PHIElimination.cpp
583–588	No need to keep the old code in a comment, it can always be found in the subversion log.

In D8016#133116, @MatzeB wrote:

The change itself LGTM (but nitpick below).

As mentioned in by you on IRC we perform some unlucky choices in the coalescing order in test/CodeGen/X86/coalescer-commute4.ll with this patch. Are there more? If it's just some tests like this failing but benchmarks generally improving, then it's okay to XFAIL the tests, or trying to rewrite them in a way that we are lucky with the heuristic. If benchmarks generally regress with the changes, then we need further research on how to avoid that...

You answered to most of my remarks while I was writing this :)

The old code is already removed in the latest version of the patch.

I think the "unlucky" ordering happens in three tests as per my previous comment and it also regresses benchmarks in our internal suite, although I haven't looked closely at which ones. Thus, I have guarded the new behavior by a flag now as Quentin suggested. I'll then try to fix the unlucky behavior and see whether that also improves benchmarks.

Thanks Daniel for the flag.

We could switch the default value as soon as we fix the assertion. In the meantime, please file PRs for the regressions so that we can help resolving those.

LGTM.

Cheers,
-Quentin

This revision is now accepted and ready to land.Mar 2 2015, 4:49 PM

Submitted as r231064.

Also filed:

llvm.org/PR22767
llvm.org/PR22768

Revision Contents

Path

Size

lib/

CodeGen/

PHIElimination.cpp

20 lines

test/

CodeGen/

X86/

phielim-split.ll

39 lines

Diff 21064

lib/CodeGen/PHIElimination.cpp

Show All 40 Lines	DisableEdgeSplitting("disable-phi-elim-edge-splitting", cl::init(false),
cl::Hidden, cl::desc("Disable critical edge splitting "		cl::Hidden, cl::desc("Disable critical edge splitting "
"during PHI elimination"));		"during PHI elimination"));

static cl::opt<bool>		static cl::opt<bool>
SplitAllCriticalEdges("phi-elim-split-all-critical-edges", cl::init(false),		SplitAllCriticalEdges("phi-elim-split-all-critical-edges", cl::init(false),
cl::Hidden, cl::desc("Split all critical edges during "		cl::Hidden, cl::desc("Split all critical edges during "
"PHI elimination"));		"PHI elimination"));

		static cl::opt<bool> NoPhiElimLiveOutEarlyExit(
		"no-phi-elim-live-out-early-exit", cl::init(false), cl::Hidden,
		cl::desc("Do not use an early exit if isLiveOutPastPHIs returns true."));

namespace {		namespace {
class PHIElimination : public MachineFunctionPass {		class PHIElimination : public MachineFunctionPass {
MachineRegisterInfo *MRI; // Machine register information		MachineRegisterInfo *MRI; // Machine register information
LiveVariables *LV;		LiveVariables *LV;
LiveIntervals *LIS;		LiveIntervals *LIS;

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
▲ Show 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	for (unsigned i = 1, e = BBI->getNumOperands(); i != e; i += 2) {
continue;		continue;

// LV doesn't consider a phi use live-out, so isLiveOut only returns true		// LV doesn't consider a phi use live-out, so isLiveOut only returns true
// when the source register is live-out for some other reason than a phi		// when the source register is live-out for some other reason than a phi
// use. That means the copy we will insert in PreMBB won't be a kill, and		// use. That means the copy we will insert in PreMBB won't be a kill, and
// there is a risk it may not be coalesced away.		// there is a risk it may not be coalesced away.
//		//
// If the copy would be a kill, there is no need to split the edge.		// If the copy would be a kill, there is no need to split the edge.
if (!isLiveOutPastPHIs(Reg, PreMBB) && !SplitAllCriticalEdges)		bool ShouldSplit = isLiveOutPastPHIs(Reg, PreMBB);
		if (!ShouldSplit && !NoPhiElimLiveOutEarlyExit)
continue;		continue;
		if (ShouldSplit) {
DEBUG(dbgs() << PrintReg(Reg) << " live-out before critical edge BB#"		DEBUG(dbgs() << PrintReg(Reg) << " live-out before critical edge BB#"
<< PreMBB->getNumber() << " -> BB#" << MBB.getNumber()		<< PreMBB->getNumber() << " -> BB#" << MBB.getNumber()
<< ": " << *BBI);		<< ": " << *BBI);
		}

		MatzeBUnsubmitted Not Done Reply Inline Actions No need to keep the old code in a comment, it can always be found in the subversion log. MatzeB: No need to keep the old code in a comment, it can always be found in the subversion log.
// If Reg is not live-in to MBB, it means it must be live-in to some		// If Reg is not live-in to MBB, it means it must be live-in to some
// other PreMBB successor, and we can avoid the interference by splitting		// other PreMBB successor, and we can avoid the interference by splitting
// the edge.		// the edge.
//		//
// If Reg is live-in to MBB, the interference is inevitable and a copy		// If Reg is live-in to MBB, the interference is inevitable and a copy
// is likely to be left after coalescing. If we are looking at a loop		// is likely to be left after coalescing. If we are looking at a loop
// exiting edge, split it so we won't insert code in the loop, otherwise		// exiting edge, split it so we won't insert code in the loop, otherwise
// don't bother.		// don't bother.
bool ShouldSplit = !isLiveIn(Reg, &MBB) \|\| SplitAllCriticalEdges;		ShouldSplit = ShouldSplit && !isLiveIn(Reg, &MBB);

// Check for a loop exiting edge.		// Check for a loop exiting edge.
if (!ShouldSplit && CurLoop != PreLoop) {		if (!ShouldSplit && CurLoop != PreLoop) {
DEBUG({		DEBUG({
dbgs() << "Split wouldn't help, maybe avoid loop copies?\n";		dbgs() << "Split wouldn't help, maybe avoid loop copies?\n";
if (PreLoop) dbgs() << "PreLoop: " << *PreLoop;		if (PreLoop) dbgs() << "PreLoop: " << *PreLoop;
if (CurLoop) dbgs() << "CurLoop: " << *CurLoop;		if (CurLoop) dbgs() << "CurLoop: " << *CurLoop;
});		});
// This edge could be entering a loop, exiting a loop, or it could be		// This edge could be entering a loop, exiting a loop, or it could be
// both: Jumping directly form one loop to the header of a sibling		// both: Jumping directly form one loop to the header of a sibling
// loop.		// loop.
// Split unless this edge is entering CurLoop from an outer loop.		// Split unless this edge is entering CurLoop from an outer loop.
ShouldSplit = PreLoop && !PreLoop->contains(CurLoop);		ShouldSplit = PreLoop && !PreLoop->contains(CurLoop);
}		}
if (!ShouldSplit)		if (!ShouldSplit && !SplitAllCriticalEdges)
continue;		continue;
if (!PreMBB->SplitCriticalEdge(&MBB, this)) {		if (!PreMBB->SplitCriticalEdge(&MBB, this)) {
DEBUG(dbgs() << "Failed to split critical edge.\n");		DEBUG(dbgs() << "Failed to split critical edge.\n");
continue;		continue;
}		}
Changed = true;		Changed = true;
++NumCriticalEdgesSplit;		++NumCriticalEdgesSplit;
}		}
Show All 33 Lines

test/CodeGen/X86/phielim-split.ll

; RUN: llc < %s -verify-machineinstrs \| FileCheck %s		; RUN: llc < %s -verify-machineinstrs -no-phi-elim-live-out-early-exit \| FileCheck %s
target triple = "x86_64-apple-macosx10.8.0"		target triple = "x86_64-apple-macosx10.8.0"

; The critical edge from for.cond to if.end2 should be split to avoid injecting		; The critical edge from for.cond to if.end2 should be split to avoid injecting
; copies into the loop. The use of %b after the loop causes interference that		; copies into the loop. The use of %b after the loop causes interference that
; makes a copy necessary.		; makes a copy necessary.
; <rdar://problem/11561842>		; <rdar://problem/11561842>
;		;
; CHECK: split_loop_exit		; CHECK: split_loop_exit
Show All 13 Lines	for.cond: ; preds = %entry, %for.cond
%tobool = icmp eq i8 %0, 0		%tobool = icmp eq i8 %0, 0
br i1 %tobool, label %for.cond, label %if.end2		br i1 %tobool, label %for.cond, label %if.end2

if.end2: ; preds = %for.cond, %entry		if.end2: ; preds = %for.cond, %entry
%r.0 = phi i32 [ %a, %entry ], [ %b, %for.cond ]		%r.0 = phi i32 [ %a, %entry ], [ %b, %for.cond ]
%add = add nsw i32 %r.0, %b		%add = add nsw i32 %r.0, %b
ret i32 %add		ret i32 %add
}		}

		; CHECK: split_live_out
		; CHECK: %while.body
		; CHECK: cmp
		; CHECK-NEXT: ja
		define i8* @split_live_out(i32 %value, i8* %target) nounwind uwtable readonly ssp {
		entry:
		%cmp10 = icmp ugt i32 %value, 127
		br i1 %cmp10, label %while.body.preheader, label %while.end

		while.body.preheader: ; preds = %entry
		br label %while.body

		while.body: ; preds = %while.body.preheader, %while.body
		%target.addr.012 = phi i8* [ %incdec.ptr, %while.body ], [ %target, %while.body.preheader ]
		%value.addr.011 = phi i32 [ %shr, %while.body ], [ %value, %while.body.preheader ]
		%or = or i32 %value.addr.011, 128
		%conv = trunc i32 %or to i8
		store i8 %conv, i8* %target.addr.012, align 1
		%shr = lshr i32 %value.addr.011, 7
		%incdec.ptr = getelementptr inbounds i8, i8* %target.addr.012, i64 1
		%cmp = icmp ugt i32 %value.addr.011, 16383
		br i1 %cmp, label %while.body, label %while.end.loopexit

		while.end.loopexit: ; preds = %while.body
		%incdec.ptr.lcssa = phi i8* [ %incdec.ptr, %while.body ]
		%shr.lcssa = phi i32 [ %shr, %while.body ]
		br label %while.end

		while.end: ; preds = %while.end.loopexit, %entry
		%target.addr.0.lcssa = phi i8* [ %target, %entry ], [ %incdec.ptr.lcssa, %while.end.loopexit ]
		%value.addr.0.lcssa = phi i32 [ %value, %entry ], [ %shr.lcssa, %while.end.loopexit ]
		%conv1 = trunc i32 %value.addr.0.lcssa to i8
		store i8 %conv1, i8* %target.addr.0.lcssa, align 1
		%incdec.ptr3 = getelementptr inbounds i8, i8* %target.addr.0.lcssa, i64 1
		ret i8* %incdec.ptr3
		}