This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
MachineBlockPlacement.cpp
-
TailDuplicator.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
block-placement.mir

Differential D27783

[MachineBlockPlacement] Don't make blocks "uneditable"
ClosedPublic

Authored by sanjoy on Dec 14 2016, 4:01 PM.

Download Raw Diff

Details

Reviewers

chandlerc
gberry
MatzeB
iteratee

Commits

rGd7389d626109: [MachineBlockPlacement] Don't make blocks "uneditable"
rL289764: [MachineBlockPlacement] Don't make blocks "uneditable"

Summary

NB: I'm new to this area, so please review with care and distrust. :)

This fixes an issue with MachineBlockPlacement due to a badly timed call
to analyzeBranch with AllowModify set to true. The timeline is as
follows:

MachineBlockPlacement::maybeTailDuplicateBlock calls TailDup.shouldTailDuplicate on its argument, which in turn calls analyzeBranch with AllowModify set to true.
This analyzeBranch call edits the terminator sequence of the block based on the physical layout of the machine function, turning an unanalyzable non-fallthrough block to a unanalyzable fallthrough block. Normally MBP bails out of rearranging such blocks, but this block was unanalyzable non-fallthrough (and thus rearrangeable) the first time MBP looked at it, and so it goes ahead and decides where it should be placed in the function.
When placing this block MBP fails to analyze and thus update the block in keeping with the new physical layout.

Concretely, before (1) we have something like:

LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1

LBL1:
  ... A

LBL2:
  ... B

In (2), analyze branch simplifies this to

LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed

LBL1:
  ... A

LBL2:
  ... B

In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.

LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump

LBL2:
  ... B

LBL1:
  ... A

and the program now has incorrect behavior (we no longer fall-through
from LBL0 to LBL1) because MBP can no longer edit LBL0.

There are several possible solutions, but I went with removing the teeth
off of the analyzeBranch calls in TailDuplicator. That makes thinking
about the result of these calls easier, and nothing in the lit test
suite broke when I did it.

I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this issue.

Diff Detail

Repository: rL LLVM

Event Timeline

sanjoy updated this revision to Diff 81491.Dec 14 2016, 4:01 PM

sanjoy retitled this revision from to [MachineBlockPlacement] Don't make blocks "uneditable".

sanjoy updated this object.

sanjoy added reviewers: iteratee, chandlerc, gberry, MatzeB.

sanjoy added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptDec 14 2016, 4:01 PM

I'm OK with the debugging, but I think it's a little over-engineered. Basically you're keeping a list of all the blocks that you think have unanalyzable fallthrough and then you want to assert if during updateTerminators you encounter a block with unanalyzable fallthrough that is not on that list. Is that correct? If so simply keeping that set and testing for membership is sufficient, and simpler.

I'm also OK removing the teeth in TailDuplicator. It bothers me that analyzeBranch has that flag at all.

My real surprise moment is that analyzeBranch would choose to modify a branch that it claims it can't analyze. Really! I'm not convinced that's correct, but I'm fine with some defensive programming in response to it.

lib/CodeGen/MachineBlockPlacement.cpp
179 ↗	(On Diff #81491)	I think this should just be a list global to the pass. Each unanalyzable fallthrough gets merged exactly once.
217 ↗	(On Diff #81491)	I don't really like this function. This is basically all blocks without unanalyzable fallthrough. I'd rather just test for membership at the point where we need it.
1632 ↗	(On Diff #81491)	Rather than polluting the merge api, BB is available right here, just mark it.

In D27783#623014, @iteratee wrote:

I'm OK with the debugging, but I think it's a little over-engineered. Basically you're keeping a list of all the blocks that you think have unanalyzable fallthrough and then you want to assert if during updateTerminators you encounter a block with unanalyzable fallthrough that is not on that list. Is that correct? If so simply keeping that set and testing for membership is sufficient, and simpler.

Sounds good -- I'll move the set to live on the MachineFunctionPass object then.

I'm also OK removing the teeth in TailDuplicator.

It bothers me that analyzeBranch has that flag at all.

Me too. :)

My real surprise moment is that analyzeBranch would choose to modify a branch that it claims it can't analyze. Really! I'm not convinced that's correct, but I'm fine with some defensive programming in response to it.

I agree that's hard to defend, and I was initially considering fixing analyzeBranch to edit a branch only if it would return false. However, it is also reasonable to also say that the transform it did here is a pretty obvious peephole transform which it _should_ be able to do without fully understanding the branch.

I think an overall better solution is to have two TII hooks, analyzeBranch and simplifyBranch. simplifyBranch can then be specified to not indicate anything about the understandability of the terminator sequence -- its job would be to do locally obvious peephole simplification without any global understanding.

review

One minor fix, but otherwise LGTM

lib/CodeGen/MachineBlockPlacement.cpp
1680 ↗	(On Diff #81500)	Can you make the assert comment here more descriptive, something like: "Unexpected block with un-analyzable fallthrough detected."

This revision is now accepted and ready to land.Dec 14 2016, 6:01 PM

Closed by commit rL289764: [MachineBlockPlacement] Don't make blocks "uneditable" (authored by sanjoy). · Explain WhyDec 14 2016, 9:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

MachineBlockPlacement.cpp

22 lines

TailDuplicator.cpp

14 lines

test/

CodeGen/

X86/

block-placement.mir

87 lines

Diff 81530

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	class MachineBlockPlacement : public MachineFunctionPass {
/// \brief Function wide BasicBlock to BlockChain mapping.		/// \brief Function wide BasicBlock to BlockChain mapping.
///		///
/// This mapping allows efficiently moving from any given basic block to the		/// This mapping allows efficiently moving from any given basic block to the
/// BlockChain it participates in, if any. We use it to, among other things,		/// BlockChain it participates in, if any. We use it to, among other things,
/// allow implicitly defining edges between chains as the existing edges		/// allow implicitly defining edges between chains as the existing edges
/// between basic blocks.		/// between basic blocks.
DenseMap<MachineBasicBlock , BlockChain > BlockToChain;		DenseMap<MachineBasicBlock , BlockChain > BlockToChain;

		#ifndef NDEBUG
		/// The set of basic blocks that have terminators that cannot be fully
		/// analyzed. These basic blocks cannot be re-ordered safely by
		/// MachineBlockPlacement, and we must preserve physical layout of these
		/// blocks and their successors through the pass.
		SmallPtrSet<MachineBasicBlock *, 4> BlocksWithUnanalyzableExits;
		#endif

/// Decrease the UnscheduledPredecessors count for all blocks in chain, and		/// Decrease the UnscheduledPredecessors count for all blocks in chain, and
/// if the count goes to 0, add them to the appropriate work list.		/// if the count goes to 0, add them to the appropriate work list.
void markChainSuccessors(BlockChain &Chain, MachineBasicBlock *LoopHeaderBB,		void markChainSuccessors(BlockChain &Chain, MachineBasicBlock *LoopHeaderBB,
const BlockFilterSet *BlockFilter = nullptr);		const BlockFilterSet *BlockFilter = nullptr);

/// Decrease the UnscheduledPredecessors count for a single block, and		/// Decrease the UnscheduledPredecessors count for a single block, and
/// if the count goes to 0, add them to the appropriate work list.		/// if the count goes to 0, add them to the appropriate work list.
void markBlockSuccessors(		void markBlockSuccessors(
▲ Show 20 Lines • Show All 1,249 Lines • ▼ Show 20 Lines	for (;;) {
MachineBasicBlock NextBB = &NextFI;		MachineBasicBlock NextBB = &NextFI;
// Ensure that the layout successor is a viable block, as we know that		// Ensure that the layout successor is a viable block, as we know that
// fallthrough is a possibility.		// fallthrough is a possibility.
assert(NextFI != FE && "Can't fallthrough past the last block.");		assert(NextFI != FE && "Can't fallthrough past the last block.");
DEBUG(dbgs() << "Pre-merging due to unanalyzable fallthrough: "		DEBUG(dbgs() << "Pre-merging due to unanalyzable fallthrough: "
<< getBlockName(BB) << " -> " << getBlockName(NextBB)		<< getBlockName(BB) << " -> " << getBlockName(NextBB)
<< "\n");		<< "\n");
Chain->merge(NextBB, nullptr);		Chain->merge(NextBB, nullptr);
		BlocksWithUnanalyzableExits.insert(&*BB);
FI = NextFI;		FI = NextFI;
BB = NextBB;		BB = NextBB;
}		}
}		}

// Turned on with OutlineOptionalBranches option		// Turned on with OutlineOptionalBranches option
collectMustExecuteBBs();		collectMustExecuteBBs();

▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (MachineBasicBlock *ChainBB : FunctionChain) {
MachineBasicBlock PrevBB = &std::prev(MachineFunction::iterator(ChainBB));		MachineBasicBlock PrevBB = &std::prev(MachineFunction::iterator(ChainBB));

// FIXME: It would be awesome of updateTerminator would just return rather		// FIXME: It would be awesome of updateTerminator would just return rather
// than assert when the branch cannot be analyzed in order to remove this		// than assert when the branch cannot be analyzed in order to remove this
// boiler plate.		// boiler plate.
Cond.clear();		Cond.clear();
MachineBasicBlock TBB = nullptr, FBB = nullptr; // For AnalyzeBranch.		MachineBasicBlock TBB = nullptr, FBB = nullptr; // For AnalyzeBranch.

		#ifndef NDEBUG
		if (!BlocksWithUnanalyzableExits.count(PrevBB)) {
		// Given the exact block placement we chose, we may actually not _need_ to
		// be able to edit PrevBB's terminator sequence, but not being _able_ to
		// do that at this point is a bug.
		assert((!TII->analyzeBranch(*PrevBB, TBB, FBB, Cond) \|\|
		!PrevBB->canFallThrough()) &&
		"Unexpected block with un-analyzable fallthrough!");
		Cond.clear();
		TBB = FBB = nullptr;
		}
		#endif

// The "PrevBB" is not yet updated to reflect current code layout, so,		// The "PrevBB" is not yet updated to reflect current code layout, so,
// o. it may fall-through to a block without explicit "goto" instruction		// o. it may fall-through to a block without explicit "goto" instruction
// before layout, and no longer fall-through it after layout; or		// before layout, and no longer fall-through it after layout; or
// o. just opposite.		// o. just opposite.
//		//
// analyzeBranch() may return erroneous value for FBB when these two		// analyzeBranch() may return erroneous value for FBB when these two
// situations take place. For the first scenario FBB is mistakenly set NULL;		// situations take place. For the first scenario FBB is mistakenly set NULL;
// for the 2nd scenario, the FBB, which is expected to be NULL, is		// for the 2nd scenario, the FBB, which is expected to be NULL, is
▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TailDuplicator.cpp

Show First 20 Lines • Show All 544 Lines • ▼ Show 20 Lines	else
MaxDuplicateCount = TailDupSize;		MaxDuplicateCount = TailDupSize;

// If the block to be duplicated ends in an unanalyzable fallthrough, don't		// If the block to be duplicated ends in an unanalyzable fallthrough, don't
// duplicate it.		// duplicate it.
// A similar check is necessary in MachineBlockPlacement to make sure pairs of		// A similar check is necessary in MachineBlockPlacement to make sure pairs of
// blocks with unanalyzable fallthrough get layed out contiguously.		// blocks with unanalyzable fallthrough get layed out contiguously.
MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;		MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;
SmallVector<MachineOperand, 4> PredCond;		SmallVector<MachineOperand, 4> PredCond;
if (TII->analyzeBranch(TailBB, PredTBB, PredFBB, PredCond, true)		if (TII->analyzeBranch(TailBB, PredTBB, PredFBB, PredCond) &&
&& TailBB.canFallThrough())		TailBB.canFallThrough())
return false;		return false;

// If the target has hardware branch prediction that can handle indirect		// If the target has hardware branch prediction that can handle indirect
// branches, duplicating them can often make them predictable when there		// branches, duplicating them can often make them predictable when there
// are common paths through the code. The limit needs to be high enough		// are common paths through the code. The limit needs to be high enough
// to allow undoing the effects of tail merging and other optimizations		// to allow undoing the effects of tail merging and other optimizations
// that rearrange the predecessors of the indirect branch.		// that rearrange the predecessors of the indirect branch.

▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines

bool TailDuplicator::canCompletelyDuplicateBB(MachineBasicBlock &BB) {		bool TailDuplicator::canCompletelyDuplicateBB(MachineBasicBlock &BB) {
for (MachineBasicBlock *PredBB : BB.predecessors()) {		for (MachineBasicBlock *PredBB : BB.predecessors()) {
if (PredBB->succ_size() > 1)		if (PredBB->succ_size() > 1)
return false;		return false;

MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;		MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;
SmallVector<MachineOperand, 4> PredCond;		SmallVector<MachineOperand, 4> PredCond;
if (TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond, true))		if (TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond))
return false;		return false;

if (!PredCond.empty())		if (!PredCond.empty())
return false;		return false;
}		}
return true;		return true;
}		}

Show All 10 Lines	for (MachineBasicBlock *PredBB : Preds) {
if (PredBB->hasEHPadSuccessor())		if (PredBB->hasEHPadSuccessor())
continue;		continue;

if (bothUsedInPHI(*PredBB, Succs))		if (bothUsedInPHI(*PredBB, Succs))
continue;		continue;

MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;		MachineBasicBlock PredTBB = nullptr, PredFBB = nullptr;
SmallVector<MachineOperand, 4> PredCond;		SmallVector<MachineOperand, 4> PredCond;
if (TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond, true))		if (TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond))
continue;		continue;

Changed = true;		Changed = true;
DEBUG(dbgs() << "\nTail-duplicating into PredBB: " << *PredBB		DEBUG(dbgs() << "\nTail-duplicating into PredBB: " << *PredBB
<< "From simple Succ: " << *TailBB);		<< "From simple Succ: " << *TailBB);

MachineBasicBlock NewTarget = TailBB->succ_begin();		MachineBasicBlock NewTarget = TailBB->succ_begin();
MachineBasicBlock *NextBB = PredBB->getNextNode();		MachineBasicBlock *NextBB = PredBB->getNextNode();
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
bool TailDuplicator::canTailDuplicate(MachineBasicBlock *TailBB,		bool TailDuplicator::canTailDuplicate(MachineBasicBlock *TailBB,
MachineBasicBlock *PredBB) {		MachineBasicBlock *PredBB) {
// EH edges are ignored by analyzeBranch.		// EH edges are ignored by analyzeBranch.
if (PredBB->succ_size() > 1)		if (PredBB->succ_size() > 1)
return false;		return false;

MachineBasicBlock PredTBB, PredFBB;		MachineBasicBlock PredTBB, PredFBB;
SmallVector<MachineOperand, 4> PredCond;		SmallVector<MachineOperand, 4> PredCond;
if (TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond, true))		if (TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond))
return false;		return false;
if (!PredCond.empty())		if (!PredCond.empty())
return false;		return false;
return true;		return true;
}		}

/// If it is profitable, duplicate TailBB's contents in each		/// If it is profitable, duplicate TailBB's contents in each
/// of its predecessors.		/// of its predecessors.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	while (I != TailBB->instr_end()) {
duplicateInstruction(MI, TailBB, PredBB, LocalVRMap, UsedByPhi);		duplicateInstruction(MI, TailBB, PredBB, LocalVRMap, UsedByPhi);
}		}
}		}
appendCopies(PredBB, CopyInfos, Copies);		appendCopies(PredBB, CopyInfos, Copies);

// Simplify		// Simplify
MachineBasicBlock PredTBB, PredFBB;		MachineBasicBlock PredTBB, PredFBB;
SmallVector<MachineOperand, 4> PredCond;		SmallVector<MachineOperand, 4> PredCond;
TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond, true);		TII->analyzeBranch(*PredBB, PredTBB, PredFBB, PredCond);

NumTailDupAdded += TailBB->size() - 1; // subtract one for removed branch		NumTailDupAdded += TailBB->size() - 1; // subtract one for removed branch

// Update the CFG.		// Update the CFG.
PredBB->removeSuccessor(PredBB->succ_begin());		PredBB->removeSuccessor(PredBB->succ_begin());
assert(PredBB->succ_empty() &&		assert(PredBB->succ_empty() &&
"TailDuplicate called on block with multiple successors!");		"TailDuplicate called on block with multiple successors!");
for (MachineBasicBlock *Succ : TailBB->successors())		for (MachineBasicBlock *Succ : TailBB->successors())
Show All 11 Lines	if (!PrevBB)
PrevBB = &*std::prev(TailBB->getIterator());		PrevBB = &*std::prev(TailBB->getIterator());
MachineBasicBlock PriorTBB = nullptr, PriorFBB = nullptr;		MachineBasicBlock PriorTBB = nullptr, PriorFBB = nullptr;
SmallVector<MachineOperand, 4> PriorCond;		SmallVector<MachineOperand, 4> PriorCond;
// This has to check PrevBB->succ_size() because EH edges are ignored by		// This has to check PrevBB->succ_size() because EH edges are ignored by
// analyzeBranch.		// analyzeBranch.
if (PrevBB->succ_size() == 1 &&		if (PrevBB->succ_size() == 1 &&
// Layout preds are not always CFG preds. Check.		// Layout preds are not always CFG preds. Check.
*PrevBB->succ_begin() == TailBB &&		*PrevBB->succ_begin() == TailBB &&
!TII->analyzeBranch(*PrevBB, PriorTBB, PriorFBB, PriorCond, true) &&		!TII->analyzeBranch(*PrevBB, PriorTBB, PriorFBB, PriorCond) &&
PriorCond.empty() &&		PriorCond.empty() &&
(!PriorTBB \|\| PriorTBB == TailBB) &&		(!PriorTBB \|\| PriorTBB == TailBB) &&
TailBB->pred_size() == 1 &&		TailBB->pred_size() == 1 &&
!TailBB->hasAddressTaken()) {		!TailBB->hasAddressTaken()) {
DEBUG(dbgs() << "\nMerging into block: " << *PrevBB		DEBUG(dbgs() << "\nMerging into block: " << *PrevBB
<< "From MBB: " << *TailBB);		<< "From MBB: " << *TailBB);
// There may be a branch to the layout successor. This is unlikely but it		// There may be a branch to the layout successor. This is unlikely but it
// happens. The correct thing to do is to remove the branch before		// happens. The correct thing to do is to remove the branch before
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/block-placement.mir

				# RUN: llc -O3 -run-pass=block-placement -o - %s \| FileCheck %s

				--- \|
				; ModuleID = 'test.ll'
				source_filename = "test.ll"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				declare void @stub(i32*)

				define i32 @f(i32* %ptr, i1 %cond) {
				entry:
				br i1 %cond, label %left, label %right

				left: ; preds = %entry
				%is_null = icmp eq i32* %ptr, null
				br i1 %is_null, label %null, label %not_null, !prof !0, !make.implicit !1

				not_null: ; preds = %left
				%val = load i32, i32* %ptr
				ret i32 %val

				null: ; preds = %left
				call void @stub(i32* %ptr)
				unreachable

				right: ; preds = %entry
				call void @stub(i32* null)
				unreachable
				}

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #0

				attributes #0 = { nounwind }

				!0 = !{!"branch_weights", i32 1048575, i32 1}
				!1 = !{}

				...
				---
				# CHECK: name: f
				name: f
				alignment: 4
				tracksRegLiveness: true
				liveins:
				- { reg: '%rdi' }
				- { reg: '%esi' }

				# CHECK: %eax = FAULTING_LOAD_OP %bb.3.null, 1684, killed %rdi, 1, _, 0, _ :: (load 4 from %ir.ptr)
				# CHECK-NEXT: JMP_1 %bb.2.not_null
				# CHECK: bb.3.null:
				# CHECK: bb.4.right:
				# CHECK: bb.2.not_null:

				body: \|
				bb.0.entry:
				successors: %bb.1.left(0x7ffff800), %bb.3.right(0x00000800)
				liveins: %esi, %rdi

				frame-setup PUSH64r undef %rax, implicit-def %rsp, implicit %rsp
				CFI_INSTRUCTION def_cfa_offset 16
				TEST8ri %sil, 1, implicit-def %eflags, implicit killed %esi
				JE_1 %bb.3.right, implicit killed %eflags

				bb.1.left:
				successors: %bb.2.null(0x7ffff800), %bb.4.not_null(0x00000800)
				liveins: %rdi

				%eax = FAULTING_LOAD_OP %bb.2.null, 1684, killed %rdi, 1, _, 0, _ :: (load 4 from %ir.ptr)
				JMP_1 %bb.4.not_null

				bb.4.not_null:
				liveins: %rdi, %eax

				%rcx = POP64r implicit-def %rsp, implicit %rsp
				RETQ %eax

				bb.2.null:
				liveins: %rdi

				CALL64pcrel32 @stub, csr_64, implicit %rsp, implicit %rdi, implicit-def %rsp

				bb.3.right:
				dead %edi = XOR32rr undef %edi, undef %edi, implicit-def dead %eflags, implicit-def %rdi
				CALL64pcrel32 @stub, csr_64, implicit %rsp, implicit %rdi, implicit-def %rsp

				...