This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
BranchFolding.h
-
BranchFolding.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
tail-merge-debugloc.ll

Differential D30226

[BranchFolding] Merge debug locations from common tail instead of removing
ClosedPublic

Authored by twoh on Feb 21 2017, 2:39 PM.

Download Raw Diff

Details

Reviewers

MatzeB
rob.lougher
probinson
aprantl

Commits

rGfb1833efeb14: [BranchFolding] Merge debug locations from common tail instead of removing
rL297805: [BranchFolding] Merge debug locations from common tail instead of removing

Summary

D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.

Diff Detail

Repository: rL LLVM

Event Timeline

twoh created this revision.Feb 21 2017, 2:39 PM

However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them.

Out of curiosity, did you see this happening in real code?

In general, I am not against the idea of using getMergedLocation . However, as far as I understand, in practice you can only end up in a situation where common code has the same debug location as a result of macro expansion, and if we didn't run pass AddDiscriminators.
When building for PGO, we would end up running pass AddDiscriminators, and instructions from different basic blocks would end up having different discriminators.

Yes I observed the case from refresh_potential function in spec2006 429.mcf. The original C code snippet is below:

 81     while( node != root )
 82     {
 83         while( node )
 84         {
 85             if( node->orientation == UP )
 86                 node->potential = node->basic_arc->cost + node->pred->potential;
 87             else /* == DOWN */
 88             {
 89                 node->potential = node->pred->potential - node->basic_arc->cost;
 90                 checksum++;
 91             }
...

Before Control Flow Optimizer pass (which runs BranchFolder pass), instructions for line 83 are located in two places, one in the loop preheader and the other in the loop body:

BB#4: derived from LLVM BB %while.cond3.preheader
  TEST64rr %RDX, %RDX, %EFLAGS<imp-def>; dbg:mcfutil.c:83:9
  JE_1 <BB#5>, %EFLAGS<imp-use>; dbg:mcfutil.c:83:9
  JMP_1 <BB#6>; dbg:mcfutil.c:83:9

...

BB#6: derived from LLVM BB %while.body4
...
  TEST64rr %RDX, %RDX, %EFLAGS<imp-def>; dbg:mcfutil.c:83:9
  JE_1 <BB#5>, %EFLAGS<imp-use>; dbg:mcfutil.c:83:9
  JMP_1 <BB#6>; dbg:mcfutil.c:83:9

And these two blocks are tail merged by BranchFolder pass, but without this patch, debug locations are gone in the common tail.

I confirmed that AddDiscriminator pass has been executed with '-Xclang -fdebug-info-for-profiling'. (By the way @andreadb, do we need to explicitly add the flag to have discriminators? As far as I remember I could have discriminators for expressions in the same source line even without the flag. Thanks!)

In D30226#685947, @twoh wrote:
Yes I observed the case from refresh_potential function in spec2006 429.mcf. The original C code snippet is below:
 81     while( node != root )
 82     {
 83         while( node )
 84         {
 85             if( node->orientation == UP )
 86                 node->potential = node->basic_arc->cost + node->pred->potential;
 87             else /* == DOWN */
 88             {
 89                 node->potential = node->pred->potential - node->basic_arc->cost;
 90                 checksum++;
 91             }
...
Before Control Flow Optimizer pass (which runs BranchFolder pass), instructions for line 83 are located in two places, one in the loop preheader and the other in the loop body:
BB#4: derived from LLVM BB %while.cond3.preheader
  TEST64rr %RDX, %RDX, %EFLAGS<imp-def>; dbg:mcfutil.c:83:9
  JE_1 <BB#5>, %EFLAGS<imp-use>; dbg:mcfutil.c:83:9
  JMP_1 <BB#6>; dbg:mcfutil.c:83:9

...

BB#6: derived from LLVM BB %while.body4
...
  TEST64rr %RDX, %RDX, %EFLAGS<imp-def>; dbg:mcfutil.c:83:9
  JE_1 <BB#5>, %EFLAGS<imp-use>; dbg:mcfutil.c:83:9
  JMP_1 <BB#6>; dbg:mcfutil.c:83:9
And these two blocks are tail merged by BranchFolder pass, but without this patch, debug locations are gone in the common tail.

Thanks for the example!
I guess, part of the reason why we end up in that situation is because the loop is rotated, so the original loop check now exists in two places (and the debugloc for those instructions is the same).

I confirmed that AddDiscriminator pass has been executed with '-Xclang -fdebug-info-for-profiling'. (By the way @andreadb, do we need to explicitly add the flag to have discriminators? As far as I remember I could have discriminators for expressions in the same source line even without the flag. Thanks!)

You remember it right. I think that there is an hidden option to disable discriminators (but just for debugging purposes).

Friendly Ping. Thanks!

aprantl added inline comments.Mar 3 2017, 3:37 PM

lib/CodeGen/BranchFolding.cpp
758 ↗	(On Diff #89283)	doxygen comments should be in the header file and not repeat the method name.
763 ↗	(On Diff #89283)	I think local variables should be captialized.
948 ↗	(On Diff #89283)	`.`

Seems plausible. (inline comments pending, ...)

This revision is now accepted and ready to land.Mar 3 2017, 3:38 PM

@aprantl Thanks for the review! I added inline questions asking for your opinion about consistency vs guideline.

lib/CodeGen/BranchFolding.cpp
758 ↗	(On Diff #89283)	All other functions in BranchFolding.cpp file have doxygen comments inside .cpp file with a method name. I thought it is better to keep the consistency, and wonder what is your opinion on this.
763 ↗	(On Diff #89283)	I kept 'i' small here for the consistency as well.

@aprantl Could you please give your opinion on the inlined questions? Thanks!

Replied.

lib/CodeGen/BranchFolding.cpp
762 ↗	(On Diff #89283)	I think the right way forward is to commit it like this and then follow-up by an NFC comment that reformats the comments according to the coding guidelines. (No need for a pre-commit review IMO)
763 ↗	(On Diff #89283)	see above.

Closed by commit rL297805: [BranchFolding] Merge debug locations from common tail instead of removing (authored by twoh). · Explain WhyMar 14 2017, 10:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

BranchFolding.h

1 line

BranchFolding.cpp

46 lines

test/

CodeGen/

X86/

tail-merge-debugloc.ll

42 lines

Diff 91822

llvm/trunk/lib/CodeGen/BranchFolding.h

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	unsigned ComputeSameTails(unsigned CurHash, unsigned minCommonTailLength,
MachineBasicBlock *SuccBB,		MachineBasicBlock *SuccBB,
MachineBasicBlock *PredBB);		MachineBasicBlock *PredBB);
void RemoveBlocksWithHash(unsigned CurHash, MachineBasicBlock* SuccBB,		void RemoveBlocksWithHash(unsigned CurHash, MachineBasicBlock* SuccBB,
MachineBasicBlock* PredBB);		MachineBasicBlock* PredBB);
bool CreateCommonTailOnlyBlock(MachineBasicBlock *&PredBB,		bool CreateCommonTailOnlyBlock(MachineBasicBlock *&PredBB,
MachineBasicBlock *SuccBB,		MachineBasicBlock *SuccBB,
unsigned maxCommonTailLength,		unsigned maxCommonTailLength,
unsigned &commonTailIndex);		unsigned &commonTailIndex);
		void MergeCommonTailDebugLocs(unsigned commonTailIndex);

bool OptimizeBranches(MachineFunction &MF);		bool OptimizeBranches(MachineFunction &MF);
bool OptimizeBlock(MachineBasicBlock *MBB);		bool OptimizeBlock(MachineBasicBlock *MBB);
void RemoveDeadBlock(MachineBasicBlock *MBB);		void RemoveDeadBlock(MachineBasicBlock *MBB);

bool HoistCommonCode(MachineFunction &MF);		bool HoistCommonCode(MachineFunction &MF);
bool HoistCommonCodeInSuccs(MachineBasicBlock *MBB);		bool HoistCommonCodeInSuccs(MachineBasicBlock *MBB);
};		};
}		}

#endif /* LLVM_CODEGEN_BRANCHFOLDING_HPP */		#endif /* LLVM_CODEGEN_BRANCHFOLDING_HPP */

llvm/trunk/lib/CodeGen/BranchFolding.cpp

Show All 26 Lines
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"		#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
▲ Show 20 Lines • Show All 705 Lines • ▼ Show 20 Lines	bool BranchFolder::CreateCommonTailOnlyBlock(MachineBasicBlock *&PredBB,

// If we split PredBB, newMBB is the new predecessor.		// If we split PredBB, newMBB is the new predecessor.
if (PredBB == MBB)		if (PredBB == MBB)
PredBB = newMBB;		PredBB = newMBB;

return true;		return true;
}		}

		/// MergeCommonTailDebugLocs - Create merged DebugLocs of identical instructions
		/// across SameTails and assign it to the instruction in common tail.
		void BranchFolder::MergeCommonTailDebugLocs(unsigned commonTailIndex) {
		MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();

		std::vector<MachineBasicBlock::iterator> NextCommonInsts(SameTails.size());
		for (unsigned int i = 0 ; i != SameTails.size() ; ++i) {
		if (i != commonTailIndex)
		NextCommonInsts[i] = SameTails[i].getTailStartPos();
		else {
		assert(SameTails[i].getTailStartPos() == MBB->begin() &&
		"MBB is not a common tail only block");
		}
		}

		for (auto &MI : *MBB) {
		if (MI.isDebugValue())
		continue;
		DebugLoc DL = MI.getDebugLoc();
		for (unsigned int i = 0 ; i < NextCommonInsts.size() ; i++) {
		if (i == commonTailIndex)
		continue;

		auto &Pos = NextCommonInsts[i];
		assert(Pos != SameTails[i].getBlock()->end() &&
		"Reached BB end within common tail");
		while (Pos->isDebugValue()) {
		++Pos;
		assert(Pos != SameTails[i].getBlock()->end() &&
		"Reached BB end within common tail");
		}
		assert(MI.isIdenticalTo(*Pos) && "Expected matching MIIs!");
		DL = DILocation::getMergedLocation(DL, Pos->getDebugLoc());
		NextCommonInsts[i] = ++Pos;
		}
		MI.setDebugLoc(DL);
		}
		}

static void		static void
mergeOperations(MachineBasicBlock::iterator MBBIStartPos,		mergeOperations(MachineBasicBlock::iterator MBBIStartPos,
MachineBasicBlock &MBBCommon) {		MachineBasicBlock &MBBCommon) {
MachineBasicBlock *MBB = MBBIStartPos->getParent();		MachineBasicBlock *MBB = MBBIStartPos->getParent();
// Note CommonTailLen does not necessarily matches the size of		// Note CommonTailLen does not necessarily matches the size of
// the common BB nor all its instructions because of debug		// the common BB nor all its instructions because of debug
// instructions differences.		// instructions differences.
unsigned CommonTailLen = 0;		unsigned CommonTailLen = 0;
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	if (commonTailIndex == SameTails.size() \|\|
}		}
}		}

MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();		MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();

// Recompute common tail MBB's edge weights and block frequency.		// Recompute common tail MBB's edge weights and block frequency.
setCommonTailEdgeWeights(*MBB);		setCommonTailEdgeWeights(*MBB);

// Remove the original debug location from the common tail.		// Merge debug locations across identical instructions for common tail
for (auto &MI : *MBB)		MergeCommonTailDebugLocs(commonTailIndex);
if (!MI.isDebugValue())
MI.setDebugLoc(DebugLoc());

// MBB is common tail. Adjust all other BB's to jump to this one.		// MBB is common tail. Adjust all other BB's to jump to this one.
// Traversal must be forwards so erases work.		// Traversal must be forwards so erases work.
DEBUG(dbgs() << "\nUsing common tail in BB#" << MBB->getNumber()		DEBUG(dbgs() << "\nUsing common tail in BB#" << MBB->getNumber()
<< " for ");		<< " for ");
for (unsigned int i=0, e = SameTails.size(); i != e; ++i) {		for (unsigned int i=0, e = SameTails.size(); i != e; ++i) {
if (commonTailIndex == i)		if (commonTailIndex == i)
continue;		continue;
▲ Show 20 Lines • Show All 1,043 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/tail-merge-debugloc.ll

				; RUN: llc -stop-after=branch-folder < %s \| FileCheck %s
				;
				; bb2 and bb3 in the IR below will be tail-merged into a single basic block.
				; As br instructions in bb2 and bb3 have the same debug location, make sure that
				; the branch instruction in the merged basic block still maintains the debug
				; location info.
				;
				; CHECK: [[DLOC:![0-9]+]] = !DILocation(line: 2, column: 2, scope: !{{[0-9]+}})
				; CHECK: TEST64rr{{.*}}%rsi, %rsi, implicit-def %eflags
				; CHECK-NEXT: JNE_1{{.*}}, debug-location [[DLOC]]

				target triple = "x86_64-unknown-linux-gnu"

				define i32 @foo(i1 %b, i8* %p) {
				bb1:
				br i1 %b, label %bb2, label %bb3

				bb2:
				%a1 = icmp eq i8* %p, null
				br i1 %a1, label %bb4, label %bb5, !dbg !6

				bb3:
				%a2 = icmp eq i8* %p, null
				br i1 %a2, label %bb4, label %bb5, !dbg !6

				bb4:
				ret i32 1

				bb5:
				ret i32 0
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!2, !3}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1)
				!1 = !DIFile(filename: "foo.c", directory: "b/")
				!2 = !{i32 2, !"Dwarf Version", i32 4}
				!3 = !{i32 2, !"Debug Info Version", i32 3}
				!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 3, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: true, unit: !0)
				!5 = distinct !DILexicalBlock(scope: !4, file: !1, line: 1, column: 1)
				!6 = !DILocation(line: 2, column: 2, scope: !5)