This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MachineLoopInfo.h
-
lib/CodeGen/
-
CodeGen/
4/6
MachineBlockPlacement.cpp
1
MachineLoopInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
Hexagon/
1/2
prof-early-if.ll
-
X86/
1/1
block-placement-2.ll
1
block-placement.ll
1
move_latch_to_loop_top.ll
2/2
ragreedy-bug.ll

Differential D74809

[MBP][X86] Include static prof data when collecting loop BBs
ClosedPublic

Authored by void on Feb 18 2020, 5:18 PM.

Download Raw Diff

Details

Reviewers

hjyamauchi
nickdesaulniers
skatkov
iteratee

Commits

rG129c911efaa4: Include static prof data when collecting loop BBs

Summary

If the programmer adds static profile data to a branch---i.e. uses
"builtin_expect()" or similar---then we should honor it. Otherwise,
"builtin_expect()" is ignored in crucial situations. So we trust that
the programmer knows what they're doing until proven wrong.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

void created this revision.Feb 18 2020, 5:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2020, 5:18 PM

Herald added subscribers: llvm-commits, JDevlieghere, hiraditya. · View Herald Transcript

void added reviewers: hjyamauchi, nickdesaulniers.Feb 18 2020, 5:20 PM

Harbormaster completed remote builds in B46775: Diff 245307.Feb 18 2020, 5:27 PM

nickdesaulniers added reviewers: skatkov, iteratee.Feb 18 2020, 6:00 PM

I'm in no way qualified to be reviewing changes to something with as important implications as machine block placement, but from the test modifications, it doesn't seem too invasive a change. Posting mostly notes for other reviewers to check.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2517	I really wish this was a method on `MachineBlockPlacement` or `MachineLoop` rather than a lambda in an `if` statement.
llvm/test/CodeGen/Hexagon/prof-early-if.ll
5	LGTM; b5 is only branched to from b4, but b4's has higher branch weight to b2. Though it would be nice to just list all the blocks in order; otherwise I'm just guessing where the rest land.
llvm/test/CodeGen/X86/block-placement-2.ll
14	probably should specify where `if.end.i.i` gets placed, too, since it also has profile metadata.
llvm/test/CodeGen/X86/block-placement.ll
1507	LGTM; backedge is very likely to go to header. middle is unlikely to go to slow
llvm/test/CodeGen/X86/move_latch_to_loop_top.ll
177	LGTM; %latch has a 50-50 chance of branching either to %exit or %header. `!3` seems unused.
llvm/test/CodeGen/X86/ragreedy-bug.ll
35	this change is curious; I'd have expected `je`'s to flip to `jne`'s or vice versa, but not unconditional jumps.

void marked 2 inline comments as done.Feb 18 2020, 9:10 PM

void added inline comments.

llvm/test/CodeGen/X86/ragreedy-bug.ll
35	This change is very interesting actually. The two mem-move blocks are placed at the end of the function. They then branch back up into what was a tail duplicated block. So the testl/je statements are actually at the original place, and the jmp here is just a simple branch to them.

lebedev.ri retitled this revision from Include static prof data when collecting loop BBs to [MBP][X86] Include static prof data when collecting loop BBs.Feb 19 2020, 1:27 AM

I think if we add more checks as I've commented above and see if there's a way to make the lamba-in-if nicer, this is ready to land.

Move lambda logic into a method. Improve some tests with additional checks.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2517	Is it that it's cleaner to do it that way? The logic is going to be basically the same.

Harbormaster completed remote builds in B46827: Diff 245459.Feb 19 2020, 10:38 AM

LGTM; thanks Bill!

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2517	Yes, it's much nicer IMO to have short concise well named functions. It helps make this code more readable. Thank you.
llvm/test/CodeGen/Hexagon/prof-early-if.ll
5	Looks like the intent of the test is `that "if.then" was not predicated.` not necessarily anything to do with the block placement.

This revision is now accepted and ready to land.Feb 19 2020, 10:59 AM

Closed by commit rG129c911efaa4: Include static prof data when collecting loop BBs (authored by void). · Explain WhyFeb 19 2020, 11:36 AM

This revision was automatically updated to reflect the committed changes.

Is it expected that this patch could increase binary size? I'm seeing about a 9kB total increase in fuchsia ZBIs between clang with this patch and the one before it.

davidxl added a subscriber: davidxl.Mar 20 2020, 2:15 PM

davidxl added inline comments.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2516	The check here seems too weak. One branch in the loop has user annotation does not mean other branch probablity data can be trusted. More sophisticated analysis is needed.
llvm/lib/CodeGen/MachineLoopInfo.cpp
114	Note that with PGO is on, the branches are also annotated with MD_prof meta data. In other words, you will also need to check fhat the function entry does not have profile count.

void marked an inline comment as done.Mar 20 2020, 3:40 PM

void added inline comments.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2516	Do you mean that a builtin_expect shouldn't be considered when using profile data?

davidxl added inline comments.Mar 20 2020, 5:12 PM

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2516	No, that is what I meant. When profile data is available, builtin_expect will be ignored -- the prof meta data is overridden by profile data reader. What I meant is that builtin_expect has very vague meaning -- some user uses it to indicate branch bias which can be weak, or strong. LLVM's implementation treat it as a very strong bias (2000:1 ratio) -- so it will likely to exclude too many blocks from the block set when used in this context -- leading to code size increase or performance regression. In other words, PGO is the way to go :)

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineLoopInfo.h

4 lines

lib/

CodeGen/

MachineBlockPlacement.cpp

11 lines

MachineLoopInfo.cpp

7 lines

test/

CodeGen/

Hexagon/

prof-early-if.ll

2 lines

X86/

block-placement-2.ll

162 lines

block-placement.ll

2 lines

move_latch_to_loop_top.ll

2 lines

ragreedy-bug.ll

10 lines

Diff 245480

llvm/include/llvm/CodeGen/MachineLoopInfo.h

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	public:

/// Return the debug location of the start of this loop.		/// Return the debug location of the start of this loop.
/// This looks for a BB terminating instruction with a known debug		/// This looks for a BB terminating instruction with a known debug
/// location by looking at the preheader and header blocks. If it		/// location by looking at the preheader and header blocks. If it
/// cannot find a terminating instruction with location information,		/// cannot find a terminating instruction with location information,
/// it returns an unknown location.		/// it returns an unknown location.
DebugLoc getStartLoc() const;		DebugLoc getStartLoc() const;

		/// Returns true if a machine loop has blocks that have static profiling
		/// information---e.g. from '__builtin_expect()'.
		bool hasStaticProfInfo() const;

void dump() const;		void dump() const;

private:		private:
friend class LoopInfoBase<MachineBasicBlock, MachineLoop>;		friend class LoopInfoBase<MachineBasicBlock, MachineLoop>;

explicit MachineLoop(MachineBasicBlock *MBB)		explicit MachineLoop(MachineBasicBlock *MBB)
: LoopBase<MachineBasicBlock, MachineLoop>(MBB) {}		: LoopBase<MachineBasicBlock, MachineLoop>(MBB) {}

▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineBlockPlacement.cpp

Show First 20 Lines • Show All 2,500 Lines • ▼ Show 20 Lines	MachineBlockPlacement::collectLoopBlockSet(const MachineLoop &L) {

// Filter cold blocks off from LoopBlockSet when profile data is available.		// Filter cold blocks off from LoopBlockSet when profile data is available.
// Collect the sum of frequencies of incoming edges to the loop header from		// Collect the sum of frequencies of incoming edges to the loop header from
// outside. If we treat the loop as a super block, this is the frequency of		// outside. If we treat the loop as a super block, this is the frequency of
// the loop. Then for each block in the loop, we calculate the ratio between		// the loop. Then for each block in the loop, we calculate the ratio between
// its frequency and the frequency of the loop block. When it is too small,		// its frequency and the frequency of the loop block. When it is too small,
// don't add it to the loop chain. If there are outer loops, then this block		// don't add it to the loop chain. If there are outer loops, then this block
// will be merged into the first outer loop chain for which this block is not		// will be merged into the first outer loop chain for which this block is not
// cold anymore. This needs precise profile data and we only do this when		// cold anymore.
// profile data is available.		//
if (F->getFunction().hasProfileData() \|\| ForceLoopColdBlock) {		// If a block uses static profiling data (e.g. from '__builtin_expect()'),
		// then the programmer is explicitly telling us which paths are hot and cold.
		// There's no reason for the compiler to believe otherwise, unless
		// '-fprofile-use' is specified.
		if (F->getFunction().hasProfileData() \|\| ForceLoopColdBlock \|\|
		L.hasStaticProfInfo()) {
		davidxlUnsubmitted Not Done Reply Inline Actions The check here seems too weak. One branch in the loop has user annotation does not mean other branch probablity data can be trusted. More sophisticated analysis is needed. davidxl: The check here seems too weak. One branch in the loop has user annotation does not mean other…
		voidAuthorUnsubmitted Done Reply Inline Actions Do you mean that a builtin_expect shouldn't be considered when using profile data? void: Do you mean that a builtin_expect shouldn't be considered when using profile data?
		davidxlUnsubmitted Not Done Reply Inline Actions No, that is what I meant. When profile data is available, builtin_expect will be ignored -- the prof meta data is overridden by profile data reader. What I meant is that builtin_expect has very vague meaning -- some user uses it to indicate branch bias which can be weak, or strong. LLVM's implementation treat it as a very strong bias (2000:1 ratio) -- so it will likely to exclude too many blocks from the block set when used in this context -- leading to code size increase or performance regression. In other words, PGO is the way to go :) davidxl: No, that is what I meant. When profile data is available, builtin_expect will be ignored…
BlockFrequency LoopFreq(0);		BlockFrequency LoopFreq(0);
		nickdesaulniersUnsubmitted Done Reply Inline Actions I really wish this was a method on `MachineBlockPlacement` or `MachineLoop` rather than a lambda in an `if` statement. nickdesaulniers: I really wish this was a method on `MachineBlockPlacement` or `MachineLoop` rather than a…
		voidAuthorUnsubmitted Done Reply Inline Actions Is it that it's cleaner to do it that way? The logic is going to be basically the same. void: Is it that it's cleaner to do it that way? The logic is going to be basically the same.
		nickdesaulniersUnsubmitted Done Reply Inline Actions Yes, it's much nicer IMO to have short concise well named functions. It helps make this code more readable. Thank you. nickdesaulniers: Yes, it's much nicer IMO to have short concise well named functions. It helps make this code…
for (auto LoopPred : L.getHeader()->predecessors())		for (auto LoopPred : L.getHeader()->predecessors())
if (!L.contains(LoopPred))		if (!L.contains(LoopPred))
LoopFreq += MBFI->getBlockFreq(LoopPred) *		LoopFreq += MBFI->getBlockFreq(LoopPred) *
MBPI->getEdgeProbability(LoopPred, L.getHeader());		MBPI->getEdgeProbability(LoopPred, L.getHeader());

for (MachineBasicBlock *LoopBB : L.getBlocks()) {		for (MachineBasicBlock *LoopBB : L.getBlocks()) {
auto Freq = MBFI->getBlockFreq(LoopBB).getFrequency();		auto Freq = MBFI->getBlockFreq(LoopBB).getFrequency();
if (Freq == 0 \|\| LoopFreq.getFrequency() / Freq > LoopToColdBlockRatio)		if (Freq == 0 \|\| LoopFreq.getFrequency() / Freq > LoopToColdBlockRatio)
▲ Show 20 Lines • Show All 919 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineLoopInfo.cpp

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	DebugLoc MachineLoop::getStartLoc() const {
// info in it, try the header.		// info in it, try the header.
if (MachineBasicBlock *HeadMBB = getHeader())		if (MachineBasicBlock *HeadMBB = getHeader())
if (const BasicBlock *HeadBB = HeadMBB->getBasicBlock())		if (const BasicBlock *HeadBB = HeadMBB->getBasicBlock())
return HeadBB->getTerminator()->getDebugLoc();		return HeadBB->getTerminator()->getDebugLoc();

return DebugLoc();		return DebugLoc();
}		}

		bool MachineLoop::hasStaticProfInfo() const {
		davidxlUnsubmitted Not Done Reply Inline Actions Note that with PGO is on, the branches are also annotated with MD_prof meta data. In other words, you will also need to check fhat the function entry does not have profile count. davidxl: Note that with PGO is on, the branches are also annotated with MD_prof meta data. In other…
		return llvm::any_of(blocks(), [](const MachineBasicBlock *MBB){
		const BasicBlock *BB = MBB->getBasicBlock();
		return BB && BB->getTerminator()->hasMetadata(LLVMContext::MD_prof);
		});
		}

MachineBasicBlock *		MachineBasicBlock *
MachineLoopInfo::findLoopPreheader(MachineLoop *L,		MachineLoopInfo::findLoopPreheader(MachineLoop *L,
bool SpeculativePreheader) const {		bool SpeculativePreheader) const {
if (MachineBasicBlock *PB = L->getLoopPreheader())		if (MachineBasicBlock *PB = L->getLoopPreheader())
return PB;		return PB;

if (!SpeculativePreheader)		if (!SpeculativePreheader)
return nullptr;		return nullptr;
Show All 32 Lines

llvm/test/CodeGen/Hexagon/prof-early-if.ll

	; RUN: llc -O2 -march=hexagon < %s \| FileCheck %s			; RUN: llc -O2 -march=hexagon < %s \| FileCheck %s
	; Rely on the comments generated by llc. Check that "if.then" was not predicated.			; Rely on the comments generated by llc. Check that "if.then" was not predicated.
	; CHECK: b5
	; CHECK: b2			; CHECK: b2
	; CHECK-NOT: if{{.*}}memd			; CHECK-NOT: if{{.*}}memd
				; CHECK: b5
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions LGTM; b5 is only branched to from b4, but b4's has higher branch weight to b2. Though it would be nice to just list all the blocks in order; otherwise I'm just guessing where the rest land. nickdesaulniers: LGTM; b5 is only branched to from b4, but b4's has higher branch weight to b2. Though it would…
				nickdesaulniersUnsubmitted Done Reply Inline Actions Looks like the intent of the test is `that "if.then" was not predicated.` not necessarily anything to do with the block placement. nickdesaulniers: Looks like the intent of the test is `that "if.then" was not predicated.` not necessarily…

	%s.0 = type { [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [3 x i32], [24 x i32], [8 x %s.1], [5 x i32] }			%s.0 = type { [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [24 x i32], [3 x i32], [24 x i32], [8 x %s.1], [5 x i32] }
	%s.1 = type { i32, i32 }			%s.1 = type { i32, i32 }

	@g0 = global i64 0			@g0 = global i64 0
	@g1 = global i32 0			@g1 = global i32 0
	@g2 = global i32 0			@g2 = global i32 0
	@g3 = global i8 0			@g3 = global i8 0
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/block-placement-2.ll

This file was added.


				; RUN: llc -mtriple=i686-linux -pre-RA-sched=source < %s \| FileCheck %s
				; RUN: opt -disable-output -debugify < %s

				; This was derived from the Linux kernel. The __builtin_expect was ignored
				; which pushed the hot block "if.else" out of the critical path choosing
				; instead the cold block "if.then23". The cold block should be moved towards
				; the bottom.

				; CHECK-LABEL: test1:
				; CHECK: %for.inc
				; CHECK: %if.end18
				; CHECK: %if.else
				; CHECK: %if.end.i.i
				nickdesaulniersUnsubmitted Done Reply Inline Actions probably should specify where `if.end.i.i` gets placed, too, since it also has profile metadata. nickdesaulniers: probably should specify where `if.end.i.i` gets placed, too, since it also has profile metadata.
				; CHECK: %if.end8.i.i
				; CHECK: %if.then23
				; CHECK: ret

				%struct.hlist_bl_node = type { %struct.hlist_bl_node, %struct.hlist_bl_node* }
				%struct.dentry = type { i32, %struct.inode, %struct.hlist_bl_node, %struct.dentry, %struct.inode, %struct.inode, [32 x i8], %struct.inode, %struct.dentry_operations* }
				%struct.inode = type { i32 }
				%struct.dentry_operations = type { i32 (%struct.dentry, i32), i32 (%struct.dentry, i32), i32 (%struct.dentry, %struct.inode), i32 (%struct.dentry, i32, i8) }
				%struct.anon.2 = type { i32, i32 }

				define %struct.dentry* @test1(%struct.dentry* readonly %parent, i8* %name, i32* nocapture %seqp, i64 %param1) {
				entry:
				%tobool135 = icmp eq i64 %param1, 0
				br i1 %tobool135, label %cleanup63, label %do.body4.lr.ph

				do.body4.lr.ph: ; preds = %entry
				%d_op = getelementptr inbounds %struct.dentry, %struct.dentry* %parent, i64 0, i32 8
				%shr = lshr i64 %param1, 32
				%conv49 = trunc i64 %shr to i32
				br label %do.body4

				do.body4: ; preds = %for.inc, %do.body4.lr.ph
				%node.0.in136 = phi i64 [ %param1, %do.body4.lr.ph ], [ %tmp35, %for.inc ]
				%node.0 = inttoptr i64 %node.0.in136 to %struct.hlist_bl_node*
				%add.ptr = getelementptr %struct.hlist_bl_node, %struct.hlist_bl_node* %node.0, i64 -1, i32 1
				%tmp6 = bitcast %struct.hlist_bl_node*** %add.ptr to %struct.dentry*
				%tmp7 = getelementptr inbounds %struct.dentry, %struct.dentry* %tmp6, i64 0, i32 1, i32 0
				%tmp8 = load volatile i32, i32* %tmp7, align 4
				call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
				%d_parent = getelementptr inbounds %struct.hlist_bl_node, %struct.hlist_bl_node* %add.ptr, i64 3
				%tmp9 = bitcast %struct.hlist_bl_node* %d_parent to %struct.dentry
				%tmp10 = load %struct.dentry, %struct.dentry* %tmp9, align 8
				%cmp133 = icmp eq %struct.dentry* %tmp10, %parent
				br i1 %cmp133, label %if.end14.lr.ph, label %for.inc

				if.end14.lr.ph: ; preds = %do.body4
				%tmp11 = getelementptr inbounds %struct.hlist_bl_node, %struct.hlist_bl_node* %add.ptr, i64 2
				%d_name43 = getelementptr inbounds %struct.hlist_bl_node, %struct.hlist_bl_node* %add.ptr, i64 4
				%hash = bitcast %struct.hlist_bl_node*** %d_name43 to i32*
				%tmp12 = bitcast %struct.hlist_bl_node*** %d_name43 to %struct.anon.2*
				%len = getelementptr inbounds %struct.anon.2, %struct.anon.2* %tmp12, i64 0, i32 1
				%name31 = getelementptr inbounds %struct.hlist_bl_node, %struct.hlist_bl_node* %add.ptr, i64 5
				%tmp13 = bitcast %struct.hlist_bl_node* %name31 to i8
				br label %if.end14

				if.end14: ; preds = %cleanup, %if.end14.lr.ph
				%and.i100134.in = phi i32 [ %tmp8, %if.end14.lr.ph ], [ undef, %cleanup ]
				%and.i100134 = and i32 %and.i100134.in, -2
				%tmp14 = load %struct.hlist_bl_node, %struct.hlist_bl_node* %tmp11, align 8
				%tobool.i.i = icmp eq %struct.hlist_bl_node** %tmp14, null
				br i1 %tobool.i.i, label %for.inc, label %if.end18

				if.end18: ; preds = %if.end14
				%tmp15 = load i32, i32* %seqp, align 8
				%tmp16 = and i32 %tmp15, 2
				%tobool22 = icmp eq i32 %tmp16, 0
				br i1 %tobool22, label %if.else, label %if.then23, !prof !0, !misexpect !1

				if.then23: ; preds = %if.end18
				%tmp17 = load i32, i32* %hash, align 8
				%cmp25 = icmp eq i32 %tmp17, 42
				br i1 %cmp25, label %if.end28, label %for.inc

				if.end28: ; preds = %if.then23
				%tmp18 = load i32, i32* %len, align 4
				%tmp19 = load i8, i8* %tmp13, align 8
				call void asm sideeffect "", "~{memory},~{dirflag},~{fpsr},~{flags}"()
				%tmp20 = load i32, i32* %tmp7, align 4
				%cmp.i.i101 = icmp eq i32 %tmp20, %and.i100134
				br i1 %cmp.i.i101, label %if.end36, label %cleanup

				if.end36: ; preds = %if.end28
				%tmp21 = load %struct.dentry_operations, %struct.dentry_operations* %d_op, align 8
				%d_compare = getelementptr inbounds %struct.dentry_operations, %struct.dentry_operations* %tmp21, i64 0, i32 3
				%tmp22 = load i32 (%struct.dentry, i32, i8), i32 (%struct.dentry, i32, i8)* %d_compare, align 8
				%call37 = call i32 %tmp22(%struct.dentry* %tmp6, i32 %tmp18, i8* %name)
				%cmp38 = icmp eq i32 %call37, 0
				br i1 %cmp38, label %cleanup56, label %for.inc

				cleanup: ; preds = %if.end28
				%tmp24 = load %struct.dentry, %struct.dentry* %tmp9, align 8
				%cmp = icmp eq %struct.dentry* null, %parent
				br i1 %cmp, label %if.end14, label %for.inc

				if.else: ; preds = %if.end18
				%hash_len44 = bitcast %struct.hlist_bl_node*** %d_name43 to i64*
				%tmp25 = load i64, i64* %hash_len44, align 8
				%cmp45 = icmp eq i64 %tmp25, %param1
				br i1 %cmp45, label %if.end48, label %for.inc

				if.end48: ; preds = %if.else
				%tmp26 = bitcast %struct.hlist_bl_node*** %name31 to i64*
				%tmp27 = load volatile i64, i64* %tmp26, align 8
				%tmp28 = inttoptr i64 %tmp27 to i8*
				br label %for.cond.i.i

				for.cond.i.i: ; preds = %if.end8.i.i, %if.end48
				%tcount.addr.0.i.i = phi i32 [ %conv49, %if.end48 ], [ %sub.i.i, %if.end8.i.i ]
				%ct.addr.0.i.i = phi i8* [ %name, %if.end48 ], [ %add.ptr9.i.i, %if.end8.i.i ]
				%cs.addr.0.i.i = phi i8* [ %tmp28, %if.end48 ], [ %add.ptr.i.i, %if.end8.i.i ]
				%tmp29 = bitcast i8* %cs.addr.0.i.i to i64*
				%tmp30 = load i64, i64* %tmp29, align 8
				%tmp31 = bitcast i8* %ct.addr.0.i.i to i64*
				%tmp32 = call { i64, i64 } asm "1:\09mov $2,$0\0A2:\0A.section .fixup,\22ax\22\0A3:\09lea $2,$1\0A\09and $3,$1\0A\09mov ($1),$0\0A\09leal $2,%ecx\0A\09andl $4,%ecx\0A\09shll $$3,%ecx\0A\09shr %cl,$0\0A\09jmp 2b\0A.previous\0A .pushsection \22__ex_table\22,\22a\22\0A .balign 4\0A .long (1b) - .\0A .long (3b) - .\0A .long (ex_handler_default) - .\0A .popsection\0A", "=&r,=&{cx},m,i,i,~{dirflag},~{fpsr},~{flags}"(i64 %tmp31, i64 -8, i64 7)
				%cmp.i.i = icmp ult i32 %tcount.addr.0.i.i, 8
				%asmresult.i.le.i.le.i.le = extractvalue { i64, i64 } %tmp32, 0
				br i1 %cmp.i.i, label %dentry_cmp.exit, label %if.end.i.i

				if.end.i.i: ; preds = %for.cond.i.i
				%cmp3.i.i = icmp eq i64 %tmp30, %asmresult.i.le.i.le.i.le
				br i1 %cmp3.i.i, label %if.end8.i.i, label %for.inc, !prof !0, !misexpect !1

				if.end8.i.i: ; preds = %if.end.i.i
				%add.ptr.i.i = getelementptr i8, i8* %cs.addr.0.i.i, i64 8
				%add.ptr9.i.i = getelementptr i8, i8* %ct.addr.0.i.i, i64 8
				%sub.i.i = add i32 %tcount.addr.0.i.i, -8
				%tobool12.i.i = icmp eq i32 %sub.i.i, 0
				br i1 %tobool12.i.i, label %cleanup56, label %for.cond.i.i

				dentry_cmp.exit: ; preds = %for.cond.i.i
				%asmresult.i.le.i.le.i.le.le = extractvalue { i64, i64 } %tmp32, 0
				%mul.i.i = shl nuw nsw i32 %tcount.addr.0.i.i, 3
				%sh_prom.i.i = zext i32 %mul.i.i to i64
				%shl.i.i = shl nsw i64 -1, %sh_prom.i.i
				%neg.i.i = xor i64 %shl.i.i, -1
				%xor.i.i = xor i64 %asmresult.i.le.i.le.i.le.le, %tmp30
				%and.i.i = and i64 %xor.i.i, %neg.i.i
				%tobool15.i.i = icmp eq i64 %and.i.i, 0
				br i1 %tobool15.i.i, label %cleanup56, label %for.inc

				cleanup56: ; preds = %dentry_cmp.exit, %if.end8.i.i, %if.end36
				%tmp33 = bitcast %struct.hlist_bl_node*** %add.ptr to %struct.dentry*
				store i32 %and.i100134, i32* %seqp, align 4
				br label %cleanup63

				for.inc: ; preds = %dentry_cmp.exit, %if.end.i.i, %if.else, %cleanup, %if.end36, %if.then23, %if.end14, %do.body4
				%tmp34 = inttoptr i64 %node.0.in136 to i64*
				%tmp35 = load volatile i64, i64* %tmp34, align 8
				%tobool = icmp eq i64 %tmp35, 0
				br i1 %tobool, label %cleanup63, label %do.body4

				cleanup63: ; preds = %for.inc, %cleanup56, %entry
				%retval.2 = phi %struct.dentry* [ %tmp33, %cleanup56 ], [ null, %entry ], [ null, %for.inc ]
				ret %struct.dentry* %retval.2
				}

				!0 = !{!"branch_weights", i32 2000, i32 1}
				!1 = !{!"misexpect", i64 1, i64 2000, i64 1}

llvm/test/CodeGen/X86/block-placement.ll

	Show First 20 Lines • Show All 1,496 Lines • ▼ Show 20 Lines
	; Specifically in this case because best exit is .header			; Specifically in this case because best exit is .header
	; but it has fallthrough to .middle block and last block in			; but it has fallthrough to .middle block and last block in
	; loop chain .slow does not have afallthrough to .header.			; loop chain .slow does not have afallthrough to .header.
	; CHECK-LABEL: not_rotate_if_extra_branch			; CHECK-LABEL: not_rotate_if_extra_branch
	; CHECK: %.entry			; CHECK: %.entry
	; CHECK: %.header			; CHECK: %.header
	; CHECK: %.middle			; CHECK: %.middle
	; CHECK: %.backedge			; CHECK: %.backedge
	; CHECK: %.slow
	; CHECK: %.bailout			; CHECK: %.bailout
	; CHECK: %.stop			; CHECK: %.stop
				; CHECK: %.slow
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions LGTM; backedge is very likely to go to header. middle is unlikely to go to slow nickdesaulniers: LGTM; backedge is very likely to go to header. middle is unlikely to go to slow
	.entry:			.entry:
	%sum.0 = shl nsw i32 %count, 1			%sum.0 = shl nsw i32 %count, 1
	br label %.header			br label %.header

	.header:			.header:
	%i = phi i32 [ %i.1, %.backedge ], [ 0, %.entry ]			%i = phi i32 [ %i.1, %.backedge ], [ 0, %.entry ]
	%sum = phi i32 [ %sum.1, %.backedge ], [ %sum.0, %.entry ]			%sum = phi i32 [ %sum.1, %.backedge ], [ %sum.0, %.entry ]
	%is_exc = icmp sgt i32 %i, 9000000			%is_exc = icmp sgt i32 %i, 9000000
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/move_latch_to_loop_top.ll

	Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines
	}			}

	; The exit block has higher frequency than false block, so latch block			; The exit block has higher frequency than false block, so latch block
	; should not moved before header.			; should not moved before header.
	;CHECK-LABEL: test4:			;CHECK-LABEL: test4:
	;CHECK: %header			;CHECK: %header
	;CHECK: %true			;CHECK: %true
	;CHECK: %latch			;CHECK: %latch
	;CHECK: %false
	;CHECK: %exit			;CHECK: %exit
				;CHECK: %false
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions LGTM; %latch has a 50-50 chance of branching either to %exit or %header. `!3` seems unused. nickdesaulniers: LGTM; %latch has a 50-50 chance of branching either to %exit or %header. `!3` seems unused.
	define i32 @test4(i32 %t, i32* %p) {			define i32 @test4(i32 %t, i32* %p) {
	entry:			entry:
	br label %header			br label %header

	header:			header:
	%x1 = phi i64 [0, %entry], [%x2, %latch]			%x1 = phi i64 [0, %entry], [%x2, %latch]
	%count1 = phi i32 [0, %entry], [%count4, %latch]			%count1 = phi i32 [0, %entry], [%count4, %latch]
	%0 = ptrtoint i32* %p to i64			%0 = ptrtoint i32* %p to i64
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/ragreedy-bug.ll

	; RUN: llc < %s -mtriple=x86_64-apple-macosx -regalloc=greedy \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-macosx -regalloc=greedy \| FileCheck %s

	; This testing case is reduced from 197.parser prune_match function.			; This testing case is reduced from 197.parser prune_match function.
	; We make sure register copies are not generated on isupper.exit blocks.			; We make sure register copies are not generated on isupper.exit blocks.

	; isupper.exit and isupper.exit223 get tail-duplicated into all their			; isupper.exit and isupper.exit223 get tail-duplicated into all their
	; predecessors.			; predecessors.
	; CHECK: cond.true.i.i			; CHECK: cond.true.i.i
	; CHECK-NEXT: in Loop			; CHECK-NEXT: in Loop
	; Mem-move			; Mem-move
	; CHECK-NEXT: movl			; CHECK-NEXT: movl
	; CHECK-NEXT: andl			; CHECK-NEXT: andl
				; CHECK-NEXT: LBB0
				; CHECK-NEXT: in Loop
	; CHECK-NEXT: testl			; CHECK-NEXT: testl
	; CHECK-NEXT: jne			; CHECK-NEXT: jne
	; CHECK: cond.true.i.i217			; CHECK: cond.true.i.i217
	; CHECK-NEXT: in Loop			; CHECK-NEXT: in Loop
	; Mem-move			; Mem-move
	; CHECK-NEXT: movl			; CHECK-NEXT: movl
	; CHECK-NEXT: andl			; CHECK-NEXT: andl
				; CHECK-NEXT: LBB0
				; CHECK-NEXT: in Loop
	; CHECK-NEXT: testl			; CHECK-NEXT: testl
	; CHECK-NEXT: je			; CHECK-NEXT: je
	; CHECK: cond.false.i.i			; CHECK: cond.false.i.i
	; CHECK: maskrune			; CHECK: maskrune
	; CHECK-NEXT: movzbl			; CHECK-NEXT: movzbl
	; CHECK-NEXT: movzbl			; CHECK-NEXT: movzbl
	; CHECK-NEXT: testl			; CHECK-NEXT: jmp
	; CHECK-NEXT: je
	; CHECK: cond.false.i.i219			; CHECK: cond.false.i.i219
	; CHECK: maskrune			; CHECK: maskrune
	; CHECK-NEXT: movzbl			; CHECK-NEXT: movzbl
	; CHECK-NEXT: movzbl			; CHECK-NEXT: movzbl
	; CHECK-NEXT: testl			; CHECK-NEXT: jmp
				nickdesaulniersUnsubmitted Done Reply Inline Actions this change is curious; I'd have expected `je`'s to flip to `jne`'s or vice versa, but not unconditional jumps. nickdesaulniers: this change is curious; I'd have expected `je`'s to flip to `jne`'s or vice versa, but not…
				voidAuthorUnsubmitted Done Reply Inline Actions This change is very interesting actually. The two mem-move blocks are placed at the end of the function. They then branch back up into what was a tail duplicated block. So the testl/je statements are actually at the original place, and the jmp here is just a simple branch to them. void: This change is very interesting actually. The two mem-move blocks are placed at the end of the…
	; CHECK-NEXT: jne

	%struct.List_o_links_struct = type { i32, i32, i32, %struct.List_o_links_struct* }			%struct.List_o_links_struct = type { i32, i32, i32, %struct.List_o_links_struct* }
	%struct.Connector_struct = type { i16, i16, i8, i8, %struct.Connector_struct, i8 }			%struct.Connector_struct = type { i16, i16, i8, i8, %struct.Connector_struct, i8 }
	%struct._RuneLocale = type { [8 x i8], [32 x i8], i32 (i8, i64, i8), i32 (i32, i8, i64, i8), i32, [256 x i32], [256 x i32], [256 x i32], %struct._RuneRange, %struct._RuneRange, %struct._RuneRange, i8, i32, i32, %struct._RuneCharClass }			%struct._RuneLocale = type { [8 x i8], [32 x i8], i32 (i8, i64, i8), i32 (i32, i8, i64, i8), i32, [256 x i32], [256 x i32], [256 x i32], %struct._RuneRange, %struct._RuneRange, %struct._RuneRange, i8, i32, i32, %struct._RuneCharClass }
	%struct._RuneRange = type { i32, %struct._RuneEntry* }			%struct._RuneRange = type { i32, %struct._RuneEntry* }
	%struct._RuneEntry = type { i32, i32, i32, i32* }			%struct._RuneEntry = type { i32, i32, i32, i32* }
	%struct._RuneCharClass = type { [14 x i8], i32 }			%struct._RuneCharClass = type { [14 x i8], i32 }
	%struct.Exp_struct = type { i8, i8, i8, i8, %union.anon }			%struct.Exp_struct = type { i8, i8, i8, i8, %union.anon }
	▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MBP][X86] Include static prof data when collecting loop BBsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 245480

llvm/include/llvm/CodeGen/MachineLoopInfo.h

llvm/lib/CodeGen/MachineBlockPlacement.cpp

llvm/lib/CodeGen/MachineLoopInfo.cpp

llvm/test/CodeGen/Hexagon/prof-early-if.ll

llvm/test/CodeGen/X86/block-placement-2.ll

llvm/test/CodeGen/X86/block-placement.ll

llvm/test/CodeGen/X86/move_latch_to_loop_top.ll

llvm/test/CodeGen/X86/ragreedy-bug.ll

[MBP][X86] Include static prof data when collecting loop BBs
ClosedPublic