This is an archive of the discontinued LLVM Phabricator instance.

[MBP][X86] Include static prof data when collecting loop BBs
ClosedPublic

Authored by void on Feb 18 2020, 5:18 PM.

Details

Summary

If the programmer adds static profile data to a branch---i.e. uses
"builtin_expect()" or similar---then we should honor it. Otherwise,
"
builtin_expect()" is ignored in crucial situations. So we trust that
the programmer knows what they're doing until proven wrong.

Event Timeline

void created this revision.Feb 18 2020, 5:18 PM

I'm in no way qualified to be reviewing changes to something with as important implications as machine block placement, but from the test modifications, it doesn't seem too invasive a change. Posting mostly notes for other reviewers to check.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2517

I really wish this was a method on MachineBlockPlacement or MachineLoop rather than a lambda in an if statement.

llvm/test/CodeGen/Hexagon/prof-early-if.ll
5

LGTM; b5 is only branched to from b4, but b4's has higher branch weight to b2. Though it would be nice to just list all the blocks in order; otherwise I'm just guessing where the rest land.

llvm/test/CodeGen/X86/block-placement-2.ll
13

probably should specify where if.end.i.i gets placed, too, since it also has profile metadata.

llvm/test/CodeGen/X86/block-placement.ll
1507

LGTM; backedge is very likely to go to header. middle is unlikely to go to slow

llvm/test/CodeGen/X86/move_latch_to_loop_top.ll
177

LGTM; %latch has a 50-50 chance of branching either to %exit or %header. !3 seems unused.

llvm/test/CodeGen/X86/ragreedy-bug.ll
35

this change is curious; I'd have expected je's to flip to jne's or vice versa, but not unconditional jumps.

void marked 2 inline comments as done.Feb 18 2020, 9:10 PM
void added inline comments.
llvm/test/CodeGen/X86/ragreedy-bug.ll
35

This change is very interesting actually. The two mem-move blocks are placed at the end of the function. They then branch back up into what was a tail duplicated block. So the testl/je statements are actually at the original place, and the jmp here is just a simple branch to them.

lebedev.ri retitled this revision from Include static prof data when collecting loop BBs to [MBP][X86] Include static prof data when collecting loop BBs.Feb 19 2020, 1:27 AM

I think if we add more checks as I've commented above and see if there's a way to make the lamba-in-if nicer, this is ready to land.

void updated this revision to Diff 245459.Feb 19 2020, 10:34 AM
void marked 4 inline comments as done.

Move lambda logic into a method. Improve some tests with additional checks.

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2517

Is it that it's cleaner to do it that way? The logic is going to be basically the same.

nickdesaulniers accepted this revision.Feb 19 2020, 10:59 AM
nickdesaulniers marked 2 inline comments as done.

LGTM; thanks Bill!

llvm/lib/CodeGen/MachineBlockPlacement.cpp
2517

Yes, it's much nicer IMO to have short concise well named functions. It helps make this code more readable. Thank you.

llvm/test/CodeGen/Hexagon/prof-early-if.ll
5

Looks like the intent of the test is that "if.then" was not predicated. not necessarily anything to do with the block placement.

This revision is now accepted and ready to land.Feb 19 2020, 10:59 AM
This revision was automatically updated to reflect the committed changes.
leonardchan added a subscriber: leonardchan.EditedMar 18 2020, 12:27 PM

Is it expected that this patch could increase binary size? I'm seeing about a 9kB total increase in fuchsia ZBIs between clang with this patch and the one before it.

davidxl added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2516

The check here seems too weak. One branch in the loop has user annotation does not mean other branch probablity data can be trusted. More sophisticated analysis is needed.

llvm/lib/CodeGen/MachineLoopInfo.cpp
114 ↗(On Diff #245480)

Note that with PGO is on, the branches are also annotated with MD_prof meta data. In other words, you will also need to check fhat the function entry does not have profile count.

void marked an inline comment as done.Mar 20 2020, 3:40 PM
void added inline comments.
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2516

Do you mean that a builtin_expect shouldn't be considered when using profile data?

davidxl added inline comments.Mar 20 2020, 5:12 PM
llvm/lib/CodeGen/MachineBlockPlacement.cpp
2516

No, that is what I meant.

When profile data is available, builtin_expect will be ignored -- the prof meta data is overridden by profile data reader.

What I meant is that builtin_expect has very vague meaning -- some user uses it to indicate branch bias which can be weak, or strong. LLVM's implementation treat it as a very strong bias (2000:1 ratio) -- so it will likely to exclude too many blocks from the block set when used in this context -- leading to code size increase or performance regression.

In other words, PGO is the way to go :)