If the programmer adds static profile data to a branch---i.e. uses
"builtin_expect()" or similar---then we should honor it. Otherwise,
"builtin_expect()" is ignored in crucial situations. So we trust that
the programmer knows what they're doing until proven wrong.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm in no way qualified to be reviewing changes to something with as important implications as machine block placement, but from the test modifications, it doesn't seem too invasive a change. Posting mostly notes for other reviewers to check.
llvm/lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
2517 | I really wish this was a method on MachineBlockPlacement or MachineLoop rather than a lambda in an if statement. | |
llvm/test/CodeGen/Hexagon/prof-early-if.ll | ||
5 | LGTM; b5 is only branched to from b4, but b4's has higher branch weight to b2. Though it would be nice to just list all the blocks in order; otherwise I'm just guessing where the rest land. | |
llvm/test/CodeGen/X86/block-placement-2.ll | ||
14 | probably should specify where if.end.i.i gets placed, too, since it also has profile metadata. | |
llvm/test/CodeGen/X86/block-placement.ll | ||
1507 | LGTM; backedge is very likely to go to header. middle is unlikely to go to slow | |
llvm/test/CodeGen/X86/move_latch_to_loop_top.ll | ||
177 | LGTM; %latch has a 50-50 chance of branching either to %exit or %header. !3 seems unused. | |
llvm/test/CodeGen/X86/ragreedy-bug.ll | ||
35 | this change is curious; I'd have expected je's to flip to jne's or vice versa, but not unconditional jumps. |
llvm/test/CodeGen/X86/ragreedy-bug.ll | ||
---|---|---|
35 | This change is very interesting actually. The two mem-move blocks are placed at the end of the function. They then branch back up into what was a tail duplicated block. So the testl/je statements are actually at the original place, and the jmp here is just a simple branch to them. |
I think if we add more checks as I've commented above and see if there's a way to make the lamba-in-if nicer, this is ready to land.
Move lambda logic into a method. Improve some tests with additional checks.
llvm/lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
2517 | Is it that it's cleaner to do it that way? The logic is going to be basically the same. |
LGTM; thanks Bill!
llvm/lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
2517 | Yes, it's much nicer IMO to have short concise well named functions. It helps make this code more readable. Thank you. | |
llvm/test/CodeGen/Hexagon/prof-early-if.ll | ||
5 | Looks like the intent of the test is that "if.then" was not predicated. not necessarily anything to do with the block placement. |
Is it expected that this patch could increase binary size? I'm seeing about a 9kB total increase in fuchsia ZBIs between clang with this patch and the one before it.
llvm/lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
2516 | The check here seems too weak. One branch in the loop has user annotation does not mean other branch probablity data can be trusted. More sophisticated analysis is needed. | |
llvm/lib/CodeGen/MachineLoopInfo.cpp | ||
114 | Note that with PGO is on, the branches are also annotated with MD_prof meta data. In other words, you will also need to check fhat the function entry does not have profile count. |
llvm/lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
2516 | Do you mean that a builtin_expect shouldn't be considered when using profile data? |
llvm/lib/CodeGen/MachineBlockPlacement.cpp | ||
---|---|---|
2516 | No, that is what I meant. When profile data is available, builtin_expect will be ignored -- the prof meta data is overridden by profile data reader. What I meant is that builtin_expect has very vague meaning -- some user uses it to indicate branch bias which can be weak, or strong. LLVM's implementation treat it as a very strong bias (2000:1 ratio) -- so it will likely to exclude too many blocks from the block set when used in this context -- leading to code size increase or performance regression. In other words, PGO is the way to go :) |
The check here seems too weak. One branch in the loop has user annotation does not mean other branch probablity data can be trusted. More sophisticated analysis is needed.