getLoopEstimatedTripCount should really be called getLoopEstimatedBackedgeTakeCount.
Diff Detail
- Build Status
Buildable 2868 Build 2868: arc lint + arc unit
Event Timeline
@danielcdh I can change the FlatLoopTripCountThreshold as this does change the behavior of unroller right now.
How do we represent a trip-count-zero estimate?
include/llvm/Transforms/Utils/LoopUtils.h | ||
---|---|---|
464 ↗ | (On Diff #83571) | BE taken count -> backedge-taken count |
The getLoopEstimatedTripCount function is really returning the backedge taken count. Trip count == backedge taken count + 1. So I rename it in this patch. The place I am not sure is whether you were looking for backedge count or trip count in LoopUnrollPass.cpp. It seems that you were assigning a profiled backedge count to ProfileTripCount variable.
I don't think this is the change we want to make here, for a few reasons:
- On the API level, I think it makes sense to have the estimated loop trip-count. So, if we want to add 1 to the BE count, that should probably just happen inside the function, instead of renaming the function but keeping functionality as is.
- I'm not sure there's an actual off-by-1 here. Or, rather, there's definitely an off-by-1, but only in some cases - I guess it depends on whether the loop is bottom-tested or not?
- I've actually verified that this gives the right results for instrumentation-based FDO for loops with a known trip count. It's possible that I simply made a mistake - or perhaps there's an error somewhere else that this cancels out with. If you want to change this, could you please check that again?
cat b.cc
int bar(int);
void foo(int a) {
while (a) { a = bar(a); }
}
#cat c.cc
void foo(int);
int bar(int x) {
return x - 1;
}
int main() {
foo(5);
return 0;
}
#clang -O2 -fprofile-generate=prof.raw b.cc c.cc -o instr_bin
#llvm-profdata merge prof.raw -o prof
#clang b.cc -c -O2 -fprofile-use=prof -emit-llvm
#opt -S b.bc |grep branch_weights
!29 = !{!"branch_weights", i32 1, i32 5}
Looks like (5 + 1/2) / 1 = 5 is the correct trip count?
I fully agree.
- I'm not sure there's an actual off-by-1 here. Or, rather, there's definitely an off-by-1, but only in some cases - I guess it depends on whether the loop is bottom-tested or not?
In case the loop is bottom tested (rotated), there would be an off-by-1, as the loop body is executed backedge-count + 1 times.
If its not bottom tested, we would not have this profile data on the latch, right ? I would assume the latch simply branches back to the header and the test is in the header.
- I've actually verified that this gives the right results for instrumentation-based FDO for loops with a known trip count. It's possible that I simply made a mistake - or perhaps there's an error somewhere else that this cancels out with. If you want to change this, could you please check that again?
In case the loop is bottom tested (rotated), there would be an off-by-1, as the loop body is executed backedge-count + 1 times.
If its not bottom tested, we would not have this profile data on the latch, right ? I would assume the latch simply branches back to the header and the test is in the header.
Hm, yes, that sounds right.
I'm still not sure why we get seemingly correct results with the current code, though. :-)
One option is that we don't. But if we do - maybe we don't update profile metadata correctly in loop rotation?
I am pretty sure its a problem with loop rotation, basically when we duplicate and move the terminator in the header, we did not update the prof metadata correctly. With the example @danielcdh gives me, right after rotation I am getting this. The !prof in guard block is definitely incorrect, the one in the latch block is not correct either I think. I am working on fixing this.
define void @_Z3fooi(i32 %a) local_unnamed_addr #0 !prof !29 {
entry:
%tobool2 = icmp eq i32 %a, 0 br i1 %tobool2, label %while.end, label %while.body, !prof !30
while.body: ; preds = %entry, %while.body
%a.addr.03 = phi i32 [ %call, %while.body ], [ %a, %entry ] %call = tail call i32 @_Z3bari(i32 %a.addr.03) %tobool = icmp eq i32 %call, 0 br i1 %tobool, label %while.end, label %while.body, !prof !30
while.end: ; preds = %while.body, %entry
ret void
}
!30 = !{!"branch_weights", i32 2, i32 10}
I think this ought to be merged into D28593 - committing (or reverting) them separately would mean we get the wrong trip counts, so they should go in atomically.