Update loop branch_weight metadata after loop rotation.
In case we have branch_weight in the unrotated loop header, we update
it after rotation, more specifically we update the branch in the guard
block and the branch in the rotated latch block.
Differential D28593
Update loop branch_weight metadata after loop rotation. trentxintong on Jan 11 2017, 7:24 PM. Authored by
Details Update loop branch_weight metadata after loop rotation. In case we have branch_weight in the unrotated loop header, we update
Diff Detail
Event Timeline
Comment Actions @danielcdh Its the true the cond is not changed, but the inputs to the condition has changed, before you were testing with index variable in the loop header, now u have 2 conditions and the SSA values you use in the conditions have changed, so the probability needs to be adjusted. I will explain more when i get to office. Comment Actions After we rotate the loop, we duplicate the comparison the old header into the guard block and we move the header to the end of the loop. So basically we have 2 branches that carry the branch_weight metadata they need to be adjusted. With this patch, we only do the adjustment for loop with only 1 exitting block, if the loop has early exits, its harder to adjust the branch weight properly. e.g. if we have an early exit in the loop, we will not be able to tell whether the latch block will ever be reached after rotation, not to mention to adjust its branch weight, (we need to look at the branch weights of the early exits to do this properly). There are 2 conditions that are constant after rotation, (1) # of times the loop body is executed and (2) the # of times the exit block is executed. (these are the values we can get by extracting the prof metadata from the header branch before rotation). The backedge weight should simply be the # of times the loop body is executed - # of times the loop exit is executed. With the loop exit count, loop body weight and loop backedge weight, we can compute the branch weight for the guard block. This is true iff the loop executes at least once every time. If the loop body execute seldom, they we can not do it (and we will get a negative backedge weight num with # of times the loop body is executed - # of times the loop exit is executed.) In this case, we cap the backedge # to 1 (should be 0). And with this information we can compute loop exit weight and all the other weights. Thats roughly what the patch is doing and how the numbers in the test case is computed.
Comment Actions I want to merge https://reviews.llvm.org/D28460, but I do not know how to do Comment Actions Looks like this patch will make the "always call" worse: Without this patch: pushq %rbx movq %rdi, %rbx cmpl $0, (%rbx) jne .LBB1_3 .LBB1_1: # =>This Inner Loop Header: Depth=1 movq %rbx, %rdi callq call_me cmpl $0, (%rbx) je .LBB1_1 .LBB1_3: popq %rbx retq With this patch: pushq %rbx movq %rdi, %rbx cmpl $0, (%rbx) je .LBB1_1 .LBB1_3: popq %rbx retq .LBB1_1: # =>This Inner Loop Header: Depth=1 movq %rbx, %rdi callq call_me cmpl $0, (%rbx) jne .LBB1_3 jmp .LBB1_1 As the trip count of this loop is always 1, the first code will have no taken branches, while with this patch, it will have 2 taken branches. Comment Actions Talked with Xin offline:
Comment Actions I am still in middle of getting a machine which i can do performance runs on. The machine I have is not very stable, i.e. specrun # fluctuates from run to run.
Comment Actions I am very sorry I have not put up the #s for speccpu 2006. I am stuck in middle of a few things and I will put it up as soon as I have them. I will also address the comments too. Comment Actions Address davidxl's comments. I reworked how !prof metadata is computed after loop rotation. There is one test case in peel-loop-pgo.ll. thats because I corrected how Comment Actions I ran C/C++ benchmarks in CPU2006 with the current state of the patch (baseline is without the metadata adjustment, negative percent means the benchmark runs slower or code size becomes smaller after the patch. It seems the the regressions in 429 and 444 are real. And we also have a code size reduction of -2.41% in 401. Overall, we have more performance regressions after we adjust the metadata this way. Benchmark Perf CodeSize 400 -0.22% 0.00% 401 0.76% -2.41% 403 -0.51% 0.07% 429 -2.09% 0.32% 445 0.64% -0.08% 456 -0.29% 0.01% 458 0.00% -0.04% 462 -0.15% 0.00% 464 0.16% 0.00% 471 -0.38% -0.02% 473 0.00% -0.21% 483 -0.40% -0.04% 433 0.73% 0.03% 444 -2.15% -0.26% 447 -0.17% -0.06% 450 0.00% -0.05% 453 0.38% -0.01% 470 0.12% 0.00% 482 0.18% 0.14%
Comment Actions Rework the reworked metadata update in loop rotation. I still have some hardcoded numbers At least, we should try to agree on the mechanism itself and start collecting some numbers.
|
name the parameters.