Consider the following example:
define i64 @foo(i32* %ptr, i32 %cond) !dbg !5 { entry: %0 = load i32, i32* %ptr, align 4, !dbg !7 switch i32 %cond, label %sw.epilog [ i32 3, label %sw.bb i32 4, label %sw.bb1 ], !dbg !8 sw.bb: %conv = zext i32 %0 to i64, !dbg !9 br label %sw.epilog, !dbg !10 sw.bb1: br label %sw.epilog, !dbg !11 sw.epilog: %result.0 = phi i64 [ 3, %entry ], [ 5, %sw.bb1 ], [ %conv, %sw.bb ] %conv2 = zext i32 %0 to i64, !dbg !12 %add3 = add nuw nsw i64 %result.0, %conv2, !dbg !13 ret i64 %add3, !dbg !14 }
Where debug line information is described by the following metadata nodes:
!7 = !DILocation(line: 3, column: 21, scope: !5) !8 = !DILocation(line: 4, column: 3, scope: !5) !9 = !DILocation(line: 6, column: 14, scope: !5) !10 = !DILocation(line: 7, column: 5, scope: !5) !11 = !DILocation(line: 10, column: 3, scope: !5) !12 = !DILocation(line: 12, column: 19, scope: !5) !13 = !DILocation(line: 12, column: 17, scope: !5) !14 = !DILocation(line: 12, column: 3, scope: !5)
%conv is obtained from zero extending %0, which is a load that "lives" in a different basic block (i.e. %entry).
CGP (CodeGenPrepare) would move %conv from basic block %sw.bb to basic block %entry. The goal is to help ISel matching a zero-extending load instruction instead of two separated instructions.
However, when %conv is moved, it should not retain its original debug location. Instead, %conv should reuse the deug location associated with the load. That is because the zext will become part of the load; the code generator will attempt to fuse the two instructions into a zextload.
In our example, %conv is speculatively executed in the entry basic block. That is because the computation is considered to be cheap for the target. Before this patch, the debug location for the zext was still referring to line 6.
This was negatively affecting the debug stepping experience in the optimized code.
This can also have a negative impact on sample PGO. In this particular example, block %entry
dominates every other block in the function. However, %sw.bb is not a post-dominator of %entry! By moving %conv to the %entry block we are artificially bumping up the importance of basic block %sw.bb.
The reproducible test case has been extracted and simplified from a large game codebase. In the original code, the switch statement is in a very hot loop, and basic block 'sw.bb' is related to a "case" which is never taken at runtime.
Please let me know if okay to commit.
Thanks,
Andrea