This is an archive of the discontinued LLVM Phabricator instance.

When moving a zext near to its associated load, do not retain the origial debug location.
ClosedPublic

Authored by andreadb on Oct 14 2016, 5:23 AM.

Details

Summary

Consider the following example:

define i64 @foo(i32* %ptr, i32 %cond) !dbg !5 {
entry:
  %0 = load i32, i32* %ptr, align 4, !dbg !7
  switch i32 %cond, label %sw.epilog [
    i32 3, label %sw.bb
    i32 4, label %sw.bb1
  ], !dbg !8

sw.bb:
  %conv = zext i32 %0 to i64, !dbg !9
  br label %sw.epilog, !dbg !10

sw.bb1:
  br label %sw.epilog, !dbg !11

sw.epilog:
  %result.0 = phi i64 [ 3, %entry ], [ 5, %sw.bb1 ], [ %conv, %sw.bb ]
  %conv2 = zext i32 %0 to i64, !dbg !12
  %add3 = add nuw nsw i64 %result.0, %conv2, !dbg !13
  ret i64 %add3, !dbg !14
}

Where debug line information is described by the following metadata nodes:

!7 = !DILocation(line: 3, column: 21, scope: !5)
!8 = !DILocation(line: 4, column: 3, scope: !5)
!9 = !DILocation(line: 6, column: 14, scope: !5)
!10 = !DILocation(line: 7, column: 5, scope: !5)
!11 = !DILocation(line: 10, column: 3, scope: !5)
!12 = !DILocation(line: 12, column: 19, scope: !5)
!13 = !DILocation(line: 12, column: 17, scope: !5)
!14 = !DILocation(line: 12, column: 3, scope: !5)

%conv is obtained from zero extending %0, which is a load that "lives" in a different basic block (i.e. %entry).

CGP (CodeGenPrepare) would move %conv from basic block %sw.bb to basic block %entry. The goal is to help ISel matching a zero-extending load instruction instead of two separated instructions.

However, when %conv is moved, it should not retain its original debug location. Instead, %conv should reuse the deug location associated with the load. That is because the zext will become part of the load; the code generator will attempt to fuse the two instructions into a zextload.

In our example, %conv is speculatively executed in the entry basic block. That is because the computation is considered to be cheap for the target. Before this patch, the debug location for the zext was still referring to line 6.
This was negatively affecting the debug stepping experience in the optimized code.

This can also have a negative impact on sample PGO. In this particular example, block %entry
dominates every other block in the function. However, %sw.bb is not a post-dominator of %entry! By moving %conv to the %entry block we are artificially bumping up the importance of basic block %sw.bb.

The reproducible test case has been extracted and simplified from a large game codebase. In the original code, the switch statement is in a very hot loop, and basic block 'sw.bb' is related to a "case" which is never taken at runtime.

Please let me know if okay to commit.

Thanks,
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb updated this revision to Diff 74658.Oct 14 2016, 5:23 AM
andreadb retitled this revision from to When moving a zext near to its associated load, do not retain the origial debug location..
andreadb updated this object.
andreadb added subscribers: llvm-commits, gbedwell.
danielcdh edited edge metadata.Oct 14 2016, 8:29 AM

The change in CGP LGTM, but as I am not debug info maintainer, I'll let someone else to sign off the patch.

But to fix the debug info problem in the given test case, it's better to fix it the place closest to the problem, which is ISel. (I do not mean that the CGP fix in this patch is wrong, it's worth fixing too)

In the generated assembly, we did not see the zext instruction, there is only one movl instruction.

Looking at the Isel, the following IR:

%0 = load i32, i32* %ptr, align 4, !dbg !7
%conv = zext i32 %0 to i64, !dbg !8

are translated to:

%vreg5<def> = MOV32rm %vreg2, 1, %noreg, 0, %noreg; mem:LD4[%ptr] GR32:%vreg5 GR64:%vreg2 dbg:test.c:6:14
%vreg0<def> = SUBREG_TO_REG 0, %vreg5<kill>, 4; GR64:%vreg0 GR32:%vreg5 dbg:test.c:6:14

Both MOV32rm and SUBREG_TO_REG are attributed to !dbg !8, while MOV32rm should be attributed to !dbg !7 instead. In this way we will not see line 6 attributed to the movl instruction.

The change in CGP LGTM, but as I am not debug info maintainer, I'll let someone else to sign off the patch.

Thanks Dehao!

But to fix the debug info problem in the given test case, it's better to fix it the place closest to the problem, which is ISel. (I do not mean that the CGP fix in this patch is wrong, it's worth fixing too)

In the generated assembly, we did not see the zext instruction, there is only one movl instruction.

<snip>

Both MOV32rm and SUBREG_TO_REG are attributed to !dbg !8, while MOV32rm should be attributed to !dbg !7 instead. In this way we will not see line 6 attributed to the movl instruction.

That's very interesting. I agree that the problem could have been avoided if only the MOV32rm had the correct line associated. On x86 that SUBREG_TO_REG is effectively a noop copy from GR32 to GR64. I will definitely investigate further to figure out what is going on here.

In the meantime, as you said, I think it is worthy to apply this CGP fix if someone more familiar with debug info can review it.

Thanks,
Andrea

aprantl edited edge metadata.Oct 14 2016, 9:51 AM

This feels a bit like a hack; but I'm not sure I have a better solution.

Would it be possible to post source code for this function? If the frontend has assigned an explicit location to the zext instruction, I would probably want that to be preserved (unless the load and the zext are combined I guess). If the zext doesn't appear in the source code, wouldn't it be more appropriate for the frontend to assign an artificial (line 0) location to the instruction?

But to fix the debug info problem in the given test case, it's better to fix it the place closest to the problem, which is ISel. (I do not mean that the CGP fix in this patch is wrong, it's worth fixing too)

In the generated assembly, we did not see the zext instruction, there is only one movl instruction.

<snip>

Both MOV32rm and SUBREG_TO_REG are attributed to !dbg !8, while MOV32rm should be attributed to !dbg !7 instead. In this way we will not see line 6 attributed to the movl instruction.

That's very interesting. I agree that the problem could have been avoided if only the MOV32rm had the correct line associated. On x86 that SUBREG_TO_REG is effectively a noop copy from GR32 to GR64. I will definitely investigate further to figure out what is going on here.

So, I further investigated on this.
It turns out that the reason why we get the same line for the MOV32rm and the SUBREG_TO_REG is because the DAGCombiner triggers the following rule:

(zext (load x))  --> zextload x

The new zextload node obtains the debug location from the zext. So, we end up with a zextload associated to line 6. Later on, ISel matches the zextload as the sequence MOV32rm + SUBREG_TO_REG.
However, at this point we have lost the information that the original load was from line 3. The debug location used for the two nodes after ISel is the same as the debug location from the zextload.
That is why we end up with the two line 6 instructions.

probinson edited edge metadata.Oct 14 2016, 10:32 AM

This feels a bit like a hack; but I'm not sure I have a better solution.

Would it be possible to post source code for this function? If the frontend has assigned an explicit location to the zext instruction, I would probably want that to be preserved (unless the load and the zext are combined I guess). If the zext doesn't appear in the source code, wouldn't it be more appropriate for the frontend to assign an artificial (line 0) location to the instruction?

The source is in comments in the test file.

I think the zext is not the "important" part of the statement. But currently we put is_stmt=1 on every .loc that has a new source line, and if we start moving individual instructions around, that seriously messes up single-stepping for these unimportant cases. So yes it's a hack, to workaround the is_stmt hack.

I don't want to be too eager to introduce line 0, as that takes up space in the line table and for something like this it would have no real benefit.

aprantl accepted this revision.Oct 14 2016, 11:01 AM
aprantl edited edge metadata.

Ok. I'm fine with doing this as long as we document clearly why we are doing it, so we can remove safely remove it later should it become unnecessary. (see inline comment)

lib/CodeGen/CodeGenPrepare.cpp
4273 ↗(On Diff #74658)

Please insert the longer explanation of why we are doing this from the testcase 19-22 here.

This revision is now accepted and ready to land.Oct 14 2016, 11:01 AM

Hi Adrian,
Thanks for the review.

Ok. I'm fine with doing this as long as we document clearly why we are doing it, so we can remove safely remove it later should it become unnecessary. (see inline comment)

I'd like to point out that CodeGenPrepare does not check if the zext would be speculatively executed when moved to the same basic block as the load. So, preserving its location would pessimize the debugging experience, other than negatively impacting the quality of sample pgo.

I can definitely add that explanation as a comment in the code if you like.

Thanks!

bjope added a subscriber: bjope.Oct 17 2016, 1:25 AM

I'm not an expert on debug info either, but to me the solution sound weird.
Isn't it a fact then when having optimisations turned on debugging might get tricky.

What if for example the target does not combine the zext and the load into a zextload. Then there would be a zext with apparently incorrect debug location. I'm not fully convinced that incorrect debug locations will aid the cause of making debugging easier.

Isn't the tricky part optimisations that combines instructions with different debug locations. Only one can be kept, so all such optimisations should choose wisely (if possible). I'm not sure about the strategy used by DAGCombiner here. Is it always keeping the "outer" instructions debug locations per design, or could it be that when combining a zext and and load it should keep the debug location of the load?

My feeling is that the choice should be made when doing the optimisations that needs to pick one debug location and discard the other. In this scenario CodeGenPrepare only moves instructions.

If something should be done when moving the zext, then I guess it should be dropping the debug information. Isn't inheriting it from the load simply wrong? That would just yield incorrect debug location for the zext in case it isn't combined with the load later.

I'm not an expert on debug info either, but to me the solution sound weird.
Isn't it a fact then when having optimisations turned on debugging might get tricky.

What if for example the target does not combine the zext and the load into a zextload. Then there would be a zext with apparently incorrect debug location. I'm not fully convinced that incorrect debug locations will aid the cause of making debugging easier.

Isn't the tricky part optimisations that combines instructions with different debug locations. Only one can be kept, so all such optimisations should choose wisely (if possible). I'm not sure about the strategy used by DAGCombiner here. Is it always keeping the "outer" instructions debug locations per design, or could it be that when combining a zext and and load it should keep the debug location of the load?

My feeling is that the choice should be made when doing the optimisations that needs to pick one debug location and discard the other. In this scenario CodeGenPrepare only moves instructions.

The problem is that instructions hoisted/sunk by CodeGenPrepare may be speculated. So, preserving the debug location would hurt the debugging experience other than causing problems to sample pgo. Also, the intent of the code in CGP is to subordinate the zext to the load. So, logically the the zero-extend is effectively "part of" the load.

If something should be done when moving the zext, then I guess it should be dropping the debug information. Isn't inheriting it from the load simply wrong? That would just yield incorrect debug location for the zext in case it isn't combined with the load later.

I don't think that it would yield to incorrect debug location if the zext is not combined. I think Paul already explained why zeroing the debug location might not be the best way to go here. We could zero the debug location, however (as he wrote) the zext is not the "important" part of the statement, and (more important) we don't want to over-use "line 0" as that has a size cost in the line-table section.

This revision was automatically updated to reflect the committed changes.