This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][DebugInfo] Propagate debug location for localized constants
ClosedPublic

Authored by dzhidzhoev on Jun 20 2022, 5:55 AM.

Details

Summary

After IRTranslator pass, constants are deduplicated and translated into instructions at entry block, having debug locations lost.
Localization of constants may cause emission of extra zero lines in debug_line section, like here https://godbolt.org/z/ecvsxxfKn. In this example, constant gets placed as a first instruction in entry block, and despite it has no debug location, AsmPrinter emits zero line for it (line 45)

If a localized constant has the only user, we can assume that it has the same debug location as its user, since they are placed consequently.

Diff Detail

Event Timeline

dzhidzhoev created this revision.Jun 20 2022, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 5:55 AM
dzhidzhoev requested review of this revision.Jun 20 2022, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 5:55 AM
dzhidzhoev edited the summary of this revision. (Show Details)Jun 20 2022, 5:56 AM
dzhidzhoev edited the summary of this revision. (Show Details)Jun 20 2022, 5:56 AM
arsenm added inline comments.Jul 5 2022, 8:42 AM
llvm/lib/CodeGen/GlobalISel/Localizer.cpp
182

Isn't NewMI the same as MI?

191

demorgan this?

llvm/test/CodeGen/AArch64/GlobalISel/localizer-propagate-debug-loc.mir
14–37

Don't really need the function body if you remove the block names

70

Don't need registers section

dzhidzhoev added inline comments.Jul 8 2022, 8:30 AM
llvm/lib/CodeGen/GlobalISel/Localizer.cpp
182

It seems that MI is copied on insert call.

arsenm added inline comments.Jul 8 2022, 9:02 AM
llvm/lib/CodeGen/GlobalISel/Localizer.cpp
182

Insert should just insert, the instruction pointer should still be the same

dzhidzhoev updated this revision to Diff 443365.Jul 8 2022, 3:03 PM
dzhidzhoev marked 2 inline comments as done.

Get rid of redundant variable

dzhidzhoev marked an inline comment as done.Jul 8 2022, 3:05 PM
dzhidzhoev added inline comments.
llvm/test/CodeGen/AArch64/GlobalISel/localizer-propagate-debug-loc.mir
14–37

volatile seems to be ignored without it

dzhidzhoev marked an inline comment as not done.Jul 8 2022, 3:06 PM

@aprantl @probinson I'm really not sure, but should we consider backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block more generally than this patch is proposing?

@aprantl @probinson I'm really not sure, but should we consider backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block more generally than this patch is proposing?

BTW discussion about backpropagation for zero-location-instructions at the start of a basic block took in place here https://lists.llvm.org/pipermail/lldb-dev/2018-October/014263.html . There was a point against backpropagation of location for arbitrary instructions.

dzhidzhoev added a comment.EditedJul 25 2022, 3:12 PM

@aprantl @probinson I'm really not sure, but should we consider backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block more generally than this patch is proposing?

BTW discussion about backpropagation for zero-location-instructions at the start of a basic block took in place here https://lists.llvm.org/pipermail/lldb-dev/2018-October/014263.html . There was a point against backpropagation of location for arbitrary instructions.

The objection was that it is incorrect to backpropagate debug locations of the nearest instruction below on arbitrary instructions, since there is no knowledge about their semantics on AsmPrinter stage. In contrast to that, in this commit we know that instructions being marked are generated from constants and are used by the nearest following instruction.

@aprantl @probinson I'm really not sure, but should we consider backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block more generally than this patch is proposing?

BTW discussion about backpropagation for zero-location-instructions at the start of a basic block took in place here https://lists.llvm.org/pipermail/lldb-dev/2018-October/014263.html . There was a point against backpropagation of location for arbitrary instructions.

The objection was that it is incorrect to backpropagate debug locations of the nearest instruction below on arbitrary instructions, since there is no knowledge about their semantics on AsmPrinter stage. In contrast to that, in this commit we know that instructions being marked are generated from constants and are used by the nearest following instruction.

We forward propagate such locations, though, right? So I'm not sure back propagating is especially worse/more problematic.

@aprantl @probinson I'm really not sure, but should we consider backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block more generally than this patch is proposing?

BTW discussion about backpropagation for zero-location-instructions at the start of a basic block took in place here https://lists.llvm.org/pipermail/lldb-dev/2018-October/014263.html . There was a point against backpropagation of location for arbitrary instructions.

The objection was that it is incorrect to backpropagate debug locations of the nearest instruction below on arbitrary instructions, since there is no knowledge about their semantics on AsmPrinter stage. In contrast to that, in this commit we know that instructions being marked are generated from constants and are used by the nearest following instruction.

We forward propagate such locations, though, right? So I'm not sure back propagating is especially worse/more problematic.

I'm not sure I have enough experience to answer that question.

But "backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block" is not generalization of this commit, since here debug locations are propagated not only for instructions at the block beginning. During localization, instructions may be put not only at the block beginning.

@aprantl @probinson I'm really not sure, but should we consider backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block more generally than this patch is proposing?

BTW discussion about backpropagation for zero-location-instructions at the start of a basic block took in place here https://lists.llvm.org/pipermail/lldb-dev/2018-October/014263.html . There was a point against backpropagation of location for arbitrary instructions.

The objection was that it is incorrect to backpropagate debug locations of the nearest instruction below on arbitrary instructions, since there is no knowledge about their semantics on AsmPrinter stage. In contrast to that, in this commit we know that instructions being marked are generated from constants and are used by the nearest following instruction.

We forward propagate such locations, though, right? So I'm not sure back propagating is especially worse/more problematic.

I'm not sure I have enough experience to answer that question.

But "backpropagating the first non-zero location to cover zero-location instructions at the start of a basic block" is not generalization of this commit, since here debug locations are propagated not only for instructions at the block beginning. During localization, instructions may be put not only at the block beginning.

We could potentially do it at other places too. It wouldn't hurt profile accuracy if we're propagating a location in the same basic block. It /might/ confuse interactive/human users if we ever did this over a call or possibly over a store instruction... eh, mixed feelings. I think if the instruction has no location (as opposed to zero location) and so we're willing/already letting previous locations cover the non-location instruction, then we should be willing to do the same thing backwards too.

In this case we're talking about dropping the location entirely (that's the current behavior, yes?) - so the only time an instruction like that causes more zero locations, is when it appears at the start of a block - so I think backpropagating at the start of a block would address the issue being discussed in this patch. Other zeros would not appear if the constant is put not at the beginning of a block - those would already be getting flow-on locations from the previous locations in a block.

It's not clear that singular constants getting the location of their original constant will address the location issues described here - they might still move up over some other location and cause just the same line table size problems, but without zero locations. So that's partly why I'm not sure about this path as a way to address the problem description.

In this case we're talking about dropping the location entirely (that's the current behavior, yes?) - so the only time an instruction like that causes more zero locations, is when it appears at the start of a block

I'm not sure, but it seems to be true.

so I think backpropagating at the start of a block would address the issue being discussed in this patch.

I can implement that, collect some metrics, but I can't make a decision on this assumption yet. On which stage/pass should such backpropagation happen?

Other zeros would not appear if the constant is put not at the beginning of a block - those would already be getting flow-on locations from the previous locations in a block.

We can consider this patch as changing a propagation direction for constants. From forward propagation, the default debugger behavior, to backward propagation. I think it affects not only instructions at the beginning of the basic block, since changing the propagation direction may make boundaries between source code lines more precise, and thus, improve stepping behavior (https://github.com/llvm/llvm-project/issues/56370 addressing such cases too).

It would be great to find out others opinions.

On which stage/pass should such backpropagation happen?

The reason I have chosen Localizer pass is that we know the context. What instructions are getting propagated locations.

so I think backpropagating at the start of a block would address the issue being discussed in this patch.

I can implement that, collect some metrics, but I can't make a decision on this assumption yet. On which stage/pass should such backpropagation happen?

I'm not sure - wherever @probinson implemented the "use zero rather than unspecified location at the start of basic blocks" is probably the place to revisit that and consider backpropagating from the first specified location and see what that looks like.

Other zeros would not appear if the constant is put not at the beginning of a block - those would already be getting flow-on locations from the previous locations in a block.

We can consider this patch as changing a propagation direction for constants. From forward propagation, the default debugger behavior, to backward propagation.

Slight pedantry: This isn't a debugger behavior thing, it's a DWARF format thing (so a bit more robust/explicitly guaranteed/required than something some debuggers choose to do) - a source line in the line table is valid from the address it's specified at until the next point in the line table that changes the line/address.

I think it affects not only instructions at the beginning of the basic block, since changing the propagation direction may make boundaries between source code lines more precise, and thus, improve stepping behavior (https://github.com/llvm/llvm-project/issues/56370 addressing such cases too).

Yeah, I'm not entirely understanding that issue and/or the connection with this one - some complications/lots of nuanced details in this area for sure.

The case I was having a flashback to was FastISel, which _used to_ put all the constants at the top of the current block, with no debug info. This was reworked multiple times over the years, and at this point it no longer behaves that way--constants are materialized per IR instruction.

This patch (now that I look at it in detail) is doing something else entirely. The pass was already moving constant-materializing instructions to be adjacent to the user; the change here is, having done that motion, if there's only one use, set the debug info of the moved instruction. That seems entirely reasonable.

It just happens that, as a special case, sometimes the moved instruction would end up at the top of a block, and IIRC AsmPrinter will deliberately emit line 0 for a first-in-block instruction that has no source location. (This protects the first-in-block instructions from incorrectly inheriting the source location of the physically preceding block.) With this patch, that instruction will have the same source location as its user, which seems like a clear improvement.

I apologize for being triggered on all this stuff, and not recognizing what was going on sooner. FWIW, LGTM.

aemerson accepted this revision.Sep 6 2022, 4:18 AM
This revision is now accepted and ready to land.Sep 6 2022, 4:18 AM
This revision was landed with ongoing or failed builds.Dec 5 2022, 5:39 AM
This revision was automatically updated to reflect the committed changes.