This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][DebugInfo] Remove debug info with zero line from constants inserted at entry block
ClosedPublic

Authored by dzhidzhoev on Jun 10 2022, 4:30 AM.

Details

Summary

Emission of constants having DebugLoc with line 0 causes significant increase of debug_line section size for some source files.

To illustrate, we can compare section sizes of several files from llvm test-suite, built with SelectionDAG vs GlobalISel, on Aarch64 (macOS), using -O0 optimization level:

Source pathSDAG text szGISel text szSDAG debug_line szGISel debug_line sz
SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c15320660148726340
SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c336402630028126693
SingleSource/Benchmarks/Misc/flops-4.c142811965941008
MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c2716964809903
MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c25342502189573

For instance, here is a fragment of flops-4.c.o debug line section dump

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000    174      0      1   0             0  is_stmt
0x0000000000000010      0      0      1   0             0
0x0000000000000018    185      4      1   0             0  is_stmt prologue_end
0x000000000000001c      0      0      1   0             0
0x0000000000000024    186      4      1   0             0  is_stmt
0x000000000000002c    189     10      1   0             0  is_stmt
0x0000000000000030      0      0      1   0             0
0x0000000000000038    207     11      1   0             0  is_stmt
0x0000000000000044    208     11      1   0             0  is_stmt
0x0000000000000048      0      0      1   0             0
0x0000000000000058    210     10      1   0             0  is_stmt
0x000000000000005c      0      0      1   0             0
0x0000000000000060    211     10      1   0             0  is_stmt
0x0000000000000064      0      0      1   0             0
0x000000000000006c    212     10      1   0             0  is_stmt
0x0000000000000070      0      0      1   0             0
0x000000000000007c    213     10      1   0             0  is_stmt
0x0000000000000080      0      0      1   0             0
0x0000000000000088    214     10      1   0             0  is_stmt
0x000000000000008c      0      0      1   0             0
0x0000000000000094    215     10      1   0             0  is_stmt

Lot of zero lines are produced by constants (global values) having DebugLoc with line 0.
It seems that they're not significant for debugging experience.

With the commit applied, total size of debug_line sections of llvm shared libraries has reduced by 2.5%.
Change of debug line section size of files listed above:

Source pathGISel debug_line szPatch debug_line sz
SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c63401465
SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c66933782
SingleSource/Benchmarks/Misc/flops-4.c1008609
MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c903841
MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c573190

Diff Detail

Event Timeline

dzhidzhoev created this revision.Jun 10 2022, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2022, 4:30 AM
dzhidzhoev requested review of this revision.Jun 10 2022, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2022, 4:30 AM
dzhidzhoev retitled this revision from Remove debug info with zero line from constants inserted at entry block to [GlobalISel][DebugInfo] Remove debug info with zero line from constants inserted at entry block.Jun 10 2022, 4:33 AM
arsenm added inline comments.Jun 10 2022, 6:56 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
157

Should also specifically check for constants?

dzhidzhoev added inline comments.Jun 10 2022, 11:13 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
157

It would be nice, but various of machine instructions can be emitted from llvm::ConstantExpr.

To split instructions originated from constants and non-constant ones in this check, DILocationVerifier should be extended with state variable indicating that we're emitting instructions within translate(Constant, Register) call.
Since IRTranslator::translate(Constant, Register) call could be recursive, we need to add nesting counter + member functions to increase and decrease it.
Initially I considered this as an excessive complication of DILocationVerifier, but I can add it if it is needed.

I'm on board with the general idea, FWIW.

Added const check to DILocationVerifier

dzhidzhoev marked an inline comment as done.Jun 13 2022, 2:39 PM

I'd have thought AsmPrinter would deal with this, I remember adding code to look at the sequence of emitted .loc directives and handle line 0 specially. But if this is helping, it seems reasonable. It would be nice to experiment with actually debugging the generated code and see if the quality of the debugging experience is better/worse/unchanged. Maybe @jmorse your team can take a quick look?

krisb added a subscriber: krisb.Jun 14 2022, 12:47 PM

See also https://github.com/llvm/llvm-project/issues/56370 which wants to introduce more line-0 entries. There is clearly some tension going one between two goals, and I'd like to understand that better.

2.5% in .debug_line, which isn't a large part of total debug info size seems kind of small (it'd be good to have % of total .debug_* section sizes, and of the objects/binary too, to put this in perspective).

Presumably the zero line on these was added for a reason to begin with (though maybe with hindsight we decide that reason is insufficient/doesn't apply here/etc) - could you check version history/see if there's some back story to this? (perhaps it was added incidentally while addressing some other issue, perhaps not)

One thing that is important is that line zero is used at the start of basic blocks if the instructions don't otherwise have a location - because otherwise sample profiles can be mislead (because the line location from the previous basic block will cover the unassigned locations and then a sample could incorrectly conclude that a source line was executed when it was not). I don't know if that's related to this case - doesn't /look/ like it, but figured I'd mention it jus tin case.

I originally made constants use line 0 because on D63286 Adrian said that if we don't emit 0, the constants end up being interpreted as being part of the prologue, which they're technically not.

I originally made constants use line 0 because on D63286 Adrian said that if we don't emit 0, the constants end up being interpreted as being part of the prologue, which they're technically not.

Though it doesn't look like @aprantl thought this would be bad - just that it would happen. Do you/we/anyone have any particular reason to believe it would be bad/problematic if thees instructions became part of the prologue?

I originally made constants use line 0 because on D63286 Adrian said that if we don't emit 0, the constants end up being interpreted as being part of the prologue, which they're technically not.

Though it doesn't look like @aprantl thought this would be bad - just that it would happen. Do you/we/anyone have any particular reason to believe it would be bad/problematic if thees instructions became part of the prologue?

I don't know of any reason why it would be bad, but admittedly I'm not an expert in this area. The original intention of using zero was to avoid unnecessary unintended effects. If we do have an issue here with the debug info sizes then I'm ok with removing them.

Presumably the zero line on these was added for a reason to begin with (though maybe with hindsight we decide that reason is insufficient/doesn't apply here/etc) - could you check version history/see if there's some back story to this? (perhaps it was added incidentally while addressing some other issue, perhaps not)

Previously constants generated at entry block had taken the same debug location as instruction being translated (https://reviews.llvm.org/D63286). It might cause jumpy debugger behavior.

One thing that is important is that line zero is used at the start of basic blocks if the instructions don't otherwise have a location - because otherwise sample profiles can be mislead (because the line location from the previous basic block will cover the unassigned locations and then a sample could incorrectly conclude that a source line was executed when it was not). I don't know if that's related to this case - doesn't /look/ like it, but figured I'd mention it jus tin case.

If an instruction at the start of basic block has no debug location, AsmPrinter emits zero line location for it.

See also https://github.com/llvm/llvm-project/issues/56370 which wants to introduce more line-0 entries. There is clearly some tension going one between two goals, and I'd like to understand that better.

Since this patch affects constant generation, such problem could appear after Localizer pass (for example, if constant gets placed after call instruction). Here is the commit intended to fix that https://reviews.llvm.org/D128192.

2.5% in .debug_line, which isn't a large part of total debug info size seems kind of small (it'd be good to have % of total .debug_* section sizes, and of the objects/binary too, to put this in perspective).

Total size of all debug sections of llvm shared libraries has reduced by 0.3%. Total size of debug_line sections over llvm object files has reduced by 2%.
Total size of all debug sections of llvm object files has reduced by 0,1%.
Total size of all debug sections of test-suite object files has reduced by 1.5%. Total size of debug_line sections over test-suite object files has reduced by 7.5%.

Does seem like a relatively small win - is there a particular situation where you came across this/where it was important for your usage? (it'd be interesting to know of novel use cases that might motivate changes like this)

Why are the other changes needed here - the previous change set the location to 0, the new change sets it to nothing, so why are the other changes (the scope devices, etc) needed? Are they representative of bugs/issues in the previous (line 0) solution?

Does seem like a relatively small win - is there a particular situation where you came across this/where it was important for your usage? (it'd be interesting to know of novel use cases that might motivate changes like this)

It was noticed that almost each instruction accessing global value is marked with zero line in debug_line (even in simple case like that https://godbolt.org/z/W6shEoh95 at lines 23, 27).

Then it was discovered that for some source files we had significant increase of debug_* section size compared to SelectionDAG because of that, even considering difference of output code size of different backends. For these files we can reduce debug_* sections size by reducing debug_line size - like for files listed in review message:

Source pathSDag debug_* szGISel debug_* szGISel+patch debug_* sz
SingleSource/Regression/C/gcc-c-torture/execute/strlen-2.c311687963926
SingleSource/Regression/C/gcc-c-torture/execute/20040629-1.c162732015417248
SingleSource/Benchmarks/Misc/flops-4.c457449894595
MultiSource/Benchmarks/MiBench/consumer-typeset/z31.c319851614647
MultiSource/Benchmarks/Prolangs-C/gnugo/showinst.c147218561478

Why are the other changes needed here - the previous change set the location to 0, the new change sets it to nothing, so why are the other changes (the scope devices, etc) needed? Are they representative of bugs/issues in the previous (line 0) solution?

DILocationVerifier basically doesn't allow to build instruction without debug location, if the instruction being translated has the location. We change DILocationVerifier, allowing to translate instructions with debug location into instructions without it, if instructions being emitted are inserted into "constant block". In IRTranslator::translate(Constant*, ...), DILocationVerifier is notified that instructions being translated into entry block belong to "constant block". Since IRTranslator::translate can be called recursively, DILocationVerifier has to track nested constant blocks by using a counter. Scope device is used to make it implicitly.

Why are the other changes needed here - the previous change set the location to 0, the new change sets it to nothing, so why are the other changes (the scope devices, etc) needed? Are they representative of bugs/issues in the previous (line 0) solution?

DILocationVerifier basically doesn't allow to build instruction without debug location, if the instruction being translated has the location. We change DILocationVerifier, allowing to translate instructions with debug location into instructions without it, if instructions being emitted are inserted into "constant block". In IRTranslator::translate(Constant*, ...), DILocationVerifier is notified that instructions being translated into entry block belong to "constant block". Since IRTranslator::translate can be called recursively, DILocationVerifier has to track nested constant blocks by using a counter. Scope device is used to make it implicitly.

Hmm, not sure I'm following why the solution to this is more involved than the solution to allowing line 0? (could the check for line 0 be changed to allow no-line & be done at that point?)

Why are the other changes needed here - the previous change set the location to 0, the new change sets it to nothing, so why are the other changes (the scope devices, etc) needed? Are they representative of bugs/issues in the previous (line 0) solution?

DILocationVerifier basically doesn't allow to build instruction without debug location, if the instruction being translated has the location. We change DILocationVerifier, allowing to translate instructions with debug location into instructions without it, if instructions being emitted are inserted into "constant block". In IRTranslator::translate(Constant*, ...), DILocationVerifier is notified that instructions being translated into entry block belong to "constant block". Since IRTranslator::translate can be called recursively, DILocationVerifier has to track nested constant blocks by using a counter. Scope device is used to make it implicitly.

Hmm, not sure I'm following why the solution to this is more involved than the solution to allowing line 0? (could the check for line 0 be changed to allow no-line & be done at that point?)

Initially it was like that. Addressing this comment https://reviews.llvm.org/D127488#3573315, I have added more sophisticated check. Should I revert it to previous version https://reviews.llvm.org/D127488?id=435872 ?

Why are the other changes needed here - the previous change set the location to 0, the new change sets it to nothing, so why are the other changes (the scope devices, etc) needed? Are they representative of bugs/issues in the previous (line 0) solution?

DILocationVerifier basically doesn't allow to build instruction without debug location, if the instruction being translated has the location. We change DILocationVerifier, allowing to translate instructions with debug location into instructions without it, if instructions being emitted are inserted into "constant block". In IRTranslator::translate(Constant*, ...), DILocationVerifier is notified that instructions being translated into entry block belong to "constant block". Since IRTranslator::translate can be called recursively, DILocationVerifier has to track nested constant blocks by using a counter. Scope device is used to make it implicitly.

Hmm, not sure I'm following why the solution to this is more involved than the solution to allowing line 0? (could the check for line 0 be changed to allow no-line & be done at that point?)

Initially it was like that. Addressing this comment https://reviews.llvm.org/D127488#3573315, I have added more sophisticated check. Should I revert it to previous version https://reviews.llvm.org/D127488?id=435872 ?

@arsenm what're your thoughts here? I don't feel very strongly either way, but I didn't understand the added complexity.

Initially it was like that. Addressing this comment https://reviews.llvm.org/D127488#3573315, I have added more sophisticated check. Should I revert it to previous version https://reviews.llvm.org/D127488?id=435872 ?

@arsenm what're your thoughts here? I don't feel very strongly either way, but I didn't understand the added complexity.

I don't have a particularly strong opinion. I don't think it's worth adding a lot of extra complexity just for the benefit of an assert

Reverted to previous diff, addressing @dblaikie and @arsenm comments

dzhidzhoev added inline comments.Jul 24 2022, 2:00 PM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
157

Could you advise how to check for constants in less complicated way? Currently I tend towards that we could accept this assert to be less exact for entry block instructions.

dzhidzhoev marked 2 inline comments as not done.Jul 24 2022, 2:00 PM
dblaikie accepted this revision.Jul 25 2022, 12:02 AM

Initially it was like that. Addressing this comment https://reviews.llvm.org/D127488#3573315, I have added more sophisticated check. Should I revert it to previous version https://reviews.llvm.org/D127488?id=435872 ?

@arsenm what're your thoughts here? I don't feel very strongly either way, but I didn't understand the added complexity.

I don't have a particularly strong opinion. I don't think it's worth adding a lot of extra complexity just for the benefit of an assert

OK, sounds good then.

This revision is now accepted and ready to land.Jul 25 2022, 12:02 AM
dzhidzhoev updated this revision to Diff 447231.EditedJul 25 2022, 1:50 AM

Rebased (on accepted version)

This revision was landed with ongoing or failed builds.Jul 25 2022, 10:19 AM
This revision was automatically updated to reflect the committed changes.