Page MenuHomePhabricator

Fix debug_loc offset difference with basic block sections
Needs ReviewPublic

Authored by tmsriram on Aug 1 2020, 9:42 PM.

Details

Summary

debug_loc for params not used in entry block need to be computed with the section start as the base label.

The attached test will not assemble without the patch as the assembler will complain that we are trying to compute a label difference from 2 different sections.

If the basic block containing the first mention of the param is not in the same section as the entry basic block, we cannot use the CurrentFnBegin label. To fix it, we use the Section begin label instead. Using the Section Label will result in loss of debug loc fidelity as we do not capture the basic blocks before the start of this section.

Right now, LabelsBeforeInsn just stores one label for each insn. We can fix the fidelity problem by storing all the basic block section labels that appear before the insn and generating a difference for each of them. Hence, I have made this as a FIXME.

Is this alright? I am not very familiar with debug_loc, other better suggestions are much appreciated here.

Diff Detail

Event Timeline

tmsriram created this revision.Aug 1 2020, 9:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2020, 9:42 PM
tmsriram requested review of this revision.Aug 1 2020, 9:42 PM

Trying to reproduce this (by taking the .ll test case and compiling it with clang -c) I don't get any debug_loc section or DW_AT_locations. Is there something I'm missing about how to reproduce this?

(trying to see what the non-bb-sections location description looks like - if it depends on basic block ordering, it's probably incorrect anyway, and might be worth fixing so it doesn't depend on that non-guaranteed property & then there'll be less divergence between bb-sections and not)

Trying to reproduce this (by taking the .ll test case and compiling it with clang -c) I don't get any debug_loc section or DW_AT_locations. Is there something I'm missing about how to reproduce this?

(trying to see what the non-bb-sections location description looks like - if it depends on basic block ordering, it's probably incorrect anyway, and might be worth fixing so it doesn't depend on that non-guaranteed property & then there'll be less divergence between bb-sections and not)

Yes, you are right. The debug_loc section is not generated without basic block sections, I will dig more.

Trying to reproduce this (by taking the .ll test case and compiling it with clang -c) I don't get any debug_loc section or DW_AT_locations. Is there something I'm missing about how to reproduce this?

(trying to see what the non-bb-sections location description looks like - if it depends on basic block ordering, it's probably incorrect anyway, and might be worth fixing so it doesn't depend on that non-guaranteed property & then there'll be less divergence between bb-sections and not)

I see why debug_loc is not produced in the default case. This is what happens:

+ For the default case (no bbsections), DebugLoc in https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1744 , it has exactly one value that contains the whole function scope as the start label is lfunc_begin and the end label is the end of the function.
+ Such location lists are handled specially here when the scope is the whole function : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1841 and the debug_loc is not generated. Diffing the asm by forcing a debug_loc entry, what happens here is that a DW_AT_location is replaced with a DW_AT_const_value. I don't understand this too well but this seems like an optimization to me.

Does this make sense?

Trying to reproduce this (by taking the .ll test case and compiling it with clang -c) I don't get any debug_loc section or DW_AT_locations. Is there something I'm missing about how to reproduce this?

(trying to see what the non-bb-sections location description looks like - if it depends on basic block ordering, it's probably incorrect anyway, and might be worth fixing so it doesn't depend on that non-guaranteed property & then there'll be less divergence between bb-sections and not)

I see why debug_loc is not produced in the default case. This is what happens:

+ For the default case (no bbsections), DebugLoc in https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1744 , it has exactly one value that contains the whole function scope as the start label is lfunc_begin and the end label is the end of the function.
+ Such location lists are handled specially here when the scope is the whole function : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1841 and the debug_loc is not generated.

OK ^ and with bb sections do we see the same behavior? If not, why not?

Diffing the asm by forcing a debug_loc entry,

Not sure what you mean by this ^ could you explain further?

what happens here is that a DW_AT_location is replaced with a DW_AT_const_value. I don't understand this too well but this seems like an optimization to me.

Does this make sense?

I'm still a bit confused. I think if non-bb-sections doesn't need a location list, then bb-sections shouldn't either. Perhaps that issue should be fixed first, if it's an issue?

In any case/any order - perhaps this patch is still reflective of a bug that exists even in this ^ issue is fixed, but maybe it's that the test case isn't sufficient to demonstrate the remaining bug - perhaps the test case could be complicated a bit further to ensure it's not testing a behavior that shouldn't be happening for other reasons anyway?

Trying to reproduce this (by taking the .ll test case and compiling it with clang -c) I don't get any debug_loc section or DW_AT_locations. Is there something I'm missing about how to reproduce this?

(trying to see what the non-bb-sections location description looks like - if it depends on basic block ordering, it's probably incorrect anyway, and might be worth fixing so it doesn't depend on that non-guaranteed property & then there'll be less divergence between bb-sections and not)

I see why debug_loc is not produced in the default case. This is what happens:

+ For the default case (no bbsections), DebugLoc in https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1744 , it has exactly one value that contains the whole function scope as the start label is lfunc_begin and the end label is the end of the function.
+ Such location lists are handled specially here when the scope is the whole function : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1841 and the debug_loc is not generated.

OK ^ and with bb sections do we see the same behavior? If not, why not?

Please bear with my longish response here:

First, let me explain what happens with the test by default. There is no .debug_loc section by default and here is why:

Without bbsections, the ASM looks like this:

	.text
	.file	"locdspnm.cpp"
...
	.type	_ZL4ncatPcjz,@function
_ZL4ncatPcjz:                           # @_ZL4ncatPcjz
.Lfunc_begin0:
	.file	3 "/proc/self/cwd/third_party/icu/source/common/locdspnm.cpp"
	.loc	3 37 0                          # third_party/icu/source/common/locdspnm.cpp:37:0
	.cfi_startproc
# %bb.0:                                # %.critedge3
	subq	$56, %rsp
...
	je	.LBB0_2
# %bb.1:                                # %.critedge3
...
	movaps	%xmm7, 32(%rsp)
.LBB0_2:                                # %.critedge3
...
	leaq	-128(%rsp), %rax
	#DEBUG_VALUE: ncat:buflen <- 157
.Ltmp0:
	.loc	3 47 3 prologue_end             # third_party/icu/source/common/locdspnm.cpp:47:3
...
	movl	$16, (%rax)
.Ltmp1:
	.p2align	4, 0x90
.LBB0_3:                                # =>This Inner Loop Header: Depth=1
	#DEBUG_VALUE: ncat:buflen <- 157
	.loc	3 0 3 is_stmt 0                 # third_party/icu/source/common/locdspnm.cpp:0:3
	jmp	.LBB0_3
.Lfunc_end0:
	.size	_ZL4ncatPcjz, .Lfunc_end0-_ZL4ncatPcjz
	.cfi_endproc

The function buildLocationList : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1638 is called and it adds two entries to DebugLoc with StartLabel and Label pairs as follows:

  1. .Lfunc_begin0, .Ltmp1
  2. .Ltmp1, .Lfunc_end0

The size of DebugLoc is 2 but these two ranges are merged into one range with MergeRanges here: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1733

There is now one entry in DebugLoc with full function scope .Lfunc_begin0, .Lfunc_end0

whose StartLabel is the start of the function and the EndLabel is the function end.

There is exactly one entry in the Location List DebugLoc in function buildLocationList. Before returning, there is a check to see if this one entry is valid throughout the scope of the function: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1744 This returns true in this case.

This leads to an optimization to use DW_AT_const_value instead of DW_AT_location and the dwarf standard here seems to suggest this : http://dwarfstd.org/ShowIssue.php?issue=161109.2

This optimization is triggered right here: https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1841

At this point isValidSingleLocation is the return value of buildLocationList and when true does not create a .debug_loc section but instead creates a DW_AT_const_value.

Now, I tried to force a .debug_loc for this by simply making the value of isValidSingleLocation false always and the .debug_loc list shows up as expected. Diffing the asm without and with .debug_loc, the asm differs exactly like this :

Without debug_locWith debug_loc
.section .debug_loc
...
......
.byte 28 # DW_AT_const_value.byte 2 # DW_AT_location
.byte 15 # DW_FORM_udata.byte 23 # DW_FORM_sec_offset

To summarize, it seems like DW_AT_const_value can be used instead of DW_AT_location under certain conditions and that's what seems to be happening here. Please correct me if this is a wrong assumption.

This is for the default case.

Now, let's look at what happens with bbsections, first without this patch. With bb sections, buildLocationList is called again. Here is the ASM with basic block sections:

	.text
	.file	"locdspnm.cpp"
	.type	_ZL4ncatPcjz,@function
_ZL4ncatPcjz:                           # @_ZL4ncatPcjz
.Lfunc_begin0:
	.loc	3 37 0                          # third_party/icu/source/common/locdspnm.cpp:37:0
	.cfi_startproc
# %bb.0:                                # %.critedge3
...
	je	_ZL4ncatPcjz.2
	jmp	_ZL4ncatPcjz.1
	.cfi_endproc
	.section	.text,"ax",@progbits,unique,1
_ZL4ncatPcjz.1:                         # %.critedge3
	.cfi_startproc
	.cfi_def_cfa %rsp, 64
...
	jmp	_ZL4ncatPcjz.2
.LBB_END0_1:
	.size	_ZL4ncatPcjz.1, .LBB_END0_1-_ZL4ncatPcjz.1
	.cfi_endproc
	.section	.text,"ax",@progbits,unique,2
_ZL4ncatPcjz.2:                         # %.critedge3
	.cfi_startproc
	.cfi_def_cfa %rsp, 64
...
	leaq	-128(%rsp), %rax
	#DEBUG_VALUE: ncat:buflen <- 157
.Ltmp0:
	.loc	3 47 3 prologue_end             # third_party/icu/source/common/locdspnm.cpp:47:3
...
	jmp	_ZL4ncatPcjz.3
.Ltmp1:
.LBB_END0_2:
	.size	_ZL4ncatPcjz.2, .LBB_END0_2-_ZL4ncatPcjz.2
	.cfi_endproc
	.p2align	4, 0x90
	.section	.text,"ax",@progbits,unique,3
_ZL4ncatPcjz.3:                         # =>This Inner Loop Header: Depth=1
	.cfi_startproc
	.cfi_def_cfa %rsp, 64
	#DEBUG_VALUE: ncat:buflen <- 157
	.loc	3 0 3 is_stmt 0                 # third_party/icu/source/common/locdspnm.cpp:0:3
	jmp	_ZL4ncatPcjz.3
.LBB_END0_3:
	.size	_ZL4ncatPcjz.3, .LBB_END0_3-_ZL4ncatPcjz.3
	.cfi_endproc
	.text
.Lfunc_end0:
	.size	_ZL4ncatPcjz, .Lfunc_end0-_ZL4ncatPcjz

Again, buildLocationList at https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1638 adds two entries but their start and end label pairs are now:

  1. .Lfunc_begin0, .Ltmp1 // This is the same as the default case.
  2. _ZL4ncatPcjz.3, .LBB_END0_3 // Note that the start label is the start of the next section and the end label is the end of the next section

Now, merge ranges will not merge these two into a single range as .Ltmp1 is not equal to _ZL4ncatPcjz3.

So, buildLocationList returns false at : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1741

That is, the optimization to not make a .debug_loc is not applicable in this case as that only holds if there is a valid single location here : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1841

So, a loc list is constructed but without this patch this triggers a problem at : https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DebugLocStream.h#L99

This is because the startSym and the endSym have to be in the same section. .Ltmp1 and .Lfunc_begin0 are in different sections. The assembler will complain if we try to assemble this.

So, what triggered this problem? LabelsBeforeInsn[insn] cannot be loosely assigned to be the start of the function with basic block sections if the instruction and the start of the function are in different sections.

Now, this patch fixes it by making LabelsBeforeInsn to be the start of the section. So, the two entries in DebugLoc would become:

  1. _ZL4ncatPcjz.2, .Ltmp1
  2. _ZL4ncatPcjz.3, .Lfunc_end0

This is good, but we are missing the entries corresponding to (.Lfunc_begin0 ... _ZL4ncatPcjz.2) which denote the 2 basic blocks before. We could handle this if we extend LabelsBeforeInsn to have multiple values in the case of basic block sections.

The way this patch exists, it loses the debug_loc fidelity as it does not capture the other blocks.

Thoughts?

Apologies for the very long reply!

Diffing the asm by forcing a debug_loc entry,

Not sure what you mean by this ^ could you explain further?

what happens here is that a DW_AT_location is replaced with a DW_AT_const_value. I don't understand this too well but this seems like an optimization to me.

Does this make sense?

I'm still a bit confused. I think if non-bb-sections doesn't need a location list, then bb-sections shouldn't either. Perhaps that issue should be fixed first, if it's an issue?

In any case/any order - perhaps this patch is still reflective of a bug that exists even in this ^ issue is fixed, but maybe it's that the test case isn't sufficient to demonstrate the remaining bug - perhaps the test case could be complicated a bit further to ensure it's not testing a behavior that shouldn't be happening for other reasons anyway?

tmsriram updated this revision to Diff 286474.Aug 18 2020, 9:28 PM

@amharc

When basic block sections are on and the debug_loc entries span multiple sections, emit entries for each section. This keeps the fidelity and there is no loss of debug information.

To summarize, by default there is only one entry which is collapsed into a DW_AT_const_value.

WIth basic block sections, this one entry spans multiple sections and must be split into an entry for each section.

@amharc

When basic block sections are on and the debug_loc entries span multiple sections, emit entries for each section. This keeps the fidelity and there is no loss of debug information.

To summarize, by default there is only one entry which is collapsed into a DW_AT_const_value.

WIth basic block sections, this one entry spans multiple sections and must be split into an entry for each section.

if the location is valid for the entire range of the enclosing scope, it could still be described with a single DW_AT_const_value, I think?

DWARFv5 spec: "[single (non-loclist) location descriptions] are sufficient for describing the location of any object as long as its lifetime is either static or the same as the lexical block that owns it, and it does not move during its lifetime"

@amharc

When basic block sections are on and the debug_loc entries span multiple sections, emit entries for each section. This keeps the fidelity and there is no loss of debug information.

To summarize, by default there is only one entry which is collapsed into a DW_AT_const_value.

WIth basic block sections, this one entry spans multiple sections and must be split into an entry for each section.

if the location is valid for the entire range of the enclosing scope, it could still be described with a single DW_AT_const_value, I think?

DWARFv5 spec: "[single (non-loclist) location descriptions] are sufficient for describing the location of any object as long as its lifetime is either static or the same as the lexical block that owns it, and it does not move during its lifetime"

I am not sure I fully understood what you meant here, could you please elaborate? With sections, since the individual sections could be separated and can become disjoint you need one per section right? Maybe you meant something else.

@amharc

When basic block sections are on and the debug_loc entries span multiple sections, emit entries for each section. This keeps the fidelity and there is no loss of debug information.

To summarize, by default there is only one entry which is collapsed into a DW_AT_const_value.

WIth basic block sections, this one entry spans multiple sections and must be split into an entry for each section.

if the location is valid for the entire range of the enclosing scope, it could still be described with a single DW_AT_const_value, I think?

DWARFv5 spec: "[single (non-loclist) location descriptions] are sufficient for describing the location of any object as long as its lifetime is either static or the same as the lexical block that owns it, and it does not move during its lifetime"

I am not sure I fully understood what you meant here, could you please elaborate? With sections, since the individual sections could be separated and can become disjoint you need one per section right? Maybe you meant something else.

Ah, sure, sorry - if a variable only has one location, and that location is valid for the entire range of its enclosing scope (even if that enclosing scope is fragmented/uses ranges (rather than high/low) to describe its valid set of addresses) then the variable can use a simple/single location description.

Take for example this code:

void f1();
void f2() {
  {
    int i = 7;
    f1();
    f1();
  }
  f1();
}

compiled to LLVM IR with optimizations, then manually modified to reorder the 2nd and 3rd calls to f1 (to cause the valid set of addresses for the explicit lexical scope that contains i to become fragmented, and need a DW_AT_ranges to describe its valid addresses (the first f1 call, then the interleaved (originally 3rd, now second) f1 call which isn't part of tnhe scope, then more scope with the last call to f1)), this produces the following DWARF:

DW_TAG_subprogram
  DW_AT_low_pc    (0x0000000000000000)
  DW_AT_high_pc   (0x0000000000000011)
  DW_AT_name      ("f2")
  ...

  DW_TAG_lexical_block
    DW_AT_ranges  (0x00000000
      [0x0000000000000001, 0x0000000000000006)
      [0x000000000000000b, 0x0000000000000011))

    DW_TAG_variable
      DW_AT_const_value   (7)
      DW_AT_name  ("i")
      ...
    NULL

Despite 'i' having a disjoint valid range (it's only valid in the [1, 6)+[b, 11) instructions) - it doesn't need to use a loclist to describe its location, because its location is the same for the entire life of its enclosing scope, which is the implicit range that the location is valid over if no location list is used.

tmsriram updated this revision to Diff 287715.Aug 25 2020, 11:02 AM

Redo the patch according to @dblaikie 's observation that loc list need not be used even with sections.

This change checks if debug loc entries can be merged to form a larger range even if they are split across sections. The test checks that a single range is collapsed into DW_AT_const_value both with and without sections.

dblaikie added inline comments.Aug 25 2020, 3:00 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1732–1735

This description sounds like it might merge across BB sections even if the location doesn't span the whole outer range.

eg: given a scope like [x, x+5) [y, y+3) and a location range of [x+3, x+5) [y, y+2) it sounds like this code might produce a single [x+3, y+2) range? That would be incorrect - since there's no ordering between the x and y ranges - so there's no way to cross the gap like that.

Could you include some test cases with interesting ranges like this to check that the DWARF is correct there?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1732–1735

Should the location range span the entire function? Could it be fully covering a subset of sections? Test cases for this seem very hard, even the one attached is reduced from a larger failure. Any pointers here? Thanks.

dblaikie added inline comments.Aug 26 2020, 1:43 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1732–1735

llvm-dwarfdump some examples with non-trivial locations (I'll provide some suggestions on how to create non-trivial locations shortly) - if the location doesn't use a location list (ie: it uses a single location that spans the entire enclosing scope) then it should continue to use a single location with BB sections - because the implied range is "whatever addresses the enclosing scope says" - which will use ranges or high/low depending on whether it's disjoint or crosses BB-sections boundaries, etc)

If the location isn't a single location without BB-sections, then it can't be a single location with BB-sections. And any ranges that that location uses should be fragmented in the same way that DW_AT_ranges/high/low would be fragmented.

Would it be possible to use the same logic that handles ranges to handle these location ranges too? Presumably there's some code that transforms a begin/end pair into fragments when that pair crosses BB-section boundaries? So perhaps all the merging can be done here, then it can be checked to see if it's a valid single location (there's got to be some code somewhere that checks whether the final ranges cover all the enclosing scope ranges and uses the same location -> go to single location) and then fragmented if it crosses BB-section boundaries?

Some examples to consider:

void f1();
int f2();
extern bool b;
extern int x;
void test() {
  {
    // constant value with a single location description
    // Shouldn't change with BB-sections
    int i = 7;
    f1();
  } 
  int tmp = f2();
  { 
    // non-constant value with a single location description
    // Shouldn't change with BB-sections
    int i = tmp;
    f1();
    x = i;
  }
  { 
    // constant value through partial scope, but crossing a BB boundary
    // Should change with BB sections, I think - the address range for the
    // constant value will be fragmented.
    int i = 7;
    f1();
    if (b)
      f1();
    i = f2();
    f1();
  } 
  { 
    // constant value through partial scope, no BB boundary
    // Shouldn't change with BB-sections
    int i = 7;
    f1();
    i = f2();
    f1();
  } 
  { 
    // constant value through whole scope, including BB crossing - but LLVM puts
    // this in a location list unnecessarily.
    // You could fix the underlying LLVM bug (would be much appreciated) & make
    // this use a single location description - then with BB sections it should
    // still use a single location description.
    // If this bug remains, then the BB-sections code would need to fragment the
    // location list. (this'll make BB-sections DWARF size overhead higher than
    // it would otherwise be if the bug was fixed).
    int i = 7;
    f1();
    if (b)
      f1();
  } 
}
dblaikie added inline comments.Aug 28 2020, 1:28 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1732–1735

Handily enough, this last case got fixed to use a constant location rather than a loclist recently, so this:

{
  int i = 7;
  f1();
  if (b)
    f1();
}

produces this DWARF (without bb sections - and should continue to produce the same DWARF with bb-sections)

DW_TAG_lexical_block
  DW_AT_low_pc  (0x0000000000000044)
  DW_AT_high_pc (0x0000000000000058)

  DW_TAG_variable
    DW_AT_const_value   (7)
    DW_AT_name  ("i")
    DW_AT_decl_file     ("/usr/local/google/home/blaikie/dev/scratch/loc.cpp")
    DW_AT_decl_line     (48)
    DW_AT_type  (0x00000192 "int")
MaskRay added inline comments.
llvm/test/DebugInfo/X86/basic-block-sections-debug-loc.ll
32

The attributes can be cleaned up a bit. For example "use-soft-float" defaults to false so it is not needed.