Page MenuHomePhabricator

[DWARF] Change ambiguity resolution from smallest CUOffset to largest (LowPC, CUOffset)
Needs ReviewPublic

Authored by MaskRay on Apr 9 2019, 9:41 AM.

Details

Summary

When constructing debug aranges, if ranges overlap, we currently pick
the CU with the smallest CUOffset. This patch changes the rule to the
largest (LowPC, CUOffset).

--gc-sections and comdat may discard sections. The associated compile
units may be kept and have zero DW_AT_low_pc. If the DW_AT_high_pc is
sufficiently large, it may overlap with a good one.

llvm-symbolizer may resolve an address in the good CU to the discarded
CU. By giving priority to the largest (LowPC, CUOffset), the good one
will win. Some incorrect resolution will be fixed.

This patch also drops the Aranges.back().setHighPC(E.Address);
optimization because that is not very useful. N CUs can have at most
2*N-1 ranges.

In practice, overlapping only happens with such [0,high_pc) sections.
This optimization saves no more than M ranges where M is the number of
garbage collected CUs that overlap with a good CU.

Diff Detail

Event Timeline

MaskRay created this revision.Apr 9 2019, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2019, 9:41 AM

What does addr2line (or any other symbolizers, not that I know of others) do in these cases?

How's this compare/contrast with the lld change that's being made to achieve similar results? (does this remove the need for the lld change?)

ormris removed a subscriber: ormris.Apr 9 2019, 2:15 PM
MaskRay added a comment.EditedApr 9 2019, 7:10 PM

What does addr2line (or any other symbolizers, not that I know of others) do in these cases?

addr2line doesn't use binary search. It iterates all ELF sections, finds the first section containing the address, then iterates DWARF CUs and finds the associated line table. It has the same problem and is computationally more expensive:

% addr2line -e res.out 0x2010f0
/tmp/c/not_used.cpp:2

How's this compare/contrast with the lld change that's being made to achieve similar results? (does this remove the need for the lld change?)

This patch drops one minor space optimization, and changes the arbitrary *ValidCUs.begin() (prefer smallest CUOffset) to LiveCUs.rbegin() (prefer largest LowPC). It thus fixes an issue similar to D59553. After this patch, for an address in a non-discarded CU, it will not be incorrectly attributed to the discarded CU if the discarded CU also contains that address.

The D59553 lld change will resolve a relocation to a discarded DW_AT_low_pc to -2. It will make the DW_AT_low_pc=0 scenario not happen for lld-linked executables/DSOs (note, ld.bfd and gold still use zero DW_AT_low_pc). It won't affect other overlapping CU scenarios. We still need a rule defensively and I think (largest LowPC) is probably better than (smallest CUOffset).

What does addr2line (or any other symbolizers, not that I know of others) do in these cases?

addr2line doesn't use binary search. It iterates all ELF sections, finds the first section containing the address, then iterates DWARF CUs and finds the associated line table. It has the same problem

Any idea why this problem hasn't been much of an issue until now/recently (both your change here, and the LLD change being reviewed)?

and is computationally more expensive:

% addr2line -e res.out 0x2010f0
/tmp/c/not_used.cpp:2

How's this compare/contrast with the lld change that's being made to achieve similar results? (does this remove the need for the lld change?)

This patch drops one minor space optimization, and changes the arbitrary *ValidCUs.begin() (prefer smallest CUOffset) to LiveCUs.rbegin() (prefer largest LowPC). It thus fixes an issue similar to D54747. After this patch, for an address in a non-discarded CU, it will not be incorrectly attributed to the discarded CU if the discarded CU also contains that address.

The D54747 lld change will resolve a relocation to a discarded DW_AT_low_pc to -2. It will make the DW_AT_low_pc=0 scenario not happen for lld-linked executables/DSOs (note, ld.bfd and gold still use zero DW_AT_low_pc). It won't affect other overlapping CU scenarios. We still need a rule defensively and I think (largest LowPC) is probably better than (smallest CUOffset).

MaskRay added a comment.EditedApr 10 2019, 7:16 PM

Any idea why this problem hasn't been much of an issue until now/recently (both your change here, and the LLD change being reviewed)?

These .debug_info sections associated with garbage collected sections have low_pc/high_pc pairs in this form: [0, high_pc) (their DW_AT_low_pc relocations resolve to zero).

ld.bfd sets default image base to 0x400000 for x86_64, thus the first good CU will have DW_AT_low_pc slightly larger than 0x400000. That bad CU needs to have DW_AT_high_pc larger than 0x400000 (4MiB) to overlap with a good CU. I think it is not common to have a >4MiB text section with -ffunction-sections -fdata-sections -Wl,--gc-sections.

lld has a different section layout. Its default image base is 0x200000 (2MiB) but it places potentially large sections .dynsym .rodata .eh_frame before .text. If the application is complex enough to have >2MiB .text.*, I'm fairly confident it should have sufficient large .rodata to move .text far away from the bad CU.

[12] .rodata           PROGBITS        000000000020b950 00b950 006118 00 AMS  0   0 16                       
...
[15] .eh_frame         PROGBITS        0000000000222330 022330 01a494 00   A  0   0  8                       
[16] .text             PROGBITS        000000000023d000 03d000 0f6339 00  AX  0   0 16

So the llvm-symbolizer fix is probably most useful for ld.bfd/gold linked executables.

What does addr2line (or any other symbolizers, not that I know of others) do in these cases?

addr2line doesn't use binary search. It iterates all ELF sections, finds the first section containing the address, then iterates DWARF CUs and finds the associated line table. It has the same problem

Any idea why this problem hasn't been much of an issue until now/recently (both your change here, and the LLD change being reviewed)?

It's not even a new issue. We've had this problem for years in the PlayStation proprietary linker, because ASLR-enabled executables have a 0 base address. We have special handling of discarded functions to work around this in the linker.

It's not even a new issue. We've had this problem for years in the PlayStation proprietary linker, because ASLR-enabled executables have a 0 base address.

Ah, right! I forgot about the PIE case.

if (Config->ImageBase) // --image-base= option
  return *Config->ImageBase;
return Config->Pic ? 0 : DefaultImageBase; // otherwise, image base is 0 if -shared or -pie

-static makes it easier to run into this problem as there is no huge .dynsym .dynstr. lld's different section layout alleviates the problem due to its placement of .rodata .eh_frame before .text.

What happens if a program is linked from some objects with debug info and some objects without it?

It sounds like this solution might not address that case (in this case, there might be no ambiguity - only one CU would have an address range covering the instruction address - but that instruction address would have no real debug info).

Should we use a different solution? Is GNU addr2line's solution robust, if slow? Or does it still fail in these sort of cases? & if it isn't complete, should we settle for fixing the linker & not try to partially support this in the symbolizer if it necessarily can't be complete in general?

Have you run make check on this patch?

Have you run make check on this patch?

If you meant llvm-objdump -exports-trie. It was my problem and I apologize for the llvm-objdump test failure I caused. Or https://reviews.llvm.org/D60376 where I didn't notice BSD sed's escape character support for y command is different from GNU sed on Linux. ("The meaning of a <backslash> followed by any character that is not 'n', a <backslash>, or the delimiter character is undefined."). Thank you for telling that there is ninja check beside check-all check-llvm ..., but those discussions are unrelated to this revision.

Have you run make check on this patch?

If you meant llvm-objdump -exports-trie. It was my problem and I apologize for the llvm-objdump test failure I caused. Or https://reviews.llvm.org/D60376 where I didn't notice BSD sed's escape character support for y command is different from GNU sed on Linux. ("The meaning of a <backslash> followed by any character that is not 'n', a <backslash>, or the delimiter character is undefined."). Thank you for telling that there is ninja check beside check-all check-llvm ..., but those discussions are unrelated to this revision.

check is just an alias for check-llvm.

What happens if a program is linked from some objects with debug info and some objects without it?

This revision doesn't change the behavior. For the address ranges with no debug info, the results remain ??:0:0 (unknown).

It sounds like this solution might not address that case (in this case, there might be no ambiguity - only one CU would have an address range covering the instruction address - but that instruction address would have no real debug info).

The motivation was to mitigate some issues of lld-linked modules => my approach is general and doesn't discuss anything specific as zero addresses, etc => this patch seems a bit useful for ld.bfd/gold linked modules (reasons at https://reviews.llvm.org/D60470#1462246)

Should we use a different solution? Is GNU addr2line's solution robust, if slow? Or does it still fail in these sort of cases? &

I mentioned before (https://reviews.llvm.org/D60470#1460670) that addr2line's linear search can be faulty, as demonstrated by the following command:

% addr2line -e res.out 0x2010f0
/tmp/c/not_used.cpp:2

Linear search doesn't look more robust, but it is assuredly slow.

if it isn't complete, should we settle for fixing the linker & not try to partially support this in the symbolizer if it necessarily can't be complete in general?

This revision doesn't do any "involved" logic. It changes the rule about which range prevails when several ranges overlap. I even deleted a not useful "optimization"(trade-off) that (I think) cluttered the code.

What happens if a program is linked from some objects with debug info and some objects without it?

This revision doesn't change the behavior. For the address ranges with no debug info, the results remain ??:0:0 (unknown).

Ah, sorry, that's not the case I was interested in. I meant a case where two object files are linked together - one has debug info, one does not. They both have a comdat function. The non-debug info version is chosen (creating the [0, X) address range in the debug info from the other object).

If I understand it, this change is to the prioritization/resolution of ambiguities - but similar situations/bugs can arise when there is no ambiguity (because some code has no debug info - so it's not there to create an ambiguity). if I'm understanding/explaining correctly.

Ah, sorry, that's not the case I was interested in. I meant a case where two object files are linked together - one has debug info, one does not. They both have a comdat function. The non-debug info version is chosen (creating the [0, X) address range in the debug info from the other object).

If the linker makes the comdat function without debug info (assuming the debug info is not in a comdat) prevailing, the one with debug info is discarded while its debug info is kept. This is essentially the same as the --gc-sections case.

The compile unit with 0 DW_AT_low_pc is actually invalid because the addresses are incorrect. It should be ignored if it overlaps with a valid range when constructing aranges. This patch will do so.

If I understand it, this change is to the prioritization/resolution of ambiguities - but similar situations/bugs can arise when there is no ambiguity (because some code has no debug info - so it's not there to create an ambiguity). if I'm understanding/explaining correctly.

Yes, it is related to the resolution of ambiguities. The old rule is to pick the CU with the smallest E.CUOffset; the new rule is to pick the CU with the largest (E.LowPC, E.CUOffset) pair.

MaskRay updated this revision to Diff 195506.Apr 16 2019, 7:16 PM
MaskRay retitled this revision from [DWARF] Prefer larger DW_AT_low_pc when constructing aranges to [DWARF] Change ambiguity resolution from smallest CUOffset to largest (LowPC, CUOffset).
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed subscribers: smeenai, t.p.northover, jhenderson.

Update description

Ah, sorry, that's not the case I was interested in. I meant a case where two object files are linked together - one has debug info, one does not. They both have a comdat function. The non-debug info version is chosen (creating the [0, X) address range in the debug info from the other object).

If the linker makes the comdat function without debug info (assuming the debug info is not in a comdat) prevailing, the one with debug info is discarded while its debug info is kept. This is essentially the same as the --gc-sections case.

The compile unit with 0 DW_AT_low_pc is actually invalid because the addresses are incorrect. It should be ignored if it overlaps with a valid range when constructing aranges. This patch will do so.

If I understand it, this change is to the prioritization/resolution of ambiguities - but similar situations/bugs can arise when there is no ambiguity (because some code has no debug info - so it's not there to create an ambiguity). if I'm understanding/explaining correctly.

Yes, it is related to the resolution of ambiguities. The old rule is to pick the CU with the smallest E.CUOffset; the new rule is to pick the CU with the largest (E.LowPC, E.CUOffset) pair.

What I'm trying to ask is - it sounds like this is about resolution of an ambiguity between the wrong answer (the zero-based entry due to a comdat function not being chosen, or a gc-section being GC'd - but where the address range, even zero based, overlaps with executable code) and the right one (the one with a non-zero low pc).

So what I'm asking is: what happens if the right one isn't there (because that code wasn't built with debug info) - would we still get the wrong answer?

So what I'm asking is: what happens if the right one isn't there (because that code wasn't built with debug info) - would we still get the wrong answer?

We still get the wrong answer. Say there is an invalid [0, 0x201000). [0x200100, 0x220000) is part of the .text section but it isn't associated with a compile unit. When you resolve any address in [0x200100, 0x201000) (valid address ranges), it resolves to the invalid [0, 0x201000). This is the status quo and this patch doesn't change it.

This is because the code doesn't know if a compile unit should be ignored or not. This patch just changes priority.

Ping :) Whatever the lld resolution of D59553 is, this is at least useful for ld.bfd/gold linked modules. And this patch simplifies the logic.