This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Limit 8b259fe573e1 to dSYMs
ClosedPublic

Authored by JDevlieghere on Jan 9 2023, 3:10 PM.

Details

Summary

Limit trusting the arange accelerator tables to dSYMs only, and not any debug info object file.

Diff Detail

Event Timeline

JDevlieghere created this revision.Jan 9 2023, 3:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2023, 3:10 PM
JDevlieghere requested review of this revision.Jan 9 2023, 3:10 PM
JDevlieghere updated this revision to Diff 487577.
JDevlieghere updated this revision to Diff 487578.

typo

Should this be true for a fully linked ELF executable, too?

lldb/include/lldb/Symbol/ObjectFile.h
685

Doxygen comment?

Should this be true for a fully linked ELF executable, too?

Sounds reasonable, but adding @labath and @DavidSpickett to confirm. This is trivial to extend later.

Address @aprantl's feedback

aprantl accepted this revision.Jan 9 2023, 3:20 PM
aprantl added inline comments.
lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
6098

Can you also add a comment here explaining why we only trust a dSYM?

This revision is now accepted and ready to land.Jan 9 2023, 3:20 PM

Yeah I'd add a comment explaining that this returns true if the DWARF dwarf_aranges accelerator table can be trusted to be complete by lldb. which it is with dsymutil created dSYM files. Otherwise lldb will confirm that every CU mentioned in the debug_info is included in dwarf_aranges, and for any that are not, it will scan the debug_info for that compile unit looking to make sure there aren't any subprograms that were omitted. Anyway, LGTM.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2023, 3:38 PM

Should this be true for a fully linked ELF executable, too?

Sounds reasonable, but adding @labath and @DavidSpickett to confirm. This is trivial to extend later.

It sounds like this is depends on whoever produced the file, not on the file format.

Should this be true for a fully linked ELF executable, too?

Sounds reasonable, but adding @labath and @DavidSpickett to confirm. This is trivial to extend later.

It sounds like this is depends on whoever produced the file, not on the file format.

It's not even the producer (compiler/assembler) - it's the linker (or whatever links the dwarf) that needs to generate the table, and we can't detect that from the debug info.

We need (1) a guarantee that every compile unit which generated code in the debug_info has an entry in debug_aranges, (2) a guarantee that all of the address ranges that come from the CU are described, and (3) a guarantee that the ranges in debug_aranges are unique, that no address will map to multiple compile units. The description in the dwarf v5 doc includes,

By scanning the table, a debugger can quickly decide which compilation unit to look in to find the debugging information for an object that has a given address.

If the range of addresses covered by the text and/or data of a compilation unit is not contiguous, then there may be multiple address range descriptors for that compilation unit.

(I dislike it when the standard says "there MAY be multiple address range descriptors" -- does this mean if I have a noncontiguous CUs A and B interleaved in the final binary, the debug_aranges can overlap?)

There's no guarantee for (1) if a debug_aranges table is present, but maybe we can simply assume that any producer producing debug_aranges has done so comprehensively. I couldn't imagine why it wouldn't be that way.

The point of this patch, of course, is to skip the verification of (1) - that every CU that generated functions in the final binary has a debug_aranges entry. It turns out we have compile units in debug_info that don't emit any functions, and lldb would see those as missing from debug_aranges and iterate over the DIEs looking for subprograms.

Maybe lldb should simply trust that if debug_aranges exists, all of 1-3 are true, and revisit it when/if we get bug reports of some random toolchain in the world violating that assumption.

Just to clarify: I was responding to the question about the ELF file usage, not questioning the overall approach. Given that MachO has a special code for dsym files, I think it makes sense to use it.

By scanning the table, a debugger can quickly decide which compilation unit to look in to find the debugging information for an object that has a given address.

If the range of addresses covered by the text and/or data of a compilation unit is not contiguous, then there may be multiple address range descriptors for that compilation unit.

(I dislike it when the standard says "there MAY be multiple address range descriptors" -- does this mean if I have a noncontiguous CUs A and B interleaved in the final binary, the debug_aranges can overlap?)

I see what you mean, but I wouldn't really read it that way, and I hope nobody produces files like that.

There's no guarantee for (1) if a debug_aranges table is present, but maybe we can simply assume that any producer producing debug_aranges has done so comprehensively. I couldn't imagine why it wouldn't be that way.

We can't do that because, as I alluded to in the other comment, the aranges section can be produced by the compiler, and there's no guarantee that all CUs will be built with the same flags (so some may be missing that section). In the elf world a linker will just concatenate those sections without trying to fill in the blanks.

A normal binary includes some code from the system runtime (which is built by the OS vendor not the user). Typically, that code will not have debug info, but sometimes it does, and in that case it's pretty much guaranteed that it will be built using different compiler/flags than the user code.