This is an archive of the discontinued LLVM Phabricator instance.

[ELF] .gdb_index: fix CuOff when a .debug_info section contains more than 1 CU
ClosedPublic

Authored by MaskRay on Nov 9 2018, 4:03 PM.

Details

Summary

Idx passed to readPubNamesAndTypes was an index into Chunks, not an
index into the CU list. This would be incorrect if some .debug_info
section contained more than 1 CU.

In real world, glibc Scrt1.o is a partial link of start.os abi-note.o init.o and contains 2 CUs in debug builds.
Without this patch, any application linking such Scrt1.o would have invalid .gdb_index
The issue could be demonstrated by:

(gdb) py print(gdb.lookup_global_symbol('main'))
None

Event Timeline

MaskRay created this revision.Nov 9 2018, 4:03 PM
MaskRay updated this revision to Diff 173463.Nov 9 2018, 4:04 PM
MaskRay retitled this revision from [ELF] .gdb_index: fix CuOff when a .debug_info section more than 1 CU This fixes a .gdb_index issue linking Scrt1.o with debug information, as Scrt1.o has two DW_tag_compile_unit. to [ELF] .gdb_index: fix CuOff when a .debug_info section more than 1 CU.
MaskRay edited the summary of this revision. (Show Details)
MaskRay added a reviewer: ruiu.

.

I think this would fail at least in the following example:

  • Compile one file with gnu-pubnames and one withou
  • ld -r the two together (non-pubnames first, then pubnames one)
  • link those

I believe that'll produce a different (& incorrect) gdb-index compared to skipping the -r step and linking the two objects directly, maybe? (I haven't looked at the code carefully/completely enough to understand the CU counting aspect - but suffice to say, the linker shouldn't assume there are as many gnu_pubnames contributions/sets as there are CUs - or that they're in the same order, for that matter)

Probably the right solution is to store the Offset (DWARFDebugPubTable::Set::Offset) that describes the CU from the pubnames - then lookup that offset to determine the CU index - possibly by a search within a sorted list of CU offsets, built by a walk of the CUs in the input)

MaskRay updated this revision to Diff 173491.Nov 9 2018, 11:46 PM

Use debug_info_offset to index into CUs

MaskRay updated this revision to Diff 173492.Nov 9 2018, 11:46 PM
MaskRay retitled this revision from [ELF] .gdb_index: fix CuOff when a .debug_info section more than 1 CU to [ELF] .gdb_index: fix CuOff when a .debug_info section contains more than 1 CU.
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: dblaikie.

Description

MaskRay updated this revision to Diff 173521.Nov 10 2018, 11:28 AM

Add test gdb-index-multiple-cu.s

ruiu added inline comments.Nov 11 2018, 10:21 PM
ELF/SyntheticSections.cpp
2425–2426

Not sure if I understand this comment, because this comment explains only the tricky part of the code but doesn't explain the normal condition. I'd just explain what I represents and don't focus too much on the detail.

2428

Can you use a regular for loop instead of std::lower_bound? I think std::lower_bound with a lambda is not very easy to read because because a lambda of the third argument takes two arguments, even though Off is actually just Set.Offset in this case.

MaskRay updated this revision to Diff 173617.Nov 11 2018, 10:34 PM

Address comments

MaskRay marked 2 inline comments as done.Nov 11 2018, 10:36 PM

The thing we've done in readPubNamesAndTypes (try getting the correct CuIndexInObject) is a plus. The mechanism is not clearly specified and I think gold only handles the first .debug_gnu_pubnames set and ignores the rest.

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;hb=160fc977b6f33e5ef51f6c1f3cdcb57965c522c8;f=gold/gdb-index.cc#l864

ruiu accepted this revision.Nov 13 2018, 12:34 AM

LGTM

This revision is now accepted and ready to land.Nov 13 2018, 12:34 AM
This revision was automatically updated to reflect the committed changes.

(ah, I see I hadn't posted my first feedback earlier (written but unsubmitted) - oops, sorry about that)

ELF/SyntheticSections.cpp
2431

CUs[*].CuOff is necessarily increasing, because it's the offset as the CUs are read from the input, but there's no reason to believe that Set.Offset is necessarily increasing - pubnames could be in any order relative to the CUs.

lld/trunk/ELF/SyntheticSections.cpp
2432 ↗(On Diff #173818)

Could you fix this? This assumption seems pretty subtle/likely to break valid DWARF in surprising/hard to diagnose ways.

Moving the declaration of I into the for(Set) loop (ie: putting it just before this while loop) should suffice - and a test case with 2 pubnames in the reverse order to CUs would demonstrate the fix?

(would be nice to do a binary search rather than a linear one on the CU offsets, since they are naturally sorted by offset)

MaskRay added inline comments.Nov 13 2018, 10:06 AM
lld/trunk/ELF/SyntheticSections.cpp
2432 ↗(On Diff #173818)

I used std::lower_bound(CUs.begin(), CUs.end(), ...) - CUs.begin()) in a former version but @ruiu thought it was too complicated...

dblaikie added inline comments.Nov 13 2018, 10:11 AM
lld/trunk/ELF/SyntheticSections.cpp
2432 ↗(On Diff #173818)

*nod* std::lower_bound is a somewhat awkward API, but the general goal of using a binary search still seems achievable with readable/not-so-awkward code.

Having a convenient binary search API that takes a lambda/functor without also taking the comparison object, instead relying on the functor to have that parameter hardcoded/captured, seems useful. Not sure what to name it (naming it llvm::binary_search would be likely to get mixed up with the interface to std::binary_search, which has the similar API quirks to std::lower_bound, unfortunately) though.