This is an archive of the discontinued LLVM Phabricator instance.

[ELF] --gdb-index: skip SHF_GROUP .debug_info
ClosedPublic

Authored by MaskRay on Aug 7 2020, 7:09 PM.

Details

Summary

-gdwarf-5 -fdebug-types-section may produce multiple .debug_info sections. All
except one are type units (.debug_types before DWARF v5). When constructing
.gdb_index, we should ignore these type units. We use a simple heuristic: the
compile unit does not have the SHF_GROUP flag. (This needs to be revisited if
people place compile unit .debug_info in COMDAT groups.)

This issue manifests as a data race: because an object file may have multiple
.debug_info sections, we may concurrently construct LLDDwarfObj for the same
file in multiple threads. The threads may access InputSectionBase::data()
concurrently on the same input section. InputSectionBase::data() does a lazy
uncompress() and rewrites the member variable rawData. A thread running zlib
inflate() (transitively called by uncompress()) on a buffer with rawData
tampered by another thread may fail with uncompress failed: zlib error: Z_DATA_ERROR.

Even if no data race occurred in an optimistic run, if there are N .debug_info,
one CU entry and its address ranges will be replicated N times. The result
.gdb_index can be much larger than a correct one.

The new test gdb-index-dwarf5-type-unit.s actually has two compile units. This
cannot be produced with regular approaches (it can be produced with -r
--unique). This is used to demonstrate that the .gdb_index construction code
only considers the last non-SHF_GROUP .debug_info

Diff Detail

Event Timeline

MaskRay created this revision.Aug 7 2020, 7:09 PM
MaskRay requested review of this revision.Aug 7 2020, 7:09 PM
MaskRay edited the summary of this revision. (Show Details)Aug 7 2020, 7:13 PM
MaskRay updated this revision to Diff 284114.Aug 7 2020, 9:36 PM
MaskRay retitled this revision from [WIP][ELF] --gdb-index: skip SHF_GROUP .debug_info to [ELF] --gdb-index: skip SHF_GROUP .debug_info.
MaskRay edited the summary of this revision. (Show Details)

Ready

MaskRay edited the summary of this revision. (Show Details)Aug 7 2020, 10:29 PM
MaskRay updated this revision to Diff 284115.Aug 7 2020, 10:36 PM

Add REQUIRES: zlib

Reword a comment.

MaskRay edited the summary of this revision. (Show Details)Aug 8 2020, 5:21 PM
grimar added a comment.EditedAug 10 2020, 2:56 AM

I can't say about the heuristic used and how well it works,
but have a few comments about the code.

lld/ELF/DWARF.cpp
35

So, you need this to read the flags from the sh_flags field of a section header in the initial object.
And you can't just use sec->flags, because we drop the SHF_GROUP flag much earlier.

I do not know a simpler way to access initial flags. Though probably it worth adding an assert to check that the number
of sections in obj->getObj().sections() == the number of them in obj->getSections() and a comment
about SHT_GROUP flag? That way we will not forget to simplify this loop back later when will stop relying on SHF_GROUP flag.

lld/ELF/SyntheticSections.cpp
2850–2851

It doesn't seem the getDebugInfoSections helper and its name makes much sence now with this change?

  1. It is not clear what it returns anymore: it might return either CU or TU .debug_infos.
  2. You are not really using the result, since dobj.getInfoSection() is called instead to get debug info sections.
  3. It is just needed now to find the number of files with .debug_info sections.

So I think it should probably just be inlined and/or rewritten to collect a file set mentioned probably?

MaskRay updated this revision to Diff 284510.Aug 10 2020, 2:53 PM
MaskRay marked 2 inline comments as done.

address comments

MaskRay added inline comments.Aug 10 2020, 7:45 PM
lld/test/ELF/gdb-index-dwarf5-type-unit.s
52

@dblaikie Does this test achieve a good minimality/robustness/usefulness in your opinion?

dblaikie added inline comments.Aug 11 2020, 2:16 PM
lld/test/ELF/gdb-index-dwarf5-type-unit.s
52

I think so?

If I'm reading it correctly it's a type unit in a comdat, a compile unit, another compile unit in a distinct fragment of debug_info, and then another type unit in a comdat?

The existence of two compilation units in separate sections with the same name isn't a use case I know of any producer producing today - but if it helps test out the generality of this functionality, it sounds OK to me.

MaskRay added inline comments.Aug 11 2020, 2:40 PM
lld/test/ELF/gdb-index-dwarf5-type-unit.s
52

Thanks for confirmation.

The existence of two compilation units in separate sections with the same name isn't a use case I know of any producer producing today - but if it helps test out the generality of this functionality, it sounds OK to me.

This is super rare. ld -r --unique can create the case. I just want to make the behavior clearer/concise without adding another test.

dblaikie added inline comments.Aug 11 2020, 2:41 PM
lld/test/ELF/gdb-index-dwarf5-type-unit.s
52

Ah, sounds good!

grimar added inline comments.Aug 12 2020, 3:28 AM
lld/ELF/SyntheticSections.cpp
2885

Previously, parallelForEachN loop assumed that there is a one .debug_info section in an object.
Now it is not true and parallelizing for sections by itself (instead of doing it for files) doesn't look optimal.

Also, you do not need to build a list sections, because you are not using them (you call dobj.getInfoSection(); anyways), so
you only need a list of files to work with.

With that we can have a simpler code:

  SetVector<ObjFile<ELFT> *> files;
  for (InputSectionBase *s : inputSections) {
    InputSection *isec = dyn_cast<InputSection>(s);
    if (!isec)
      continue;
    // .debug_gnu_pub{names,types} are useless in executables.
    // They are present in input object files solely for creating
    // a .gdb_index. So we can remove them from the output.
    if (s->name == ".debug_gnu_pubnames" || s->name == ".debug_gnu_pubtypes")
      s->markDead();
    else if (isec->name == ".debug_info")
      files.insert(isec->getFile<ELFT>());
  }

  std::vector<GdbChunk> chunks(files.size());
  std::vector<std::vector<NameAttrEntry>> nameAttrs(files.size());

  parallelForEachN(0, files.size(), [&](size_t i) {
    // To keep memory usage low, we don't want to keep cached DWARFContext, so
    // avoid getDwarf() here.
    DWARFContext dwarf(std::make_unique<LLDDwarfObj<ELFT>>(files[i]));
    auto &dobj = static_cast<const LLDDwarfObj<ELFT> &>(dwarf.getDWARFObj());
...
MaskRay updated this revision to Diff 285252.Aug 12 2020, 10:03 PM
MaskRay marked an inline comment as done.

address comments

grimar accepted this revision.Aug 13 2020, 2:02 AM

LGTM I think.

This revision is now accepted and ready to land.Aug 13 2020, 2:02 AM
This revision was automatically updated to reflect the committed changes.
MaskRay added a subscriber: hans.Aug 25 2020, 2:47 PM

@hans This patch can be cleanly cherry-picked to release/11.x It addresses a pain for DWARF v5 -fdebug-types-section users.

lld/ELF/SyntheticSections.cpp
2885

Forgot to push this comment:

Thanks! I'll make a small change: SetVector<InputFile *>, just so that we don't need to instantiate nearly identical code for 4 instantiations of ELFT.

hans added a comment.Aug 26 2020, 6:36 AM

@hans This patch can be cleanly cherry-picked to release/11.x It addresses a pain for DWARF v5 -fdebug-types-section users.

Pushed as 21d01a67c9613932053dd89c9957782f86e0c93f. Please let me know if there are any follow-ups.