Page MenuHomePhabricator

[WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info.
Needs ReviewPublic

Authored by avl on Thu, Feb 6, 2:17 PM.

Details

Summary

There was a thread at llvm-dev about removing obsolete debug info:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135473.html

This patch is the first skeleton implementation for this.
It does not produce any results. It includes:

  • DWARFLinker library(extracted from dsymutil tool)
  • --gc-debuginfo option.
  • LLDDwarfLinker.h/.cpp

Other functionality assumed to be added by the set of small
incremental patches.

Diff Detail

Event Timeline

avl created this revision.Thu, Feb 6, 2:17 PM

test/ELF/gc-debuginfo.s works without the code change (after deleting --gc-debuginfo). How many changes do you expect will be needed to make the feature work?

lld/ELF/CMakeLists.txt
57

Is AsmPrinter used?

lld/ELF/LLDDwarfLinker.cpp
71

Capitalize.

Append full stop.

71

// for each object file does not convey more information than the code itself.

Say what this loop is supposed to do.

74

Delete empty line

82

Doesn't make_unique work?

lld/test/ELF/gc-debuginfo.s
3

Delete (invisibile) whitespace

7

- -> --

11

Append # CHECK-NOT: {{.}}

11

Delete trailing space

grimar added a comment.Fri, Feb 7, 2:01 AM

test/ELF/gc-debuginfo.s works without the code change (after deleting --gc-debuginfo). How many changes do you expect will be needed to make the feature work?

(Disclaimer: As a person who is interested in this feature implicitly (Alexey and I do work for the same company))

I wonder if that is really important to split patches that much (this patch seems have no functionality at all?). It would be good to see at least some benefit, but perhaps
it is not that easy to do... Anyways:
Personally I'd probably prefer to see a full set of patches for the feature. A reviewer should be able to apply them and debug (if he/she wants to) probably.

lld/ELF/CMakeLists.txt
38

I'd not add "LLD" prefix. All if sources here belong to "LLD" actually.

lld/ELF/Driver.cpp
2004

Missing a full stop.

lld/ELF/LLDDwarfLinker.cpp
13

Is it a final version of the comment? I'd expect to habe more information here probably about what we exactly do.

lld/test/ELF/gc-debuginfo.s
2

I'd ask to add a comment:

## This is a test for blah blah..
avl added a comment.Fri, Feb 7, 5:23 AM
How many changes do you expect will be needed to make the feature work?

Hi, following is the approximate list of changes, which assumed to match with corresponding patches :

  1. Do not create DWARFContext inplace. Use ObjFile<ELFT>::dwarf and "llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });" instead.
  1. Patch DebugInfoDWARF library to have a possibility to replace the error handler. Someone could already pass own error handler for DWARFContext. But there are places where WithColor::error() is used directly:
void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
  WithColor::error() << toString(std::move(e));
}

Thus, it is necessary to have a possibility to replace ErrorHandler here.

  1. Replace LLDDwarfObj with DWARFObjInMemory implementation while creating DWARFContext:
for (InputFile *file : objectFiles) {
  std::unique_ptr<ObjectFile> ObjectFile = createELFObjectFile(file.mb);

  std::unique_ptr<DWARFContext> DWARFContext::create (ObjectFile, nullptr, ErrorHandler));
}
  1. Create an implementation for lld/ELF/LLDDwarfLinker.cpp::ObjFileAddressMap.
  1. Create special kind of section - "DebugInputSection : public InputSection". To keep the result of debug info linking and to handle relocations.
  1. add LLD`s implementation for DWARFLinker/DWARFLinker.h::DwarfEmitter, which would write the output of DWARFLinker into DebugInputSection.
avl marked 13 inline comments as done.Fri, Feb 7, 5:43 AM
avl added inline comments.
lld/ELF/CMakeLists.txt
38

The idea is that the file name matches the class name. The class name has LLD to be different from DWARFLinker from DWARFLinker library. If it looks redundant, then I would delete this prefix.

57

It is used because LLDDwarfLinker.cpp includes DWARFLinker.h, which includes "llvm/CodeGen/Die.h", which needs "llvm::DIEAbbrev::Profile(llvm::FoldingSetNodeID&)" which is in AsmPrinter.

probably it could be refactored out later.

lld/ELF/LLDDwarfLinker.cpp
13

Ok, I would put more detailed comment.

avl added a comment.Fri, Feb 7, 6:03 AM

I wonder if that is really important to split patches that much (this patch seems have no functionality at all?). It would be good to see at least some benefit, but perhaps
it is not that easy to do... Anyways:
Personally I'd probably prefer to see a full set of patches for the feature. A reviewer should be able to apply them and debug (if he/she wants to) probably.

Hi George,

I follow the idea is that small patches are easier to review and easier to integrate. That concrete patch does not have a functionality but it includes DWARLinker library, creates option and creates placeholders for the functionality. Which looks like a small amount of things that could be reviewed.

Creating full set of patches which could be any time applied to the upstream and to the each other is quite time consuming thing because of time for synchronization/merging/rebasing. As an alternative we could discuss main points of future patches[0]. And iteratively apply patches step by step. If that plan is not good - then OK, I would start to work on preparing full set of patches for that feature.

[0] following is the approximate list of changes, which assumed to match with corresponding patches :

  1. Do not create DWARFContext inplace. Use ObjFile<ELFT>::dwarf and "llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });" instead.
  1. Patch DebugInfoDWARF library to have a possibility to replace the error handler. Someone could already pass own error handler for DWARFContext. But there are places where WithColor::error() is used directly:
void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
  WithColor::error() << toString(std::move(e));
}

Thus, it is necessary to have a possibility to replace ErrorHandler here.

  1. Replace LLDDwarfObj with DWARFObjInMemory implementation while creating DWARFContext:
for (InputFile *file : objectFiles) {
  std::unique_ptr<ObjectFile> ObjectFile = createELFObjectFile(file.mb);

  std::unique_ptr<DWARFContext> DWARFContext::create (ObjectFile, nullptr, ErrorHandler));
}
  1. Create an implementation for lld/ELF/LLDDwarfLinker.cpp::ObjFileAddressMap.
  1. Create special kind of section - "DebugInputSection : public InputSection". To keep the result of debug info linking and to handle relocations.
  1. add LLD`s implementation for DWARFLinker/DWARFLinker.h::DwarfEmitter, which would write the output of DWARFLinker into DebugInputSection.
avl updated this revision to Diff 243205.Fri, Feb 7, 9:52 AM

addressed comments.

avl retitled this revision from [LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info. to [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info..Fri, Feb 14, 11:48 PM

marked the patch as WIP while preparing full set of patches for the feature.

A few comments from me.

lld/ELF/Config.h
158

Unsorted (shold be placed before gcSections + should be gcDebugInfo I think.

lld/ELF/Driver.cpp
2006

Looking at this piece I think that 915, 916 lines should just be:

config->gcDebuginfo = config->gcSections &&
      args.hasFlag(OPT_gc_debuginfo, OPT_no_gc_debuginfo, false);

Perhaps worth to report an error when we have --gc-debug info, but not --gc-sections, but I am not sure here.

lld/ELF/LLDDwarfLinker.cpp
66

We usually do not do such things in LLD. I.e. do not assert on options when the code is trivial like in your case:

if (config->gcSections && config->gcDebuginfo)
  linkDebugInfo<ELFT>();
83

Please do not use new. Is should be something like:

dwarfContexts.push_back(std::make_unique...

88

It would probably be a bit more natural to:

std::unique_ptr<DwarfLinkerObjFile> Obj = std::make_unique<DwarfLinkerObjFile>(
              file->getName(), dwarfObjFile, emptyWarnings);
Obj->Addresses = ;
debugInfoLinker.addObjectFile(*Obj);
objectsForLinking.push_back(std::move(Obj));
93

Addresses = std::make_unique<ObjFileAddressMap>();

lld/ELF/LLDDwarfLinker.h
17

I am not an expert in English, though would rewrite to something like:

// This function is used to remove an unused parts of debug data
// which belongs to garbage collected sections.

(I guess othes might have a better variant though).

lld/test/ELF/gc-debuginfo.s
7

No need to use that much asm:

llvm-mc /dev/null -o %t.o -filetype=obj -triple=aarch64-pc-linux

15

I'd add a test for -r --gc-debuginfo error too since you have this code.

grimar added a comment.EditedSat, Feb 15, 6:27 AM
In D74169#1863848, @avl wrote:

Creating full set of patches which could be any time applied to the upstream and to the each other is quite time consuming thing because of time for synchronization/merging/rebasing.

I do not think them needed to be in state "any time applied to the upstream".
"any time" is indeed too much.

Perhaps we could do something like the following: create a single large patch with all of features which can be applied. And upload it somewhere here as [not for commit].
Then reviewers should be able to review this single patch, but in case of questions they should be able to apply the huge patch to see how the whole feature works.
In this case you'll need to update one huge patch from time to time, e.g. rebase it from time to time after commits of smaller ones. And it should be much simpler.
How does it sound? (the same question for others).

avl added a comment.Sun, Feb 16, 1:42 PM

That is OK. The only thing is that I planned to integrate small fixes as soon as they are ready. So I do not have a full implementation at the moment. I will create a single large patch with all of the features sometime later as soon as it would be fully ready.

MaskRay added a comment.EditedSun, Feb 16, 2:22 PM

I agree with

if the additional complexity is not too much, and the new code is nicely isolated from existing code. I think I agree with you that linker is perhaps the best place to drop dead DWARF info.

(One thing I plan to work on is .debug_names accelerator table, which is similar to but more power than .gdb_index.)

On one side, I've heard thoughts that: (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable? (2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?
On the other side, I think this approach may be better than D54747 (reverted by rL360955) and may address the problem it tried to mitigate. Given the savings, we may bend our mind a bit and give more tolerance even if the additional complexity is more than not too much.

I am reluctant to let individual patches in before we can see the benefits, because they could easily turn out to be insufficient and cause churns. There is still possibility that we find --gc-debuginfo not suitable on the linker side and may want to do it in a post-link tool.