This is an archive of the discontinued LLVM Phabricator instance.

[GC] Don't crash while processing Discarded sections
ClosedPublic

Authored by davide on Sep 19 2016, 5:13 PM.

Details

Summary

When we mark all the reachable sections, we might end up with a InputSection<ELFT>::Discarded section in the queue. This section has no successor, we can't get the object associated, so pretty much every operation we want to run on it it's pointless and cause a crash. Detect this case and bail out early.

Diff Detail

Repository
rL LLVM

Event Timeline

davide updated this revision to Diff 71894.Sep 19 2016, 5:13 PM
davide retitled this revision from to [GC] Don't crash while processing Discarded sections.
davide updated this object.
davide added a reviewer: ruiu.
davide added a subscriber: llvm-commits.
ruiu edited edge metadata.Sep 19 2016, 5:53 PM

Sections are discarded mainly because they are comdat sections and deduplicated. Comdat sections are discarded as a group, so I believe there's no reference pointing to discarded sections from outside of the group. That means I think we shouldn't see discarded sections in the mark-sweep collector here. What am I missing?

In D24750#547093, @ruiu wrote:

Sections are discarded mainly because they are comdat sections and deduplicated. Comdat sections are discarded as a group, so I believe there's no reference pointing to discarded sections from outside of the group. That means I think we shouldn't see discarded sections in the mark-sweep collector here. What am I missing?

I'm not very familiar with the algorihtm, so I may miss something (or at least, not as familiar as the person who wrote it :)), but, it's not entirely clear to me what prevents &InputSection<ELFT>::Discarded to reach the mark&sweep algorithm.
In initializeSections() we set Sections[I] == &InputSection<ELFT>::Discarded and never change it. In the Enqueue function we might end up enqueuing a DiscardedSection if ResolvedReloc refers to it.

To convince myself this was actually possible in the current code, I put the following assertion in Enqueue:

if (InputSection<ELFT> *S = dyn_cast<InputSection<ELFT>>(R.Sec)) {
  if (S == &InputSection<ELFT>::Discarded)
    llvm_unreachable("foo");
  Q.push_back(S);
}

So, are you saying that a ResolvedReloc which points to &InputSection<ELFT>::Discarded is not possible? If so, the current code violates this invariant and needs to be fixed.

ruiu added a comment.Sep 20 2016, 3:32 PM

Quote from https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter7-26.html:

References to the sections comprising a group from sections outside of the group must be made through symbol table entries with STB_GLOBAL or STB_WEAK binding and section index SHN_UNDEF.

comdat.s contains relocations directly referring a section comprising a comdat group from outside of the group, so it is an invalid object file as per the ELF spec. Even with this change, comdat.s wouldn't be linked properly because it has a dangling relocation that points to a dedup'ed (and removed) section.

So even though we might want to add an error check here (I'm not sure if we really want it though), I think ignoring the error here is not a right thing.

So, lld is currently crashing on this test. I personally consider it a bug. I'd be happy to turn this into an error, if you don't mind.

ruiu added a comment.Sep 20 2016, 3:43 PM

Yeah, I think we should check this error. We want to report the error even without --gc-sections.

This is going to be really fun. I was very puzzled because I wasn't able to find out why this was failing. So, I tried to debug with VS on Windows and realized I wasn't able to reproduce the crash. I then tried with gcc 6.1 on Linux and also noticed it doesn't crash there. So this only happens with clang from svn (1 month ago or such). I'll update clang to ToT to see if I can reproduce, then probably open a bug if it still happens. Meh.

The reason why gcc doesn't crash (and presumably also MSVC) is that this code is hitting UB.
getFile() == nullptr, but gcc decides to not crash, instead it returns a bogus pointer for the object pointed by getFile()->getObj().

diff --git a/ELF/MarkLive.cpp b/ELF/MarkLive.cpp
index 4965377..0c6d510 100644
--- a/ELF/MarkLive.cpp
+++ b/ELF/MarkLive.cpp
@@ -81,6 +81,14 @@ static ResolvedReloc<ELFT> resolveReloc(InputSectionBase<ELFT> &Sec,
 template <class ELFT>
 static void forEachSuccessor(InputSection<ELFT> &Sec,
                              std::function<void(ResolvedReloc<ELFT>)> Fn) {
+
+  if (&Sec == &InputSection<ELFT>::Discarded) {
+    if (Sec.getFile() == nullptr) {
+      ELFFile<ELFT> &Obj = Sec.getFile()->getObj();
+      llvm::errs() << "Still alive: " << &Obj << "\n";
+    }
+  }
+
   ELFFile<ELFT> &Obj = Sec.getFile()->getObj();
   for (const typename ELFT::Shdr *RelSec : Sec.RelocSections) {
     if (RelSec->sh_type == SHT_RELA) {
$ /home/davide/work/llvm/build-gcc/./bin/ld.lld -shared /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp.o /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp.o /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp2.o -o /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp --gc-sections
Still alive: 0x48

Circling back to this. I implemented a diagnostic in https://reviews.llvm.org/D24837

I think we should still skip sections here. They're not compliant to the ELF spec but the comment in Symbols.cpp suggests we can't do otherwise.

// According to the ELF spec reference to a local symbol from outside
// the group are not allowed. Unfortunately .eh_frame breaks that rule
// and must be treated specially. For now we just replace the symbol with
// 0.
if (SC == &InputSection<ELFT>::Discarded)
  return 0;

Already other parts of this algorithm skip the section if IS == &InputSection<ELFT>::Discarded and the reasons why we didn't notice in the non --gc-sections case I think is that markLive() runs earlier than scanRelocs() , does resolve relocation itself and there's no special handling in there for ::Discarded.

That said, I think that a better place to put this check would be the Enqueue lambda. That goes without saying, I might be mistaken, so, comments welcome.

ruiu accepted this revision.Sep 22 2016, 12:46 PM
ruiu edited edge metadata.

LGTM with the following changes.

ELF/MarkLive.cpp
84 ↗(On Diff #71894)

Instead of describing what we are doing here, I'd describe why we need this. Please mention that ideally this shouldn't happen but unfortunately we need this because a relocation may directly point to a dedup'ed comdat section even though the ELF spec doesn't allow such relocations.

test/ELF/comdat.s
17 ↗(On Diff #71894)

Remove one extra blank line.

This revision is now accepted and ready to land.Sep 22 2016, 12:46 PM
This revision was automatically updated to reflect the committed changes.