When we mark all the reachable sections, we might end up with a InputSection<ELFT>::Discarded section in the queue. This section has no successor, we can't get the object associated, so pretty much every operation we want to run on it it's pointless and cause a crash. Detect this case and bail out early.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Sections are discarded mainly because they are comdat sections and deduplicated. Comdat sections are discarded as a group, so I believe there's no reference pointing to discarded sections from outside of the group. That means I think we shouldn't see discarded sections in the mark-sweep collector here. What am I missing?
I'm not very familiar with the algorihtm, so I may miss something (or at least, not as familiar as the person who wrote it :)), but, it's not entirely clear to me what prevents &InputSection<ELFT>::Discarded to reach the mark&sweep algorithm.
In initializeSections() we set Sections[I] == &InputSection<ELFT>::Discarded and never change it. In the Enqueue function we might end up enqueuing a DiscardedSection if ResolvedReloc refers to it.
To convince myself this was actually possible in the current code, I put the following assertion in Enqueue:
if (InputSection<ELFT> *S = dyn_cast<InputSection<ELFT>>(R.Sec)) { if (S == &InputSection<ELFT>::Discarded) llvm_unreachable("foo"); Q.push_back(S); }
So, are you saying that a ResolvedReloc which points to &InputSection<ELFT>::Discarded is not possible? If so, the current code violates this invariant and needs to be fixed.
Quote from https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter7-26.html:
References to the sections comprising a group from sections outside of the group must be made through symbol table entries with STB_GLOBAL or STB_WEAK binding and section index SHN_UNDEF.
comdat.s contains relocations directly referring a section comprising a comdat group from outside of the group, so it is an invalid object file as per the ELF spec. Even with this change, comdat.s wouldn't be linked properly because it has a dangling relocation that points to a dedup'ed (and removed) section.
So even though we might want to add an error check here (I'm not sure if we really want it though), I think ignoring the error here is not a right thing.
So, lld is currently crashing on this test. I personally consider it a bug. I'd be happy to turn this into an error, if you don't mind.
Yeah, I think we should check this error. We want to report the error even without --gc-sections.
This is going to be really fun. I was very puzzled because I wasn't able to find out why this was failing. So, I tried to debug with VS on Windows and realized I wasn't able to reproduce the crash. I then tried with gcc 6.1 on Linux and also noticed it doesn't crash there. So this only happens with clang from svn (1 month ago or such). I'll update clang to ToT to see if I can reproduce, then probably open a bug if it still happens. Meh.
The reason why gcc doesn't crash (and presumably also MSVC) is that this code is hitting UB.
getFile() == nullptr, but gcc decides to not crash, instead it returns a bogus pointer for the object pointed by getFile()->getObj().
diff --git a/ELF/MarkLive.cpp b/ELF/MarkLive.cpp index 4965377..0c6d510 100644 --- a/ELF/MarkLive.cpp +++ b/ELF/MarkLive.cpp @@ -81,6 +81,14 @@ static ResolvedReloc<ELFT> resolveReloc(InputSectionBase<ELFT> &Sec, template <class ELFT> static void forEachSuccessor(InputSection<ELFT> &Sec, std::function<void(ResolvedReloc<ELFT>)> Fn) { + + if (&Sec == &InputSection<ELFT>::Discarded) { + if (Sec.getFile() == nullptr) { + ELFFile<ELFT> &Obj = Sec.getFile()->getObj(); + llvm::errs() << "Still alive: " << &Obj << "\n"; + } + } + ELFFile<ELFT> &Obj = Sec.getFile()->getObj(); for (const typename ELFT::Shdr *RelSec : Sec.RelocSections) { if (RelSec->sh_type == SHT_RELA) {
$ /home/davide/work/llvm/build-gcc/./bin/ld.lld -shared /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp.o /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp.o /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp2.o -o /home/davide/work/llvm/build-gcc/tools/lld/test/ELF/Output/comdat.s.tmp --gc-sections Still alive: 0x48
Circling back to this. I implemented a diagnostic in https://reviews.llvm.org/D24837
I think we should still skip sections here. They're not compliant to the ELF spec but the comment in Symbols.cpp suggests we can't do otherwise.
// According to the ELF spec reference to a local symbol from outside // the group are not allowed. Unfortunately .eh_frame breaks that rule // and must be treated specially. For now we just replace the symbol with // 0. if (SC == &InputSection<ELFT>::Discarded) return 0;
Already other parts of this algorithm skip the section if IS == &InputSection<ELFT>::Discarded and the reasons why we didn't notice in the non --gc-sections case I think is that markLive() runs earlier than scanRelocs() , does resolve relocation itself and there's no special handling in there for ::Discarded.
That said, I think that a better place to put this check would be the Enqueue lambda. That goes without saying, I might be mistaken, so, comments welcome.
LGTM with the following changes.
ELF/MarkLive.cpp | ||
---|---|---|
84 ↗ | (On Diff #71894) | Instead of describing what we are doing here, I'd describe why we need this. Please mention that ideally this shouldn't happen but unfortunately we need this because a relocation may directly point to a dedup'ed comdat section even though the ELF spec doesn't allow such relocations. |
test/ELF/comdat.s | ||
17 ↗ | (On Diff #71894) | Remove one extra blank line. |