This is an archive of the discontinued LLVM Phabricator instance.

Respect section groups in GC
AbandonedPublic

Authored by eugenis on Jan 9 2017, 2:14 PM.

Details

Reviewers
ruiu
pcc
rnk
Summary

Treat sections that are part of a group as a unit in GC. This effectively makes ELF section groups behave as associative comdats in COFF.

The current implementation uses section groups for selection (i.e. discard duplicate definitions), but then ignores them in GC, potentially discarding parts of a section group.

  • The new implementation matches the BFD linker behavior, while the current implementation matches Gold.

This is also necessary for proper GC of globals in AddressSanitizer. To give some context, ASan emits a "metadata global" for each application global, and needs to iterate over them at runtime. Metadata globals should not affect the liveness of original globals, and they need to be preserved if and only if the corresponding global is preserved.

Diff Detail

Repository
rL LLVM

Event Timeline

eugenis updated this revision to Diff 83695.Jan 9 2017, 2:14 PM
eugenis retitled this revision from to Respect section groups in GC.
eugenis updated this object.
eugenis added reviewers: pcc, rnk, ruiu.
eugenis set the repository for this revision to rL LLVM.
eugenis added a subscriber: llvm-commits.
rnk edited edge metadata.Jan 9 2017, 2:19 PM

Great! I'm glad we can do this without extending ELF.

What happens when gold users link asan objects with non-comdat section groups? Does gold simply fail to GC globals in the group, or does it do something bad? If we can treat this as a gold quality-of-implementation issue, we can always emit the section groups this way in ASan.

Yes, Gold simply fails to GC asan globals. My next change would unconditionally enable the new codegen in ASan.

ruiu edited edge metadata.Jan 9 2017, 4:56 PM

At first this seems like a good change, but after thinking for a while, it started feeling that this is not a good one, as this is effectively extending the ELF format by adding a new meaning to SHT_GROUP. As per the ELF spec, SHT_GROUP is a mechanism to group and uniquify sections, and the spec doesn't say anything about GC. Even GNU linkers don't agree with each other. You could argue that it implies that sections in a SHT_GROUP need to be handled as one unit by GC, but you could also argue that it has no business with GC as it doesn't violate any specifications. So it's a dark corner, and I don't think we should rely on such thing.

I wonder if there's a different way to keep them alive. One thing I can think of is to add a dummy relocation (something like R_X86_64_NONE) to a section so that if the section is alive, the other sections will become alive, without actually doing any relocations. Does it work?

pcc edited edge metadata.Jan 9 2017, 5:01 PM
In D28481#640798, @ruiu wrote:

At first this seems like a good change, but after thinking for a while, it started feeling that this is not a good one, as this is effectively extending the ELF format by adding a new meaning to SHT_GROUP. As per the ELF spec, SHT_GROUP is a mechanism to group and uniquify sections, and the spec doesn't say anything about GC. Even GNU linkers don't agree with each other. You could argue that it implies that sections in a SHT_GROUP need to be handled as one unit by GC, but you could also argue that it has no business with GC as it doesn't violate any specifications. So it's a dark corner, and I don't think we should rely on such thing.

I wouldn't necessarily draw that conclusion, it may simply mean that the gABI requires clarification.

I wonder if there's a different way to keep them alive. One thing I can think of is to add a dummy relocation (something like R_X86_64_NONE) to a section so that if the section is alive, the other sections will become alive, without actually doing any relocations. Does it work?

This alone is not sufficient, because the linker treats sections whose names are C identifiers as GC roots.

rnk added a comment.Jan 9 2017, 5:06 PM
In D28481#640798, @ruiu wrote:

I wonder if there's a different way to keep them alive. One thing I can think of is to add a dummy relocation (something like R_X86_64_NONE) to a section so that if the section is alive, the other sections will become alive, without actually doing any relocations. Does it work?

Currently, linker GC retains all data in sections referenced by start_$section stop_$section. We need to suppress that retention in addition to establishing this inverse reference. How do you propose to do that?

This behavior should also already be suppressed for normal comdat section groups. Consider this assembly: https://ghostbin.com/paste/v5qrp The .init_array global is in a comdat section, and should not be retained by the linker if 'x' is unreferenced.

ruiu added a comment.Jan 9 2017, 5:16 PM

We just cut a corner when writing the code to handle section names valid as C identifiers. Because the number of such sections are usually small, we just made all such sections alive. However, if you need more precise GC, I think you could do that.

Can I ask why ASAN uses start_ and end_ symbols?

pcc added a comment.Jan 9 2017, 5:53 PM
In D28481#640854, @ruiu wrote:

We just cut a corner when writing the code to handle section names valid as C identifiers. Because the number of such sections are usually small, we just made all such sections alive. However, if you need more precise GC, I think you could do that.

How would you propose to change the GC?

Can I ask why ASAN uses start_ and end_ symbols?

The __start_ and __stop_ symbols define the address range in which the metadata globals are stored. See Evgeniy's other change, D28498, which sets up a call to __asan_register_globals passing in that address range.

ruiu added a comment.Jan 9 2017, 6:38 PM

Hm, now I recall why we did that for C identifier sections. Sections with C identifier names can be garbage collected if no one is referring them, but that cannot be determined until we create output sections, and when we create output sections, it is usually too late to do GC. We could do something tricky to handle that, but it's probably not worth it.

So, maybe I'm fine with this. But I think the implementation can be simplified. It feels it is doing too much ahead of time. Also the new condition Sec->GroupSections.size() == 0 && isValidCIdentifier(S) definitely needs a comment.

In D28481#640940, @ruiu wrote:

Hm, now I recall why we did that for C identifier sections. Sections with C identifier names can be garbage collected if no one is referring them, but that cannot be determined until we create output sections, and when we create output sections, it is usually too late to do GC. We could do something tricky to handle that, but it's probably not worth it.

So, maybe I'm fine with this. But I think the implementation can be simplified. It feels it is doing too much ahead of time.

Could you elaborate? All this code does is store two words per each section that participates in a group. We do need to store the inverse dependency somewhere...

Also the new condition Sec->GroupSections.size() == 0 && isValidCIdentifier(S) definitely needs a comment.

Will do. Also, it sounds like we want to extend this logic to all special sections (like .ctors below). Basically,
if (Sec->GroupSections.size() != 0) return false;
at the beginning of isReserved().

ruiu added a comment.Jan 10 2017, 3:24 PM
In D28481#640940, @ruiu wrote:

Hm, now I recall why we did that for C identifier sections. Sections with C identifier names can be garbage collected if no one is referring them, but that cannot be determined until we create output sections, and when we create output sections, it is usually too late to do GC. We could do something tricky to handle that, but it's probably not worth it.

So, maybe I'm fine with this. But I think the implementation can be simplified. It feels it is doing too much ahead of time.

Could you elaborate? All this code does is store two words per each section that participates in a group. We do need to store the inverse dependency somewhere...

Please avoid instantiating sections that are to be nullified with Discarded section. Previously, we didn't instantiate sections if they are discarded, but with this patch, all sections are instantiated whether they are going to be discarded or not.

eugenis updated this revision to Diff 83886.Jan 10 2017, 3:41 PM
eugenis edited edge metadata.
In D28481#642004, @ruiu wrote:
In D28481#640940, @ruiu wrote:

Hm, now I recall why we did that for C identifier sections. Sections with C identifier names can be garbage collected if no one is referring them, but that cannot be determined until we create output sections, and when we create output sections, it is usually too late to do GC. We could do something tricky to handle that, but it's probably not worth it.

So, maybe I'm fine with this. But I think the implementation can be simplified. It feels it is doing too much ahead of time.

Could you elaborate? All this code does is store two words per each section that participates in a group. We do need to store the inverse dependency somewhere...

Please avoid instantiating sections that are to be nullified with Discarded section. Previously, we didn't instantiate sections if they are discarded, but with this patch, all sections are instantiated whether they are going to be discarded or not.

Of course. Good point. Fixed.

ruiu added inline comments.Jan 10 2017, 3:52 PM
ELF/InputFiles.cpp
296–303

Revert these cosmetic changes.

339

Use ArrayRef<typename ELFT::Word> instead of auto.

341–343

It shouldn't fail this time because we have the same error check above, so you don't need this check. Then this can be written as

for (uint32_t SecIndex : GroupSections)
  if (Sections[SecIndex] != &InputSection<ELFT>::Discarded)
    Sections[SecIndex]->GroupSections = GroupSections;
344–345

It cannot be nullptr, no? I think this can only be Discarded or a regular section.

ELF/InputSection.h
95

This needs a comment. Please describe that if sections are grouped by SHT_GROUP, we handle the sections as a unit in garbage collection, because that is needed for at least ASAN although it is not clearly documented in the standard document.

ELF/MarkLive.cpp
83–84

// Visit relocations.

92

// Visit a reverse link of SHF_LINK_ORDER if exists.

94

// Visit SHT_GROUP members if this section is in a group.

174

Please explain why you want that behavior, in addition to how this code behaves.

eugenis updated this revision to Diff 83897.Jan 10 2017, 4:58 PM
eugenis marked 7 inline comments as done.
eugenis added inline comments.
ELF/InputFiles.cpp
296–303

Done, but this file is not clang-format clean.

341–343

The check above runs for discarded groups only.

344–345

These cases above don't seem to set the section pointer:

case SHT_SYMTAB:
 case SHT_SYMTAB_SHNDX:
 case SHT_STRTAB:
 case SHT_NULL:
ruiu accepted this revision.Jan 10 2017, 5:32 PM
ruiu edited edge metadata.

LGTM with nits.

ELF/MarkLive.cpp
83–84

This belongs to this entire if { ~ } else { ~ } and not to this if clause, so it should be written before if.

95

auto -> uint32_t

187

nit: use empty() if ArrayRef has that function.

This revision is now accepted and ready to land.Jan 10 2017, 5:32 PM
pcc added inline comments.Jan 10 2017, 7:05 PM
ELF/MarkLive.cpp
178

Can it, though? I think we still need to run the global initializer since it may have observable side effects other than initializing the global.

rnk added inline comments.Jan 10 2017, 7:17 PM
ELF/MarkLive.cpp
178

This is a valid concern. MSVC actually puts '/include:globalsym' in the .drective section to ensure that such globals are not garbage collected.

eugenis abandoned this revision.Jun 12 2017, 11:26 AM