This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF] __start_/__stop_ refs don't retain C-ident named group sections
ClosedPublic

Authored by phosek on Feb 15 2021, 11:54 PM.

Details

Summary

The special root semantics for identifier-named sections is meant
specifically for the metadata sections. In the context of group
semantics, where group members are always retained or discarded as a
unit, it's natural not to have this semantics apply to a section in a
group, otherwise we would never discard the group defeating the purpose
of using the group in the first place.

This change modifies the GC behavior so that start_/stop_ references
don't retain C identifier named sections in section groups which allows
for these groups to be collected. This matches the behavior of BFD ld.

The only kind of existing case that might break is interdependent
metadata sections that are all in a group together, but that group
doesn't contain any other sections referenced by anything except
implicit inclusion in a __start_ and/or __stop_-referenced
identifier-named section, but such cases should be unlikely.

Diff Detail

Event Timeline

phosek created this revision.Feb 15 2021, 11:54 PM
phosek requested review of this revision.Feb 15 2021, 11:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2021, 11:54 PM
MaskRay added a comment.EditedFeb 16 2021, 12:51 AM

I actually want to get rid of the "C identifier name sections have GC root semantics" https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order#c-identifier-name-sections
See also https://sourceware.org/bugzilla/show_bug.cgi?id=27259#c9 . I started a generic-abi thread as Alan Modra requested.
I would sent a patch like this as well but I am waiting for some feedback.

If we have had SHF_GNU_RETAIN earlier, we would not have needed the special case. Do you know what sections we may break by deleting the isValidCIdentifier(sec->name) branch?

mcgrathr accepted this revision.Feb 16 2021, 1:14 PM

I think this clearly the correct semantics. The whole purpose of groups really is for GC and de-dup behavior. The "automatic" anchor semantics for identifier-named sections clearly only makes sense for sections outside groups where there is no other way to tell if they are meant to be referenced. When such a section is in a group, it clearly is meant to go with the symbol-referenced entities defined in the group (i.e. the function or data object whose symbol is the group's signature symbol in the COMDAT case). Group semantics require that the whole group be kept or discarded together, so if an identifier-named section within a group were always kept, then that whole group must always be kept--keeping the identifier named sections while discarding others in the same group would clearly be a bug by the spec. Keeping the whole group because of the identifier-named section inside it is clearly not correct for GC or COMDAT cases.

This revision is now accepted and ready to land.Feb 16 2021, 1:14 PM

Code LG. I hope in the future we can entirely drop the code.

else if (isValidCIdentifier(sec->name) && !sec->nextInSectionGroup) {
  cNamedSections[saver.save("__start_" + sec->name)].push_back(sec);
  cNamedSections[saver.save("__stop_" + sec->name)].push_back(sec);
}

I'll check how to make __attribute__((used)) set SHF_GNU_RETAIN.


GC semantics are unfortunate. We have a special rule for SHT_NOTE, which has confused FreeBSD folks a bit:
(See the code touched by D70146)

+  case SHT_NOTE:
+    // SHT_NOTE sections in a group are subject to garbage collection.
+    return !sec->nextInSectionGroup;
lld/test/ELF/gc-sections-group-startstop.s
1 ↗(On Diff #323894)

The test can be written in gc-sections-metadata-startstop.s

32 ↗(On Diff #323894)

bbb and ccc are not needed.

You can rename startstop-gccollect.s to gc-sections-* and add a section group case.

This seems entirely reasonable to me. Actually, I wouldn't be entirely surprised if some users don't expect the current preservation behaviour, even for those that aren't group members.

MaskRay accepted this revision.Feb 19 2021, 1:39 PM

Don't consider C identifier-named sections as GC roots

The description needs to be fixed. It is "__start_/__stop_ references retain C identifier name sections". When __start_/__stop_ references are guaranteed to be present in live sections, C identifier-named sections _behave like_ GC roots.

phosek updated this revision to Diff 325268.Feb 20 2021, 6:17 PM
phosek marked 2 inline comments as done.
phosek retitled this revision from [lld][ELF] Don't consider C identifier-named sections as GC roots to [lld][ELF] __start_/__stop_ refs don't retain C-ident named group sections.
phosek edited the summary of this revision. (Show Details)

Updated test and change description.

@MaskRay let me know if the updated test and change description looks good to you.

MaskRay accepted this revision.Feb 20 2021, 7:29 PM

Looks great!

lld/test/ELF/gc-sections-startstop.s