Page MenuHomePhabricator

[ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections
ClosedPublic

Authored by MaskRay on Wed, Feb 17, 3:59 PM.

Details

Summary

For one metadata section usage, each text section references a metadata section.
The metadata sections have a C identifier name to allow the runtime to collect them via __start_/__stop_ symbols.

Since __start_/__stop_ references are always present from live sections, the
C identifier name sections appear like GC roots, which means they cannot be
discarded by ld --gc-sections.

To make such sections GCable, either SHF_LINK_ORDER or a section group is needed.

SHF_LINK_ORDER is not suitable for the references can be inlined into other functions
(See D97430:
Function A (in the section .text.A) references its __sancov_guard section.
Function B inlines A (so now .text.B references __sancov_guard - this is invalid with the semantics of SHF_LINK_ORDER).

In the linking stage,
if .text.A gets discarded, and __sancov_guard is retained via the reference from .text.B,
the output will be invalid because __sancov_guard references the discarded .text.A.
LLD errors "sh_link points to discarded section".
)

A section group have size overhead, and is cumbersome when there is just one metadata section.

Add -z start-stop-gc to drop the "start_/stop_ references retain
non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections" rule.
We reserve the rights to switch the default in the future.

Diff Detail

Event Timeline

MaskRay created this revision.Wed, Feb 17, 3:59 PM
MaskRay requested review of this revision.Wed, Feb 17, 3:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Feb 17, 3:59 PM
MaskRay updated this revision to Diff 324460.Wed, Feb 17, 4:11 PM

add a test

Personally, I'm really not a fan of hardcoding/special casing names of internal symbols from libclang_rt.profile and I'd prefer a more generic solution like SHF_GNU_RETAIN.

I understand that there would be an issue when you try to link libclang_rt.profile that was compiled by an older GCC/Clang (which doesn't support SHF_GNU_RETAIN) using the latest lld, but I'm not sure if that's a use case we want to support. This was discussed several times in the past, and the conclusion has always been that you should either use the in-tree Clang to build runtimes, or that you should build runtimes separately from the compiler in which case you need to configure the build appropriately.

We could introduce an additional option in the compiler-rt profile build for that purpose which would enable/disable the use of the .data.PERSIST_SIG trick (used by AFL) as one potential solution for older compilers.

MaskRay updated this revision to Diff 324539.Wed, Feb 17, 11:41 PM
MaskRay edited the summary of this revision. (Show Details)

Drop __llvm_prf_* workaround

MaskRay added inline comments.Wed, Feb 17, 11:44 PM
lld/test/ELF/gc-sections-metadata-startstop.s
22

Will delete this stray comment.

phosek added inline comments.Wed, Feb 17, 11:44 PM
lld/test/ELF/gc-sections-metadata-startstop.s
22–30

Leftover from previous version?

phosek accepted this revision.Thu, Feb 18, 11:41 AM

LGTM (% stale comment), thanks for figuring out the solution for __llvm_prf_*.

This revision is now accepted and ready to land.Thu, Feb 18, 11:41 AM
MaskRay edited the summary of this revision. (Show Details)Fri, Feb 19, 3:32 PM
jrtc27 requested changes to this revision.Fri, Feb 19, 5:30 PM
jrtc27 added a subscriber: jrtc27.

FreeBSD uses linker sets extensively. Do not remove this, you will break FreeBSD, both the kernel and userspace.

This revision now requires changes to proceed.Fri, Feb 19, 5:30 PM

Please can you ensure that this is tested with some Objective-C code compiled with -fobjc-runtime=gnustep-2.0? If I am reading the intention correctly, it may result in all of the Objective-C code being dropped from the final link.

These types of reference can be difficult as it is difficult to know what the users intention with respect to gc_root is. I think that the majority of programs don't depend on all sections being marked live but there are definitely programs out there that do. It may not be easy to migrate all these programs to use alternative means such as SHF_RETAIN. Would it be useful to make this a selectable option for users with lots of existing code that depends on these sections being marked live?

As an aside, in Arm's proprietary linker we have had some experience with trying to come up with a more complex way of handling these types of reference as
we found that keeping everything was too restrictive for the use case we had for .ARM.exidx sections. We wanted to always generate unwind tables (.ARM.exidx), but to be able to remove all the .ARM.exidx sections and all the exception handling code from the library that process them if no call to throw was found in the program; in effect a semi-automatic -fno-exceptions. A key part of that was only referencing the .ARM.exidx sections via the equivalent of _start and _stop sections. Rather than special case this we wanted to come up with a general mechansim.

I wish I could say we came up with a good automatic solution but we couldn't find anything particularly justifiable. Our thought was to mark sections that were reachable only via a chain of dependencies (relocations or link-order) from _start and _stop symbols as weakly used. Sections reachable from a GC root were strongly used. A weakly used section could be turned into strongly used if one of the weakly used sections had a dependency on a strongly used section defined in the same object file as the weakly used section (the last bit was in effect a hack as we found the gc ineffective without it). This worked reasonably well for .ARM.exidx sections. However while it was intended to generalise to other use cases, such as throwing away unused C++ static constructors, we found this broke too many programs so the whole mechanism only got used for exceptions.

lld/test/ELF/startstop-gccollect.s
8 ↗(On Diff #324539)

Presumably you'd need to update the comment if the change went through.

MaskRay updated this revision to Diff 325701.Tue, Feb 23, 12:46 AM
MaskRay marked an inline comment as done.
MaskRay retitled this revision from [ELF] Don't let __start_/__stop_ retain C identifier name sections to [ELF] Add -z start-stop-gc to let __start_/__stop_ not retain C identifier name sections.
MaskRay edited the summary of this revision. (Show Details)

Add -z start-stop-gc

It does not link to a usage requiring the rule __start_/__stop_ references retain non-SHF_LINK_ORDER non-SHF_GROUP C identifier name sections.

FreeBSD uses linker sets extensively. Do not remove this, you will break FreeBSD, both the kernel and userspace.

I think I'd like bjk's word: "'that breaks linker sets entirely' seems like something that would benefit from a paragraph or two of additional exposition". Perhaps @dim can help on discussing this on a FreeBSD mailing list (I don't subscribe them).

Please can you ensure that this is tested with some Objective-C code compiled with -fobjc-runtime=gnustep-2.0? If I am reading the intention correctly, it may result in all of the Objective-C code being dropped from the final link.

I know almost nothing about Objective-C. Can you name the section which could be problematic?

% clang -ffunction-sections -fobjc-runtime=gnustep-2.0 -isystem /usr/include/GNUstep a.m
In file included from a.m:1:
/usr/include/GNUstep/Foundation/Foundation.h:31:9: fatal error: 'objc/objc.h' file not found
#import <objc/objc.h>
        ^~~~~~~~~~~~~
1 error generated.

Thanks for the update. I'm happy for this to be an option that defaults to off. That won't break any existing code that is depending on the behavior.

jrtc27 accepted this revision.Wed, Feb 24, 3:07 PM

Thanks, having it opt-in for the times it's needed is fine by me (although anyone opting in should of course be careful that it really is safe). Possibly worth a note in the manpage that it's not always safe?

This revision is now accepted and ready to land.Wed, Feb 24, 3:07 PM
MaskRay edited the summary of this revision. (Show Details)Thu, Feb 25, 3:44 PM
MaskRay edited the summary of this revision. (Show Details)Thu, Feb 25, 3:47 PM

Please can you ensure that this is tested with some Objective-C code compiled with -fobjc-runtime=gnustep-2.0? If I am reading the intention correctly, it may result in all of the Objective-C code being dropped from the final link.

I know almost nothing about Objective-C. Can you name the section which could be problematic?

% clang -ffunction-sections -fobjc-runtime=gnustep-2.0 -isystem /usr/include/GNUstep a.m
In file included from a.m:1:
/usr/include/GNUstep/Foundation/Foundation.h:31:9: fatal error: 'objc/objc.h' file not found
#import <objc/objc.h>
        ^~~~~~~~~~~~~
1 error generated.

Answering this: there are some llvm.used usage for ObjC (D97446). D97448 will handle them.