This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Don't emit SHF_GNU_RETAIN on Solaris
ClosedPublic

Authored by ro on Aug 9 2021, 1:10 AM.

Details

Summary

The introduction of SHF_GNU_RETAIN has caused massive problems on Solaris.

Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and
x86. The objects were marked as ELFOSABI_NONE, but SHF_GNU_RETAIN is a GNU
extension. In the native Solaris ABI, that flag (in the range for OS specific values) is
SHF_SUNW_ABSENT with a completely different semantics, which confuses Solaris ld
very much.

Later, the objects became (correctly) marked ELFOSABI_GNU, which Solaris ld
doesn't support, causing it to SEGV and break the build. The linker is currently being
hardened to not accept non-native OS ABIs to avoid this.

The need for linker support is already documented in clang/include/clang/Basic/AttrDocs.td,
but not currently checked.

This patch avoids all this by not emitting SHF_GNU_RETAIN on Solaris at all.

Tested on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11, and x86_64-pc-linux-gnu.

The current patch is a minimal solution. For full generality, any code emitting extensions
beyond the ELF gABI should check that all of the assembler, linker, and runtime linker
used support it. While in some cases assembler support is enough, in others all three
components will be needed.

If support for SHF_GNU_RETAIN is crucial, Solaris has an equivalent flag SHF_SUNW_NODISCARD
that could be used instead.

Diff Detail

Event Timeline

ro created this revision.Aug 9 2021, 1:10 AM
ro requested review of this revision.Aug 9 2021, 1:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2021, 1:10 AM

This is critical for programs which use llvm.used.

int foo();
template <class T>
inline T x = foo();

int var = x<int>;

The dynamic initializer of x<int> is in a COMDAT section group.
If Solaris ld can GC SHT_INIT_ARRAY in a section group, a SHF_GNU_RETAIN counterpart is needed to make the program work.


Instrumentation based PGO has names variables referenced by llvm.used. Without a SHF_GNU_RETAIN counterpart indirect call profiling cannot work.

According to D95749, the rationale for allowing ELFOSABI_NONE with SHF_GNU_RETAIN is to keep consistent behaviour with STT_GNU_IFUNC and STB_GNU_UNIQUE, which are permitted under ELFOSABI_NONE. @MaskRay Should we perhaps revisit that decision?

More details at https://sourceware.org/bugzilla/show_bug.cgi?id=27282

This doesn't seem like the right fix. I would suggest either:

  1. Add SHF_SUNW_NODISCARD to flags instead of SHF_GNU_RETAIN if Retain is set and the target is Solaris (if they do in fact have the same semantics)
  2. Retain is set based on llvm.compiler.used (D97448). If this behaviour is not wanted then set Retain = Used.count(...) && !Solaris.
  3. Add a support check function as you describe here:

For full generality, any code emitting extensions beyond the ELF gABI should check that all of the assembler, linker, and runtime linker used support it.

According to D95749, the rationale for allowing ELFOSABI_NONE with SHF_GNU_RETAIN is to keep consistent behaviour with STT_GNU_IFUNC and STB_GNU_UNIQUE, which are permitted under ELFOSABI_NONE. @MaskRay Should we perhaps revisit that decision?

More details at https://sourceware.org/bugzilla/show_bug.cgi?id=27282

The Solaris Linkers and Loader guide doesn't say the section type value 10 is used.
We can mark STT_GNU_IFUNC as ELFOSABI_GNU, it just feels to me not very necessary. (It can be used by multiple other OSABI values. All these OSABI need to something similar to D107748 even if NONE works)

I think STB_GNU_UNIQUE is a mistake.
Clang doesn't emit it.
So marking it as ELFOSABI_GNU isn't too necessary.

ro added a comment.Aug 10 2021, 5:31 AM

Instrumentation based PGO has names variables referenced by llvm.used. Without a SHF_GNU_RETAIN counterpart indirect call profiling cannot work.

But isn't this the same with pre-binutils 2.36 GNU ld and still there are no precautions about this case?

ro added a comment.Aug 10 2021, 5:35 AM

According to D95749, the rationale for allowing ELFOSABI_NONE with SHF_GNU_RETAIN is to keep consistent behaviour with STT_GNU_IFUNC and STB_GNU_UNIQUE, which are permitted under ELFOSABI_NONE. @MaskRay Should we perhaps revisit that decision?

More details at https://sourceware.org/bugzilla/show_bug.cgi?id=27282

The Solaris Linkers and Loader guide doesn't say the section type value 10 is used.
We can mark STT_GNU_IFUNC as ELFOSABI_GNU, it just feels to me not very necessary. (It can be used by multiple other OSABI values. All these OSABI need to something similar to D107748 even if NONE works)

Solaris <sys/elf.h> has

/*
 * GNU/Linux specific symbol type not used by Solaris
 */
#define	STT_GNU_IFUNC	10

so in this particular case, it's aware about the possible conflict with different OS ABIs.

I think STB_GNU_UNIQUE is a mistake.
Clang doesn't emit it.
So marking it as ELFOSABI_GNU isn't too necessary.

Same here:

/*
 * GNU/Linux specific binding not used by Solaris
 */
#define	STB_GNU_UNIQUE	10

However, as at least the SHF_GNU_RETAIN vs. SHF_SUNW_ABSENT case shows, the possibility of conflict in the OS-specific ranges
is quite real, so ignoring it is dangerous.

ro added a comment.Aug 10 2021, 5:40 AM

According to D95749, the rationale for allowing ELFOSABI_NONE with SHF_GNU_RETAIN is to keep consistent behaviour with STT_GNU_IFUNC and STB_GNU_UNIQUE, which are permitted under ELFOSABI_NONE. @MaskRay Should we perhaps revisit that decision?

More details at https://sourceware.org/bugzilla/show_bug.cgi?id=27282

Thanks for the background.

This doesn't seem like the right fix. I would suggest either:

  1. Add SHF_SUNW_NODISCARD to flags instead of SHF_GNU_RETAIN if Retain is set and the target is Solaris (if they do in fact have the same semantics)
  2. Retain is set based on llvm.compiler.used (D97448). If this behaviour is not wanted then set Retain = Used.count(...) && !Solaris.
  3. Add a support check function as you describe here:

For full generality, any code emitting extensions beyond the ELF gABI should check that all of the assembler, linker, and runtime linker used support it.

I'm quite wary to try 3), in particular at this point in time: the release/13.x branch is affected by the same issue, and I'd like to unbreak the release in time. The larger/more complex a fix is, the less likely it's allowed to get in. So my goal would be a minimal fix right now.

However, I believe either 2) (which is semantically equivalent to my current patch) or 1) (which may require a bit more legwork to make the new section flag known) are doable and acceptable for 13.x.

MaskRay accepted this revision.Aug 10 2021, 11:59 AM
In D107747#2936740, @ro wrote:

Instrumentation based PGO has names variables referenced by llvm.used. Without a SHF_GNU_RETAIN counterpart indirect call profiling cannot work.

But isn't this the same with pre-binutils 2.36 GNU ld and still there are no precautions about this case?

The issue can be mitigated by a hack in GNU ld and gold:
https://maskray.me/blog/2021-01-31-metadata-sections-comdat-and-shf-link-order#reflection
(In the Solaris Linkers and Libraries guide, such __start_ and __stop_ symbols are called encapsulation symbols.)

This revision is now accepted and ready to land.Aug 10 2021, 11:59 AM
In D107747#2936744, @ro wrote:

According to D95749, the rationale for allowing ELFOSABI_NONE with SHF_GNU_RETAIN is to keep consistent behaviour with STT_GNU_IFUNC and STB_GNU_UNIQUE, which are permitted under ELFOSABI_NONE. @MaskRay Should we perhaps revisit that decision?

More details at https://sourceware.org/bugzilla/show_bug.cgi?id=27282

The Solaris Linkers and Loader guide doesn't say the section type value 10 is used.
We can mark STT_GNU_IFUNC as ELFOSABI_GNU, it just feels to me not very necessary. (It can be used by multiple other OSABI values. All these OSABI need to something similar to D107748 even if NONE works)

Solaris <sys/elf.h> has

/*
 * GNU/Linux specific symbol type not used by Solaris
 */
#define	STT_GNU_IFUNC	10

so in this particular case, it's aware about the possible conflict with different OS ABIs.

There are quite a few OS ABIs which have adopted IFUNC.
Currently I am unsure whether the ELFOSABI_GNU marker can make them unhappy.

I think STB_GNU_UNIQUE is a mistake.
Clang doesn't emit it.
So marking it as ELFOSABI_GNU isn't too necessary.

MC supports @gnu_unique_object. I can change it to use ELFOSABI_GNU.

Clang doesn't emit it anyway. (An example complaint: https://news.ycombinator.com/item?id=21555752)

This revision was automatically updated to reflect the committed changes.