Page MenuHomePhabricator

[clang] Support clang -fpic -fno-semantic-interposition for AArch64
ClosedPublic

Authored by MaskRay on May 4 2021, 4:44 PM.

Details

Summary

-fno-semantic-interposition can optimize default visibility external linkage
(non-ifunc-non-COMDAT) variable access and function calls to avoid GOT/PLT, by
using local aliases, e.g.

int var;
__attribute__((optnone)) int fun(int x) { return x * x; }
int test() { return fun(var); }

-fpic (var and fun are dso_preemptable)

test:                                   // @test
        adrp    x8, :got:var
        ldr     x8, [x8, :got_lo12:var]
        ldr     w0, [x8]
// fun is preemptible by default in ld -shared mode. ld will create a PLT.
        b       fun

vs -fpic -fno-semantic-interposition (var and fun are dso_local)

test:                                   // @test
.Ltest$local:
        adrp    x8, .Lvar$local
        ldr     w0, [x8, :lo12:.Lvar$local]
// The assembler either resolves .Lfun$local at assembly time, or produces a
// relocation referencing a non-preemptible section symbol (which can avoid PLT).
        b       .Lfun$local

Note: Clang's default -fpic is more aggressive than GCC -fpic: interprocedural
optimizations (including inlining) are available but local aliases are not used.
-fpic -fsemantic-interposition can disable interprocedural optimizations.

Depends on D101872

Diff Detail

Event Timeline

MaskRay created this revision.May 4 2021, 4:44 PM
MaskRay requested review of this revision.May 4 2021, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2021, 4:44 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MaskRay retitled this revision from [clang] Support clang -fpic -fno-semantic-interposition for AArch64 Depends on D101872 to [clang] Support clang -fpic -fno-semantic-interposition for AArch64.May 4 2021, 4:52 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay edited the summary of this revision. (Show Details)

I've no comments on the code in D101872 , and D10873 they look reasonable to me. I guess it is down to whether this is the right thing to do or not.

Just to check my understanding:

  • Clang defaults to -fno-semantic-interposition (GCC I believe has the opposite default fsemantic-interposition)
  • With this change code in the same translation unit that defines the global will use a local alias for the global rather than accessing via the GOT, but the global will still be defined with default visibility.
  • Symbol interposing is still permitted at link time so the global can be interposed, but as the code is using a local alias it will still use the original value.
  • Without this chang clang would use fhalf-no-semantic-interposition which I believe permits some assumptions about symbol interpositioning such as resolving some short range assembly pc-relative references to a local alias. These would be out of range if the symbol were interposed anyway.

If I've got this right, particularly the default then this makes me nervous about the default behaviour as it could silently break some existing code. If a user had to opt in explicitly with -fno-semantic-interposition then fair enough.

Can you let me know, if I'm being overly cautious here? For example are programs that would be affected by this already broken by the half-no-semantic-interpositioning anyway? Is symbol interpositioning so rare that the X86 version of this didn't break anything? Have I got the default of -fno-semantic-interpositioning wrong?

MaskRay added a comment.EditedMay 5 2021, 9:47 AM

I've no comments on the code in D101872 , and D10873 they look reasonable to me. I guess it is down to whether this is the right thing to do or not.

Just to check my understanding:

  • Clang defaults to -fno-semantic-interposition (GCC I believe has the opposite default fsemantic-interposition)

Clang ELF defaults to a state between -fno-semantic-interposition and -fsemantic-interposition (cc1 -fhalf-no-semantic-interposition): interprocedural optimizations (e.g. inlining,IPSCCP) are enabled for default visibility external linkage definitions, but dso_local is not added.
This has been the traditional behavior for years.

-fno-semantic-interposition adds dso_local to default visibility external linkage definitions and leads to .Lfoo$local, which can break a small amount of applications, so I don't want to change the default.

  • With this change code in the same translation unit that defines the global will use a local alias for the global rather than accessing via the GOT, but the global will still be defined with default visibility.

Only when -fno-semantic-interpoistion is explicitly specified, i.e. -fpic -fno-semantic-interpoistion

  • Symbol interposing is still permitted at link time so the global can be interposed, but as the code is using a local alias it will still use the original value.
  • Without this chang clang would use fhalf-no-semantic-interposition which I believe permits some assumptions about symbol interpositioning such as resolving some short range assembly pc-relative references to a local alias. These would be out of range if the symbol were interposed anyway.

If I've got this right, particularly the default then this makes me nervous about the default behaviour as it could silently break some existing code. If a user had to opt in explicitly with -fno-semantic-interposition then fair enough.

Can you let me know, if I'm being overly cautious here? For example are programs that would be affected by this already broken by the half-no-semantic-interpositioning anyway? Is symbol interpositioning so rare that the X86 version of this didn't break anything? Have I got the default of -fno-semantic-interpositioning wrong?

Yes, the first step:) -fno-semantic-interposition is not the default.

https://gist.github.com/MaskRay/c03a90922003df666551589f1629df22 I use this program to check that (1) there is no behavior change for the default and (2) explicit -fno-semantic-interposition gets optimization.

MaskRay added a comment.EditedMay 6 2021, 12:03 AM

https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 has more information about -fno-semantic-interposition.

Can Clang default to -fno-semantic-interposition?

I think we can probably make non-x86 default to -fno-semantic-interposition (dso_local inference, given D72197. x86 may find default -fno-semantic-interposition too aggressive.

https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 has more information about -fno-semantic-interposition.

Can Clang default to -fno-semantic-interposition?

I think we can probably make non-x86 default to -fno-semantic-interposition (dso_local inference, given D72197. x86 may find default -fno-semantic-interposition too aggressive.

Thanks for the link, and the explanation that -fno-semantic-interposition is not the default.

I'm not (yet) convinced that we could make -fno-semantic-interposition the default, primarily due to data and not functions, I agree that interpositioning functions is rarely used. For data the classic example for symbol-interposition was errno, a shared library can't know if any other library or executable will define it so it must define, but it must use only one value for the definition. I'm not sure if that still holds in today's environment with shared C libraries used by practically everything but I think the principle still applies.

Looking at the gist I've got one concern for AArch64 and Arm. The ABI relies on thunks which are only defined for symbols of type STT_FUNC. Changing branches to STT_FUNC to STT_SECTION will break long range thunks on AArch64 and interworking for Arm (there is a possibility that the bottom bit for STT_FUNC may get used in the future for AArch64 as well). This is solvable by keeping the local label and setting STT_FUNC on it.

Looking at the gist I've got one concern for AArch64 and Arm. The ABI relies on thunks which are only defined for symbols of type STT_FUNC. Changing branches to STT_FUNC to STT_SECTION will break long range thunks on AArch64 and interworking for Arm (there is a possibility that the bottom bit for STT_FUNC may get used in the future for AArch64 as well). This is solvable by keeping the local label and setting STT_FUNC on it.

Ooh, I missed this. Yes, we need the symbol attributes. On 32-bit Arm, that includes a .thumb_func directive (MCStreamer::emitThumbFunc) in addition to the STT_FUNC attribute.

https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 has more information about -fno-semantic-interposition.

Can Clang default to -fno-semantic-interposition?

I think we can probably make non-x86 default to -fno-semantic-interposition (dso_local inference, given D72197. x86 may find default -fno-semantic-interposition too aggressive.

Thanks for the link, and the explanation that -fno-semantic-interposition is not the default.

I'm not (yet) convinced that we could make -fno-semantic-interposition the default, primarily due to data and not functions, I agree that interpositioning functions is rarely used. For data the classic example for symbol-interposition was errno, a shared library can't know if any other library or executable will define it so it must define, but it must use only one value for the definition. I'm not sure if that still holds in today's environment with shared C libraries used by practically everything but I think the principle still applies.

errno needs to be thread-local. C11 7.5.2 says "and errno which expands to a modifiable lvalue that has type int and thread local storage duration, the value of which is set to a positive error number by several library functions."
Do you mean that in some environment it may be defined in more than one shared object?

Looking at the gist I've got one concern for AArch64 and Arm. The ABI relies on thunks which are only defined for symbols of type STT_FUNC. Changing branches to STT_FUNC to STT_SECTION will break long range thunks on AArch64 and interworking for Arm (there is a possibility that the bottom bit for STT_FUNC may get used in the future for AArch64 as well). This is solvable by keeping the local label and setting STT_FUNC on it.

I'll unlikely touch 32-bit arm.

For aarch64, aaelf64/aaelf64.rst says

A linker may use a veneer (a sequence of instructions) to implement a relocated branch if the relocation is either

``R_<CLS>_CALL26``, ``R_<CLS>_JUMP26`` or ``R_<CLS>_PLT32`` and:

- The target symbol has type ``STT_FUNC``.

- Or, the target symbol and relocated place are in separate sections input to the linker.

- Or, the target symbol is undefined (external to the link unit).

If bl .Lhigh_target$local and .Lhigh_target$local are in the same section, the fixup is resolved at assembly time;
otherwise, they are in separate sections and satisfy the ABI requirement.

I just changed bl high_target in test/lld/ELF/aarch64-thunk-script.s and noticed that both GNU ld and ld.lld can produce a thunk, regardless of the symbol type.

Thanks for the update.

With the clarification that this isn't breaking aarch64 long range thunks now, and we are not considering Arm then I'm happy for this to happen if the user opts in with -fno-semantic-interposition. I think the longer term question, outside of the scope of this review, about whether -fno-semantic-interposition should be the default is probably one for llvm-dev.

https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 has more information about -fno-semantic-interposition.

Can Clang default to -fno-semantic-interposition?

I think we can probably make non-x86 default to -fno-semantic-interposition (dso_local inference, given D72197. x86 may find default -fno-semantic-interposition too aggressive.

Thanks for the link, and the explanation that -fno-semantic-interposition is not the default.

I'm not (yet) convinced that we could make -fno-semantic-interposition the default, primarily due to data and not functions, I agree that interpositioning functions is rarely used. For data the classic example for symbol-interposition was errno, a shared library can't know if any other library or executable will define it so it must define, but it must use only one value for the definition. I'm not sure if that still holds in today's environment with shared C libraries used by practically everything but I think the principle still applies.

errno needs to be thread-local. C11 7.5.2 says "and errno which expands to a modifiable lvalue that has type int and thread local storage duration, the value of which is set to a positive error number by several library functions."
Do you mean that in some environment it may be defined in more than one shared object?

In the general case it is multiple shared libraries include the same static library that has a global variable, in the normal rules only one of these globals will be used, wheras with -fno-semantic-interposition they will all use individual copies. I don't think that this is common as it is not considered good design, it is just an example of how some programs could be broken in subtle ways if the default were changed.

Looking at the gist I've got one concern for AArch64 and Arm. The ABI relies on thunks which are only defined for symbols of type STT_FUNC. Changing branches to STT_FUNC to STT_SECTION will break long range thunks on AArch64 and interworking for Arm (there is a possibility that the bottom bit for STT_FUNC may get used in the future for AArch64 as well). This is solvable by keeping the local label and setting STT_FUNC on it.

I'll unlikely touch 32-bit arm.

For aarch64, aaelf64/aaelf64.rst says

A linker may use a veneer (a sequence of instructions) to implement a relocated branch if the relocation is either

``R_<CLS>_CALL26``, ``R_<CLS>_JUMP26`` or ``R_<CLS>_PLT32`` and:

- The target symbol has type ``STT_FUNC``.

- Or, the target symbol and relocated place are in separate sections input to the linker.

- Or, the target symbol is undefined (external to the link unit).

If bl .Lhigh_target$local and .Lhigh_target$local are in the same section, the fixup is resolved at assembly time;
otherwise, they are in separate sections and satisfy the ABI requirement.

I just changed bl high_target in test/lld/ELF/aarch64-thunk-script.s and noticed that both GNU ld and ld.lld can produce a thunk, regardless of the symbol type.

OK, so it looks like the "Or, the target symbol and relocated place are in separate sections input to the linker." can cover AArch64.

An area I didn't want to mention earlier as there is no guarantee it will be part of the architecture, or the ABI is Morello. This introduces capabilities into AArch64 https://github.com/ARM-software/abi-aa/blob/main/aaelf64-morello/aaelf64-morello.rst#414symbol-values with an eye to the future where this might be significant. I realise that we can't be hostage to a future that might not come to pass and there can always be "turn fno-semantic-interposition off when Morello is selected" but my instinct is to be cautious as I don't want to make Morello even more difficult than it already is.

Thanks for the update.

With the clarification that this isn't breaking aarch64 long range thunks now, and we are not considering Arm then I'm happy for this to happen if the user opts in with -fno-semantic-interposition. I think the longer term question, outside of the scope of this review, about whether -fno-semantic-interposition should be the default is probably one for llvm-dev.

https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 has more information about -fno-semantic-interposition.

Can Clang default to -fno-semantic-interposition?

I think we can probably make non-x86 default to -fno-semantic-interposition (dso_local inference, given D72197. x86 may find default -fno-semantic-interposition too aggressive.

Thanks for the link, and the explanation that -fno-semantic-interposition is not the default.

I'm not (yet) convinced that we could make -fno-semantic-interposition the default, primarily due to data and not functions, I agree that interpositioning functions is rarely used. For data the classic example for symbol-interposition was errno, a shared library can't know if any other library or executable will define it so it must define, but it must use only one value for the definition. I'm not sure if that still holds in today's environment with shared C libraries used by practically everything but I think the principle still applies.

errno needs to be thread-local. C11 7.5.2 says "and errno which expands to a modifiable lvalue that has type int and thread local storage duration, the value of which is set to a positive error number by several library functions."
Do you mean that in some environment it may be defined in more than one shared object?

In the general case it is multiple shared libraries include the same static library that has a global variable, in the normal rules only one of these globals will be used, wheras with -fno-semantic-interposition they will all use individual copies. I don't think that this is common as it is not considered good design, it is just an example of how some programs could be broken in subtle ways if the default were changed.

I had been aware that there could be data preemption though I could not find an example.
Being aware of potentially such programs I don't intend to flip the default for any target without a wider discussion.

Does this patch look good since no default is flipped?

Looking at the gist I've got one concern for AArch64 and Arm. The ABI relies on thunks which are only defined for symbols of type STT_FUNC. Changing branches to STT_FUNC to STT_SECTION will break long range thunks on AArch64 and interworking for Arm (there is a possibility that the bottom bit for STT_FUNC may get used in the future for AArch64 as well). This is solvable by keeping the local label and setting STT_FUNC on it.

I'll unlikely touch 32-bit arm.

For aarch64, aaelf64/aaelf64.rst says

A linker may use a veneer (a sequence of instructions) to implement a relocated branch if the relocation is either

``R_<CLS>_CALL26``, ``R_<CLS>_JUMP26`` or ``R_<CLS>_PLT32`` and:

- The target symbol has type ``STT_FUNC``.

- Or, the target symbol and relocated place are in separate sections input to the linker.

- Or, the target symbol is undefined (external to the link unit).

If bl .Lhigh_target$local and .Lhigh_target$local are in the same section, the fixup is resolved at assembly time;
otherwise, they are in separate sections and satisfy the ABI requirement.

I just changed bl high_target in test/lld/ELF/aarch64-thunk-script.s and noticed that both GNU ld and ld.lld can produce a thunk, regardless of the symbol type.

OK, so it looks like the "Or, the target symbol and relocated place are in separate sections input to the linker." can cover AArch64.

An area I didn't want to mention earlier as there is no guarantee it will be part of the architecture, or the ABI is Morello. This introduces capabilities into AArch64 https://github.com/ARM-software/abi-aa/blob/main/aaelf64-morello/aaelf64-morello.rst#414symbol-values with an eye to the future where this might be significant. I realise that we can't be hostage to a future that might not come to pass and there can always be "turn fno-semantic-interposition off when Morello is selected" but my instinct is to be cautious as I don't want to make Morello even more difficult than it already is.

The Morello code is not in the upstream. I have no idea how its thunk may behave when the symbol type is STT_NOTYPE or STT_FUNC, but we can either treat "-fno-semantic-interposition" as a no-op like other non-x86 non-normal-aarch64 targets, or refine the dso_local logic to make it workable.

peter.smith accepted this revision.May 10 2021, 1:26 AM

LGTM as this is opt in with a command line option.

This revision is now accepted and ready to land.May 10 2021, 1:26 AM
MaskRay edited the summary of this revision. (Show Details)May 10 2021, 9:32 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay edited the summary of this revision. (Show Details)May 10 2021, 9:37 AM
MaskRay edited the summary of this revision. (Show Details)May 10 2021, 9:41 AM
MaskRay edited the summary of this revision. (Show Details)
lanza added a subscriber: lanza.Mon, Jun 7, 11:43 PM

Hey Fangrui, is there any reason this couldn't extend to armv7?

MaskRay added a comment.EditedTue, Jun 8, 12:21 AM

Hey Fangrui, is there any reason this couldn't extend to armv7?

@lanza Always happy when more folks are interested in such kind of stuff:)
This needs backend work. See D101872. I don't have bandwidth working on 32-bit arm :)

Now that we don't optimize variables, the value of clang -fno-semantic-interposition is small.
-fno-semantic-interposition can save a PLT entry (and associated R_*_JUMP_SLOT dynamic relocation) if a default visibility STB_GLOBAL function is only called in its defining TU, not by other TUs linked into the shared object.
Its benefit is subsumed by ld -Bsymbolic-non-weak-functions (seems that binutils isn't enthusiastic https://sourceware.org/pipermail/binutils/2021-May/116753.html)


I asked whether GCC could provide a configure option defaulting -fno-semantic-interposition https://gcc.gnu.org/PR100937 and I even sent a patch.
Oops it was immediately closed as a wontfix.
It is so unfortunate that so few people pay attention on performance.
If I still want to try something, my angle has to be *security hardening : https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572103.html

Doing this on 32-bit Arm would make me nervous as STT_FUNC symbols encode the state of Arm/Thumb in the bottom bit, but STT_NOTYPE symbols do not. In principle it could be done but extra care would have to be taken to make sure no state changes were required. For example caller and callee would need to be in the same state. I'm not entirely sure that LLD's current range-extension thunks would work to STT_NOTYPE symbols as they use BX IP, which would always state change to Arm due to the STT_NOTYPE symbol having the bottom bit clear. This is fixable but is the additional complexity and possible fragility of older tools worth it?

https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#554symbol-names