This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Don't override relocation model stored in the module
AbandonedPublic

Authored by phosek on Jul 14 2019, 5:43 PM.

Details

Summary

Currently, lld LTO configuration explicitly sets relocation model
based on the linker flags, effectively overriding whatever model is
stored inside the module. This behavior differs from the behavior
one would get during regular linking or even ThinLTO with explicit
codegen where the linker flags such as -r, -pie or -shared don't
affect codegen. This breaks the case where user wants to deliberately
select different relocation model for codegen and for linking, which
is can be the case in special environments like kernels.

Concretely, we stumbled upon this issue while trying to build Zircon
kernel with ThinLTO but the resulting kernel wouldn't boot. Upon
investigation, we've noticed that the kernel code is non-relocatable
despite it being compiled with -fpie. The problem was that our build
passes --no-pie to the linker. This is on purpose, while we want pie
codegen, we want to link the final binary to be linked as static one
(because there's no dynamic linker in our environment).

This change alters the behavior to use the relocation model stored
in the module which is the default behavior of the LTO backend, and
allows overriding the model via -mllvm flag. This matches the behavior
already used for code model. With this change, our kernel builds and
boots even when built with (Thin)LTO.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

phosek created this revision.Jul 14 2019, 5:43 PM
pcc added a reviewer: pcc.Jul 14 2019, 9:36 PM
pcc added a subscriber: pcc.

I'd like to understand the motivation behind this change a little better because the relaxation of the code model from PIC to static is technically an "optimization" that the linker should have license to perform and in fact is already performing in some ways (e.g. emitting addresses into the binary instead of RELATIVE/RELR relocations for statically known addresses).

In D64711#1584916, @pcc wrote:

I'd like to understand the motivation behind this change a little better because the relaxation of the code model from PIC to static is technically an "optimization" that the linker should have license to perform and in fact is already performing in some ways (e.g. emitting addresses into the binary instead of RELATIVE/RELR relocations for statically known addresses).

In the case of our kernel, we need the generated code to be position independent for our KASLR scheme, but we're not doing traditional PIE link. When compiling individual translation units, we use -fpie, but when linking we use --no-pie. Changing the code model from PIC to static in this case results in an invalid kernel image because the generated code is not relocatable. I think it's a matter of principle that it shouldn't be the linker's responsibility to decide how to combine our code-generation choices with our linking choices. I'm open to suggestions for other ways to solve this issue though as we'd really like to be able to use ThinLTO for our kernel.

pcc added a comment.Jul 15 2019, 4:54 PM
In D64711#1584916, @pcc wrote:

I'd like to understand the motivation behind this change a little better because the relaxation of the code model from PIC to static is technically an "optimization" that the linker should have license to perform and in fact is already performing in some ways (e.g. emitting addresses into the binary instead of RELATIVE/RELR relocations for statically known addresses).

In the case of our kernel, we need the generated code to be position independent for our KASLR scheme, but we're not doing traditional PIE link. When compiling individual translation units, we use -fpie, but when linking we use --no-pie. Changing the code model from PIC to static in this case results in an invalid kernel image because the generated code is not relocatable. I think it's a matter of principle that it shouldn't be the linker's responsibility to decide how to combine our code-generation choices with our linking choices. I'm open to suggestions for other ways to solve this issue though as we'd really like to be able to use ThinLTO for our kernel.

Have you considered linking the kernel as a PIE image when KASLR is enabled? This will fix not only this problem but also others (e.g. initializers containing global pointers, which might not just be explicit in the source code but could also be implicitly generated by the compiler, are broken with this scheme).

This is what the Linux kernel does (it uses -shared -Bsymbolic instead of -pie in KASLR mode, but they are almost equivalent):
https://github.com/torvalds/linux/blob/fec88ab0af9706b2201e5daf377c5031c62d11f7/arch/arm64/Makefile#L21

It uses assembly to self relocate to avoid depending on relative relocations during early startup:
https://github.com/torvalds/linux/blob/fec88ab0af9706b2201e5daf377c5031c62d11f7/arch/arm64/kernel/head.S#L822

In D64711#1586738, @pcc wrote:
In D64711#1584916, @pcc wrote:

I'd like to understand the motivation behind this change a little better because the relaxation of the code model from PIC to static is technically an "optimization" that the linker should have license to perform and in fact is already performing in some ways (e.g. emitting addresses into the binary instead of RELATIVE/RELR relocations for statically known addresses).

In the case of our kernel, we need the generated code to be position independent for our KASLR scheme, but we're not doing traditional PIE link. When compiling individual translation units, we use -fpie, but when linking we use --no-pie. Changing the code model from PIC to static in this case results in an invalid kernel image because the generated code is not relocatable. I think it's a matter of principle that it shouldn't be the linker's responsibility to decide how to combine our code-generation choices with our linking choices. I'm open to suggestions for other ways to solve this issue though as we'd really like to be able to use ThinLTO for our kernel.

Have you considered linking the kernel as a PIE image when KASLR is enabled? This will fix not only this problem but also others (e.g. initializers containing global pointers, which might not just be explicit in the source code but could also be implicitly generated by the compiler, are broken with this scheme).

This is what the Linux kernel does (it uses -shared -Bsymbolic instead of -pie in KASLR mode, but they are almost equivalent):
https://github.com/torvalds/linux/blob/fec88ab0af9706b2201e5daf377c5031c62d11f7/arch/arm64/Makefile#L21

It uses assembly to self relocate to avoid depending on relative relocations during early startup:
https://github.com/torvalds/linux/blob/fec88ab0af9706b2201e5daf377c5031c62d11f7/arch/arm64/kernel/head.S#L822

We're considering it but it requires some non-trivial changes and so won't be ready anytime soon.

pcc added a comment.Jul 15 2019, 7:18 PM
In D64711#1586738, @pcc wrote:
In D64711#1584916, @pcc wrote:

I'd like to understand the motivation behind this change a little better because the relaxation of the code model from PIC to static is technically an "optimization" that the linker should have license to perform and in fact is already performing in some ways (e.g. emitting addresses into the binary instead of RELATIVE/RELR relocations for statically known addresses).

In the case of our kernel, we need the generated code to be position independent for our KASLR scheme, but we're not doing traditional PIE link. When compiling individual translation units, we use -fpie, but when linking we use --no-pie. Changing the code model from PIC to static in this case results in an invalid kernel image because the generated code is not relocatable. I think it's a matter of principle that it shouldn't be the linker's responsibility to decide how to combine our code-generation choices with our linking choices. I'm open to suggestions for other ways to solve this issue though as we'd really like to be able to use ThinLTO for our kernel.

Have you considered linking the kernel as a PIE image when KASLR is enabled? This will fix not only this problem but also others (e.g. initializers containing global pointers, which might not just be explicit in the source code but could also be implicitly generated by the compiler, are broken with this scheme).

This is what the Linux kernel does (it uses -shared -Bsymbolic instead of -pie in KASLR mode, but they are almost equivalent):
https://github.com/torvalds/linux/blob/fec88ab0af9706b2201e5daf377c5031c62d11f7/arch/arm64/Makefile#L21

It uses assembly to self relocate to avoid depending on relative relocations during early startup:
https://github.com/torvalds/linux/blob/fec88ab0af9706b2201e5daf377c5031c62d11f7/arch/arm64/kernel/head.S#L822

We're considering it but it requires some non-trivial changes and so won't be ready anytime soon.

I guess I'd be fine with making it so that passing -relocation-model explicitly would override lld's decision (see e.g. how we handle function sections/data sections in the gold plugin). Then you can use -mllvm -relocation-model=pic for now.

ormris added a subscriber: ormris.Jul 16 2019, 10:47 AM
In D64711#1586864, @pcc wrote:

I guess I'd be fine with making it so that passing -relocation-model explicitly would override lld's decision (see e.g. how we handle function sections/data sections in the gold plugin). Then you can use -mllvm -relocation-model=pic for now.

I'd like to better understand the rationale behind the current implementation: eliding relocations feels qualitatively different because that's just an optimization of the linking work, whereas what's being done today in lld is effectively ignoring/overriding the (per-TU) compile-time decision which can lead to surprising results as in our case where we explicitly rely on the distinction between the compilation and linking stages. The LTO code generator should be allowed to relax when instructed to target non-PIC, but that's orthogonal to the linker's role as a linker.

Ultimately what really matters to us is that there's some combination of switches that does what we say, so if there's a strong desire to preserve the current semantics, I'm happy to implement the option to override the relocation model via -mllvm -relocation-model=pic as suggested.

pcc added a comment.Jul 16 2019, 8:38 PM

As I see it it's not too different from e.g. internalizing or hiding symbols or propagating DSO-local, which is, in the end, also overriding a compile-time decision. I think that taking this sort of stance in general makes it hard to say where to draw the line on what kind of optimizations are permitted. If we do switch to the compile-time relocation model, should the backend then be prevented from generating jump tables with absolute addresses when linking without -pie or -shared because it could cause problems if the binary happens to end up being loaded at the wrong address? That doesn't seem like a good idea to me. It seems more practical to say, at least in this case, that the binary that you get from a non-PIC link is, in fact, position-dependent, and it is an error to use it position-independently.

I am with Peter. I don't know much about LTO but just wanted to say it is fairly common to compile with the less efficient -fPIC, and then link twice: -no-pie/-pie (executable) and -shared. A lot of linker optimizations (got->pcrel, TLS GD/LD->IE/LE, internalizing, etc) happen when you link -fPIC objects in the -no-pie/-pie way. Passing -mllvm -relocation-model explicitly to achieve your intended effect seems practical. It just should not be the default.

phosek abandoned this revision.Aug 1 2019, 6:47 PM

This was addressed in rL366644.