Page MenuHomePhabricator

[LLD][ELF] - A fix for "linker script assignment loses relative nature of section" bug.
ClosedPublic

Authored by grimar on Dec 7 2018, 3:20 AM.

Details

Summary

This is https://bugs.llvm.org//show_bug.cgi?id=39857.
I added the comment with much more details to the bug page,
the short version is below.

The following script and code demonstrates the issue:

aliasto__text = __text;
 SECTIONS {
  .text 0x1000 : { __text = . ; *(.text) }
 }
call aliasto__text@PLT

LLD fails with "cannot refer to absolute symbol: aliasto__text" error.
It happens because at the moment of scanning the relocations
we do not yet assign the correct/final/any section value for the symbol aliasto__text.
I made a change to Relocations.cpp to workaround that.

I had to remove the symbol-location.s test case completely, because now it does not
trigger any error. Since now all linker scripts symbols are resolved to constants, no errors can be triggered
at all it seems. I checked that it is consistent with the behavior of bfd and gold (they do not trigger errors
for the case from symbol-location.s), so it should be OK. I.e. at least it is probably not the best possible, but
natural behavior.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Dec 7 2018, 3:20 AM
grimar edited the summary of this revision. (Show Details)Dec 7 2018, 3:26 AM
grimar updated this revision to Diff 177169.Dec 7 2018, 3:43 AM
  • Updated the comment.
grimar added a comment.Dec 7 2018, 5:16 AM

Note that for some targets we already call the assignAddresses at least twice.
That happens because maybeAddThunks calls it.

At first, I thought to reuse maybeAddThunks, it could always run if Script->HasSectionsCommand set,
but it is perhaps not the right place, so I added `assignAddresses to Writer<ELFT>::run().

Thanks for the fix. I kind of thought it might be possible to lazily postpone symbol assignments to other symbols till after linker scripts but this requires quite a bit of contextual information to know when we need the value of a symbol expression. It looks like ld.bfd has a similar post-script fixup for symbol sections and values although it just evaluates all linker defined symbols again rather than all the addresses again. That could be another option if there were performance concerns about redoing the whole of assignAddresses.

One thing I haven't checked through yet is whether there is a more specific check for a relocation to an absolute address. Ideally we wouldn't want to allow a relative relocation to symbol = 0x1000 but we would symbol = some_other_symbol. I don't know whether that is possible though given the context available at the time. Letting it through is probably the best option (rely on the user knowing what they are doing).

grimar added a comment.Dec 7 2018, 6:10 AM

One thing I haven't checked through yet is whether there is a more specific check for a relocation to an absolute address. Ideally we wouldn't want to allow a relative relocation to symbol = 0x1000 but we would symbol = some_other_symbol. I don't know whether that is possible though given the context available at the time. Letting it through is probably the best option (rely on the user knowing what they are doing).

I agree. In theory, we could do some additional fixup of the symbols values before scanning the relocations, it would allow distinguishing between non-absolute and absolute symbols,
that is what I also tried when experimented with the code for this patch, it was:

void LinkerScript::assignSymbolSections() {
  for (BaseCommand *Base : Script->SectionCommands) {
    auto *Cmd = dyn_cast<SymbolAssignment>(Base);
    if (!Cmd || !Cmd->Sym)
      continue;

    auto Deleter = make_unique<AddressState>();
    Ctx = Deleter.get();
    Ctx->OutSec = Aether;

    ExprValue V = Cmd->Expression();
    if (!V.isAbsolute())
      Cmd->Sym->Section = V.Sec;
  }
}

But it is too complicated and this patch is much simpler without such things.

ruiu added a comment.Dec 7 2018, 4:52 PM

This seems tricky to me. I wonder if you can just fix the problem by reordering symbol assignments in your linker scripts. Also, by allowing forward reference in an assignment, you could theoretically write circular symbol assignments, e.g. foo=bar and bar=foo. That is logically a bit too complicated to me. Maybe we should just keep the linearity of the symbol assignment, instead of supporting this corner case?

This seems tricky to me. I wonder if you can just fix the problem by reordering symbol assignments in your linker scripts.

Reordering can help to set the correct symbols values. I.e. the following would work for PR:

SECTIONS { .text 0x1000 : { __text = . ; *(.text) } }
aliasto__text = __text;

But it does not look like a final solution. Without reordering we still produce the broken output which might be hard to diagnose.
And I think the initial pattern is used in the wild.

As an example: we have a addr-zero.test test case already (implemented in rL298083) which is broken right now after recent rL332038.
The problem has the same roots, just reordering the sections broke the correctness of symbols evaluation. Doing the one more pass would
solve this once and forever it seems.

Also, by allowing forward reference in an assignment, you could theoretically write circular symbol assignments, e.g. foo=bar and bar=foo.

LLD already accepts this. For script containing foo = boo; boo = foo; we create 2 absolute zero symbols.
For just foo = boo we report a "symbol not found: boo" as expected.

It is not ideal, It happens because we declare the symbols early anyways as a Defined to let
they exist before starting LTO and versioning.

But I think that is just one more example that linker scripts should be used carefully.
I do not think we really should care about a fix for this, because gold has equal behavior it seems (though bfd - not).

Maybe we should just keep the linearity of the symbol assignment, instead of supporting this corner case?

That sure would be the simplest and desired way, if we could drop the existent scripts I think and write LS logic from scratch.
But we have to support non-linearity behavior which is assumed to work I think. Or report an error.

I see one more argument that I find important for the LLD design. See our code in maybeAddThunks:
https://github.com/llvm-mirror/lld/blob/master/ELF/Writer.cpp#L1500

It was implemented to create thunks, but now it does not only that. It is called not only when Target->NeedsThunks,
but also when packed relocations are used. As a result, for some targets, we always call assignAddresses at least once again,
and for others - never do.

This makes the behavior for those platforms to be different from the common case (assuming it is x86-x64).
It is not a problem to call assignAddresses few times, it was designed for that.
And in this patch, I had to add assignAddresses to make other targets, that do not use maybeAddThunks to be happy.
I would prefer to have the consistent behavior of the linker script for all targets. I think we could rename maybeAddThunks
to something else, update its comment and make assignAddresses be always called at least once for each target.
That would remove 2 of 4 code lines this patch has. It is very short for the problem it solves.

I quite like the idea of renaming maybeNeedsThunks(), for example addAddressDependentContent()? We could make sure that there is always at least one pass of assignAddresses done there.

The test case was derived from the linux kernel linker script for AArch64 which would have assignAddresses() called more than once due to AArch64 needing thunks. It is tempting to say just rearrange your linker script but this is not always easy as complex linker scripts are often machine generated. For example the kernel linker script is actually generated from https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/vmlinux.lds.S using macros from https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/image.h .

My understanding is that AArch64 (Arm64 in kernel parlance) is the priority target for using LLD to link the kernel, so I guess the Relocations.cpp would be sufficient on AArch64 and Arm. I don't know if the X86 kernel linker script would suffer from this problem, but it is certainly a possibility.

My understanding is that AArch64 (Arm64 in kernel parlance) is the priority target for using LLD to link the kernel, so I guess the Relocations.cpp would be sufficient on AArch64 and Arm. I don't know if the X86 kernel linker script would suffer from this problem, but it is certainly a possibility.

Yes, it worked for AArch64 without changes to Writer.cpp, but I think it is important from many sides to have common logic behavior to be consistent,
so I had to fix this for x64 and hence did the second change.

I quite like the idea of renaming maybeNeedsThunks(), for example addAddressDependentContent()? We could make sure that there is always at least one pass of assignAddresses done there.

I decided to name it finalizeAddressDependentContent in D55550 because finalize seems to be consistent prefix with the functions called around.

ruiu added a comment.Dec 17 2018, 4:59 PM

I think I'm not still convinced that we should support forward declaration. How often do you really need that? Also, how is it hard/easy to detect if something is a forward reference? How much easy/hard to fix your linker scripts if yours is already using that pattern?

I believe we should at least ban circular definitions such as foo=bar; bar=foo; because it doesn't make any sense.

I think I'm not still convinced that we should support forward declaration. How often do you really need that? Also, how is it hard/easy to detect if something is a forward reference? How much easy/hard to fix your linker scripts if yours is already using that pattern?

I believe we should at least ban circular definitions such as foo=bar; bar=foo; because it doesn't make any sense.

I am not trying to support the forward declaration. We already do that (example: addr-zero.test).
I think it is a bit hard to detect that given the approach we have (define all linker script symbols early) and I am not sure we should care.
Some of the scripts in the wild (including apps in the BSD world) were using it already and that is why we supported that initially, I think.
Scripts require careful use anyways.

Please take a look at the https://reviews.llvm.org/D55550 first. I think D55550 is a very reasonable change.
With the D55550, this patch becomes really trivial/natural.

nickdesaulniers added inline comments.Apr 2 2019, 3:36 AM
ELF/Relocations.cpp
387–388 ↗(On Diff #177169)

@peter.smith suggested that just this part of this patch is what I need in order to link an arm64 Linux kernel with KASLR enabled with LLD (https://bugs.llvm.org/show_bug.cgi?id=39810#c11).

I was able to confirm that this change was all that I needed to link a bootable kernel image.

Can I please have just this change split out of the patch? Surely this part is less controversial?

grimar marked an inline comment as done.Apr 2 2019, 7:28 AM
grimar added inline comments.
ELF/Relocations.cpp
387–388 ↗(On Diff #177169)

If nobody will so that earlier, I'll take a look at how to extract this part with an appropriate test case tomorrow.

(Though I still think we just need to land D55550, and then this part. Having a different behavior of linker script on different architectures is just weird).

peter.smith added inline comments.Apr 2 2019, 8:09 AM
ELF/Relocations.cpp
387–388 ↗(On Diff #177169)

The pr contains an AArch64 version of the test. It should fail with a link error with -pie. FWIW I agree with George that D55550 is worth landing as well. Linker scripts are often generated from a large set of macros and it isn't always easy to hand edit the files to avoid forward references.

grimar marked an inline comment as done.Apr 3 2019, 5:34 AM
grimar added inline comments.
ELF/Relocations.cpp
387–388 ↗(On Diff #177169)

So I ended up with the following test case:

# REQUIRES: aarch64
# RUN: llvm-mc -filetype=obj -triple=aarch64-linux-gnu %s -o %t.o
# RUN: echo "aliasto__text = __text; SECTIONS { .text 0x1000 : { __text = . ; *(.text) } }" > %t.script
# RUN: ld.lld -pie -o %t --script %t.script %t.o
# RUN: llvm-readobj -symbols %t | FileCheck %s

## Check that alias 'aliasto__text' has the correct value.
## (It should belong to the section .text and point to it's start).

## TODO: This could be a test for x86 platform, but currently, LLD
## has an inconsistent behavior of the linker script between x86 and aarch64.
## Under x86, `aliasto__text` would not get the correct value and test would fail.

# CHECK:      Symbol {
# CHECK:        Name: __text
# CHECK-NEXT:   Value: 0x1000
# CHECK-NEXT:   Size: 0
# CHECK-NEXT:   Binding: Global
# CHECK-NEXT:   Type: None
# CHECK-NEXT:   Other: 0
# CHECK-NEXT:   Section: .text
# CHECK-NEXT: }

# CHECK:      Symbol {
# CHECK:        Name: aliasto__text
# CHECK-NEXT:   Value: 0x1000
# CHECK-NEXT:   Size: 0
# CHECK-NEXT:   Binding: Global
# CHECK-NEXT:   Type: None
# CHECK-NEXT:   Other: 0
# CHECK-NEXT:   Section: .text
# CHECK-NEXT: }

.text
.globl _start
.type _start, %function
_start:
.globl aliasto__text
  bl __text
  bl aliasto__text

I.e. it still uses forward declaration for aliasto__text. I found no way to make this piece of the code be useful without
doing that.

Rui's comment above says:
"I think I'm not still convinced that we should support forward declaration. How often do you really need that? Also, how is it hard/easy to detect if something is a forward reference? How much easy/hard to fix your linker scripts if yours is already using that pattern?
I believe we should at least ban circular definitions such as foo=bar; bar=foo; because it doesn't make any sense."

I.e. he was against doing it that time. My conclusion remains the same: if you want to support such forward declarations we want to land D55550 + rebased version of this patch to avoid mess about different behavior between aarch64 and x86 and fix it all at once. I do not see for what another case this piece can be useful. (If it is useful for some other case, please let me know, then it would be probably possible to justify such change).

Thanks for the test suggestion, that looks good to me. My understanding is that this would unblock the Linux Kernel KASLR build on AArch64. The upstream linux maintainers are not willing to change the linker script for what seems like a linker limitation/bug.

My understanding is that only the Arm and AArch64 Linux kernel's are actively using LLD right now and the Linker script is specific to those builds so this half-fix in practice would be a step forward.

E5ten added a subscriber: E5ten.Apr 4 2019, 4:23 PM
grimar updated this revision to Diff 195686.Apr 18 2019, 1:55 AM
grimar edited the summary of this revision. (Show Details)
  • Simplified after landing the D55550.
  • Had to remove symbol-location.s test completely. I updated the patch description to explain why.
grimar edited the summary of this revision. (Show Details)Apr 18 2019, 1:58 AM
grimar updated this revision to Diff 195688.Apr 18 2019, 2:09 AM
  • Remove @PLT suffixes from the calls in the test. I think I put them by mistake in the initial version of the patch.
ruiu accepted this revision.Apr 18 2019, 2:12 AM

LGTM

This revision is now accepted and ready to land.Apr 18 2019, 2:12 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2019, 2:58 AM

@grimar - this patch causes a regression in behaviour for references to script symbols. I think https://reviews.llvm.org/D61298 is a reasonable fix. Could you take a look?