Page MenuHomePhabricator

[LLD][ELF] - A fix for "linker script assignment loses relative nature of section" bug.
Needs ReviewPublic

Authored by grimar on Dec 7 2018, 3:20 AM.

Details

Reviewers
ruiu
espindola
Summary

This is https://bugs.llvm.org//show_bug.cgi?id=39857.
I added the comment with much more details to the bug page,
the short version is below.

The following script and code demonstrates the issue:

aliasto__text = __text;
 SECTIONS {
  .text 0x1000 : { __text = . ; *(.text) }
 }
call aliasto__text@PLT

LLD fails with "cannot refer to absolute symbol: aliasto__text" error.
It happens because at the moment of scanning the relocations
we do not yet assign the correct/final/any section value for the symbol aliasto__text.
I made a change to Relocations.cpp to workaround that, but it is not enough to fix the PR.

Another change in Writer.cpp is used to set the correct symbol section value.
Without that change, aliasto__text would be assigned to the first section
(.dynamic in our case). Seems a fine way is to do an additional iteration,
and that is what this patch does.

This also fixed one more bug we had (see the change in addr-zero.test, it was broken).

Diff Detail

Event Timeline

grimar created this revision.Dec 7 2018, 3:20 AM
grimar edited the summary of this revision. (Show Details)Dec 7 2018, 3:26 AM
grimar updated this revision to Diff 177169.Dec 7 2018, 3:43 AM
  • Updated the comment.
grimar added a comment.Dec 7 2018, 5:16 AM

Note that for some targets we already call the assignAddresses at least twice.
That happens because maybeAddThunks calls it.

At first, I thought to reuse maybeAddThunks, it could always run if Script->HasSectionsCommand set,
but it is perhaps not the right place, so I added `assignAddresses to Writer<ELFT>::run().

Thanks for the fix. I kind of thought it might be possible to lazily postpone symbol assignments to other symbols till after linker scripts but this requires quite a bit of contextual information to know when we need the value of a symbol expression. It looks like ld.bfd has a similar post-script fixup for symbol sections and values although it just evaluates all linker defined symbols again rather than all the addresses again. That could be another option if there were performance concerns about redoing the whole of assignAddresses.

One thing I haven't checked through yet is whether there is a more specific check for a relocation to an absolute address. Ideally we wouldn't want to allow a relative relocation to symbol = 0x1000 but we would symbol = some_other_symbol. I don't know whether that is possible though given the context available at the time. Letting it through is probably the best option (rely on the user knowing what they are doing).

grimar added a comment.Dec 7 2018, 6:10 AM

One thing I haven't checked through yet is whether there is a more specific check for a relocation to an absolute address. Ideally we wouldn't want to allow a relative relocation to symbol = 0x1000 but we would symbol = some_other_symbol. I don't know whether that is possible though given the context available at the time. Letting it through is probably the best option (rely on the user knowing what they are doing).

I agree. In theory, we could do some additional fixup of the symbols values before scanning the relocations, it would allow distinguishing between non-absolute and absolute symbols,
that is what I also tried when experimented with the code for this patch, it was:

void LinkerScript::assignSymbolSections() {
  for (BaseCommand *Base : Script->SectionCommands) {
    auto *Cmd = dyn_cast<SymbolAssignment>(Base);
    if (!Cmd || !Cmd->Sym)
      continue;

    auto Deleter = make_unique<AddressState>();
    Ctx = Deleter.get();
    Ctx->OutSec = Aether;

    ExprValue V = Cmd->Expression();
    if (!V.isAbsolute())
      Cmd->Sym->Section = V.Sec;
  }
}

But it is too complicated and this patch is much simpler without such things.

ruiu added a comment.Dec 7 2018, 4:52 PM

This seems tricky to me. I wonder if you can just fix the problem by reordering symbol assignments in your linker scripts. Also, by allowing forward reference in an assignment, you could theoretically write circular symbol assignments, e.g. foo=bar and bar=foo. That is logically a bit too complicated to me. Maybe we should just keep the linearity of the symbol assignment, instead of supporting this corner case?

This seems tricky to me. I wonder if you can just fix the problem by reordering symbol assignments in your linker scripts.

Reordering can help to set the correct symbols values. I.e. the following would work for PR:

SECTIONS { .text 0x1000 : { __text = . ; *(.text) } }
aliasto__text = __text;

But it does not look like a final solution. Without reordering we still produce the broken output which might be hard to diagnose.
And I think the initial pattern is used in the wild.

As an example: we have a addr-zero.test test case already (implemented in rL298083) which is broken right now after recent rL332038.
The problem has the same roots, just reordering the sections broke the correctness of symbols evaluation. Doing the one more pass would
solve this once and forever it seems.

Also, by allowing forward reference in an assignment, you could theoretically write circular symbol assignments, e.g. foo=bar and bar=foo.

LLD already accepts this. For script containing foo = boo; boo = foo; we create 2 absolute zero symbols.
For just foo = boo we report a "symbol not found: boo" as expected.

It is not ideal, It happens because we declare the symbols early anyways as a Defined to let
they exist before starting LTO and versioning.

But I think that is just one more example that linker scripts should be used carefully.
I do not think we really should care about a fix for this, because gold has equal behavior it seems (though bfd - not).

Maybe we should just keep the linearity of the symbol assignment, instead of supporting this corner case?

That sure would be the simplest and desired way, if we could drop the existent scripts I think and write LS logic from scratch.
But we have to support non-linearity behavior which is assumed to work I think. Or report an error.

I see one more argument that I find important for the LLD design. See our code in maybeAddThunks:
https://github.com/llvm-mirror/lld/blob/master/ELF/Writer.cpp#L1500

It was implemented to create thunks, but now it does not only that. It is called not only when Target->NeedsThunks,
but also when packed relocations are used. As a result, for some targets, we always call assignAddresses at least once again,
and for others - never do.

This makes the behavior for those platforms to be different from the common case (assuming it is x86-x64).
It is not a problem to call assignAddresses few times, it was designed for that.
And in this patch, I had to add assignAddresses to make other targets, that do not use maybeAddThunks to be happy.
I would prefer to have the consistent behavior of the linker script for all targets. I think we could rename maybeAddThunks
to something else, update its comment and make assignAddresses be always called at least once for each target.
That would remove 2 of 4 code lines this patch has. It is very short for the problem it solves.

I quite like the idea of renaming maybeNeedsThunks(), for example addAddressDependentContent()? We could make sure that there is always at least one pass of assignAddresses done there.

The test case was derived from the linux kernel linker script for AArch64 which would have assignAddresses() called more than once due to AArch64 needing thunks. It is tempting to say just rearrange your linker script but this is not always easy as complex linker scripts are often machine generated. For example the kernel linker script is actually generated from https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/vmlinux.lds.S using macros from https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/image.h .

My understanding is that AArch64 (Arm64 in kernel parlance) is the priority target for using LLD to link the kernel, so I guess the Relocations.cpp would be sufficient on AArch64 and Arm. I don't know if the X86 kernel linker script would suffer from this problem, but it is certainly a possibility.

My understanding is that AArch64 (Arm64 in kernel parlance) is the priority target for using LLD to link the kernel, so I guess the Relocations.cpp would be sufficient on AArch64 and Arm. I don't know if the X86 kernel linker script would suffer from this problem, but it is certainly a possibility.

Yes, it worked for AArch64 without changes to Writer.cpp, but I think it is important from many sides to have common logic behavior to be consistent,
so I had to fix this for x64 and hence did the second change.

I quite like the idea of renaming maybeNeedsThunks(), for example addAddressDependentContent()? We could make sure that there is always at least one pass of assignAddresses done there.

I decided to name it finalizeAddressDependentContent in D55550 because finalize seems to be consistent prefix with the functions called around.

ruiu added a comment.Dec 17 2018, 4:59 PM

I think I'm not still convinced that we should support forward declaration. How often do you really need that? Also, how is it hard/easy to detect if something is a forward reference? How much easy/hard to fix your linker scripts if yours is already using that pattern?

I believe we should at least ban circular definitions such as foo=bar; bar=foo; because it doesn't make any sense.

I think I'm not still convinced that we should support forward declaration. How often do you really need that? Also, how is it hard/easy to detect if something is a forward reference? How much easy/hard to fix your linker scripts if yours is already using that pattern?

I believe we should at least ban circular definitions such as foo=bar; bar=foo; because it doesn't make any sense.

I am not trying to support the forward declaration. We already do that (example: addr-zero.test).
I think it is a bit hard to detect that given the approach we have (define all linker script symbols early) and I am not sure we should care.
Some of the scripts in the wild (including apps in the BSD world) were using it already and that is why we supported that initially, I think.
Scripts require careful use anyways.

Please take a look at the https://reviews.llvm.org/D55550 first. I think D55550 is a very reasonable change.
With the D55550, this patch becomes really trivial/natural.