This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Disable relocation validation when targeting weak undefined symbols
AbandonedPublic

Authored by jhenderson on Jul 27 2017, 8:09 AM.

Details

Reviewers
ruiu
rafael
Summary

When referencing weak undefined symbols, relocations are patched as though they target address zero. However, in the case of relative relocations, if the address being patched is large, the relocation will fail with an out of range error. This is unhelpful to the user, and they can only work around it by changing the base address of their program or explicitly defining the symbol, if this validation fails, which may not be desired.

This patch disables the validation of relocations when they target weak undefined symbols, in order to work around this issue. This is safe, because all such calls should be guarded anyway to ensure that the symbol really is defined.

Diff Detail

Event Timeline

jhenderson created this revision.Jul 27 2017, 8:09 AM
ruiu added inline comments.Jul 27 2017, 9:54 AM
ELF/InputSection.cpp
731–733

I wonder if you really have to change this many places to handle weak undefined symbols. If you don't call relocateOne if UndefinedWeak is true, doesn't it work?

davidb added a subscriber: davidb.Jul 27 2017, 10:02 AM
jhenderson added inline comments.Jul 28 2017, 1:10 AM
ELF/InputSection.cpp
731–733

I'm happy to do that. The only reason I did it this way was to match ld.bfd and gold's behaviour in this situation, but I don't think it really matters (the call should never be executed anyway).

jhenderson added inline comments.Jul 28 2017, 7:24 AM
ELF/InputSection.cpp
731–733

I've looked at making a version of this change that does what you suggested. However, this alternative behaviour breaks the guarantee provided by the ELF specification, that undefined weak symbols are treated as having a value of zero (ignoring the ARM special case). I'm concerned that there might be some issue with a corner case that relies on this behaviour.

The end result of this alternative change would be that either zero or the addend would be left in place in the instruction (depending on whether RELA or REL relocations are used), as opposed to zero or a value relative to zero (depending on whether we are dealing with an absolute or relative relocation).

davidb added inline comments.Jul 28 2017, 7:38 AM
ELF/InputSection.cpp
731–733

The change as it currently stands doesn't necessarily guarantee the result of the relocation patch is zero?

Can't you just enforce TargetVA to 0 when invoking relocateOne?

davidb added inline comments.Jul 28 2017, 7:40 AM
ELF/InputSection.cpp
731–733

With the ARM special case, are you referring to PC-biasing?

ruiu edited edge metadata.Jul 28 2017, 10:33 AM

Yeah. If the ELF specification says that unresolved weak symbols must be handled as address zero, and if address zero is not representable for some relocation, then I think the logical consequence is we should report it as error.

jhenderson abandoned this revision.Jul 31 2017, 9:08 AM

Having considered this, and looked at the latest behaviour of ld.bfd and gold (they both emit an error in this case), I think this change is not necessary after all.

ELF/InputSection.cpp
731–733

The change as it currently stands doesn't necessarily guarantee the result of the relocation patch is zero?

This isn't what is required by the standard, and isn't what is being attempted here. The idea is that the relocation is performed as if the virtual address of the target is zero, so a non-zero value will usually end up being patched in for relative relocations. An exception is that on ARM/AARCH64, PC-relative references are patched such that the branch etc ends up jumping to the next instruction (see getARMUndefinedRelativeWeakVA in LLD's code base).

Unfortunately treating a symbol as having a value of zero when the rest of the program is at a very high address can lead to out-of-range errors, hence the purpose of this change.

bd1976llvm added a comment.EditedJul 31 2017, 2:08 PM
In D35944#824267, @ruiu wrote:

Yeah. If the ELF specification says that unresolved weak symbols must be handled as address zero, and if address zero is not representable for some relocation, then I think the logical consequence is we should report it as error.

Hi Rui,

Although this is being abandoned I wanted to point out that the behaviour of either ignoring the overflowing relocation or not performing the validation is correct and lld's current behavior is broken.
To see why consider the following example:

#pragma weak foo
void foo();
void main() {if (foo) foo();}

Here I only want to call foo if there is a definition. To do this I am using the "pragma weak" feature. The documentation for the feature states:

#pragma weak symbol
This pragma declares symbol to be weak, as if the declaration had the attribute of the same name. The pragma may appear before or after the declaration of symbol. It is not an error for symbol to never be defined at all.

However, with the current behavior of lld if I link this program at high address (e.g: lld main.o -Ttext=0xBADBADBAD) then lld errors with a relocation overflow error - lld, currently, breaks the "pragma weak" feature which guarantees that "it is not an error for the symbol to never be defined at all".

ruiu added a comment.Jul 31 2017, 5:10 PM

Hi Ben,

Is that a real scenario? I mean, if function foo is defined in the same compilation unit as main, it is resolved locally, and if function foo is not defined in the same compilation unit as main, then the compiler emits a call instruction that can jump to any address (because the compiler doesn't know where foo will be after linking), no?

bd1976llvm added a comment.EditedAug 1 2017, 6:03 AM

Hi Rui,

Is that a real scenario?

This absolutely is a real example. In fact that is the canonical code sequence for using weak symbols.

I mean, if function foo is defined in the same compilation unit as main, it is resolved locally, and if function foo is not defined in the same compilation unit as main, then the compiler emits a call instruction that can jump to any address (because the compiler doesn't know where foo will be after linking), no?

I'm afraid not. I am skipping some platform specific details here but in general the compiler must codegen so that the "if (foo)" test uses an absolute relocation however, it is free to use any valid code sequence for the call to foo. For example, when using the small code model on the x86_64 platform the compiler would be expected to use codegen that requires a 32 bit relative relocation for the call. For power-pc the call would be expected to require a 24 bit relative relocation etc...

Your next question might be: Why can't we require that the codegen always uses references with a large enough range? Well, we can't do that as it would penalize the codegen for weak references.

In some cases the linker could change the code or generate trampolines to make the call reach address 0 but this is extra work that the linker should not do as these references are invalid anyway.

Rafael via mailing list:

But isn't it invalid to ask for small code model and use high addresses?

Clang's behavior is inconsistent among architectures. For example, given


extern int foo1 attribute((weak));
int *bar1() { return &foo1; }
extern int foo2;

int *bar2() { return &foo2; }

And the command line

clang -target aarch64-pc-linux -S -Os test.c -fPIE -mpie-copy-relocations

We get

adrp    x0, :got:foo1
ldr     x0, [x0, :got_lo12:foo1]

...

adrp    x0, foo2
add     x0, x0, :lo12:foo2

but for x86_64 clang fails to take into consideration the fact that 0
might be out of range:

leaq    foo1(%rip), %rax
retq

leaq    foo2(%rip), %rax
retq

I think that is a bug in llvm on x86_64. I will take a quick look.

Cheers,
Rafael

Hi Rafael,

The behaviour for weak references is different across different platforms.

For example on arm and power you can write.

void f1() attribute((weak));
void main() {f1();}

Which can be expected to work correctly. However, this may fault on x86_64.

I think that is a bug in llvm on x86_64. I will take a quick look.

That does look like a bug. However, in general the codegen should not account for the possibility that the reference is unresolved - because we want to generate identical code as for strong references for the case that the weak reference is resolved. i.e:

"void f1() attribute((weak)); void f2() {f1();}" and "void f1(); void f2() {f1();}"

should codegen identically.

Another way to see what the right behaviour should be here is to draw a comparison with the lld feature "--unresolved-symbols=ignore-all".

If I link this example:

void f1(); void main() {f1();}

At a high address, e.g. "lld main.o -Ttext=0xbadbadbad" - I get an undefined symbol error. If I link the same example with: "lld main.o -Ttext=0xbadbadbad --unresolved-symbols=ignore-all" the link should succeed. It would clearly be a bug here to warn about an overflowing relocation - the same logic should apply to weak references.