Page MenuHomePhabricator

[lld][RISCV] Fixup PC-relative relocations to undefined weak symbols.
Needs ReviewPublic

Authored by bsdjhb on Dec 5 2019, 4:51 PM.

Details

Summary

Load an explicit NULL address for non-PIC linkage when possible.
Report an error for conditional branches that target undefined weak
symbols.

Event Timeline

bsdjhb created this revision.Dec 5 2019, 4:51 PM
bsdjhb marked 2 inline comments as done.Dec 5 2019, 4:57 PM

This is needed to link a FreeBSD/riscv64 kernel which uses undefined weak symbols in the uart(4) driver.

lld/ELF/InputSection.cpp
999

These cases do not handle compressed jump and compressed branch relocations. I tried to use 'c.j' to in the new test case but the assembler turned it into a 4-byte 'j' instruction instead (even when forced to riscv32 and using '-mattr=+c'. Right now those relocations would at least exit with an error, but if I can fix the test case to exercise them I'm happy to fix it.

1019

This can result in duplicate warnings from getRISCVPCRelHi20(). One option @jrtc27 suggested was that perhaps we could lift the logic to compute the targetVA from hiRel out of getRelocTargetVA() to here so that we only call getRISCVPCRelHi20() once. Another option might be to make the nested call in getRelocTargetVA() for 'hiRel' to instead return a targetVA of 0 by changing the R_PC for undefined weak symbols in RISC-V to return 0 instead of dest - p. We kind of do this now for EM_PPC it seems by setting 'dest' to 'p' so that it effectively returns 0.

MaskRay added inline comments.Dec 5 2019, 5:23 PM
lld/ELF/InputSection.cpp
982

Adding the code here is not correct. The most appropriate place is getRelocTargetVA. However, I don't recommend adding any logic at all because weak undefined symbols do not have defined semantics. This means multiple things:

  1. their behaviors may be different in different toolchains
  2. their behavior may change with various configuration (-no-pie, -pie, -shared) even in the same toolchain

The binutils behaviors may sometime not easy to mimic in lld because their internals are so different.

I did add the following two lines for EM_PPC, but that was because I had no choice but to support that glibc use case.

else if (config->emachine == EM_PPC)
  dest = p;

For anything newly developed, we should try our best to avoid weak undefined symbols.

Can you share with me some instructions to build FreeBSD RISC-V?

lld/test/ELF/riscv-undefined-weak-branch.s
2

Delete -none-linux as this should be generic. Prefer # and ## for comments. See precedents in this directory.

lld/test/ELF/riscv-undefined-weak.s
31–33

If you can use CHECK-NEXT for all lines, you can delete addresses. They are not significant for the test.

jrtc27 added inline comments.Dec 5 2019, 5:36 PM
lld/ELF/InputSection.cpp
982

*Calling* weak undefined symbols does not necessarily have well-defined semantics, depending on the architecture, but referencing them absolutely does (namely that the result of the code sequence is NULL) and is required for many systems, not just FreeBSD. The issue here is that, with -mcmodel=medany (LLVM's medium) in position-dependent code generation, the addressing mode is PC-relative, which breaks if linked at an address beyond 2 GiB as the 32-bit immediate pair is too small to negate PC, but the whole point of medany is to allow linking code at any address, including above the 2 GiB mark (otherwise you'd just use medlow).

jrtc27 added inline comments.Dec 5 2019, 5:39 PM
lld/ELF/InputSection.cpp
982

The alternative is to generate code using the GOT for extern weak symbols, as is done on AArch64 (which also otherwise uses PC-relative addressing), but that would require changes for both GCC and LLVM.

jrtc27 added inline comments.Dec 5 2019, 5:46 PM
lld/ELF/InputSection.cpp
982

Moreover, even if calling undefined weak functions is not well-defined in the ABI, it still needs to link without error to *something*, otherwise you can't write the perfectly-legitimate common construct that is:

extern void foo(void) __attribute__((weak));
if (&foo)
  foo();

The call can be turned into whatever you like at link time if the symbol is undefined, because it should be guarded by an if (and it's undefined behaviour otherwise, I would assume), but the linker needs to do something that's not failing with an error.

MaskRay added inline comments.Dec 5 2019, 6:03 PM
lld/ELF/InputSection.cpp
982

No. http://www.sco.com/developers/gabi/latest/ch4.symtab.html

The behavior of weak symbols in areas not specified by this document is implementation defined. Weak symbols are intended primarily for use in system software. Applications using weak symbols are unreliable since changes in the runtime environment might cause the execution to fail.

You may argue that we are targeting system software, but I still want to say that the GNU ld behavior may vary according to (1) instruction types, or (2) executable vs shared objects. lld's treatment is still different from GNU ld and gold in some cases on other architectures.

extern void foo(void) __attribute__((weak));
if (&foo)
  foo();

I hope we can make some efforts to get rid of such constructs in FreeBSD first, before trying a linker workaround.

I haven't checked the logic below clearly. It appears to be more of a hack.

hiRel && hiRel->type == R_RISCV_PCREL_HI20
            && hiRel->sym->isUndefWeak() && hiRel->addend == 0
982

The alternative is to generate code using the GOT for extern weak symbols, as is done on AArch64 (which also otherwise uses PC-relative addressing), but that would require changes for both GCC and LLVM.

Fixing GCC/llvm sounds a good idea to me. I can check how to fix it in llvm. Apparently PPC64 also decided to use GOT when they migrated from PPC32. (I am now just very concerned that some binutils/gcc decisions were made before people think hard about the implications... from the binutils bugs I reported, I at least can confirm that many bugs were copied from elf32-arm.c and elf*-mips.c ...)

bsdjhb marked an inline comment as done.Dec 17 2019, 5:13 PM
bsdjhb added inline comments.
lld/ELF/InputSection.cpp
982

We can't really remove all weak symbols from FreeBSD's kernel. FreeBSD's kernel uses linker sets extensively to build link-time generated arrays of pointers to objects. If a linker set is empty, the start and stop symbols for that linker set are weak-undefined symbols that FreeBSD requires to resolve to NULL. Removing linker sets from FreeBSD and/or mandating that all linker sets are non-empty are not feasible.

I've opened an issue on the RISC-V psabi doc to track the ABI question:

https://github.com/riscv/riscv-elf-psabi-doc/issues/126

bsdjhb marked an inline comment as done.Dec 17 2019, 5:15 PM
bsdjhb added inline comments.
lld/ELF/InputSection.cpp
982

(Note that originally I had assumed the only weak symbols were in the uart(4) driver, but after testing a patch to remove those, I ran into an empty linker set and noticed that those also use weak symbols.)

Created D72046. As per my interpretation at https://github.com/riscv/riscv-elf-psabi-doc/issues/126#issuecomment-568160758 , we have to assume

Compilers should not emit PC-relative relocations in -fpic/-fpie mode.

GCC/clang emit either absolute or GOT relocations, so the assumption is true. (If not in edge cases, we have to fix the compilers.) I don't think errors on branch relocations are useful. So, D72046 just does what I did to PPC32 (the patch changes p to p+a for PPC32 as well. It is a minor issue anyway).

Created D72046. As per my interpretation at https://github.com/riscv/riscv-elf-psabi-doc/issues/126#issuecomment-568160758 , we have to assume

Compilers should not emit PC-relative relocations in -fpic/-fpie mode.

GCC/clang emit either absolute or GOT relocations, so the assumption is true. (If not in edge cases, we have to fix the compilers.) I don't think errors on branch relocations are useful. So, D72046 just does what I did to PPC32 (the patch changes p to p+a for PPC32 as well. It is a minor issue anyway).

The problem is and always has been -mcmodel=medany. Neither GCC nor LLVM use the GOT in this case, as it’s not -fpic/-fpie. This is the case we care about for the kernel. Would I have made GCC do this? No. But it does, and we have to deal with that in a sensible way. Making it an error and changing GCC and LLVM is one such way. But that hasn’t happened.