This is an archive of the discontinued LLVM Phabricator instance.

[ELF][RISCV] Add test for absolute/relative/branch relocations to undefined weak symbols
Needs ReviewPublic

Authored by MaskRay on Dec 31 2019, 6:10 PM.

Details

Summary

Relocations to undefined weak symbols are unspecified. In practice, on
most targets, linkers resolve absolute relocations to 0. Such
relocations can be generated by -fno-pic code to take addresses.

The behaviors of PC-relative relocations have varying behaviors among
targets. We simply assume that compilers cannot generate PC-relative
relocations to undefined symbols and resolve the relocations arbitrarily
to p+a. AArch64, ARM and PPC32 do similar things so that the relocation cannot overflow.

R_RISCV_CALL (which should be obsoleted) and R_RISCV_CALL_PLT to
undefined weak symbols, if executed, are considered undefined behaviors.
So we can resolve them arbitrarily. Such relocations resolve to a PLT
entry (preemptible) or p+a like R_PC (non-preemptible).

Event Timeline

MaskRay created this revision.Dec 31 2019, 6:10 PM
MaskRay edited the summary of this revision. (Show Details)Dec 31 2019, 6:13 PM
MaskRay updated this revision to Diff 235742.Dec 31 2019, 6:16 PM

Fix REQUIRES: line

For calls, whilst it differs from BFD, it can work just fine as it’s what AArch64 does. For data references, however, this is highly inappropriate. They *need* to resolve to 0 as *real world code* relies on that, and it’s what happens on every architecture out there. Code generated by both GCC and LLVM uses PC-relative relocations to refer to external weak symbols. We can argue all we like about whether that’s good or not, but at the end of the day *that is what happens* and we need to not break that. Non-CALL(_PLT) PC-relative relocations *must not* be mislinked like this. Either do the same rewriting as in John’s patch that BFD does, or produce a suitable error, but it is not acceptable to silently mis-link this code. And I would really like LLD to support linking GCC-produced code; we should not be going off and doing our own incompatible thing. Maybe you don’t want to support gcc -fuse-ld=lld, but what about when using Clang and LLD and you’re statically-linking against a library compiled by GCC? That means coordination; either support what GCC produces (which a patch already exists for), or convince GCC to change.

@ruiu do you agree with this assessment?

MaskRay added a comment.EditedDec 31 2019, 6:33 PM

For calls, whilst it differs from BFD, it can work just fine as it’s what AArch64 does.

For data references, however, this is highly inappropriate. They *need* to resolve to 0 as *real world code* relies on that, and it’s what happens on every architecture out there. Code generated by both GCC and LLVM uses PC-relative relocations to refer to external weak symbols. We can argue all we like about whether that’s good or not, but at the end of the day *that is what happens* and we need to not break that.

No. GCC/clang emit either absolute or GOT-generating relocations (R_RISCV_HI20+R_RISCV_LO12_I; R_RISCV_GOT_HI20+R_RISCV_PCREL_LO12_I). Such relocations to weak undefined symbols can be relied on by real world code. The linker only has to resolve absolute relocations to 0 and fill GOT entries with 0.

PC-relative relocations to undefined weak symbols should not be emitted by compilers. Such hand-written assembly is just broken.

For calls, whilst it differs from BFD, it can work just fine as it’s what AArch64 does.

For data references, however, this is highly inappropriate. They *need* to resolve to 0 as *real world code* relies on that, and it’s what happens on every architecture out there. Code generated by both GCC and LLVM uses PC-relative relocations to refer to external weak symbols. We can argue all we like about whether that’s good or not, but at the end of the day *that is what happens* and we need to not break that.

No. GCC/clang emit either absolute or GOT-generating relocations (R_RISCV_HI20+R_RISCV_LO12_I; R_RISCV_GOT_HI20+R_RISCV_PCREL_LO12_I). Such relocations to weak undefined symbols can be relied on by real world code. The linker only has to resolve absolute relocations to 0 and fill GOT entries with 0.

PC-relative relocations to undefined weak symbols should not be emitted by compilers. Such hand-written assembly is just broken.

No, you’re missing the whole point behind John’s patch, which is to fix LLD so it can link code compiled by either GCC or LLVM. A copy of my comment on his patch so it’s in this discussion chain:

The problem is and always has been -mcmodel=medany. Neither GCC nor LLVM use the GOT in this case, as it’s not -fpic/-fpie. This is the case we care about for the kernel. Would I have made GCC do this? No. But it does, and we have to deal with that in a sensible way. Making it an error and changing GCC and LLVM is one such way. But that hasn’t happened.

MaskRay updated this revision to Diff 235744.Dec 31 2019, 9:02 PM
MaskRay retitled this revision from [ELF][RISCV] Resolve undefined weak R_PC relocations to p+a to [ELF][RISCV] Add test for absolute/relative/branch relocations to undefined weak symbols.

Delete code change. Keep the test.

For calls, whilst it differs from BFD, it can work just fine as it’s what AArch64 does.

For data references, however, this is highly inappropriate. They *need* to resolve to 0 as *real world code* relies on that, and it’s what happens on every architecture out there. Code generated by both GCC and LLVM uses PC-relative relocations to refer to external weak symbols. We can argue all we like about whether that’s good or not, but at the end of the day *that is what happens* and we need to not break that.

No. GCC/clang emit either absolute or GOT-generating relocations (R_RISCV_HI20+R_RISCV_LO12_I; R_RISCV_GOT_HI20+R_RISCV_PCREL_LO12_I). Such relocations to weak undefined symbols can be relied on by real world code. The linker only has to resolve absolute relocations to 0 and fill GOT entries with 0.

PC-relative relocations to undefined weak symbols should not be emitted by compilers. Such hand-written assembly is just broken.

No, you’re missing the whole point behind John’s patch, which is to fix LLD so it can link code compiled by either GCC or LLVM. A copy of my comment on his patch so it’s in this discussion chain:

The problem is and always has been -mcmodel=medany. Neither GCC nor LLVM use the GOT in this case, as it’s not -fpic/-fpie. This is the case we care about for the kernel. Would I have made GCC do this? No. But it does, and we have to deal with that in a sensible way. Making it an error and changing GCC and LLVM is one such way. But that hasn’t happened.

I checked GCC/clang -fno-pic behaviors on small and large code models. (Tiny/medium is similar to either).

extern long v __attribute__((weak));
long foo() { return v; }

gcc -fno-pic

aarch64 -mcmodel={small,large}: GOT variant (.rodata.cst8)
ppc64 -mcmodel={small,large}: GOT variant (.toc)
x86_64 -mcmodel=small: PC relative
x86_64 -mcmodel=large: absolute
riscv -mcmodel=medlow: absolute
riscv -mcmodel=medany: PC relative

clang -fno-pic

aarch64 -mcmodel=small: GOT variant (.rodata.cst8)
aarch64 -mcmodel=large: absolute (R_AARCH64_MOVW_UABS_G0_NC+R_AARCH64_MOVW_UABS_G1_NC+R_AARCH64_MOVW_UABS_G2_NC+R_AARCH64_MOVW_UABS_G3)
ppc64 -mcmodel={small,large}: GOT variant (.toc)
riscv -mcmodel=medlow: absolute (R_RISCV_HI20+R_RISCV_LO12_I)
riscv -mcmodel=medany: PC relative (R_RISCV_PCREL_HI20+R_RISCV_PCREL_LO12_I)

Note that aarch64 and ppc64 use either absolute or GOT variant, but not PC relative. So they can make R_PC to undefined weak resolved to an arbitrary value. x86_64 -mcmodel=small uses PC relative, so it has to use sym.getVA(a).

I thought RISC-V was in the aarch64/ppc64 camp but it turns out that its -mcmodel=medany isn't. If it behaves like aarch64/ppc64,

else if (config->emachine == EM_PPC || config->emachine == EM_RISCV)
  dest = p + a;

will be a very nice solution.

-mcmodel=medany -fno-pic codegen is indeed strange. The external variable access code sequence lla a5, v expands to auipc a5, ...; addi a5,a5,.... riscv64-linux-musl-ld -Ttext=0x200000000 a.o can rewrite the sequence to lui a5,0; mv a5,a5 (set a5 to 0).

@bsdjhb I need to understand more why the FreeBSD kernel uses -mcmodel=medany -fno-pic, but not -mcmodel=medany -fpie.

I want to make a proposal: unify -mcmodel=medany -fno-pic and -mcmodel=medany -fpic. If the variable turns out to be defined locally, relax ld a5,...(a5) to addi a5,a5,... (think of x86 GOTPCRELX). @jrtc27 the code sequences are of the same length, so this should work in theory. Am I right? If so, I'll investigate how to make llvm codegen do this. medany is largely underdocumented (am i missing something?) I will make a comment at the psABI side. If there is better place discussing this issue, please tell me.