This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Make R_I386_GOTPC and R_X86_64_GOTPC32/64 resolve to .got.plt.
AbandonedPublic

Authored by sivachandra on Mar 19 2019, 1:47 PM.

Details

Summary

These relocations are typically used to refer to the symbol
_GLOBAL_OFFSET_TABLE_. Since this symbol is at the beginning of
.got.plt, these relocations should resolve to .got.plt afterall.
This matches the behavior of ld.bfd and ld.gold.

Event Timeline

sivachandra created this revision.Mar 19 2019, 1:47 PM

I believe the patch in the current form will break some _GLOBAL_OFFSET_TABLE_ use cases (see https://reviews.llvm.org/D48095) so I volunteer to be a reviewer :)

_GLOBAL_OFFSET_TABLE_ : specifies the offset to the base of the GOT from the current code location.

The issue has been discussed a lot in https://bugs.llvm.org/show_bug.cgi?id=36555 . Some aspects in that thread no longer apply nowadays. I believe the remaining issue is that we have _GLOBAL_OFFSET_TABLE_ = end(.got) (lld's generalized R_*_FROM_END relocation types require this) but end(.got) != start(.got.plt) (we don't have a section rank rule for this).

As I noted in https://reviews.llvm.org/D56828 , after that patch we can make end(.got) = start(.got.plt) if we want and thus address the issue this patch intends to address. AFAIK no user application needs that but some ld.so implementations may want to have this property.

If people think that making end(.got) = start(.got.plt) is desirable I volunteer to do that, but it requires some more rules in Writer.cpp:getSectionRank?

I believe the patch in the current form will break some _GLOBAL_OFFSET_TABLE_ use cases (see https://reviews.llvm.org/D48095) so I volunteer to be a reviewer :)

Thanks, I did not know who the correct reviewer is, so I just added ruiu.

About "will break some _GLOBAL_OFFSET_TABLE_ use cases", can you elaborate which ones?

_GLOBAL_OFFSET_TABLE_ : specifies the offset to the base of the GOT from the current code location.

AFAIU, this convention you are quoting (from http://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf I believe) pertains to the assembly language usage. See Figure 3.12 for example.

However, _GLOBAL_OFFSET_TABLE_ is also an ELF symbol (which is unrelated to the offset convention above.) So, AFAIU, this patch does not break anything wrt the symbol _GLOBAL_OFFSET_TABLE_. But, you can point me to examples if that is the case.

The issue has been discussed a lot in https://bugs.llvm.org/show_bug.cgi?id=36555 . Some aspects in that thread no longer apply nowadays. I believe the remaining issue is that we have _GLOBAL_OFFSET_TABLE_ = end(.got) (lld's generalized R_*_FROM_END relocation types require this) but end(.got) != start(.got.plt) (we don't have a section rank rule for this).

Again AFAIU, where you place the symbol _GLOBAL_OFFSET_TABLE_ does not matter as long as it points to the zeroth location of the GOT (which is to say, the GOT itself.)

As I noted in https://reviews.llvm.org/D56828 , after that patch we can make end(.got) = start(.got.plt) if we want and thus address the issue this patch intends to address. AFAIK no user application needs that but some ld.so implementations may want to have this property.

I think, that we should/can have end(.got) = start(.got.plt) is a very desirable thing because, again according to the same ABI doc, one should allow negative indices into GOT. Since the zeroth location of GOT is start(.got.plt) by convention, negative indices into GOT are meaningful only if end(.got) = start(.got.plt).

Note that, this change also does not do anything wrt where the symbol _GLOBAL_OFFSET_TABLE_ is placed. What this change is essentially doing is to follow the ABI docs which say that R_I386_GOTPC, R_X86_64_GOTPC32/64 are all this:

GOT + A - P

In ld.lld, GOT is at start(.got.plt). So, this change is essentially making the above relocations resolve as start(.got.plt) + A - P. If ld.lld decides to place GOT at a different location, then we should update how these relocations resolve accordingly.

FWIW, an unrelated glibc use case is what drove me to this change.

If people think that making end(.got) = start(.got.plt) is desirable I volunteer to do that, but it requires some more rules in Writer.cpp:getSectionRank?

I do not have any practical examples at hand to prove/disprove if it is desirable or not. But, end(.got) = start(.got.plt) makes GOT contiguous and allows negative indices into GOT as required by the ABI doc. So, it is probably desirable.

MaskRay added a comment.EditedMar 19 2019, 10:05 PM

About "will break some _GLOBAL_OFFSET_TABLE_ use cases", can you elaborate which ones?

See the example x86-64-reloc-gotoff64.s

With the patch, the sum doesn't compute addr(_DYNAMIC)

leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx ; R_X86_64_GOTPC32 (i.e R_GOTPLTONLY_PC): start(.got.plt)
movabsq $_DYNAMIC@GOTOFF, %rax         ; R_X86_64_GOTOFF64 (i.e R_GOTREL_FROM_END): addr(_DYNAMIC) - end(.got)
addq %rdx, %rax                        ; the sum is supposed to compute addr(_DYNAMIC)

So, AFAIU, this patch does not break anything wrt the symbol _GLOBAL_OFFSET_TABLE_. But, you can point me to examples if that is the case.

The current implementation in lld can have problems when there is a relocation type using _GLOBAL_OFFSET_TABLE_ as S, e.g. a R_X86_64_PC32 with S=_GLOBAL_OFFSET_TABLE_. This is different from R_X86_64_GOTPC32.

For the example above, suppose leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx emits R_X86_64_PC32 instead of R_X86_64_GOTPC32:

leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx ; R_X86_64_PC32: addr(_GLOBAL_OFFSET_TABLE_)
movabsq $_DYNAMIC@GOTOFF, %rax         ; R_X86_64_GOTOFF64 (i.e R_GOTREL_FROM_END): addr(_DYNAMIC) - end(.got)
addq %rdx, %rax                        ; addr(_GLOBAL_OFFSET_TABLE) = start(.got.plt) != end(.got), thus the sum is incorrect

I think, that we should/can have end(.got) = start(.got.plt) is a very desirable thing because, again according to the same ABI doc, one should allow negative indices into GOT. Since the zeroth location of GOT is start(.got.plt) by convention, negative indices into GOT are meaningful only if end(.got) = start(.got.plt).

Can you quote the exact sentences about negative indices into GOT in AMD64 psABI?

Regarding this part:

if (oneof<R_GOTONLY_PC, R_GOTPLTONLY_PC, R_GOTREL, R_GOTREL_FROM_END,
          R_PPC_TOC>(Expr))
  In.Got->HasGotOffRel = true;

After this patch, R_GOTPLTONLY_PC is unrelated to .got so it may look weird to suppress omission of .got when a R_GOTPLTONLY_PC is seen. However, we probably have to preserve that as _GLOBAL_OFFSET_TABLE_ is the base of GOT and it looks weird if .got disappears...

I've thought about the problem before and I think having end(.got) = start(.got.plt) is probably the best we can do to make everything compatible.

About "will break some _GLOBAL_OFFSET_TABLE_ use cases", can you elaborate which ones?

See the example x86-64-reloc-gotoff64.s

With the patch, the sum doesn't compute addr(_DYNAMIC)

leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx ; R_X86_64_GOTPC32 (i.e R_GOTPLTONLY_PC): start(.got.plt) + A - P
movabsq $_DYNAMIC@GOTOFF, %rax         ; R_X86_64_GOTOFF64 (i.e R_GOTREL_FROM_END): addr(_DYNAMIC) - end(.got)
addq %rdx, %rax                        ; the sum is supposed to compute addr(_DYNAMIC)

So, AFAIU, this patch does not break anything wrt the symbol _GLOBAL_OFFSET_TABLE_. But, you can point me to examples if that is the case.

The current implementation in lld can have problems when there is a relocation type using _GLOBAL_OFFSET_TABLE_ as S, e.g. a R_X86_64_PC32 with S=_GLOBAL_OFFSET_TABLE_. This is different from R_X86_64_GOTPC32.

For the example above, suppose leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx emits R_X86_64_PC32 instead of R_X86_64_GOTPC32:

leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx ; R_X86_64_PC32: addr(_GLOBAL_OFFSET_TABLE_) + A - P
movabsq $_DYNAMIC@GOTOFF, %rax         ; R_X86_64_GOTOFF64 (i.e R_GOTREL_FROM_END): addr(_DYNAMIC) - end(.got)
addq %rdx, %rax                        ; addr(_GLOBAL_OFFSET_TABLE) = start(.got.plt) != end(.got), thus the sum is incorrect

Yes, I have noticed the problem with R_X86_64_GOTOFF64 and I have another change for that which I have not completed because of other reasons [1]. But I did not bother in this change because this test, and the glibc example I have been using, do not fail after this change [2].

So, about R_X86_64_GOTOFF64, IMHO it is already broken in ld.lld. But, the test is passing and working as intended because the current implementations of GOTPC64 and GOTOFF64 are cancelling each other out [3]. The reason I say R_X86_64_GOTOFF64 is broken is because of this:

In ld.lld currently, R_X86_64_GOTOFF64 resolves as this:

Sym.getVA(A) - In.Got->getVA() - In.Got->getSize()

I think, it should actually be:

Sym.getVA(A) - In.GotPlt->getVA() // [4]

The reason why I think it should be expression [4] is because the ABI doc specifies R_X86_64_GOTOFF64 to resolve as:

S + A - GOT

In ld.lld, GOT is start(.got.plt). Hence, we should be resolving R_X86_64_GOTOFF64 as:

S + A - start(.got.plt)

Which is nothing but [4].

[1] - To fix R_X86_64_GOTOFF64 the right way, I really think we need .got and .got.plt to be adjacent as we need negative indices into GOT. May be this is _the_ reason why we should make end(.got) == start(.got.plt).
[2] - Glibc's X86_64 code uses extern long _DYNAMIC[] __attribute__((visibility("hidden"))); just like the test in question. However, it refers to only &_DYNAMIC, for which the compiler emits R_X86_64_PC32. Hence, we do not hit the problem with R_X86_64_GOTOFF64. What is broken for glibc without this change is a direct reference to _GLOBAL_OFFSET_TABLE_[0], for which the comiler produces GOTPC32.
[3] - R_X86_64_GOTPC64 is GOT + A - P; R_X86_64_GOTOFF64 is S + A - GOT. The assembly code in the test in question is effectively doing a sum of these two expressions:

GOT + A - P + S + A - GOT = 2A + S - P

The resulting expressing in the RHS is independent of GOT and hence the test works as expected currently.

PS: If we make end(.got) == start(.got.plt), then we should in principle not require this change. However, to avoid surprises while reading code, and to match what the ABI says in word, this change along with a change to R_X86_64_GOTOFF64 would still be nice.

BTW, I am not sure if makeing end(.got) = start(.got.plt) is a good idea because of this: https://fedoraproject.org/wiki/Changes/.got.plt_Isolation. It claims to have no ABI impact, but I am not sure how one can separate out .got.plt without ABI impact.

MaskRay added a comment.EditedMar 19 2019, 11:47 PM

[2] - Glibc's X86_64 code uses extern long _DYNAMIC[] attribute((visibility("hidden"))); just like the test in question. However, it refers to only &_DYNAMIC, for which the compiler emits R_X86_64_PC32. Hence, we do not hit the problem with

glibc emits leaq _DYNAMIC(%rip), %rax (R_X86_64_PC32) because it assumes small code model. I agree that this patch won't break that. However, internally we have medium code model instruction sequences:

leaq _GLOBAL_OFFSET_TABLE_(%rip), %rdx
movabsq $_DYNAMIC@GOTOFF, %rax
addq %rdx, %rax

But, the test is passing and working as intended because the current implementations of GOTPC64 and GOTOFF64 are cancelling each other out [3]

Yes. Currently the example above relies on the cancellation effect.

In ld.lld currently, R_X86_64_GOTOFF64 resolves as this:

Sym.getVA(A) - In.Got->getVA() - In.Got->getSize()

Sym.getVA(A) - In.GotPlt->getVA() // [4]

Agree. Most In.Got->getVA() + In.Got->getSize() should be changed to In.GotPlt->getVA() to be compatible with the address of _GLOBAL_OFFSET_TABLE_.

If we replace all at once, we wouldn't need end(.got) == start(.got.plt).

Two relocation types can not have such simple transformation:

case R_TLSGD_GOT_FROM_END:
  return In.Got->getGlobalDynOffset(Sym) + A - In.Got->getSize();
case R_TLSLD_GOT_FROM_END:
  return In.Got->getTlsIndexOff() + A - In.Got->getSize();

[1] - To fix R_X86_64_GOTOFF64 the right way, I really think we need .got and .got.plt to be adjacent as we need negative indices into GOT. May be this is _the_ reason why we should make end(.got) == start(.got.plt).

Can you quote the paragraph where negative indices into GOT are used in i386/amd64 psABI?

PS: If we make end(.got) == start(.got.plt), then we should in principle not require this change. However, to avoid surprises while reading code, and to match what the ABI says in word, this change along with a change to R_X86_64_GOTOFF64 would still be nice.

Agree. See above, R_X86_64_PC32 and R_X86_64_GOTOFF64 should be fixed at the same time.

If there isn't a convincing reason .got and .got.plt should be adjacent, I think we probably don't need end(.got) == start(.got.plt) (I've changed my mind about that after investigating this more).

Can you quote the paragraph where negative indices into GOT are used in i386/amd64 psABI?

See the paragraph below Figure 5.1 on page 77 of http://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf. Quoting here for convenience:

The symbol _GLOBAL_OFFSET_TABLE_ may reside in the middle of the
.got section, allowing both negative and non-negative offsets into the array of
addresses.

Thinking about it, I think I am mixing up things wrt R_X86_64_GOTOFF64. Implementing it as Sym.getVA(A) - In.GotPlt->getVA() over this change should just work IMHO. Let me try it tomorrow and get back to you.

If there isn't a convincing reason .got and .got.plt should be adjacent, I think we probably don't need end(.got) == start(.got.plt) (I've changed my mind about that after investigating this more).

Yes, I am likely mixing up things from other linkers vs how it is implemented in ld.lld. In ld.lld, for the reloc expression like G + GOT + A - P, we conveniently have Sym.getGotVA() as an equivalent of G + GOT eliminating a need for negative indices. So, I think end(.got) == start(.got.plt) is really not required.

If there isn't a convincing reason .got and .got.plt should be adjacent, I think we probably don't need end(.got) == start(.got.plt) (I've changed my mind about that after investigating this more).

Yes, I am likely mixing up things from other linkers vs how it is implemented in ld.lld. In ld.lld, for the reloc expression like G + GOT + A - P, we conveniently have Sym.getGotVA() as an equivalent of G + GOT eliminating a need for negative indices. So, I think end(.got) == start(.got.plt) is really not required.

I actually had a partial implementation on the stuff but haven't finished it yet (after I did the TLS stuff; I didn't know you are also working on glibc!).. Would you mind me finishing that and sending it for review :)

sivachandra abandoned this revision.Mar 26 2019, 10:36 AM

Not required anymore as https://reviews.llvm.org/D59594 has landed.