This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump][X86] Add @plt symbols for .plt.got
ClosedPublic

Authored by MaskRay on May 3 2023, 10:40 PM.

Details

Summary

If a symbol needs both JUMP_SLOT and GLOB_DAT relocations, there is a
minor linker optimization to keep just GLOB_DAT. This optimization
is only implemented by GNU ld's x86 port and mold.
https://maskray.me/blog/2021-08-29-all-about-global-offset-table#combining-.got-and-.got.plt

With the optimizing, the PLT entry is placed in .plt.got and the
associated GOTPLT entry is placed in .got (ld.bfd -z now) or .got.plt (ld.bfd -z lazy).
The relocation is in .rel[a].dyn.

This patch synthesizes symbol@plt labels for these .plt.got entries.

Example:

cat > a.s <<e
.globl _start; _start:
mov combined0@gotpcrel(%rip), %rax; mov combined1@gotpcrel(%rip), %rax
call combined0@plt; call combined1@plt
call foo0@plt; call foo1@plt
e
cat > b.s <<e
.globl foo0, foo1, combined0, combined1
foo0: foo1: combined0: combined1:
e
gcc -fuse-ld=bfd -shared b.s -o b.so
gcc -fuse-ld=bfd -pie -nostdlib a.s b.so -o a
Disassembly of section .plt:

0000000000001000 <.plt>:
    1000: ff 35 ea 1f 00 00             pushq   0x1fea(%rip)            # 0x2ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
    1006: ff 25 ec 1f 00 00             jmpq    *0x1fec(%rip)           # 0x2ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
    100c: 0f 1f 40 00                   nopl    (%rax)

0000000000001010 <foo1@plt>:
    1010: ff 25 ea 1f 00 00             jmpq    *0x1fea(%rip)           # 0x3000 <_GLOBAL_OFFSET_TABLE_+0x18>
    1016: 68 00 00 00 00                pushq   $0x0
    101b: e9 e0 ff ff ff                jmp     0x1000 <.plt>

0000000000001020 <foo0@plt>:
    1020: ff 25 e2 1f 00 00             jmpq    *0x1fe2(%rip)           # 0x3008 <_GLOBAL_OFFSET_TABLE_+0x20>
    1026: 68 01 00 00 00                pushq   $0x1
    102b: e9 d0 ff ff ff                jmp     0x1000 <.plt>

Disassembly of section .plt.got:

0000000000001030 <combined0@plt>:
    1030: ff 25 a2 1f 00 00             jmpq    *0x1fa2(%rip)           # 0x2fd8 <foo1+0x2fd8>
    1036: 66 90                         nop

0000000000001038 <combined1@plt>:
    1038: ff 25 a2 1f 00 00             jmpq    *0x1fa2(%rip)           # 0x2fe0 <foo1+0x2fe0>
    103e: 66 90                         nop

For x86-32, with -z now, if we remove foo0 and foo1, the absence of regular
PLT will cause GNU ld to omit .got.plt, and our code cannot synthesize @plt
labels. This is an extreme corner case that almost never happens in practice (to
trigger the case, ensure every PLT symbol has been taken address). To fix it, we
can get the _GLOBAL_OFFSET_TABLE_ symbol value, but the complexity is not
worth it.

Close https://github.com/llvm/llvm-project/issues/62537

Diff Detail

Event Timeline

MaskRay created this revision.May 3 2023, 10:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 10:40 PM
MaskRay requested review of this revision.May 3 2023, 10:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 10:40 PM
bd1976llvm accepted this revision.May 15 2023, 7:28 AM

Thanks for working on this and sorry for the lack of response.

Generally looks good, LGTM from me.

I'm happy with the missing corner case for x86-32, given that you mentioned that it is extremely unlikely to occur. Do the GNU tools support that corner case? If they have gone to the trouble to support it I'm wondering if they have a use-case that hasn't occurred to us?
In the description, I think we should have a link to the optimisation in the PS ABI documents in addition to your blog entry.
I wonder if the LLDB folks are aware of this? Might be worth informing them?
I have mentioned a few improvements that seemed worthwhile to me. Feel free to ignore these if they are not helpful.

llvm/lib/Object/ELFObjectFile.cpp
678

R instead of Relocation to match the RelaDyn code?

Also the RelaPlt code here looks very similar to the RelaDyn code below. I would be tempted to factor out the common parts?

llvm/test/tools/llvm-objdump/X86/plt-got.test
3

Would be nice to expand this comment to also mention more details of the optimisation e.g. dropping the JUMP_SLOT entry.

40

The check for the x86-64 case includes the registers?

62

don't need this empty line

This revision is now accepted and ready to land.May 15 2023, 7:28 AM
MaskRay updated this revision to Diff 522447.May 15 2023, 11:06 PM
MaskRay marked 4 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

address comments

MaskRay added a comment.EditedMay 15 2023, 11:12 PM

Thanks for working on this and sorry for the lack of response.

Generally looks good, LGTM from me.

I'm happy with the missing corner case for x86-32, given that you mentioned that it is extremely unlikely to occur. Do the GNU tools support that corner case? If they have gone to the trouble to support it I'm wondering if they have a use-case that hasn't occurred to us?
In the description, I think we should have a link to the optimisation in the PS ABI documents in addition to your blog entry.
I wonder if the LLDB folks are aware of this? Might be worth informing them?
I have mentioned a few improvements that seemed worthwhile to me. Feel free to ignore these if they are not helpful.

Thanks for the review! Sorry for the shameless plug. The official psABI mentions combining JUMP_SLOT and GLOB_DAT, but doesn't say anything about .plt.got, or the fact that .got.plt can be omitted...

With int main() {} gcc x.c -o x -z now -fuse-ld=bfd -m32, we don't get .got.plt. If we add a libc function call, a regular PLT will be needed. To omit .got.plt again, we need to take the address of the libc function call. For every PLT we need an address-taken operation in the code.
This is why I said that the corner case nearly never happens in the wild.
It seems that objdump does symbolize .plt.got in this case. I believe it takes quite some code to retrieve the value _GLOBAL_OFFSET_TABLE_.

Thanks for mentioning lldb. This gives me a motivation to study how it performs symbolization...

Thanks for working on this and sorry for the lack of response.

Generally looks good, LGTM from me.

I'm happy with the missing corner case for x86-32, given that you mentioned that it is extremely unlikely to occur. Do the GNU tools support that corner case? If they have gone to the trouble to support it I'm wondering if they have a use-case that hasn't occurred to us?
In the description, I think we should have a link to the optimisation in the PS ABI documents in addition to your blog entry.
I wonder if the LLDB folks are aware of this? Might be worth informing them?
I have mentioned a few improvements that seemed worthwhile to me. Feel free to ignore these if they are not helpful.

Thanks for the review! Sorry for the shameless plug. The official psABI mentions combining JUMP_SLOT and GLOB_DAT, but doesn't say anything about .plt.got, or the fact that .got.plt can be omitted...

No problem. I have been known to read your blog entries myself :)

With int main() {} gcc x.c -o x -z now -fuse-ld=bfd -m32, we don't get .got.plt. If we add a libc function call, a regular PLT will be needed. To omit .got.plt again, we need to take the address of the libc function call. For every PLT we need an address-taken operation in the code.
This is why I said that the corner case nearly never happens in the wild.
It seems that objdump does symbolize .plt.got in this case. I believe it takes quite some code to retrieve the value _GLOBAL_OFFSET_TABLE_.

Thanks for mentioning lldb. This gives me a motivation to study how it performs symbolization...

Thanks for the extra information, makes sense to me.

LGTM.

This revision was landed with ongoing or failed builds.May 16 2023, 9:22 AM
This revision was automatically updated to reflect the committed changes.