This is the PR27972 and PR32938.
I am posting it to show problems I faced when tried
to support this optimization and have some discussion
about it. Main question is do we want to support this optimization ?
And how to deal with 2, 3 paragraphs from below if yes,
x86_64 ABI (section B.1) tolds that when there are both
GOT and PLT references to the same symbol, normally linker
creates GOTPLT entry and GOT entry. Two dynamic relocations:
JUMP_SLOT and GLOB_DAT serves to handle things. That is
what LLD already implemented.
As optimization, linker may skip creating GOTPLT entry and create
special PLT entry that will use GOT instead. That allows to use single
GLOB_DAT dynamic relocation. Also since PLT entry is special,
it can be 8 bytes only and ABI suggests to use separate section for that.
Patch do next things:
- It introduces .plt.got section. Name is consistent with bfd.
Section keeps special 8 PLT bytes entries with jump instruction which
uses address from GOT as destination. It was possible to use regular
.plt section (16 bytes for x86_64), but it would be suboptimal and probably not clean
from code POV.
- When scanning relocations, new logic does not create got.plt entry if it is known
that symbol has got entry already. But not vise-versa. So currently it will optimize following code
correctly:
movq foo@GOTPCREL(%rip), %rax callq foo@PLT
but not:
callq foo@PLT movq foo@GOTPCREL(%rip), %rax
I thought about how to implement both.
I think it is possible if we delay creating plt entries until all relocations are
scanned. Then we will know if symbol uses got and so can avoid creation of .got.plt for it.
It should not be hard, but to keep patch cleaner, smaller and simpler I did not do that in draft.
Not sure what is better way to do that.
- ABI says that optimization must be avoided if pointer equality is needed.
Looks it is possible to support if we scan relocations for checking that somehow.
That is what bfd do I believe. Not sure what is correct/best way to check that equality is needed ?
It may bring additional complication. Or should we disable this relaxation by default ?