Local symbol which requires GOT entry initialized by "page" address. This address is high 16 bits of sum of the symbol value and the relocation addend. In the relocation scanning phase final values of symbols are unknown so to reduce number of allocated GOT entries do the following trick. Save all output sections referenced by GOT relocations during the relocation scanning phase. Then later in the GotSection::finalize method calculate number of "pages" required to cover all saved output sections and allocate appropriate number of GOT entries. We assume the worst case - each 64kb page of the output section has at least one GOT relocation against it.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
That is still more than needed, no?
A single symbol pointing to a 8k page would create 2 got entries but
only needs one. Is that correct?
Cheers,
Rafael
Yes. If symbols point only to the thirst 64kb page of large output section we allocate "page" entries covers the section completely.
A single symbol pointing to a 8k page would create 2 got entries but
only needs one. Is that correct?
I do not consider local symbols individually at all. If a single symbol pointing to a 8k page we get one redundant entry. It is not good but acceptable. If we have more than one symbol pointing to the adjacent addresses, exact number of required GOT entries depends on the final values of these addresses. In the best case they all belong to the same final 64kb page and have the same high 16 bits. In the worst case we will need two page addresses. But to get optimal result we will have to calculate number of required GOT entries after all output section will be finalized and aligned.
There are two problems in this approach (though it allows to get ideal result):
- We need to create a local symbol body for each unique combination of local symbol referenced by GOT relocation and addend. Number of such combinations might be very large. And we need to read relocation addend in relocation scanning phase.
- For some end-users like embedded developers it might be essential to be able to place sections in specific order.
BTW, what result do you get with gold/bfd?
Almost all test cases from LLVM Test suite show similar results. For example (size of .got section):
7zip-benchmark: BFD: 0x17a4 LLD: 0x177c
tramp3d-v4: BFD: 0x11a4 LLD: 0x11b8
There are couple exceptions mafft/pairlocalalign and NPB-serial/is/is. These executables have large .bss section and the patch produces the GOT bigger than necessary. But anyway it is better than current trunk implementation.
There are two problems in this approach (though it allows to get ideal result):
- We need to create a local symbol body for each unique combination of local symbol referenced by GOT relocation and addend. Number of such combinations might be very large. And we need to read relocation addend in relocation scanning phase.
Well, it is one SymbolBody,Addend pair per relocation.
- For some end-users like embedded developers it might be essential to be able to place sections in specific order.
BTW, what result do you get with gold/bfd?
Almost all test cases from LLVM Test suite show similar results. For example (size of .got section):
7zip-benchmark: BFD: 0x17a4 LLD: 0x177c
tramp3d-v4: BFD: 0x11a4 LLD: 0x11b8There are couple exceptions mafft/pairlocalalign and NPB-serial/is/is. These executables have large .bss section and the patch produces the GOT bigger than necessary. But anyway it is better than current trunk implementation.
I guess the question is more: do you know in which cases they produce
a better result?
Sending a few more comments via Phab.
Cheers,
Rafael
OK, I think this is probably fine.
If we want to compute a better result in the future and still support got pointing forward in the file I think we can do:
- Delay the scan just a bit so we know the position of each symbol in its output section.
- Compute the offset in the output section that each got entry will have
- Use that to compute an upper bound on the number of entries.
Can you just upload a new version with the inline comments fixed?
ELF/OutputSections.cpp | ||
---|---|---|
209 | The description is not exactly true. You are computing an upper bound, not the exact number. Given that, you are not assuming that the "relocations are spread" uniformly. It is just that the worst case is for every page in the section to have a relocation pointing at it. | |
ELF/Writer.cpp | ||
1096 ↗ | (On Diff #51392) | Do you have a testcase for this? If not, could you leave it for a follow up patch? |
Suppose that you have a large output section. All relocations against this section target only a small range of addresses in this section. In that case bfd and gold generates only a few GOT entries. This patch in contrast generates GOT entry for each 64kb of the output section.
- Fixed the comment describes how we count the number of required GOT entries.
- Removed changes from Writer.cpp. I reviewed this code once again and realized that the changes can be rolled back. The only section (except .dynamic, .got.plt etc) which can modify its sh_size in the finalize method is the MergeOutputSection. But such sections always go before the .got section in the OutputSections container because we support only read-only merge sections and read-only sections go before writable ones. So it is does not necessary to postpone finalizing of the .got section. It would be nice to have an assert to check that fact but I cannot find how to do that without significant modification of the code.
The description is not exactly true.
You are computing an upper bound, not the exact number. Given that, you are not assuming that the "relocations are spread" uniformly. It is just that the worst case is for every page in the section to have a relocation pointing at it.