This is an archive of the discontinued LLVM Phabricator instance.

[ELF] x86-64: place .lrodata, .lbss, and .ldata away from code sections
ClosedPublic

Authored by MaskRay on May 13 2023, 11:00 AM.

Details

Summary

The x86-64 medium code model utilizes large data sections, namely .lrodata,
.lbss, and .ldata (along with some variants of .ldata). There is a proposal to
extend the use of large data sections to the large code model as well[1].

This patch aims to place large data sections away from code sections in order to
alleviate relocation overflow pressure caused by code sections referencing
regular data sections.

.lrodata
.rodata
.text     # if --ro-segment, MAXPAGESIZE alignment
RELRO     # MAXPAGESIZE alignment
.data     # MAXPAGESIZE alignment
.bss
.ldata    # MAXPAGESIZE alignment
.lbss

In comparison to GNU ld, which places .lbss, .lrodata, and .ldata after .bss, we
place .lrodata above .rodata to minimize the number of permission transitions in
the memory image.

While GNU ld places .lbss after .bss, the subsequent sections don't reuse the
file offset bytes of BSS.

Our approach is to place .ldata and .lbss after .bss and create a PT_LOAD
segment for .bss to large data section transition in the absence of SECTIONS
commands. assignFileOffsets ensures we insert an alignment instead of allocating
space for BSS, and therefore we don't waste more than MAXPAGESIZE bytes. We have
a missing optimization to prevent all waste, but implementing it would introduce
complexity and likely be error-prone.

GNU ld's layout introduces 2 more MAXPAGESIZE alignments while ours
introduces just one.

[1]: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU "Large data sections for the large code model"

With help from Arthur Eubanks.

Co-authored-by: James Y Knight <jyknight@google.com>

Diff Detail

Event Timeline

MaskRay created this revision.May 13 2023, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2023, 11:00 AM
MaskRay requested review of this revision.May 13 2023, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2023, 11:00 AM

The approach and the code looks good to me. As an Arm person I would prefer for someone involved in x86_64 to comment on the proposed changes from GNU ld. If there are none forthcoming then give me a ping. From the GNU ld code-base there look to be a few other architectures that have .ldata sections but none of them look to be supported by upstream LLD.

lld/ELF/Writer.cpp
899

Do you mean .lrodata in the comment above?

So, I also had a version I was going to send for review, but I'm too slow, and it's now conflicted with intermediary cleanups.

So I've pushed it here instead for now: https://github.com/llvm/llvm-project/compare/main...jyknight:llvm-project:lld-largefile-2 (starting with a revert of your cleanup, just in order to not need to deal with merge conflicts at this point...).

The main points I think are important are:

  • Also supports code sections marked large (they are placed at the end, after RW data).
  • Splits segments at NOBITS->BITS boundaries, so that a new segment is started between .bss and .ldata (assuming there is .bss).

So, I also had a version I was going to send for review, but I'm too slow, and it's now conflicted with intermediary cleanups.

So I've pushed it here instead for now: https://github.com/llvm/llvm-project/compare/main...jyknight:llvm-project:lld-largefile-2 (starting with a revert of your cleanup, just in order to not need to deal with merge conflicts at this point...).

Thanks!

The main points I think are important are:

  • Also supports code sections marked large (they are placed at the end, after RW data).
  • Splits segments at NOBITS->BITS boundaries, so that a new segment is started between .bss and .ldata (assuming there is .bss).

I have thought about this, but after close look at GNU ld's layout, it shares the missing optimization with us... So I think not implementing this is fine.
Then, we can just not start a new RW PT_LOAD...

While GNU ld places .lbss after .bss, the subsequent sections don't reuse the file offset bytes of BSS. We have a similar missing optimization (implementing it would introduce complexity and likely be error-prone).

(I think objcopy/strip traditionally has some issues with bss including a bug which only happens with lld's layout.)

GCC/GNU ld don't create .ltext, so I omit it... Perhaps you may kindly send a patch to place .ltext, after Clang gets support for .ltext and .ltext.*...

lld/ELF/Writer.cpp
899

I mean .rodata is closer to .text. We have .dynsym ... .lrodata .rodata, but the comment can probably rewritten in a clearer way.

Splits segments at NOBITS->BITS boundaries, so that a new segment is started between .bss and .ldata (assuming there is .bss).

why is this necessary?

lld/ELF/Writer.cpp
899

Yeah I think this comment is confusing, something like Place large sections further from .text is clearer

lld/test/ELF/x86-64-section-layout.s
25

shouldn't this be merged with .ldata according to the change in LinkerScript.cpp?

MaskRay updated this revision to Diff 524475.May 22 2023, 1:52 PM
MaskRay marked 3 inline comments as done.

improve a comment

lld/test/ELF/x86-64-section-layout.s
25

No. .ldata2 is picked to test that .ldata2 is not combined into .ldata .ldata.* sections.

Splits segments at NOBITS->BITS boundaries, so that a new segment is started between .bss and .ldata (assuming there is .bss).

why is this necessary?

As I noted in the description:

While GNU ld places .lbss after .bss, the subsequent sections don't reuse the file offset bytes of BSS. We have a similar missing optimization (implementing it would introduce complexity and likely be error-prone).

If we start a new PT_LOAD at NOBITS->BITS boundaries, we can in theory implement such a file size optimization.
fixSectionAlignments and assignFileOffsets would need some involved code.
I think this is error-prone and may not be a worthwhile change.

Splits segments at NOBITS->BITS boundaries, so that a new segment is started between .bss and .ldata (assuming there is .bss).

why is this necessary?

As I noted in the description:

While GNU ld places .lbss after .bss, the subsequent sections don't reuse the file offset bytes of BSS. We have a similar missing optimization (implementing it would introduce complexity and likely be error-prone).

If we start a new PT_LOAD at NOBITS->BITS boundaries, we can in theory implement such a file size optimization.
fixSectionAlignments and assignFileOffsets would need some involved code.
I think this is error-prone and may not be a worthwhile change.

As a reminder, the section layout we're talking about is: .data (PROGBITS RW), .bss (NOBITS RW), .ldata (PROGBITS RW LARGE). The question is whether to cover all three sections with a single LOAD, or whether to use two LOADs, one for .data and .bss, and the second for .ldata.

In the former case, you cannot use the ability of an ELF LOAD to specify a smaller filesize than memory memory-size -- where the remainder of the memory-size gets zero-filled by the loader, because that can only be at the end of the LOAD. Thus, in that model, if you have 8MB of .bss, you will waste 8MB of zeros in the binary.

My patch does the latter, instead, and that saves space, already, without additional changes to other functions. Because of the way the alignment code works, you'll still waste some bytes in the file, but only up to max-page-size bytes (which defaults to 4K for x86-64; users can change it with -z,max-page-size= if they like).

aeubanks accepted this revision.May 24 2023, 9:57 AM

As a reminder, the section layout we're talking about is: .data (PROGBITS RW), .bss (NOBITS RW), .ldata (PROGBITS RW LARGE). The question is whether to cover all three sections with a single LOAD, or whether to use two LOADs, one for .data and .bss, and the second for .ldata.

In the former case, you cannot use the ability of an ELF LOAD to specify a smaller filesize than memory memory-size -- where the remainder of the memory-size gets zero-filled by the loader, because that can only be at the end of the LOAD. Thus, in that model, if you have 8MB of .bss, you will waste 8MB of zeros in the binary.

My patch does the latter, instead, and that saves space, already, without additional changes to other functions. Because of the way the alignment code works, you'll still waste some bytes in the file, but only up to max-page-size bytes (which defaults to 4K for x86-64; users can change it with -z,max-page-size= if they like).

The size of .bss is likely non-negligible if we're implementing -mlarge-data-threshold, so this does seem important.

If we start a new PT_LOAD at NOBITS->BITS boundaries, we can in theory implement such a file size optimization.
fixSectionAlignments and assignFileOffsets would need some involved code.
I think this is error-prone and may not be a worthwhile change.

James is saying that this already works?

anyway, this patch lgtm, I can take over James's https://github.com/llvm/llvm-project/commit/1ef68439208701146384d04f58286cdb94623452 for the new LOAD

lld/test/ELF/x86-64-section-layout.s
25

ah I missed that isSectionPrefix also checks for a . after the prefix

This revision is now accepted and ready to land.May 24 2023, 9:57 AM
MaskRay updated this revision to Diff 525319.May 24 2023, 1:24 PM
MaskRay edited the summary of this revision. (Show Details)

Add Co-authored-by and thanks to Arthur Eubanks

Add a SECTIONS test. Compared with https://github.com/llvm/llvm-project/commit/1ef68439208701146384d04f58286cdb94623452 , this version intentionally doesn't special case SECTIONS commands.

MaskRay marked an inline comment as done.May 24 2023, 1:27 PM

As a reminder, the section layout we're talking about is: .data (PROGBITS RW), .bss (NOBITS RW), .ldata (PROGBITS RW LARGE). The question is whether to cover all three sections with a single LOAD, or whether to use two LOADs, one for .data and .bss, and the second for .ldata.

In the former case, you cannot use the ability of an ELF LOAD to specify a smaller filesize than memory memory-size -- where the remainder of the memory-size gets zero-filled by the loader, because that can only be at the end of the LOAD. Thus, in that model, if you have 8MB of .bss, you will waste 8MB of zeros in the binary.

My patch does the latter, instead, and that saves space, already, without additional changes to other functions. Because of the way the alignment code works, you'll still waste some bytes in the file, but only up to max-page-size bytes (which defaults to 4K for x86-64; users can change it with -z,max-page-size= if they like).

The size of .bss is likely non-negligible if we're implementing -mlarge-data-threshold, so this does seem important.

If we start a new PT_LOAD at NOBITS->BITS boundaries, we can in theory implement such a file size optimization.
fixSectionAlignments and assignFileOffsets would need some involved code.
I think this is error-prone and may not be a worthwhile change.

James is saying that this already works?

anyway, this patch lgtm, I can take over James's https://github.com/llvm/llvm-project/commit/1ef68439208701146384d04f58286cdb94623452 for the new LOAD

Yes, it already works. Seems good to incorporate the change in this patch. We waste up to MAXPAGESIZE bytes, but the current behavior should be good enough.

MaskRay updated this revision to Diff 525323.May 24 2023, 1:34 PM
MaskRay edited the summary of this revision. (Show Details)

Use jyknight's comment

respect hasSectionsCommand

tkoeppe accepted this revision.May 24 2023, 2:21 PM
aeubanks accepted this revision.May 24 2023, 2:34 PM

lgtm, but should we explicitly test the segments with with llvm-readelf --program-headers?

MaskRay updated this revision to Diff 525369.May 24 2023, 4:48 PM
MaskRay edited the summary of this revision. (Show Details)

Test program headers