This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Set cold sections alignment explicitly
ClosedPublic

Authored by yota9 on Mar 10 2022, 10:51 AM.

Details

Summary

The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the alignment of such sections to the maximum possible
function alignment explicitly.

Diff Detail

Event Timeline

yota9 created this revision.Mar 10 2022, 10:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 10:51 AM
Herald added a subscriber: ayermolo. · View Herald Transcript
yota9 requested review of this revision.Mar 10 2022, 10:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 10:51 AM
yota9 updated this revision to Diff 414730.Mar 11 2022, 12:25 PM

Set the min section alignment to AlignFunctions

rafauler added inline comments.Mar 11 2022, 5:47 PM
bolt/lib/Core/BinaryEmitter.cpp
294–296

Why is this necessary? Is it possible to write a testcase that shows where we are getting the layout wrong?

yota9 added inline comments.Mar 12 2022, 7:30 AM
bolt/lib/Core/BinaryEmitter.cpp
294–296

As I said cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment (see just under of these new lines). The thing is that when we calculate tentative layout in LongJmp currently we don't take into account that there will be the alignment gap between text and text.cold (see new line I've added there). In order to take cold text alignment in to the account we need to know the biggest alignment value we will pass to the emitCodeAlignment. So the simple idea here is to align the cold section ( which we didn't explicitly aligned before ) with the biggest possible value (opts::AlignFunctions). This way we are guaranteed that it will be the maximum possible value that might be passed to the emitCodeAlignment and now we can easily calculate tentative layout more precise.
I will try to create the simple test soon, I hope it won't be difficult.

yota9 marked an inline comment as done.Mar 12 2022, 10:12 AM
yota9 added inline comments.
bolt/lib/Core/BinaryEmitter.cpp
294–296

UPD: I've tried to think of the test, but as for aarch64 we now can trigger only the bl case. To do this we should write something, that will have between the call and callee ~128mb, to trigger the error after cold section will be created. We can use "rept" in asm, but it will be very slow (both codegen and bolt) and quite a big binary will be created. So I just don't see the reasonable and easy way to create such test currently..

rafauler added inline comments.Mar 14 2022, 4:34 AM
bolt/lib/Passes/LongJmp.cpp
301

From what I understand, the real problem though is here, right?

We align with the minimum value, but we could be aligning more aggressively depending on "max align bytes", correct?

Also, this is done for every function. So if there is an alignment difference created from MinAlign vs. real alignment, I imagine this difference will increase as we emit more and more functions? What confused me then is why are we aligning only the very first function (via section alignment, since all functions are emitted to the same section). Wouldn't be better to replicate the logic to compute the correct alignment here? (Use a formula that retrieves how many bytes need to align and check it against max align bytes). The goal of LongJmp is to replicate exactly what binary emitter is doing (I know this is unfortunate and bug prone, but I don't think there is a better way to have access to the exact offsets of the future layout of functions without calling the assembler and emitting everything)

yota9 added inline comments.Mar 14 2022, 1:29 PM
bolt/lib/Passes/LongJmp.cpp
301

The thing is that we don't align the first function here, we align the section.
On function emittion we call

Streamer.emitCodeAlignment

for each function.

That function calls emitValueToAlignment located in MCObjectStreamer.cpp that contains the following lines:

if (ByteAlignment > CurSec->getAlignment())
  CurSec->setAlignment(Align(ByteAlignment));

So depending on what we passed in emitValueToAlignment we might change the section alignment and (of course) the alignment of the very first function. I don't want to overcomplicate the code trying to first to calculate the section alignment that the functions alignment & etc (this should also be done for golang for example, so for every pass that will calculate the layout), so the easiest and the most straightforward way I see is to align the cold section using the maximum possible value explicitly. As for the space-wasting this is very minor, since the ByteAlignment is limited by uint16_t value (64kb) and most of the time we are speaking about just a ~64 bytes. But such a solution will significantly simplify the calculations for such passes :)

rafauler accepted this revision.Mar 15 2022, 6:34 AM

Oh, I see what you mean now and what's being fixed. Thanks for explaining!
I noticed that what I said about considering "max align bytes" is already being done, so the only thing missing was the section alignment that happens implicitly via a call to "emitCodeAlignment".

This revision is now accepted and ready to land.Mar 15 2022, 6:34 AM
This revision was automatically updated to reflect the committed changes.