The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the alignment of such sections to the maximum possible
function alignment explicitly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
bolt/lib/Core/BinaryEmitter.cpp | ||
---|---|---|
294–296 | Why is this necessary? Is it possible to write a testcase that shows where we are getting the layout wrong? |
bolt/lib/Core/BinaryEmitter.cpp | ||
---|---|---|
294–296 | As I said cold text section alignment is set using the maximum alignment value |
bolt/lib/Core/BinaryEmitter.cpp | ||
---|---|---|
294–296 | UPD: I've tried to think of the test, but as for aarch64 we now can trigger only the bl case. To do this we should write something, that will have between the call and callee ~128mb, to trigger the error after cold section will be created. We can use "rept" in asm, but it will be very slow (both codegen and bolt) and quite a big binary will be created. So I just don't see the reasonable and easy way to create such test currently.. |
bolt/lib/Passes/LongJmp.cpp | ||
---|---|---|
301 | From what I understand, the real problem though is here, right? We align with the minimum value, but we could be aligning more aggressively depending on "max align bytes", correct? Also, this is done for every function. So if there is an alignment difference created from MinAlign vs. real alignment, I imagine this difference will increase as we emit more and more functions? What confused me then is why are we aligning only the very first function (via section alignment, since all functions are emitted to the same section). Wouldn't be better to replicate the logic to compute the correct alignment here? (Use a formula that retrieves how many bytes need to align and check it against max align bytes). The goal of LongJmp is to replicate exactly what binary emitter is doing (I know this is unfortunate and bug prone, but I don't think there is a better way to have access to the exact offsets of the future layout of functions without calling the assembler and emitting everything) |
bolt/lib/Passes/LongJmp.cpp | ||
---|---|---|
301 | The thing is that we don't align the first function here, we align the section. Streamer.emitCodeAlignment for each function. That function calls emitValueToAlignment located in MCObjectStreamer.cpp that contains the following lines: if (ByteAlignment > CurSec->getAlignment()) CurSec->setAlignment(Align(ByteAlignment)); So depending on what we passed in emitValueToAlignment we might change the section alignment and (of course) the alignment of the very first function. I don't want to overcomplicate the code trying to first to calculate the section alignment that the functions alignment & etc (this should also be done for golang for example, so for every pass that will calculate the layout), so the easiest and the most straightforward way I see is to align the cold section using the maximum possible value explicitly. As for the space-wasting this is very minor, since the ByteAlignment is limited by uint16_t value (64kb) and most of the time we are speaking about just a ~64 bytes. But such a solution will significantly simplify the calculations for such passes :) |
Oh, I see what you mean now and what's being fixed. Thanks for explaining!
I noticed that what I said about considering "max align bytes" is already being done, so the only thing missing was the section alignment that happens implicitly via a call to "emitCodeAlignment".
Why is this necessary? Is it possible to write a testcase that shows where we are getting the layout wrong?