This is an archive of the discontinued LLVM Phabricator instance.

[BOLT][AArch64] Preserve in text object alignment
AbandonedPublic

Authored by yota9 on Jun 7 2022, 8:29 AM.

Details

Summary

Some of the rare cases like openssl's KeccakF1600_int has a cycle that
breaks when the address stored in register is aligned on the object
size. To support such cases we need to preserve initial object alignment
and align the CI in case the object was inlined into function.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Diff Detail

Event Timeline

yota9 created this revision.Jun 7 2022, 8:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 8:29 AM
yota9 requested review of this revision.Jun 7 2022, 8:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 8:29 AM
yota9 added a comment.Jun 8 2022, 11:39 AM

Gentle ping

yota9 added a comment.Jun 8 2022, 11:48 AM

@tschuett Thank you for your comment. Yes, I know, but currently bolt uses uint16_t for alignment in its sources, this is out of scope of this patch to refactor and change this.

Thanks @yota9 for working on this! I have some questions below.

bolt/include/bolt/Core/BinaryFunction.h
1104

In general in BOLT we have no way of recovering the correct original alignment. I'm afraid having a function that returns this information might be misleading to users of this API, I'm more comfortable with a function named "guessInputAlignment" than one that says "getInputAlignment".

bolt/lib/Passes/Aligner.cpp
178–188

I don't completely follow this part and perhaps we can clarify: why are we aligning a constant island against the presumed alignment of its parent function, and not against the alignment of the constant island itself in the input?

Another thing that could be an issue here is if the function happens to be at an arbitrary round address (such as 0x100000) and now we have to emit the island with MBs worth of alignment because we have no clue what is the correct alignment. That's bound to happen if we are processing the very first function in the .text section and if it has a constant island, since that first function will probably be aligned to a page boundary, which may push the constant island beyond the reach of some instructions. That's why in general I think that's a fragile strategy, but if we have to do it and if we are resorting to guessing the correct alignment, maybe we should put a cap on it? But before proceeding with this diff, I have one alternative suggestion below.

I read the offending code in openssl. Can we try -skip-funcs=problematic_function_name and just skip supporting it (I'm not sure if that works with AArch64, though). I think BOLT is not in a position of supporting arbitrary assembly code, and code that makes assumptions on the layout of functions is bound to break. This hits AArch64 more strongly because it is okay to have data in code for AArch64, and some programmers like making assumptions on the layout of the data. That's why it's easier for us to provide support for languages such as C++, which has a standard that says that doing pointer arithmetic with function/object pointers is undefined behavior.

yota9 marked an inline comment as done.Jun 9 2022, 8:00 AM
yota9 added inline comments.
bolt/include/bolt/Core/BinaryFunction.h
1104

I agree, thanks

bolt/lib/Passes/Aligner.cpp
178–188

why are we aligning a constant island against the presumed alignment of its parent function

This is under !BF.size() if, another words if it is object in code handled as empty function by BOLT.

Another thing that could be an issue here is if the function happens to be at an arbitrary round address (such as 0x100000) and now we have to emit the island with MBs worth of alignment because we have no clue what is the correct alignment.

I agree, that it might be the problem. How about to add smth like AlignCIMaxBytes option, equal to 512 by default? If the alignment of CI is higher the warning would be displayed.

Can we try -skip-funcs=problematic_function_name and just skip supporting it

It might work, but ideally we would like to process whole objects from the original text. E.g. we've got beta option to remove old text section from the binary, so skip funcs is not an option in that case.

I agree that these things are kind of hacky, but it looks like in this case we can handle the majority of the cases, I assume it would be nice to have such functionality.. For now I will try to add the option above and re upload the review.

Thank you for your comments!

yota9 updated this revision to Diff 435567.Jun 9 2022, 8:36 AM
yota9 marked an inline comment as done.

Add AlignCIMaxBytes option, address comments

treapster added subscribers: rafaelauler, treapster.EditedJun 10 2022, 1:43 PM

I want to note that the case of openSSL is not as simple as "the object is aligned on 512 and has the size of 512 bytes." The size of iotas function which @rafaelauler referenced above is 192 bytes, and the assumption made by the code is not that it begins at 0x100-aligned address, but that it ends there. Thus, the patch in its current form does not fix the issue. To handle this case we have to estimate island size and then emit it at such address that the end of it has the same alignment as in the input binary.
And the test also has to check that the end of CI has the same alignment in the input and output

yota9 added a comment.Jun 10 2022, 1:53 PM

Yes, you're right, I didn't notice that it is not iotas object aligned to .align 8, but the aligning zeroes.

The way we support openSSL users for x86 (or users of any assembly-written libs that have layout assumptions, for that matter) is usually via -skip-funcs.

If we need to fully understand a binary, we might be more aggressive and perhaps even work with the source code of the binary to remove offending code.

TBH it's not like code that makes weird layout assumptions is really important/buying us much performance anyway... so we might just replace it if we can., at the source, or just skip it / preserve it untouched in the original section.

Another example comes from x86 library that puts data in code, for example. It will break BOLT for x86, it is not buying any performance, likely hurting performance by polluting icache. So why even bother writing it in assembly language. The worst part is that assembly-language writers are rarely aware of how to properly encode all the necessary metadata to comply with the ABI (e.g. CFI data completely wrong), so they will often just break it, put wrong symbol sizes in the symbol table, etc.

Once I found code that did a call to a function, but the callee was physically put right after the CALL, in a form of "inlining" but not really removing the CALL instruction. Then I had to write a pass to detect "internal calls" just because of that. That's why it is often easier to just -skip-funcs our way out of clowntown code because I'm afraid there are no limits to assembly-language writer's creativity.

yota9 abandoned this revision.Jun 13 2022, 5:47 AM

Abandon the diff for now