This is an archive of the discontinued LLVM Phabricator instance.

[BOLT][AArch64] Preserve in text object alignment
AbandonedPublic

Authored by yota9 on Jun 7 2022, 8:29 AM.

Download Raw Diff

Details

Reviewers

maksfb
rafauler
Amir

Summary

Some of the rare cases like openssl's KeccakF1600_int has a cycle that
breaks when the address stored in register is aligned on the object
size. To support such cases we need to preserve initial object alignment
and align the CI in case the object was inlined into function.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yota9 created this revision.Jun 7 2022, 8:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 8:29 AM

Herald added subscribers: ayermolo, kristof.beyls. · View Herald Transcript

yota9 requested review of this revision.Jun 7 2022, 8:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 8:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B168312: Diff 434829.Jun 7 2022, 8:41 AM

Gentle ping

LLVM has its own type for alignment:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/Alignment.h

@tschuett Thank you for your comment. Yes, I know, but currently bolt uses uint16_t for alignment in its sources, this is out of scope of this patch to refactor and change this.

No worries.

Thanks @yota9 for working on this! I have some questions below.

bolt/include/bolt/Core/BinaryFunction.h
1104	In general in BOLT we have no way of recovering the correct original alignment. I'm afraid having a function that returns this information might be misleading to users of this API, I'm more comfortable with a function named "guessInputAlignment" than one that says "getInputAlignment".
bolt/lib/Passes/Aligner.cpp
178–188	I don't completely follow this part and perhaps we can clarify: why are we aligning a constant island against the presumed alignment of its parent function, and not against the alignment of the constant island itself in the input? Another thing that could be an issue here is if the function happens to be at an arbitrary round address (such as 0x100000) and now we have to emit the island with MBs worth of alignment because we have no clue what is the correct alignment. That's bound to happen if we are processing the very first function in the .text section and if it has a constant island, since that first function will probably be aligned to a page boundary, which may push the constant island beyond the reach of some instructions. That's why in general I think that's a fragile strategy, but if we have to do it and if we are resorting to guessing the correct alignment, maybe we should put a cap on it? But before proceeding with this diff, I have one alternative suggestion below. I read the offending code in openssl. Can we try -skip-funcs=problematic_function_name and just skip supporting it (I'm not sure if that works with AArch64, though). I think BOLT is not in a position of supporting arbitrary assembly code, and code that makes assumptions on the layout of functions is bound to break. This hits AArch64 more strongly because it is okay to have data in code for AArch64, and some programmers like making assumptions on the layout of the data. That's why it's easier for us to provide support for languages such as C++, which has a standard that says that doing pointer arithmetic with function/object pointers is undefined behavior.

For reference in this discussion, offending code is here: https://github.com/openssl/openssl/blob/master/crypto/sha/asm/keccak1600-armv8.pl#L87

yota9 marked an inline comment as done.Jun 9 2022, 8:00 AM

yota9 added inline comments.

bolt/include/bolt/Core/BinaryFunction.h
1104	I agree, thanks
bolt/lib/Passes/Aligner.cpp
178–188	why are we aligning a constant island against the presumed alignment of its parent function This is under !BF.size() if, another words if it is object in code handled as empty function by BOLT. Another thing that could be an issue here is if the function happens to be at an arbitrary round address (such as 0x100000) and now we have to emit the island with MBs worth of alignment because we have no clue what is the correct alignment. I agree, that it might be the problem. How about to add smth like AlignCIMaxBytes option, equal to 512 by default? If the alignment of CI is higher the warning would be displayed. Can we try -skip-funcs=problematic_function_name and just skip supporting it It might work, but ideally we would like to process whole objects from the original text. E.g. we've got beta option to remove old text section from the binary, so skip funcs is not an option in that case. I agree that these things are kind of hacky, but it looks like in this case we can handle the majority of the cases, I assume it would be nice to have such functionality.. For now I will try to add the option above and re upload the review. Thank you for your comments!

Add AlignCIMaxBytes option, address comments

Harbormaster completed remote builds in B168838: Diff 435567.Jun 9 2022, 8:46 AM

I want to note that the case of openSSL is not as simple as "the object is aligned on 512 and has the size of 512 bytes." The size of iotas function which @rafaelauler referenced above is 192 bytes, and the assumption made by the code is not that it begins at 0x100-aligned address, but that it ends there. Thus, the patch in its current form does not fix the issue. To handle this case we have to estimate island size and then emit it at such address that the end of it has the same alignment as in the input binary.
And the test also has to check that the end of CI has the same alignment in the input and output

Yes, you're right, I didn't notice that it is not iotas object aligned to .align 8, but the aligning zeroes.

The way we support openSSL users for x86 (or users of any assembly-written libs that have layout assumptions, for that matter) is usually via -skip-funcs.

If we need to fully understand a binary, we might be more aggressive and perhaps even work with the source code of the binary to remove offending code.

TBH it's not like code that makes weird layout assumptions is really important/buying us much performance anyway... so we might just replace it if we can., at the source, or just skip it / preserve it untouched in the original section.

Another example comes from x86 library that puts data in code, for example. It will break BOLT for x86, it is not buying any performance, likely hurting performance by polluting icache. So why even bother writing it in assembly language. The worst part is that assembly-language writers are rarely aware of how to properly encode all the necessary metadata to comply with the ABI (e.g. CFI data completely wrong), so they will often just break it, put wrong symbol sizes in the symbol table, etc.

Once I found code that did a call to a function, but the callee was physically put right after the CALL, in a form of "inlining" but not really removing the CALL instruction. Then I had to write a pass to detect "internal calls" just because of that. That's why it is often easier to just -skip-funcs our way out of clowntown code because I'm afraid there are no limits to assembly-language writer's creativity.

Abandon the diff for now

Revision Contents

Path

Size

bolt/

include/

bolt/

Core/

BinaryFunction.h

20 lines

lib/

Core/

BinaryEmitter.cpp

4 lines

Passes/

Aligner.cpp

21 lines

test/

AArch64/

object-in-code-alignment.s

42 lines

Diff 435567

bolt/include/bolt/Core/BinaryFunction.h

Show All 38 Lines
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDwarf.h"		#include "llvm/MC/MCDwarf.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
#include "llvm/Object/ObjectFile.h"		#include "llvm/Object/ObjectFile.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <limits>		#include <limits>
		#include <string>
#include <unordered_map>		#include <unordered_map>
#include <unordered_set>		#include <unordered_set>
#include <vector>		#include <vector>

using namespace llvm::object;		using namespace llvm::object;

namespace llvm {		namespace llvm {

▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	struct IslandInfo {
/// Keeps track of other functions we depend on because there is a reference		/// Keeps track of other functions we depend on because there is a reference
/// to the constant islands in them.		/// to the constant islands in them.
IslandProxiesType Proxies, ColdProxies;		IslandProxiesType Proxies, ColdProxies;
SmallPtrSet<BinaryFunction *, 1> Dependency; // The other way around		SmallPtrSet<BinaryFunction *, 1> Dependency; // The other way around

mutable MCSymbol *FunctionConstantIslandLabel{nullptr};		mutable MCSymbol *FunctionConstantIslandLabel{nullptr};
mutable MCSymbol *FunctionColdConstantIslandLabel{nullptr};		mutable MCSymbol *FunctionColdConstantIslandLabel{nullptr};

		// Constant island alignment value
		uint16_t Alignment{0};
// Returns constant island alignment		// Returns constant island alignment
uint16_t getAlignment() const { return sizeof(uint64_t); }		// The minimum required alignment is 8 bytes
		uint16_t getAlignment() const {
		return std::max(Alignment, (uint16_t)sizeof(uint64_t));
		}
		// Set constant island alignment value
		void setAlignment(uint16_t Value) { Alignment = Value; }
};		};

static constexpr uint64_t COUNT_NO_PROFILE =		static constexpr uint64_t COUNT_NO_PROFILE =
BinaryBasicBlock::COUNT_NO_PROFILE;		BinaryBasicBlock::COUNT_NO_PROFILE;

/// We have to use at least 2-byte alignment for functions because of C++ ABI.		/// We have to use at least 2-byte alignment for functions because of C++ ABI.
static constexpr unsigned MinAlign = 2;		static constexpr unsigned MinAlign = 2;

▲ Show 20 Lines • Show All 901 Lines • ▼ Show 20 Lines	public:

/// Return original address of the function (or offset from base for PIC).		/// Return original address of the function (or offset from base for PIC).
uint64_t getAddress() const { return Address; }		uint64_t getAddress() const { return Address; }

uint64_t getOutputAddress() const { return OutputAddress; }		uint64_t getOutputAddress() const { return OutputAddress; }

uint64_t getOutputSize() const { return OutputSize; }		uint64_t getOutputSize() const { return OutputSize; }

		// Return original alignment value of the function based on it's address.
		uint16_t guessInputAlignment() const {
		rafaulerUnsubmitted Done Reply Inline Actions In general in BOLT we have no way of recovering the correct original alignment. I'm afraid having a function that returns this information might be misleading to users of this API, I'm more comfortable with a function named "guessInputAlignment" than one that says "getInputAlignment". rafauler: In general in BOLT we have no way of recovering the correct original alignment. I'm afraid…
		yota9AuthorUnsubmitted Done Reply Inline Actions I agree, thanks yota9: I agree, thanks
		return 1 << (ffsll(getAddress()) - 1);
		}

/// Does this function have a valid streaming order index?		/// Does this function have a valid streaming order index?
bool hasValidIndex() const { return Index != -1U; }		bool hasValidIndex() const { return Index != -1U; }

/// Get the streaming order index for this function.		/// Get the streaming order index for this function.
uint32_t getIndex() const { return Index; }		uint32_t getIndex() const { return Index; }

/// Set the streaming order index for this function.		/// Set the streaming order index for this function.
void setIndex(uint32_t Idx) {		void setIndex(uint32_t Idx) {
▲ Show 20 Lines • Show All 950 Lines • ▼ Show 20 Lines	bool isInConstantIsland(uint64_t Address) const {

return std::prev(CodeIter) <= DataIter;		return std::prev(CodeIter) <= DataIter;
}		}

uint16_t getConstantIslandAlignment() const {		uint16_t getConstantIslandAlignment() const {
return Islands ? Islands->getAlignment() : 1;		return Islands ? Islands->getAlignment() : 1;
}		}

		void setConstantIslandAlignment(uint16_t Alignment) {
		assert(Islands && "function expected to have constant islands");
		Islands->setAlignment(Alignment);
		}

uint64_t		uint64_t
estimateConstantIslandSize(const BinaryFunction *OnBehalfOf = nullptr) const {		estimateConstantIslandSize(const BinaryFunction *OnBehalfOf = nullptr) const {
if (!Islands)		if (!Islands)
return 0;		return 0;

uint64_t Size = 0;		uint64_t Size = 0;
for (auto DataIter = Islands->DataOffsets.begin();		for (auto DataIter = Islands->DataOffsets.begin();
DataIter != Islands->DataOffsets.end(); ++DataIter) {		DataIter != Islands->DataOffsets.end(); ++DataIter) {
▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

bolt/lib/Core/BinaryEmitter.cpp

Show First 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	if (!BF.hasIslandsInfo())
return;		return;

BinaryFunction::IslandInfo &Islands = BF.getIslandInfo();		BinaryFunction::IslandInfo &Islands = BF.getIslandInfo();
if (Islands.DataOffsets.empty() && Islands.Dependency.empty())		if (Islands.DataOffsets.empty() && Islands.Dependency.empty())
return;		return;

// AArch64 requires CI to be aligned to 8 bytes due to access instructions		// AArch64 requires CI to be aligned to 8 bytes due to access instructions
// restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.		// restrictions. E.g. the ldr with imm, where imm must be aligned to 8 bytes.
const uint16_t Alignment = OnBehalfOf		const uint16_t Alignment = BF.getConstantIslandAlignment();
? OnBehalfOf->getConstantIslandAlignment()
: BF.getConstantIslandAlignment();
Streamer.emitCodeAlignment(Alignment, &*BC.STI);		Streamer.emitCodeAlignment(Alignment, &*BC.STI);

if (!OnBehalfOf) {		if (!OnBehalfOf) {
if (!EmitColdPart)		if (!EmitColdPart)
Streamer.emitLabel(BF.getFunctionConstantIslandLabel());		Streamer.emitLabel(BF.getFunctionConstantIslandLabel());
else		else
Streamer.emitLabel(BF.getFunctionColdConstantIslandLabel());		Streamer.emitLabel(BF.getFunctionColdConstantIslandLabel());
}		}
▲ Show 20 Lines • Show All 645 Lines • Show Last 20 Lines

bolt/lib/Passes/Aligner.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	cl::desc(
"containing function."),		"containing function."),
cl::init(800), cl::Hidden, cl::cat(BoltOptCategory));		cl::init(800), cl::Hidden, cl::cat(BoltOptCategory));

cl::opt<unsigned> AlignFunctionsMaxBytes(		cl::opt<unsigned> AlignFunctionsMaxBytes(
"align-functions-max-bytes",		"align-functions-max-bytes",
cl::desc("maximum number of bytes to use to align functions"), cl::init(32),		cl::desc("maximum number of bytes to use to align functions"), cl::init(32),
cl::cat(BoltOptCategory));		cl::cat(BoltOptCategory));

		cl::opt<unsigned> AlignCIMaxBytes(
		"align-ci-max-bytes",
		cl::desc("maximum number of bytes to use to align constant islands or "
		"in text objects"),
		cl::init(512), cl::cat(BoltOptCategory));

cl::opt<unsigned>		cl::opt<unsigned>
BlockAlignment("block-alignment",		BlockAlignment("block-alignment",
cl::desc("boundary to use for alignment of basic blocks"),		cl::desc("boundary to use for alignment of basic blocks"),
cl::init(16),		cl::init(16),
cl::ZeroOrMore,		cl::ZeroOrMore,
cl::cat(BoltOptCategory));		cl::cat(BoltOptCategory));

cl::opt<bool>		cl::opt<bool>
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
BinaryContext::IndependentCodeEmitter Emitter =		BinaryContext::IndependentCodeEmitter Emitter =
BC.createIndependentMCCodeEmitter();		BC.createIndependentMCCodeEmitter();

if (opts::UseCompactAligner)		if (opts::UseCompactAligner)
alignCompact(BF, Emitter.MCE.get());		alignCompact(BF, Emitter.MCE.get());
else		else
alignMaxBytes(BF);		alignMaxBytes(BF);

// Align objects that contains constant islands and no code		// Preserve initial alignment of the object that contains islands
// to at least 8 bytes.		// and no code.
if (!BF.size() && BF.hasIslandsInfo()) {		if (!BF.size() && BF.hasIslandsInfo()) {
const uint16_t Alignment = BF.getConstantIslandAlignment();		uint16_t Alignment = BF.guessInputAlignment();
		if (Alignment > opts::AlignCIMaxBytes) {
		outs() << "BOLT-WARNING: input alignment of text object " << BF << " ("
		<< Alignment << " bytes) "
		<< "is more then AlignCIMaxBytes. Setting alignment to "
		<< opts::AlignCIMaxBytes << " bytes.\n";
		Alignment = opts::AlignCIMaxBytes;
		}

		BF.setConstantIslandAlignment(Alignment);
if (BF.getAlignment() < Alignment)		if (BF.getAlignment() < Alignment)
		rafaulerUnsubmitted Not Done Reply Inline Actions I don't completely follow this part and perhaps we can clarify: why are we aligning a constant island against the presumed alignment of its parent function, and not against the alignment of the constant island itself in the input? Another thing that could be an issue here is if the function happens to be at an arbitrary round address (such as 0x100000) and now we have to emit the island with MBs worth of alignment because we have no clue what is the correct alignment. That's bound to happen if we are processing the very first function in the .text section and if it has a constant island, since that first function will probably be aligned to a page boundary, which may push the constant island beyond the reach of some instructions. That's why in general I think that's a fragile strategy, but if we have to do it and if we are resorting to guessing the correct alignment, maybe we should put a cap on it? But before proceeding with this diff, I have one alternative suggestion below. I read the offending code in openssl. Can we try -skip-funcs=problematic_function_name and just skip supporting it (I'm not sure if that works with AArch64, though). I think BOLT is not in a position of supporting arbitrary assembly code, and code that makes assumptions on the layout of functions is bound to break. This hits AArch64 more strongly because it is okay to have data in code for AArch64, and some programmers like making assumptions on the layout of the data. That's why it's easier for us to provide support for languages such as C++, which has a standard that says that doing pointer arithmetic with function/object pointers is undefined behavior. rafauler: I don't completely follow this part and perhaps we can clarify: why are we aligning a constant…
		yota9AuthorUnsubmitted Done Reply Inline Actions why are we aligning a constant island against the presumed alignment of its parent function This is under !BF.size() if, another words if it is object in code handled as empty function by BOLT. Another thing that could be an issue here is if the function happens to be at an arbitrary round address (such as 0x100000) and now we have to emit the island with MBs worth of alignment because we have no clue what is the correct alignment. I agree, that it might be the problem. How about to add smth like AlignCIMaxBytes option, equal to 512 by default? If the alignment of CI is higher the warning would be displayed. Can we try -skip-funcs=problematic_function_name and just skip supporting it It might work, but ideally we would like to process whole objects from the original text. E.g. we've got beta option to remove old text section from the binary, so skip funcs is not an option in that case. I agree that these things are kind of hacky, but it looks like in this case we can handle the majority of the cases, I assume it would be nice to have such functionality.. For now I will try to add the option above and re upload the review. Thank you for your comments! yota9: >> why are we aligning a constant island against the presumed alignment of its parent function…
BF.setAlignment(Alignment);		BF.setAlignment(Alignment);

if (BF.getMaxAlignmentBytes() < Alignment)		if (BF.getMaxAlignmentBytes() < Alignment)
BF.setMaxAlignmentBytes(Alignment);		BF.setMaxAlignmentBytes(Alignment);

if (BF.getMaxColdAlignmentBytes() < Alignment)		if (BF.getMaxColdAlignmentBytes() < Alignment)
BF.setMaxColdAlignmentBytes(Alignment);		BF.setMaxColdAlignmentBytes(Alignment);
}		}
Show All 21 Lines

bolt/test/AArch64/object-in-code-alignment.s

This file was added.

				// This test checks that the initial object in text alignment is preserved.
				// This is needed for the cases like KeccakF1600_int in openssl, where the loop
				// is breaked when the address of the object entry is aligned on 512 bytes, i.e.
				// the object is aligned on 512 and have the size of 512 bytes.

				# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
				# RUN: %clang %cflags %t.o -o %t.exe -fPIC -pie -Wl,-q -nostdlib
				# RUN: llvm-bolt %t.exe -o %t.bolt -use-old-text=0 -lite=0
				# RUN: llvm-objdump -d -j .text %t.bolt \| FileCheck %s
				# RUN: llvm-bolt %t.exe -o /dev/null -use-old-text=0 -lite=0 --no-threads \
				# RUN: -align-ci-max-bytes=64 \| FileCheck -check-prefix=CHECKWARN %s

				# CHECKWARN: input alignment of text object table (128 bytes) is more
				# CHECKWARN-SAME: then AlignCIMaxBytes. Setting alignment to 64 bytes.

				.text
				.align 8
				.global dummy
				.type dummy, %object
				dummy:
				.word 255
				.size dummy, .-dummy

				# CHECK-DAG: {{.*}}{{0\|8}}0 <table>:
				.align 7
				.global table
				.type table, %object
				table:
				.xword 0xdeadbeef
				.size table, .-table

				.align 2
				.global _start
				.type _start, %function
				_start:
				ldr x2, table
				mov x0, #0
				ret
				.Lci:
				.word 0
				.size _start, .-_start
				# CHECK-DAG: {{.}}{{0\|8}}0: {{.}} 0xdeadbeef