This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] Set cold sections alignment explicitly
ClosedPublic

Authored by yota9 on Mar 10 2022, 10:51 AM.

Download Raw Diff

Details

Reviewers

maksfb
rafauler
Amir

Commits

rG8ab69baad51a: [BOLT] Set cold sections alignment explicitly

Summary

The cold text section alignment is set using the maximum alignment value
passed to the emitCodeAlignment. In order to calculate tentetive layout
right we will set the alignment of such sections to the maximum possible
function alignment explicitly.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yota9 created this revision.Mar 10 2022, 10:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 10:51 AM

Herald added a subscriber: ayermolo. · View Herald Transcript

yota9 requested review of this revision.Mar 10 2022, 10:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 10:51 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B153601: Diff 414429.Mar 10 2022, 10:57 AM

Set the min section alignment to AlignFunctions

nit

@rafaelauler Gentle ping :)

Harbormaster completed remote builds in B153827: Diff 414731.Mar 11 2022, 12:42 PM

rafauler added inline comments.Mar 11 2022, 5:47 PM

bolt/lib/Core/BinaryEmitter.cpp
294–296	Why is this necessary? Is it possible to write a testcase that shows where we are getting the layout wrong?

yota9 added inline comments.Mar 12 2022, 7:30 AM

bolt/lib/Core/BinaryEmitter.cpp
294–296	As I said cold text section alignment is set using the maximum alignment value passed to the emitCodeAlignment (see just under of these new lines). The thing is that when we calculate tentative layout in LongJmp currently we don't take into account that there will be the alignment gap between text and text.cold (see new line I've added there). In order to take cold text alignment in to the account we need to know the biggest alignment value we will pass to the emitCodeAlignment. So the simple idea here is to align the cold section ( which we didn't explicitly aligned before ) with the biggest possible value (opts::AlignFunctions). This way we are guaranteed that it will be the maximum possible value that might be passed to the emitCodeAlignment and now we can easily calculate tentative layout more precise. I will try to create the simple test soon, I hope it won't be difficult.

yota9 marked an inline comment as done.Mar 12 2022, 10:12 AM

yota9 added inline comments.

bolt/lib/Core/BinaryEmitter.cpp
294–296	UPD: I've tried to think of the test, but as for aarch64 we now can trigger only the bl case. To do this we should write something, that will have between the call and callee ~128mb, to trigger the error after cold section will be created. We can use "rept" in asm, but it will be very slow (both codegen and bolt) and quite a big binary will be created. So I just don't see the reasonable and easy way to create such test currently..

rafauler added inline comments.Mar 14 2022, 4:34 AM

bolt/lib/Passes/LongJmp.cpp
301	From what I understand, the real problem though is here, right? We align with the minimum value, but we could be aligning more aggressively depending on "max align bytes", correct? Also, this is done for every function. So if there is an alignment difference created from MinAlign vs. real alignment, I imagine this difference will increase as we emit more and more functions? What confused me then is why are we aligning only the very first function (via section alignment, since all functions are emitted to the same section). Wouldn't be better to replicate the logic to compute the correct alignment here? (Use a formula that retrieves how many bytes need to align and check it against max align bytes). The goal of LongJmp is to replicate exactly what binary emitter is doing (I know this is unfortunate and bug prone, but I don't think there is a better way to have access to the exact offsets of the future layout of functions without calling the assembler and emitting everything)

yota9 added inline comments.Mar 14 2022, 1:29 PM

bolt/lib/Passes/LongJmp.cpp
301	The thing is that we don't align the first function here, we align the section. On function emittion we call Streamer.emitCodeAlignment for each function. That function calls emitValueToAlignment located in MCObjectStreamer.cpp that contains the following lines: if (ByteAlignment > CurSec->getAlignment()) CurSec->setAlignment(Align(ByteAlignment)); So depending on what we passed in emitValueToAlignment we might change the section alignment and (of course) the alignment of the very first function. I don't want to overcomplicate the code trying to first to calculate the section alignment that the functions alignment & etc (this should also be done for golang for example, so for every pass that will calculate the layout), so the easiest and the most straightforward way I see is to align the cold section using the maximum possible value explicitly. As for the space-wasting this is very minor, since the ByteAlignment is limited by uint16_t value (64kb) and most of the time we are speaking about just a ~64 bytes. But such a solution will significantly simplify the calculations for such passes :)

Oh, I see what you mean now and what's being fixed. Thanks for explaining!
I noticed that what I said about considering "max align bytes" is already being done, so the only thing missing was the section alignment that happens implicitly via a call to "emitCodeAlignment".

This revision is now accepted and ready to land.Mar 15 2022, 6:34 AM

Closed by commit rG8ab69baad51a: [BOLT] Set cold sections alignment explicitly (authored by yota9). · Explain WhyMar 15 2022, 12:13 PM

This revision was automatically updated to reflect the committed changes.

yota9 added a commit: rG8ab69baad51a: [BOLT] Set cold sections alignment explicitly.

yota9 mentioned this in D121728: [BOLT] LongJmp: Fix hot text section alignment.Mar 15 2022, 12:24 PM

yota9 mentioned this in rG62a289d85c9f: [BOLT] LongJmp: Fix hot text section alignment.Mar 16 2022, 5:58 AM

Revision Contents

Path

Size

bolt/

include/

bolt/

Utils/

CommandLineOpts.h

1 line

lib/

Core/

BinaryEmitter.cpp

6 lines

Passes/

Aligner.cpp

8 lines

LongJmp.cpp

4 lines

Utils/

CommandLineOpts.cpp

5 lines

Diff 414429

bolt/include/bolt/Utils/CommandLineOpts.h

	Show All 24 Lines
	extern llvm::cl::OptionCategory BoltOptCategory;			extern llvm::cl::OptionCategory BoltOptCategory;
	extern llvm::cl::OptionCategory BoltRelocCategory;			extern llvm::cl::OptionCategory BoltRelocCategory;
	extern llvm::cl::OptionCategory BoltOutputCategory;			extern llvm::cl::OptionCategory BoltOutputCategory;
	extern llvm::cl::OptionCategory AggregatorCategory;			extern llvm::cl::OptionCategory AggregatorCategory;
	extern llvm::cl::OptionCategory BoltInstrCategory;			extern llvm::cl::OptionCategory BoltInstrCategory;
	extern llvm::cl::OptionCategory HeatmapCategory;			extern llvm::cl::OptionCategory HeatmapCategory;

	extern llvm::cl::opt<unsigned> AlignText;			extern llvm::cl::opt<unsigned> AlignText;
				extern llvm::cl::opt<unsigned> AlignFunctions;
	extern llvm::cl::opt<bool> AggregateOnly;			extern llvm::cl::opt<bool> AggregateOnly;
	extern llvm::cl::opt<unsigned> BucketsPerLine;			extern llvm::cl::opt<unsigned> BucketsPerLine;
	extern llvm::cl::opt<bool> DiffOnly;			extern llvm::cl::opt<bool> DiffOnly;
	extern llvm::cl::opt<bool> EnableBAT;			extern llvm::cl::opt<bool> EnableBAT;
	extern llvm::cl::opt<bool> RemoveSymtab;			extern llvm::cl::opt<bool> RemoveSymtab;
	extern llvm::cl::opt<unsigned> ExecutionCountThreshold;			extern llvm::cl::opt<unsigned> ExecutionCountThreshold;
	extern llvm::cl::opt<unsigned> HeatmapBlock;			extern llvm::cl::opt<unsigned> HeatmapBlock;
	extern llvm::cl::opt<unsigned long long> HeatmapMaxAddress;			extern llvm::cl::opt<unsigned long long> HeatmapMaxAddress;
	Show All 40 Lines

bolt/lib/Core/BinaryEmitter.cpp

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	bool BinaryEmitter::emitFunction(BinaryFunction &Function, bool EmitColdPart) {
MCSection *Section =		MCSection *Section =
BC.getCodeSection(EmitColdPart ? Function.getColdCodeSectionName()		BC.getCodeSection(EmitColdPart ? Function.getColdCodeSectionName()
: Function.getCodeSectionName());		: Function.getCodeSectionName());
Streamer.SwitchSection(Section);		Streamer.SwitchSection(Section);
Section->setHasInstructions(true);		Section->setHasInstructions(true);
BC.Ctx->addGenDwarfSection(Section);		BC.Ctx->addGenDwarfSection(Section);

if (BC.HasRelocations) {		if (BC.HasRelocations) {
		// Set section alignment to the maximum possible object alignment.
		// We need this to support LongJmp and other passes that calculates
		// tentative layout.
		rafaulerUnsubmitted Done Reply Inline Actions Why is this necessary? Is it possible to write a testcase that shows where we are getting the layout wrong? rafauler: Why is this necessary? Is it possible to write a testcase that shows where we are getting the…
		yota9AuthorUnsubmitted Done Reply Inline Actions As I said cold text section alignment is set using the maximum alignment value passed to the emitCodeAlignment (see just under of these new lines). The thing is that when we calculate tentative layout in LongJmp currently we don't take into account that there will be the alignment gap between text and text.cold (see new line I've added there). In order to take cold text alignment in to the account we need to know the biggest alignment value we will pass to the emitCodeAlignment. So the simple idea here is to align the cold section ( which we didn't explicitly aligned before ) with the biggest possible value (opts::AlignFunctions). This way we are guaranteed that it will be the maximum possible value that might be passed to the emitCodeAlignment and now we can easily calculate tentative layout more precise. I will try to create the simple test soon, I hope it won't be difficult. yota9: As I said cold text section alignment is set using the maximum alignment value passed to the…
		yota9AuthorUnsubmitted Done Reply Inline Actions UPD: I've tried to think of the test, but as for aarch64 we now can trigger only the bl case. To do this we should write something, that will have between the call and callee ~128mb, to trigger the error after cold section will be created. We can use "rept" in asm, but it will be very slow (both codegen and bolt) and quite a big binary will be created. So I just don't see the reasonable and easy way to create such test currently.. yota9: UPD: I've tried to think of the test, but as for aarch64 we now can trigger only the bl case.
		if (Section->getAlignment() == 1)
		Section->setAlignment(Align(opts::AlignFunctions));

Streamer.emitCodeAlignment(BinaryFunction::MinAlign, &*BC.STI);		Streamer.emitCodeAlignment(BinaryFunction::MinAlign, &*BC.STI);
uint16_t MaxAlignBytes = EmitColdPart ? Function.getMaxColdAlignmentBytes()		uint16_t MaxAlignBytes = EmitColdPart ? Function.getMaxColdAlignmentBytes()
: Function.getMaxAlignmentBytes();		: Function.getMaxAlignmentBytes();
if (MaxAlignBytes > 0)		if (MaxAlignBytes > 0)
Streamer.emitCodeAlignment(Function.getAlignment(), &*BC.STI,		Streamer.emitCodeAlignment(Function.getAlignment(), &*BC.STI,
MaxAlignBytes);		MaxAlignBytes);
} else {		} else {
Streamer.emitCodeAlignment(Function.getAlignment(), &*BC.STI);		Streamer.emitCodeAlignment(Function.getAlignment(), &*BC.STI);
▲ Show 20 Lines • Show All 849 Lines • Show Last 20 Lines

bolt/lib/Passes/Aligner.cpp

	Show All 17 Lines
	using namespace llvm;			using namespace llvm;

	namespace opts {			namespace opts {

	extern cl::OptionCategory BoltOptCategory;			extern cl::OptionCategory BoltOptCategory;

	extern cl::opt<bool> AlignBlocks;			extern cl::opt<bool> AlignBlocks;
	extern cl::opt<bool> PreserveBlocksAlignment;			extern cl::opt<bool> PreserveBlocksAlignment;
				extern cl::opt<unsigned> AlignFunctions;

	cl::opt<unsigned>			cl::opt<unsigned>
	AlignBlocksMinSize("align-blocks-min-size",			AlignBlocksMinSize("align-blocks-min-size",
	cl::desc("minimal size of the basic block that should be aligned"),			cl::desc("minimal size of the basic block that should be aligned"),
	cl::init(0),			cl::init(0),
	cl::ZeroOrMore,			cl::ZeroOrMore,
	cl::Hidden,			cl::Hidden,
	cl::cat(BoltOptCategory));			cl::cat(BoltOptCategory));

	cl::opt<unsigned>			cl::opt<unsigned>
	AlignBlocksThreshold("align-blocks-threshold",			AlignBlocksThreshold("align-blocks-threshold",
	cl::desc("align only blocks with frequency larger than containing function "			cl::desc("align only blocks with frequency larger than containing function "
	"execution frequency specified in percent. E.g. 1000 means aligning "			"execution frequency specified in percent. E.g. 1000 means aligning "
	"blocks that are 10 times more frequently executed than the "			"blocks that are 10 times more frequently executed than the "
	"containing function."),			"containing function."),
	cl::init(800),			cl::init(800),
	cl::ZeroOrMore,			cl::ZeroOrMore,
	cl::Hidden,			cl::Hidden,
	cl::cat(BoltOptCategory));			cl::cat(BoltOptCategory));

	cl::opt<unsigned>			cl::opt<unsigned>
	AlignFunctions("align-functions",
	cl::desc("align functions at a given value (relocation mode)"),
	cl::init(64),
	cl::ZeroOrMore,
	cl::cat(BoltOptCategory));

	cl::opt<unsigned>
	AlignFunctionsMaxBytes("align-functions-max-bytes",			AlignFunctionsMaxBytes("align-functions-max-bytes",
	cl::desc("maximum number of bytes to use to align functions"),			cl::desc("maximum number of bytes to use to align functions"),
	cl::init(32),			cl::init(32),
	cl::ZeroOrMore,			cl::ZeroOrMore,
	cl::cat(BoltOptCategory));			cl::cat(BoltOptCategory));

	cl::opt<unsigned>			cl::opt<unsigned>
	BlockAlignment("block-alignment",			BlockAlignment("block-alignment",
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

bolt/lib/Passes/LongJmp.cpp

//===- bolt/Passes/LongJmp.cpp --------------------------------------------===//		//===- bolt/Passes/LongJmp.cpp --------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the LongJmpPass class.		// This file implements the LongJmpPass class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "bolt/Passes/LongJmp.h"		#include "bolt/Passes/LongJmp.h"
#include "llvm/Support/Alignment.h"

#define DEBUG_TYPE "longjmp"		#define DEBUG_TYPE "longjmp"

using namespace llvm;		using namespace llvm;

namespace opts {		namespace opts {
extern cl::OptionCategory BoltOptCategory;		extern cl::OptionCategory BoltOptCategory;
		extern cl::opt<unsigned> AlignFunctions;
extern cl::opt<bool> UseOldText;		extern cl::opt<bool> UseOldText;
extern cl::opt<bool> HotFunctionsAtEnd;		extern cl::opt<bool> HotFunctionsAtEnd;

static cl::opt<bool>		static cl::opt<bool>
GroupStubs("group-stubs",		GroupStubs("group-stubs",
cl::desc("share stubs across functions"),		cl::desc("share stubs across functions"),
cl::init(true),		cl::init(true),
cl::ZeroOrMore,		cl::ZeroOrMore,
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	if (Cold \|\| BB->isCold()) {
HotDot += BC.computeCodeSize(BB->begin(), BB->end());		HotDot += BC.computeCodeSize(BB->begin(), BB->end());
}		}
}		}
}		}

uint64_t LongJmpPass::tentativeLayoutRelocColdPart(		uint64_t LongJmpPass::tentativeLayoutRelocColdPart(
const BinaryContext &BC, std::vector<BinaryFunction *> &SortedFunctions,		const BinaryContext &BC, std::vector<BinaryFunction *> &SortedFunctions,
uint64_t DotAddress) {		uint64_t DotAddress) {
		DotAddress = alignTo(DotAddress, llvm::Align(opts::AlignFunctions));
for (BinaryFunction *Func : SortedFunctions) {		for (BinaryFunction *Func : SortedFunctions) {
if (!Func->isSplit())		if (!Func->isSplit())
continue;		continue;
DotAddress = alignTo(DotAddress, BinaryFunction::MinAlign);		DotAddress = alignTo(DotAddress, BinaryFunction::MinAlign);
		rafaulerUnsubmitted Not Done Reply Inline Actions From what I understand, the real problem though is here, right? We align with the minimum value, but we could be aligning more aggressively depending on "max align bytes", correct? Also, this is done for every function. So if there is an alignment difference created from MinAlign vs. real alignment, I imagine this difference will increase as we emit more and more functions? What confused me then is why are we aligning only the very first function (via section alignment, since all functions are emitted to the same section). Wouldn't be better to replicate the logic to compute the correct alignment here? (Use a formula that retrieves how many bytes need to align and check it against max align bytes). The goal of LongJmp is to replicate exactly what binary emitter is doing (I know this is unfortunate and bug prone, but I don't think there is a better way to have access to the exact offsets of the future layout of functions without calling the assembler and emitting everything) rafauler: From what I understand, the real problem though is here, right? We align with the minimum…
		yota9AuthorUnsubmitted Done Reply Inline Actions The thing is that we don't align the first function here, we align the section. On function emittion we call Streamer.emitCodeAlignment for each function. That function calls emitValueToAlignment located in MCObjectStreamer.cpp that contains the following lines: if (ByteAlignment > CurSec->getAlignment()) CurSec->setAlignment(Align(ByteAlignment)); So depending on what we passed in emitValueToAlignment we might change the section alignment and (of course) the alignment of the very first function. I don't want to overcomplicate the code trying to first to calculate the section alignment that the functions alignment & etc (this should also be done for golang for example, so for every pass that will calculate the layout), so the easiest and the most straightforward way I see is to align the cold section using the maximum possible value explicitly. As for the space-wasting this is very minor, since the ByteAlignment is limited by uint16_t value (64kb) and most of the time we are speaking about just a ~64 bytes. But such a solution will significantly simplify the calculations for such passes :) yota9: The thing is that we don't align the first function here, we align the section. On function…
uint64_t Pad =		uint64_t Pad =
offsetToAlignment(DotAddress, llvm::Align(Func->getAlignment()));		offsetToAlignment(DotAddress, llvm::Align(Func->getAlignment()));
if (Pad <= Func->getMaxColdAlignmentBytes())		if (Pad <= Func->getMaxColdAlignmentBytes())
DotAddress += Pad;		DotAddress += Pad;
ColdAddresses[Func] = DotAddress;		ColdAddresses[Func] = DotAddress;
LLVM_DEBUG(dbgs() << Func->getPrintName() << " cold tentative: "		LLVM_DEBUG(dbgs() << Func->getPrintName() << " cold tentative: "
<< Twine::utohexstr(DotAddress) << "\n");		<< Twine::utohexstr(DotAddress) << "\n");
DotAddress += Func->estimateColdSize();		DotAddress += Func->estimateColdSize();
▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

bolt/lib/Utils/CommandLineOpts.cpp

	Show All 36 Lines

	cl::opt<unsigned>			cl::opt<unsigned>
	AlignText("align-text",			AlignText("align-text",
	cl::desc("alignment of .text section"),			cl::desc("alignment of .text section"),
	cl::ZeroOrMore,			cl::ZeroOrMore,
	cl::Hidden,			cl::Hidden,
	cl::cat(BoltCategory));			cl::cat(BoltCategory));

				cl::opt<unsigned> AlignFunctions(
				"align-functions",
				cl::desc("align functions at a given value (relocation mode)"),
				cl::init(64), cl::ZeroOrMore, cl::cat(BoltOptCategory));

	cl::opt<bool>			cl::opt<bool>
	AggregateOnly("aggregate-only",			AggregateOnly("aggregate-only",
	cl::desc("exit after writing aggregated data file"),			cl::desc("exit after writing aggregated data file"),
	cl::Hidden,			cl::Hidden,
	cl::cat(AggregatorCategory));			cl::cat(AggregatorCategory));

	cl::opt<unsigned>			cl::opt<unsigned>
	BucketsPerLine("line-size",			BucketsPerLine("line-size",
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines