This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
bolt/
-
include/bolt/Core/
-
bolt/
-
Core/
-
BinaryContext.h
-
BinaryFunction.h
-
lib/
-
Core/
1/3
BinaryContext.cpp
-
BinaryFunction.cpp
-
Rewrite/
-
RewriteInstance.cpp
-
test/AArch64/
-
AArch64/
-
Inputs/
-
unmarked-data.yaml
-
unmarked-data.test

Differential D126177

[BOLT] [AArch64] Handle constant islands spanning multiple functions
ClosedPublic

Authored by treapster on May 22 2022, 1:49 PM.

Download Raw Diff

Details

Reviewers

yota9
rafauler
Amir
maksfb

Commits

rG8579db96e8c3: [BOLT] [AArch64] Handle constant islands spanning multiple functions

Summary

Fix BOLT's constant island mapping when a constant island marked by $d spans multiple functions. Currently, because BOLT only marks the constant island in the first function where $d is located, if the next function contains data at its start, BOLT will miss the data and try to disassemble it. This patch adds code to explicitly go through all symbols between $d and $x markers and mark their respective offsets as data, which stops BOLT from trying to disassemble data. It also adds MarkerType enum and refactors related functions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

treapster created this revision.May 22 2022, 1:49 PM

Herald added a reviewer: rafauler. · View Herald TranscriptMay 22 2022, 1:49 PM

Herald added a reviewer: Amir. · View Herald Transcript

Herald added a reviewer: maksfb. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ayermolo, kristof.beyls. · View Herald Transcript

treapster requested review of this revision.May 22 2022, 1:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2022, 1:49 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B165760: Diff 431260.May 22 2022, 1:49 PM

treapster retitled this revision from [BOLT] Mark every symbol in $d-$x range as data to [BOLT] Fix disassembly of data in text.May 22 2022, 2:01 PM

treapster edited the summary of this revision. (Show Details)

treapster changed the edit policy from "All Users" to "Administrators".

probably we should also fix the following

bolt/lib/Core/Exceptions.cppbolt/lib/Core/Exceptions.cpp +498

- uint64_t Offset = 0;

+ uint64_t Offset = Function.getFirstInstructionOffset();

test case:

int extra_space() {

asm volatile (".space 256, 0xff\n");
return 0;

}

int main(int argc, char **argv) {

int (*fn)(void);
fn = extra_space + 256;
fn();
return 0;

}

CFLAGS=-O2 -fPIE -Wl,-q -Wl,-pie

llvm-bolt main -instrument -o main.instr -instrumentation-file=main.fdata -instrumentation-sleep-time=180 -instrumentation-no-counters-clear -v 2

Removed getFirstInstructionOffset to commit it separately, rebased diff

Remove empty line

Harbormaster completed remote builds in B166061: Diff 431686.May 24 2022, 8:43 AM

"treapster changed the edit policy from "All Users" to "Administrators"."

can you change this back? I can't review this diff without edit access.

I'm logged in another account because phabricator won't let me submit the review I wrote in the original reviewer account. I'm going to paste my full review as a subscriber instead of reviewer to try to bypass the "you can't edit this diff" permission thing. Here it goes:

Hi @treapster, thanks for working on this and for submitting this patch!

It looks like the per-merge check is failing

Failed Tests (1):

BOLT :: AArch64/unmarked-data.s

https://buildkite.com/llvm-project/premerge-checks/builds/94328#fa09bc94-6223-4f63-93d0-e5c7aae21e66

The check is also failing for me locally. See my comments below.

BinaryContext.h Line 85:

Add a comment

// AArch64-specific symbol markers used to delimit code/data in .text.

Lines 675-676:
We should avoid exporting these in the public interface of BinaryContext if they are not (currently) used outside of BinaryContext. We should leave them as internal helper functions inside BinaryContext.cpp

RewriteInstance.cpp lines 893-895:

If both A and B are markers at the same address, then we cannot establish order according to this function. This comparison function returns:

A < B ? False
B < A ? False

Which breaks the requirement on std::sort to have the comparison function define a total order among elements.

Line 918:
I don't think we need to save memory in discoverFileObjects() (this is not a step in BOLT that achieves the peak of memory consumption), so feel free to drop attribute packed. This attribute is not used in the LLVM codebase as far as I can tell.

Line 1272:

In theory, you wouldn't need to call markAtDataAtOffset for consecutive objects. For example, if you have:

0x10 : code
0x14 : data
0x18: data
0x20: code

You would only need to call markDAtaAtOffset(0x14).

But after this patch, we will be calling markDataAtOffset(0x18) if we have any symbol at 0x18. So why was the change necessary? Am I right in my interpretation that this to cover the case when 0x18 would be located in the next BinaryFunction, for some reason? In that case, there is a side effect that AddressToConstantIslandMap will possibly be bloated with so many more entries. But I guess that's OK? I would like @yota9 approval here if possible.

In this case, I would rewrite the description of this patch as:
[BOLT][AArch64] Handle data markers spanning multiple functions
"Fix BOLT's constant island mapping when a data marker ($d) spans multiple functions. Currently, because BOLT only marks the constant island in the first function where $d is located, if the next function contains data at its start, BOLT will miss the data marker and try to disassemble it. This patch adds code to explicitly go through all symbols between $d and $x markers and mark their respective offsets as data, which stops BOLT from trying to disassemble data. It also adds MarkerType enum and refactors related functions.

unmarked data.s line 3:

I think I know why this test is failing in pre-merge. If you call the driver (clang) to link your code, it will break in X86 because we don't have an AArch64 toolchain in X86. You would need to move this test to runtime/.

But in an X86 machine, we do have an AArch64 assembler and linker, we just don't have AArch64 libs to link in your testcase. Because we're not running the executable, there's no need to call the driver (clang) to link in libs. So we can rewrite this by calling the assembler (llvm-mc) and the linker (ld.lld). As an example on how to do this, see https://github.com/llvm/llvm-project/blob/main/bolt/test/X86/gotpcrelx.s#L15

treapster changed the edit policy from "Administrators" to "All Users".May 25 2022, 12:16 AM

treapster retitled this revision from [BOLT] Fix disassembly of data in text to [BOLT] [AArch64] Handle data markers spanning multiple functions.May 25 2022, 1:38 AM

treapster edited the summary of this revision. (Show Details)

Thank you for the feedback, @rafaelauler ! I fixed the test and removed packed attribute as you said.

As for the markerType-related functions, before this patch they depended on BC to check whether arch is AArch64, so I preserved this check and moved them to BC itself. However, I don't think this check is essential - because theoretically nothing stops you from putting data in text on other architectures and marking it with the same $x/$d symbols. If we agree on it, we can remove this dependency and make getMarkerType and related functions either methods of SymbolRef or freestanding functions that only get passed symbolRef.

Regarding comparison function, I think it would be weird for a binary to have more than one marker at the same offset in the first place, but even if it had, they just would be considered equal by the comparison function - same as with two functions on the same offset or anything else. I don't see where it breaks requirements of sort. The basic requirement is that comparator cannot return true both ways, and mine does not.

And speaking of constantIsland map, I don't think marking every island will impact performance or memory usage in any significant way, but as a result data in text will be handled properly.

Harbormaster completed remote builds in B166220: Diff 431908.May 25 2022, 2:24 AM

Thanks, @treapster!

Sorry, got confused, you're absolutely right on the comparison function requirement.

Regarding the check, I would prefer to preserve the check on AArch64 to reduce the amount of work we do for X86 symbols. I think we can keep these functions in BinaryContext. We wouldn't be able to push this to SymbolRef anyway because this lives in another llvm lib, and we would need to better justify its usage outside just BOLT. I just felt that you put "isDataMarker" and "isCodeMarker" there in case you would need these functions (as we had them before), but then you wrote the solution in a way that doesn't use them, so they look like dead code to me and should be removed (leaving only getMarkerType and isMarker).

I gave another thought on this and I agree with you on the other topics (efficiency, etc), so this is good to go from my perspective, pending fixing the testcase and removing isCodeMarker/isDataMarker. Thanks for working on this and submitting the patch!

bolt/test/AArch64/unmarked-data.s
3 ↗	(On Diff #431908)	I guess you have to include -triple aarch64-unknown-linux as an argument to llvm-mc, otherwise it will fail in non-aarch64 machines
5–9 ↗	(On Diff #431908)	While I do appreciate unix style command line processing, I feel we should be careful with this in case we run this in an environment without shell (e.g. Windows). I'm not an expert in Windows support and we currently do not have a Windows buildbot for BOLT, but I feel this would probably fail in Visual Studio. We should probably remove these lines too as, from a test perspective, this is enforcing that llvm-mc and ld.lld always behave in that specific way regarding emission of "$d" symbols. If somebody tries to change that, this test will break, even though they are not touching BOLT. Here we should limit ourselves to test BOLT.

I tested on Visual Studio and the test does fail. We also have another 20 other tests failing on VS. I should probably take a look at these.

treapster added inline comments.May 26 2022, 12:59 AM

bolt/test/AArch64/unmarked-data.s
5–9 ↗	(On Diff #431908)	I totally agree with you that we should avoid depending on shell or specific behavior of assembler/linker, but the only way i see to remove this dependency is to use fixed prebuilt binary with omitted markers. As far as i know, such tests are discouraged in llvm, but we can probably use yaml with yaml2obj? I'll try to recreate it in yaml.

Removed isDataMarker/isCodeMarker, fixed test to be independent of shell and linker.

Harbormaster completed remote builds in B166432: Diff 432220.May 26 2022, 2:11 AM

@rafauler, even though isMarker and getMarkerType fit well to binary context, i wouldn't say we will do unnecessary work on x86 if we extract them, because we check arch at RewriteInstance.cpp:951 anyway. IsMarker is also used one time at BinaryFunction::isSymbolValidInScope, but it's called there anyway, and if we don't want to call it on X86 we can add that check to if. This is largely irrelevant and doesn't make practical difference, but i feel like it's more natural when being or not being a marker is an independent property of a symbol. But these are just my thoughts.

Speaking of tests, when i check-bolt on AArch64 EulerOS, most x86 tests fail and also AArch64/asm-func-debug.test. Since they fail even without my patch, i figured it's not a bug but a feature. But buildkite seems to run fine after changing test to YAML, so i guess i didn't break anything.

tschuett added a subscriber: tschuett.May 26 2022, 2:45 AM

tschuett added inline comments.

bolt/lib/Core/BinaryContext.cpp
1646	do you want to make the AARCH64 check an assert?

treapster added inline comments.May 26 2022, 3:24 AM

bolt/lib/Core/BinaryContext.cpp
1646	No, i'm saying the opposite: if we remove a call to isAArch64 from here, we remove dependency on BinaryContext and use the function without it, and check arch separately when we need.

Thanks @treapster. I appreciate the robustness of YAML for this test here, but I do like to have the source code so it's easier to read. Do you think you could put the original assembly code in the test just as a reference, but don't assemble it, just add a comment that the YAML was generated out of that code?

I agree you can move the check to callers of getMarkerType. But if the intent in doing so is to move getMarkerType to SymbolRef, then I think it's best to do not go in that direction. SymbolRef is defined by another library, LLVM's Object lib, which is not BOLT-specific. I have two concerns regarding that, (1) I'm skeptical this would be useful outside of BOLT, (2) talking specifically about the SymbolRef class (ObjectFile.h), this is a very general class that is limited to only the very basic operations you can do on a symbol, and it is not ELF specific -- it's actually the general interface for all symbols in all object formats in LLVM. In contrast, getMarkerType is ELF-specific, AArch64 ABI-specific method to identify data. Because of that I think it would be best if we leave this in BinaryContext, which is BOLT's IR module class, containing general information we need to disassemble and process a binary.

This revision is now accepted and ready to land.May 26 2022, 12:14 PM

LGTM Thanks @treapster

@rafauler, added assembly for test. Regarding getMarkerType, the intent of removing architecture check is not to move the function to SymbolRef, but primarily to decouple it from BinaryContext instance - probably leaving it in BinaryContext.cpp. If you think it fits there as a method, well, then we can leave it as it is.

Harbormaster completed remote builds in B166528: Diff 432361.May 26 2022, 1:06 PM

I'm OK with moving that out of BinaryContext class, I don't have any strong opinions on it. If you want to do that, I'll approve the diff as well, but I also think it is good as is too.

By the way, if you don't have commit access and want me to commit this, let me know.

Amir added inline comments.May 26 2022, 11:28 PM

bolt/lib/Core/BinaryContext.cpp
1649	nit: please drop `llvm::` here and below.

Removed unnecessary llvm:: qualification, reworded title and summary.
@rafauler, i don't have commit access so i'll be glad if you commit.

Harbormaster completed remote builds in B166701: Diff 432630.May 27 2022, 2:04 PM

Removed llvm:: qualifications i missed last time

treapster marked an inline comment as done and an inline comment as not done.May 27 2022, 2:07 PM

Harbormaster completed remote builds in B166703: Diff 432633.May 27 2022, 2:12 PM

@treapster When I run arc patch, phabricator won't pull your author name. It writes the commit under my name. Would you care to provide me your author string so I can amend the commit with the correct author before pushing it to the repo?

e.g. my author string is "Rafael Auler <rafaelauler@fb.com>"

@rafauler, my author string is Denis Revunov <revunov.denis@huawei-partners.com>. Thanks for your time!

Closed by commit rG8579db96e8c3: [BOLT] [AArch64] Handle constant islands spanning multiple functions (authored by treapster, committed by rafauler). · Explain WhyMay 31 2022, 1:52 PM

This revision was automatically updated to reflect the committed changes.

rafauler added a commit: rG8579db96e8c3: [BOLT] [AArch64] Handle constant islands spanning multiple functions.

Revision Contents

Path

Size

bolt/

include/

bolt/

Core/

BinaryContext.h

12 lines

BinaryFunction.h

5 lines

lib/

Core/

BinaryContext.cpp

29 lines

BinaryFunction.cpp

29 lines

Rewrite/

RewriteInstance.cpp

116 lines

test/

AArch64/

Inputs/

unmarked-data.yaml

90 lines

unmarked-data.test

34 lines

Diff 433177

bolt/include/bolt/Core/BinaryContext.h

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	struct SegmentInfo {
};		};
};		};

inline raw_ostream &operator<<(raw_ostream &OS, const SegmentInfo &SegInfo) {		inline raw_ostream &operator<<(raw_ostream &OS, const SegmentInfo &SegInfo) {
SegInfo.print(OS);		SegInfo.print(OS);
return OS;		return OS;
}		}

		// AArch64-specific symbol markers used to delimit code/data in .text.
		enum class MarkerSymType : char {
		NONE = 0,
		CODE,
		DATA,
		};

enum class MemoryContentsType : char {		enum class MemoryContentsType : char {
UNKNOWN = 0, /// Unknown contents.		UNKNOWN = 0, /// Unknown contents.
POSSIBLE_JUMP_TABLE, /// Possibly a non-PIC jump table.		POSSIBLE_JUMP_TABLE, /// Possibly a non-PIC jump table.
POSSIBLE_PIC_JUMP_TABLE, /// Possibly a PIC jump table.		POSSIBLE_PIC_JUMP_TABLE, /// Possibly a PIC jump table.
};		};

/// Helper function to truncate a \p Value to given size in \p Bytes.		/// Helper function to truncate a \p Value to given size in \p Bytes.
inline int64_t truncateToSize(int64_t Value, unsigned Bytes) {		inline int64_t truncateToSize(int64_t Value, unsigned Bytes) {
▲ Show 20 Lines • Show All 564 Lines • ▼ Show 20 Lines	bool isAArch64() const {
return TheTriple->getArch() == llvm::Triple::aarch64;		return TheTriple->getArch() == llvm::Triple::aarch64;
}		}

bool isX86() const {		bool isX86() const {
return TheTriple->getArch() == llvm::Triple::x86 \|\|		return TheTriple->getArch() == llvm::Triple::x86 \|\|
TheTriple->getArch() == llvm::Triple::x86_64;		TheTriple->getArch() == llvm::Triple::x86_64;
}		}

		// AArch64-specific functions to check if symbol is used to delimit
		// code/data in .text. Code is marked by $x, data by $d.
		MarkerSymType getMarkerType(const SymbolRef &Symbol) const;
		bool isMarker(const SymbolRef &Symbol) const;

/// Iterate over all BinaryData.		/// Iterate over all BinaryData.
iterator_range<binary_data_const_iterator> getBinaryData() const {		iterator_range<binary_data_const_iterator> getBinaryData() const {
return make_range(BinaryDataMap.begin(), BinaryDataMap.end());		return make_range(BinaryDataMap.begin(), BinaryDataMap.end());
}		}

/// Iterate over all BinaryData.		/// Iterate over all BinaryData.
iterator_range<binary_data_iterator> getBinaryData() {		iterator_range<binary_data_iterator> getBinaryData() {
return make_range(BinaryDataMap.begin(), BinaryDataMap.end());		return make_range(BinaryDataMap.begin(), BinaryDataMap.end());
▲ Show 20 Lines • Show All 596 Lines • Show Last 20 Lines

bolt/include/bolt/Core/BinaryFunction.h

Show First 20 Lines • Show All 1,942 Lines • ▼ Show 20 Lines	if (ColdCallSites.empty())
return nullptr;		return nullptr;

ColdLSDASymbol = BC.Ctx->getOrCreateSymbol(		ColdLSDASymbol = BC.Ctx->getOrCreateSymbol(
Twine("GCC_cold_except_table") + Twine::utohexstr(getFunctionNumber()));		Twine("GCC_cold_except_table") + Twine::utohexstr(getFunctionNumber()));

return ColdLSDASymbol;		return ColdLSDASymbol;
}		}

/// True if the symbol is a mapping symbol used in AArch64 to delimit
/// data inside code section.
bool isDataMarker(const SymbolRef &Symbol, uint64_t SymbolSize) const;
bool isCodeMarker(const SymbolRef &Symbol, uint64_t SymbolSize) const;

void setOutputDataAddress(uint64_t Address) { OutputDataOffset = Address; }		void setOutputDataAddress(uint64_t Address) { OutputDataOffset = Address; }

uint64_t getOutputDataAddress() const { return OutputDataOffset; }		uint64_t getOutputDataAddress() const { return OutputDataOffset; }

void setOutputColdDataAddress(uint64_t Address) {		void setOutputColdDataAddress(uint64_t Address) {
OutputColdDataOffset = Address;		OutputColdDataOffset = Address;
}		}

▲ Show 20 Lines • Show All 505 Lines • Show Last 20 Lines

bolt/lib/Core/BinaryContext.cpp

Show First 20 Lines • Show All 1,633 Lines • ▼ Show 20 Lines	case MCCFIInstruction::OpGnuArgsSize:
OS << "OpGnuArgsSize";		OS << "OpGnuArgsSize";
break;		break;
default:		default:
OS << "Op#" << Operation;		OS << "Op#" << Operation;
break;		break;
}		}
}		}

		MarkerSymType BinaryContext::getMarkerType(const SymbolRef &Symbol) const {
		// For aarch64, the ABI defines mapping symbols so we identify data in the
		// code section (see IHI0056B). $x identifies a symbol starting code or the
		// end of a data chunk inside code, $d indentifies start of data.
		if (!isAArch64() \|\| ELFSymbolRef(Symbol).getSize())
		tschuettUnsubmitted Not Done Reply Inline Actions do you want to make the AARCH64 check an assert? tschuett: do you want to make the AARCH64 check an assert?
		treapsterAuthorUnsubmitted Not Done Reply Inline Actions No, i'm saying the opposite: if we remove a call to isAArch64 from here, we remove dependency on BinaryContext and use the function without it, and check arch separately when we need. treapster: No, i'm saying the opposite: if we remove a call to isAArch64 from here, we remove dependency…
		return MarkerSymType::NONE;

		Expected<StringRef> NameOrError = Symbol.getName();
		AmirUnsubmitted Done Reply Inline Actions nit: please drop `llvm::` here and below. Amir: nit: please drop `llvm::` here and below.
		Expected<object::SymbolRef::Type> TypeOrError = Symbol.getType();

		if (!TypeOrError \|\| !NameOrError)
		return MarkerSymType::NONE;

		if (*TypeOrError != SymbolRef::ST_Unknown)
		return MarkerSymType::NONE;

		if (*NameOrError == "$x" \|\| NameOrError->startswith("$x."))
		return MarkerSymType::CODE;

		if (*NameOrError == "$d" \|\| NameOrError->startswith("$d."))
		return MarkerSymType::DATA;

		return MarkerSymType::NONE;
		}

		bool BinaryContext::isMarker(const SymbolRef &Symbol) const {
		return getMarkerType(Symbol) != MarkerSymType::NONE;
		}

void BinaryContext::printInstruction(raw_ostream &OS, const MCInst &Instruction,		void BinaryContext::printInstruction(raw_ostream &OS, const MCInst &Instruction,
uint64_t Offset,		uint64_t Offset,
const BinaryFunction *Function,		const BinaryFunction *Function,
bool PrintMCInst, bool PrintMemData,		bool PrintMCInst, bool PrintMemData,
bool PrintRelocations,		bool PrintRelocations,
StringRef Endl) const {		StringRef Endl) const {
if (MIB->isEHLabel(Instruction)) {		if (MIB->isEHLabel(Instruction)) {
OS << " EH_LABEL: " << *MIB->getTargetSymbol(Instruction) << Endl;		OS << " EH_LABEL: " << *MIB->getTargetSymbol(Instruction) << Endl;
▲ Show 20 Lines • Show All 524 Lines • Show Last 20 Lines

bolt/lib/Core/BinaryFunction.cpp

Show First 20 Lines • Show All 3,920 Lines • ▼ Show 20 Lines	if (CalleeName != "__cxa_throw@PLT" && CalleeName != "_Unwind_Resume@PLT" &&
CalleeName != "__cxa_rethrow@PLT" && CalleeName != "exit@PLT" &&		CalleeName != "__cxa_rethrow@PLT" && CalleeName != "exit@PLT" &&
CalleeName != "abort@PLT")		CalleeName != "abort@PLT")
continue;		continue;

BB->removeAllSuccessors();		BB->removeAllSuccessors();
}		}
}		}

bool BinaryFunction::isDataMarker(const SymbolRef &Symbol,
uint64_t SymbolSize) const {
// For aarch64, the ABI defines mapping symbols so we identify data in the
// code section (see IHI0056B). $d identifies a symbol starting data contents.
if (BC.isAArch64() && Symbol.getType() &&
cantFail(Symbol.getType()) == SymbolRef::ST_Unknown && SymbolSize == 0 &&
Symbol.getName() &&
(cantFail(Symbol.getName()) == "$d" \|\|
cantFail(Symbol.getName()).startswith("$d.")))
return true;
return false;
}

bool BinaryFunction::isCodeMarker(const SymbolRef &Symbol,
uint64_t SymbolSize) const {
// For aarch64, the ABI defines mapping symbols so we identify data in the
// code section (see IHI0056B). $x identifies a symbol starting code or the
// end of a data chunk inside code.
if (BC.isAArch64() && Symbol.getType() &&
cantFail(Symbol.getType()) == SymbolRef::ST_Unknown && SymbolSize == 0 &&
Symbol.getName() &&
(cantFail(Symbol.getName()) == "$x" \|\|
cantFail(Symbol.getName()).startswith("$x.")))
return true;
return false;
}

bool BinaryFunction::isSymbolValidInScope(const SymbolRef &Symbol,		bool BinaryFunction::isSymbolValidInScope(const SymbolRef &Symbol,
uint64_t SymbolSize) const {		uint64_t SymbolSize) const {
// If this symbol is in a different section from the one where the		// If this symbol is in a different section from the one where the
// function symbol is, don't consider it as valid.		// function symbol is, don't consider it as valid.
if (!getOriginSection()->containsAddress(		if (!getOriginSection()->containsAddress(
cantFail(Symbol.getAddress(), "cannot get symbol address")))		cantFail(Symbol.getAddress(), "cannot get symbol address")))
return false;		return false;

// Some symbols are tolerated inside function bodies, others are not.		// Some symbols are tolerated inside function bodies, others are not.
// The real function boundaries may not be known at this point.		// The real function boundaries may not be known at this point.
if (isDataMarker(Symbol, SymbolSize) \|\| isCodeMarker(Symbol, SymbolSize))		if (BC.isMarker(Symbol))
return true;		return true;

// It's okay to have a zero-sized symbol in the middle of non-zero-sized		// It's okay to have a zero-sized symbol in the middle of non-zero-sized
// function.		// function.
if (SymbolSize == 0 && containsAddress(cantFail(Symbol.getAddress())))		if (SymbolSize == 0 && containsAddress(cantFail(Symbol.getAddress())))
return true;		return true;

if (cantFail(Symbol.getType()) != SymbolRef::ST_Unknown)		if (cantFail(Symbol.getType()) != SymbolRef::ST_Unknown)
▲ Show 20 Lines • Show All 517 Lines • Show Last 20 Lines

bolt/lib/Rewrite/RewriteInstance.cpp

Show First 20 Lines • Show All 874 Lines • ▼ Show 20 Lines	auto isSymbolInMemory = [this](const SymbolRef &Sym) {
if (cantFail(Sym.getFlags()) & SymbolRef::SF_Undefined)		if (cantFail(Sym.getFlags()) & SymbolRef::SF_Undefined)
return false;		return false;
BinarySection Section(BC, cantFail(Sym.getSection()));		BinarySection Section(BC, cantFail(Sym.getSection()));
return Section.isAllocatable();		return Section.isAllocatable();
};		};
std::vector<SymbolRef> SortedFileSymbols;		std::vector<SymbolRef> SortedFileSymbols;
std::copy_if(InputFile->symbol_begin(), InputFile->symbol_end(),		std::copy_if(InputFile->symbol_begin(), InputFile->symbol_end(),
std::back_inserter(SortedFileSymbols), isSymbolInMemory);		std::back_inserter(SortedFileSymbols), isSymbolInMemory);
		auto CompareSymbols = [this](const SymbolRef &A, const SymbolRef &B) {
std::stable_sort(		// Marker symbols have the highest precedence, while
SortedFileSymbols.begin(), SortedFileSymbols.end(),		// SECTIONs have the lowest.
[](const SymbolRef &A, const SymbolRef &B) {		auto AddressA = cantFail(A.getAddress());
// FUNC symbols have the highest precedence, while SECTIONs		auto AddressB = cantFail(B.getAddress());
// have the lowest.
uint64_t AddressA = cantFail(A.getAddress());
uint64_t AddressB = cantFail(B.getAddress());
if (AddressA != AddressB)		if (AddressA != AddressB)
return AddressA < AddressB;		return AddressA < AddressB;

SymbolRef::Type AType = cantFail(A.getType());		bool AMarker = BC->isMarker(A);
SymbolRef::Type BType = cantFail(B.getType());		bool BMarker = BC->isMarker(B);
		if (AMarker \|\| BMarker) {
		return AMarker && !BMarker;
		}

		auto AType = cantFail(A.getType());
		auto BType = cantFail(B.getType());
if (AType == SymbolRef::ST_Function && BType != SymbolRef::ST_Function)		if (AType == SymbolRef::ST_Function && BType != SymbolRef::ST_Function)
return true;		return true;
if (BType == SymbolRef::ST_Debug && AType != SymbolRef::ST_Debug)		if (BType == SymbolRef::ST_Debug && AType != SymbolRef::ST_Debug)
return true;		return true;

return false;		return false;
});		};

		std::stable_sort(SortedFileSymbols.begin(), SortedFileSymbols.end(),
		CompareSymbols);

		auto LastSymbol = SortedFileSymbols.end() - 1;

// For aarch64, the ABI defines mapping symbols so we identify data in the		// For aarch64, the ABI defines mapping symbols so we identify data in the
// code section (see IHI0056B). $d identifies data contents.		// code section (see IHI0056B). $d identifies data contents.
auto LastSymbol = SortedFileSymbols.end() - 1;		// Compilers usually merge multiple data objects in a single $d-$x interval,
		// but we need every data object to be marked with $d. Because of that we
		// create a vector of MarkerSyms with all locations of data objects.

		struct MarkerSym {
		uint64_t Address;
		MarkerSymType Type;
		};

		std::vector<MarkerSym> SortedMarkerSymbols;
		auto addExtraDataMarkerPerSymbol =
		[this](const std::vector<SymbolRef> &SortedFileSymbols,
		std::vector<MarkerSym> &SortedMarkerSymbols) {
		bool IsData = false;
		uint64_t LastAddr = 0;
		for (auto Sym = SortedFileSymbols.begin();
		Sym < SortedFileSymbols.end(); ++Sym) {
		uint64_t Address = cantFail(Sym->getAddress());
		if (LastAddr == Address) // don't repeat markers
		continue;

		MarkerSymType MarkerType = BC->getMarkerType(*Sym);
		if (MarkerType != MarkerSymType::NONE) {
		SortedMarkerSymbols.push_back(MarkerSym{Address, MarkerType});
		LastAddr = Address;
		IsData = MarkerType == MarkerSymType::DATA;
		continue;
		}

		if (IsData) {
		SortedMarkerSymbols.push_back(
		MarkerSym{cantFail(Sym->getAddress()), MarkerSymType::DATA});
		LastAddr = Address;
		}
		}
		};

if (BC->isAArch64()) {		if (BC->isAArch64()) {
		addExtraDataMarkerPerSymbol(SortedFileSymbols, SortedMarkerSymbols);
LastSymbol = std::stable_partition(		LastSymbol = std::stable_partition(
SortedFileSymbols.begin(), SortedFileSymbols.end(),		SortedFileSymbols.begin(), SortedFileSymbols.end(),
[](const SymbolRef &Symbol) {		[this](const SymbolRef &Symbol) { return !BC->isMarker(Symbol); });
StringRef Name = cantFail(Symbol.getName());
return !(cantFail(Symbol.getType()) == SymbolRef::ST_Unknown &&
(Name == "$d" \|\| Name.startswith("$d.") \|\| Name == "$x" \|\|
Name.startswith("$x.")));
});
--LastSymbol;		--LastSymbol;
}		}

BinaryFunction *PreviousFunction = nullptr;		BinaryFunction *PreviousFunction = nullptr;
unsigned AnonymousId = 0;		unsigned AnonymousId = 0;

const auto MarkersBegin = std::next(LastSymbol);		const auto SortedSymbolsEnd = std::next(LastSymbol);
for (auto ISym = SortedFileSymbols.begin(); ISym != MarkersBegin; ++ISym) {		for (auto ISym = SortedFileSymbols.begin(); ISym != SortedSymbolsEnd;
		++ISym) {
const SymbolRef &Symbol = *ISym;		const SymbolRef &Symbol = *ISym;
// Keep undefined symbols for pretty printing?		// Keep undefined symbols for pretty printing?
if (cantFail(Symbol.getFlags()) & SymbolRef::SF_Undefined)		if (cantFail(Symbol.getFlags()) & SymbolRef::SF_Undefined)
continue;		continue;

const SymbolRef::Type SymbolType = cantFail(Symbol.getType());		const SymbolRef::Type SymbolType = cantFail(Symbol.getType());

if (SymbolType == SymbolRef::ST_File)		if (SymbolType == SymbolRef::ST_File)
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	void RewriteInstance::discoverFileObjects() {
}		}

BC->setHasSymbolsWithFileName(SeenFileName);		BC->setHasSymbolsWithFileName(SeenFileName);

// Now that all the functions were created - adjust their boundaries.		// Now that all the functions were created - adjust their boundaries.
adjustFunctionBoundaries();		adjustFunctionBoundaries();

// Annotate functions with code/data markers in AArch64		// Annotate functions with code/data markers in AArch64
for (auto ISym = MarkersBegin; ISym != SortedFileSymbols.end(); ++ISym) {		for (auto ISym = SortedMarkerSymbols.begin();
const SymbolRef &Symbol = *ISym;		ISym != SortedMarkerSymbols.end(); ++ISym) {
uint64_t Address =
cantFail(Symbol.getAddress(), "cannot get symbol address");		auto *BF =
uint64_t SymbolSize = ELFSymbolRef(Symbol).getSize();		BC->getBinaryFunctionContainingAddress(ISym->Address, true, true);
BinaryFunction *BF =
BC->getBinaryFunctionContainingAddress(Address, true, true);
if (!BF) {		if (!BF) {
// Stray marker		// Stray marker
continue;		continue;
}		}
const uint64_t EntryOffset = Address - BF->getAddress();		const auto EntryOffset = ISym->Address - BF->getAddress();
if (BF->isCodeMarker(Symbol, SymbolSize)) {		if (ISym->Type == MarkerSymType::CODE) {
BF->markCodeAtOffset(EntryOffset);		BF->markCodeAtOffset(EntryOffset);
continue;		continue;
}		}
if (BF->isDataMarker(Symbol, SymbolSize)) {		if (ISym->Type == MarkerSymType::DATA) {
BF->markDataAtOffset(EntryOffset);		BF->markDataAtOffset(EntryOffset);
BC->AddressToConstantIslandMap[Address] = BF;		BC->AddressToConstantIslandMap[ISym->Address] = BF;
continue;		continue;
}		}
llvm_unreachable("Unknown marker");		llvm_unreachable("Unknown marker");
}		}

if (opts::LinuxKernelMode) {		if (opts::LinuxKernelMode) {
// Read all special linux kernel sections and their relocations		// Read all special linux kernel sections and their relocations
processLKSections();		processLKSections();
▲ Show 20 Lines • Show All 4,298 Lines • Show Last 20 Lines

bolt/test/AArch64/Inputs/unmarked-data.yaml

This file was added.

				--- !ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_EXEC
				Machine: EM_AARCH64
				Entry: 0x210134
				ProgramHeaders:
				- Type: PT_PHDR
				Flags: [ PF_R ]
				VAddr: 0x200040
				Align: 0x8
				FileSize: 0x0000e0
				MemSize: 0x0000e0
				Offset: 0x000040
				- Type: PT_LOAD
				Flags: [ PF_R ]
				VAddr: 0x200000
				Align: 0x10000
				FileSize: 0x000120
				MemSize: 0x000120
				Offset: 0x000000
				- Type: PT_LOAD
				Flags: [ PF_X, PF_R ]
				FirstSec: .text
				LastSec: .text
				VAddr: 0x210120
				Align: 0x10000
				- Type: PT_GNU_STACK
				Flags: [ PF_W, PF_R ]
				Align: 0x0
				Sections:
				- Name: .text
				Type: SHT_PROGBITS
				Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
				Address: 0x210120
				AddressAlign: 0x4
				Content: 030F0B0700000000030F0B0700000000C0035FD6FFFFFF97000080D2A80B8052010000D4
				- Name: .rela.text
				Type: SHT_RELA
				Flags: [ SHF_INFO_LINK ]
				Link: .symtab
				AddressAlign: 0x8
				Info: .text
				Relocations:
				- Offset: 0x210134
				Symbol: dummy
				Type: R_AARCH64_CALL26
				- Name: .comment
				Type: SHT_PROGBITS
				Flags: [ SHF_MERGE, SHF_STRINGS ]
				AddressAlign: 0x1
				EntSize: 0x1
				Content: 4C696E6B65723A204C4C442031352E302E3000
				Symbols:
				- Name: val
				Index: SHN_ABS
				Value: 0x70B0F03
				- Name: first
				Section: .text
				Value: 0x210120
				Size: 0x8
				- Name: '$d.0'
				Section: .text
				Value: 0x210120
				- Name: second
				Section: .text
				Value: 0x210128
				Size: 0x8
				- Name: '$x.1'
				Section: .text
				Value: 0x210130
				- Name: .text
				Type: STT_SECTION
				Section: .text
				Value: 0x210120
				- Name: .comment
				Type: STT_SECTION
				Section: .comment
				- Name: dummy
				Type: STT_FUNC
				Section: .text
				Binding: STB_GLOBAL
				Value: 0x210130
				- Name: _start
				Type: STT_FUNC
				Section: .text
				Binding: STB_GLOBAL
				Value: 0x210134
				...

bolt/test/AArch64/unmarked-data.test

This file was added.

				// This test checks that multiple data objects in text of which only first is marked get disassembled properly

				// RUN: yaml2obj %S/Inputs/unmarked-data.yaml -o %t.exe
				// RUN: llvm-bolt %t.exe -o %t.bolt -lite=0 -use-old-text=0 2>&1 \| FileCheck %s
				// CHECK-NOT: BOLT-WARNING
				// RUN: llvm-objdump -j .text -d --disassemble-symbols=first,second %t.bolt \| FileCheck %s -check-prefix=CHECK-SYMBOL
				// CHECK-SYMBOL: <first>:
				// CHECK-SYMBOL: <second>:

				// YAML is based in the following assembly:

				.equ val, 0x070b0f03 // we use constant that is not a valid instruction so that it can't be silently dissassembled
				.text

				first:
				.xword val
				.size first, .-first

				second:
				.xword val
				.size second, .-second

				.globl dummy
				.type dummy, %function
				dummy: // dummy function to force relocations
				ret

				.globl _start
				.type _start, %function
				_start:
				bl dummy
				mov x0, #0
				mov w8, #93
				svc #0