Fix BOLT's constant island mapping when a constant island marked by $d spans multiple functions. Currently, because BOLT only marks the constant island in the first function where $d is located, if the next function contains data at its start, BOLT will miss the data and try to disassemble it. This patch adds code to explicitly go through all symbols between $d and $x markers and mark their respective offsets as data, which stops BOLT from trying to disassemble data. It also adds MarkerType enum and refactors related functions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
probably we should also fix the following
bolt/lib/Core/Exceptions.cppbolt/lib/Core/Exceptions.cpp +498
- uint64_t Offset = 0; + uint64_t Offset = Function.getFirstInstructionOffset();
test case:
int extra_space() {
asm volatile (".space 256, 0xff\n"); return 0;
}
int main(int argc, char **argv) {
int (*fn)(void); fn = extra_space + 256; fn(); return 0;
}
CFLAGS=-O2 -fPIE -Wl,-q -Wl,-pie
llvm-bolt main -instrument -o main.instr -instrumentation-file=main.fdata -instrumentation-sleep-time=180 -instrumentation-no-counters-clear -v 2
"treapster changed the edit policy from "All Users" to "Administrators"."
can you change this back? I can't review this diff without edit access.
I'm logged in another account because phabricator won't let me submit the review I wrote in the original reviewer account. I'm going to paste my full review as a subscriber instead of reviewer to try to bypass the "you can't edit this diff" permission thing. Here it goes:
Hi @treapster, thanks for working on this and for submitting this patch!
It looks like the per-merge check is failing
Failed Tests (1):
BOLT :: AArch64/unmarked-data.s
https://buildkite.com/llvm-project/premerge-checks/builds/94328#fa09bc94-6223-4f63-93d0-e5c7aae21e66
The check is also failing for me locally. See my comments below.
BinaryContext.h Line 85:
Add a comment
// AArch64-specific symbol markers used to delimit code/data in .text.
Lines 675-676:
We should avoid exporting these in the public interface of BinaryContext if they are not (currently) used outside of BinaryContext. We should leave them as internal helper functions inside BinaryContext.cpp
RewriteInstance.cpp lines 893-895:
If both A and B are markers at the same address, then we cannot establish order according to this function. This comparison function returns:
A < B ? False
B < A ? False
Which breaks the requirement on std::sort to have the comparison function define a total order among elements.
Line 918:
I don't think we need to save memory in discoverFileObjects() (this is not a step in BOLT that achieves the peak of memory consumption), so feel free to drop attribute packed. This attribute is not used in the LLVM codebase as far as I can tell.
Line 1272:
In theory, you wouldn't need to call markAtDataAtOffset for consecutive objects. For example, if you have:
0x10 : code
0x14 : data
0x18: data
0x20: code
You would only need to call markDAtaAtOffset(0x14).
But after this patch, we will be calling markDataAtOffset(0x18) if we have any symbol at 0x18. So why was the change necessary? Am I right in my interpretation that this to cover the case when 0x18 would be located in the next BinaryFunction, for some reason? In that case, there is a side effect that AddressToConstantIslandMap will possibly be bloated with so many more entries. But I guess that's OK? I would like @yota9 approval here if possible.
In this case, I would rewrite the description of this patch as:
[BOLT][AArch64] Handle data markers spanning multiple functions
"Fix BOLT's constant island mapping when a data marker ($d) spans multiple functions. Currently, because BOLT only marks the constant island in the first function where $d is located, if the next function contains data at its start, BOLT will miss the data marker and try to disassemble it. This patch adds code to explicitly go through all symbols between $d and $x markers and mark their respective offsets as data, which stops BOLT from trying to disassemble data. It also adds MarkerType enum and refactors related functions.
unmarked data.s line 3:
I think I know why this test is failing in pre-merge. If you call the driver (clang) to link your code, it will break in X86 because we don't have an AArch64 toolchain in X86. You would need to move this test to runtime/.
But in an X86 machine, we do have an AArch64 assembler and linker, we just don't have AArch64 libs to link in your testcase. Because we're not running the executable, there's no need to call the driver (clang) to link in libs. So we can rewrite this by calling the assembler (llvm-mc) and the linker (ld.lld). As an example on how to do this, see https://github.com/llvm/llvm-project/blob/main/bolt/test/X86/gotpcrelx.s#L15
Thank you for the feedback, @rafaelauler ! I fixed the test and removed packed attribute as you said.
As for the markerType-related functions, before this patch they depended on BC to check whether arch is AArch64, so I preserved this check and moved them to BC itself. However, I don't think this check is essential - because theoretically nothing stops you from putting data in text on other architectures and marking it with the same $x/$d symbols. If we agree on it, we can remove this dependency and make getMarkerType and related functions either methods of SymbolRef or freestanding functions that only get passed symbolRef.
Regarding comparison function, I think it would be weird for a binary to have more than one marker at the same offset in the first place, but even if it had, they just would be considered equal by the comparison function - same as with two functions on the same offset or anything else. I don't see where it breaks requirements of sort. The basic requirement is that comparator cannot return true both ways, and mine does not.
And speaking of constantIsland map, I don't think marking every island will impact performance or memory usage in any significant way, but as a result data in text will be handled properly.
Thanks, @treapster!
Sorry, got confused, you're absolutely right on the comparison function requirement.
Regarding the check, I would prefer to preserve the check on AArch64 to reduce the amount of work we do for X86 symbols. I think we can keep these functions in BinaryContext. We wouldn't be able to push this to SymbolRef anyway because this lives in another llvm lib, and we would need to better justify its usage outside just BOLT. I just felt that you put "isDataMarker" and "isCodeMarker" there in case you would need these functions (as we had them before), but then you wrote the solution in a way that doesn't use them, so they look like dead code to me and should be removed (leaving only getMarkerType and isMarker).
I gave another thought on this and I agree with you on the other topics (efficiency, etc), so this is good to go from my perspective, pending fixing the testcase and removing isCodeMarker/isDataMarker. Thanks for working on this and submitting the patch!
bolt/test/AArch64/unmarked-data.s | ||
---|---|---|
3 ↗ | (On Diff #431908) | I guess you have to include -triple aarch64-unknown-linux as an argument to llvm-mc, otherwise it will fail in non-aarch64 machines |
5–9 ↗ | (On Diff #431908) | While I do appreciate unix style command line processing, I feel we should be careful with this in case we run this in an environment without shell (e.g. Windows). I'm not an expert in Windows support and we currently do not have a Windows buildbot for BOLT, but I feel this would probably fail in Visual Studio. We should probably remove these lines too as, from a test perspective, this is enforcing that llvm-mc and ld.lld always behave in that specific way regarding emission of "$d" symbols. If somebody tries to change that, this test will break, even though they are not touching BOLT. Here we should limit ourselves to test BOLT. |
I tested on Visual Studio and the test does fail. We also have another 20 other tests failing on VS. I should probably take a look at these.
bolt/test/AArch64/unmarked-data.s | ||
---|---|---|
5–9 ↗ | (On Diff #431908) | I totally agree with you that we should avoid depending on shell or specific behavior of assembler/linker, but the only way i see to remove this dependency is to use fixed prebuilt binary with omitted markers. As far as i know, such tests are discouraged in llvm, but we can probably use yaml with yaml2obj? I'll try to recreate it in yaml. |
@rafauler, even though isMarker and getMarkerType fit well to binary context, i wouldn't say we will do unnecessary work on x86 if we extract them, because we check arch at RewriteInstance.cpp:951 anyway. IsMarker is also used one time at BinaryFunction::isSymbolValidInScope, but it's called there anyway, and if we don't want to call it on X86 we can add that check to if. This is largely irrelevant and doesn't make practical difference, but i feel like it's more natural when being or not being a marker is an independent property of a symbol. But these are just my thoughts.
Speaking of tests, when i check-bolt on AArch64 EulerOS, most x86 tests fail and also AArch64/asm-func-debug.test. Since they fail even without my patch, i figured it's not a bug but a feature. But buildkite seems to run fine after changing test to YAML, so i guess i didn't break anything.
bolt/lib/Core/BinaryContext.cpp | ||
---|---|---|
1646 | do you want to make the AARCH64 check an assert? |
bolt/lib/Core/BinaryContext.cpp | ||
---|---|---|
1646 | No, i'm saying the opposite: if we remove a call to isAArch64 from here, we remove dependency on BinaryContext and use the function without it, and check arch separately when we need. |
Thanks @treapster. I appreciate the robustness of YAML for this test here, but I do like to have the source code so it's easier to read. Do you think you could put the original assembly code in the test just as a reference, but don't assemble it, just add a comment that the YAML was generated out of that code?
I agree you can move the check to callers of getMarkerType. But if the intent in doing so is to move getMarkerType to SymbolRef, then I think it's best to do not go in that direction. SymbolRef is defined by another library, LLVM's Object lib, which is not BOLT-specific. I have two concerns regarding that, (1) I'm skeptical this would be useful outside of BOLT, (2) talking specifically about the SymbolRef class (ObjectFile.h), this is a very general class that is limited to only the very basic operations you can do on a symbol, and it is not ELF specific -- it's actually the general interface for all symbols in all object formats in LLVM. In contrast, getMarkerType is ELF-specific, AArch64 ABI-specific method to identify data. Because of that I think it would be best if we leave this in BinaryContext, which is BOLT's IR module class, containing general information we need to disassemble and process a binary.
@rafauler, added assembly for test. Regarding getMarkerType, the intent of removing architecture check is not to move the function to SymbolRef, but primarily to decouple it from BinaryContext instance - probably leaving it in BinaryContext.cpp. If you think it fits there as a method, well, then we can leave it as it is.
I'm OK with moving that out of BinaryContext class, I don't have any strong opinions on it. If you want to do that, I'll approve the diff as well, but I also think it is good as is too.
bolt/lib/Core/BinaryContext.cpp | ||
---|---|---|
1649 | nit: please drop llvm:: here and below. |
Removed unnecessary llvm:: qualification, reworded title and summary.
@rafauler, i don't have commit access so i'll be glad if you commit.
@treapster When I run arc patch, phabricator won't pull your author name. It writes the commit under my name. Would you care to provide me your author string so I can amend the commit with the correct author before pushing it to the repo?
e.g. my author string is "Rafael Auler <rafaelauler@fb.com>"
@rafauler, my author string is Denis Revunov <revunov.denis@huawei-partners.com>. Thanks for your time!
do you want to make the AARCH64 check an assert?