For a b/bl instruction that branches a temporary symbol (private assembly label), an IMAGE_REL_ARM64_BRANCH26 relocation to another symbol + a non-zero offset could be emitted but the linkers don't support this type of relocation and cause incorrect relocations and crashes. Avoid emitting this type of relocations.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format indicates that there does exist such a relocation:
IMAGE_REL_ARM64_BRANCH26 0x0003 The 26-bit relative displacement to the target, for B and BL instructions.
If this relocation is unsupported, we should just disable that relocation from being formed or we should figure out what the linker expectations are.
Yeah. To be clearer, what I observed was that the relocation type works as long as the offset that's embedded in the instruction is zero (I suspect that the linker doesn't use the offset in relocation or assumes it's zero.) I also tried a small local patch in lld to take this non-zero offset into account in the relocation and it also avoided the crashes I'm seeing.
That sounds like the correct thing to do; if a relocation with an offset isn’t supported by lld (but it is by link.exe), we should fix the linker instead of trying to avoid generating it.
I had expected those cases to be working correctly though, for many years now, but perhaps there is some case where the original offset indeed happens to get ignored.
That sounds like the correct thing to do; if a relocation with an offset isn’t supported by lld (but it is by link.exe), we should fix the linker instead of trying to avoid generating it.
The description sounds like it's also not supported by link.exe? Not sure.
We should teach adjustFixupValue() in AArch64AsmBackend.cpp to print an error if an unresolved branch with an offset gets there somehow.
There's a bunch of code for A.isTemporary() a few lines earlier; would it make sense to integrate with that? This is sort of similar to what we do with OffsetSymbols, except the maximum offset is zero.
You can't create symbols at arbitrary offsets from the start of a section; you need to check that the symbol actually points somewhere within the section.
llvm/lib/MC/WinCOFFObjectWriter.cpp | ||
---|---|---|
963 | We shouldn't end up in recordRelocation for branches where the source and destination are in the same section, I think. |
Yes. As far as I can tell, link.exe doesn't support this relocation type with an offset and that matches what lld does.
I would agree, but it seems that link also does not support the relocation with addend, and I would rather that we retain compatibility with the reference implementation. It sounds like it is only the addend form that we would need to avoid on Windows, not the general relocation.
I had expected those cases to be working correctly though, for many years now, but perhaps there is some case where the original offset indeed happens to get ignored.
The attach text file contains a test (*) and the command line trace that shows the relocation behaviors (without this patch) in the first half, and the same sequence of commands with this patch in the second half.
The first part shows that both link.exe and lld relocate the branches to the beginning of the text and the other section, as opposed to the ret instructions, respectively, because the offsets embedded in the instruction are ignored. The second part shows that, with this patch, the linkers relocate them as expected.
(*) A tweaked version of the test in this patch (because some important details were unexpectedly lost along the test reduction and nops were added to avoid accidental offset correctness as small offsets from a symbol can happen to be correct).
We should teach adjustFixupValue() in AArch64AsmBackend.cpp to print an error if an unresolved branch with an offset gets there somehow.
If you mean like the following, that would unintentionally trigger when we drop the relocation in the newly-added code (in the if (MCSec == &A.getSection()) case) because IsResolved is false.
case AArch64::fixup_aarch64_pcrel_branch26: case AArch64::fixup_aarch64_pcrel_call26: if (TheTriple.isOSBinFormatCOFF() && !IsResolved && SignedValue != 0) { // MSVC link.exe and lld do not support this relocation type with a non-zero offset Ctx.reportError(Fixup.getLoc(), "fixup value not supported by the linkers"); }
There's a bunch of code for A.isTemporary() a few lines earlier; would it make sense to integrate with that? This is sort of similar to what we do with OffsetSymbols, except the maximum offset is zero.
That may make sense as long as it's okay to check if the relocation type is fixup_aarch64_pcrel_branch26 or fixup_aarch64_pcrel_call26 before the getRelocType() call.
You can't create symbols at arbitrary offsets from the start of a section; you need to check that the symbol actually points somewhere within the section.
Can you clarify? Does it mean that we need to check A.isInSection() && Layout.getSymbolOffset(A) < getSectionAddressSize(&A.getSection()) before we create a new symbol at where symbol A is? What should happen if that's not the case?
Thanks for the test transcript! It does indeed look like link.exe ignores the existing offset. I also independently tested this, and it actually looks like link.exe blindly ors the target offset bits on top of the existing opcode, just like lld does: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.6/lld/COFF/Chunks.cpp#L306-L322 Depending on the contents of the original offset and the intended symbol difference, the offset may end up being partially visible or not. In any case - the linker doesn't indeed seem to be able to handle immediate offsets here.
Strictly speaking I would consider that a bug in both linkers, but I totally agree we should avoid producing such relocations. I didn't have time to look at the implementation strategy itself yet (but trust others' judgement on it).
Thanks for the test transcript! It does indeed look like link.exe ignores the existing offset. I also independently tested this, and it actually looks like link.exe blindly ors the target offset bits on top of the existing opcode, just like lld does: https://github.com/llvm/llvm-project/blob/llvmorg-16.0.6/lld/COFF/Chunks.cpp#L306-L322 Depending on the contents of the original offset and the intended symbol difference, the offset may end up being partially visible or not. In any case - the linker doesn't indeed seem to be able to handle immediate offsets here.
My recollection is that for most relocations, link does not support (in GNU parlance) partial in-place relocations, that is, any addend is generally (though not always) ignored. It was a pretty big divergence between GNU and MSVC semantics.
Strictly speaking I would consider that a bug in both linkers, but I totally agree we should avoid producing such relocations. I didn't have time to look at the implementation strategy itself yet (but trust others' judgement on it).
I'm not sure I consider it a bug since that is the normal handling of relocations, but certainly a lacking useful feature. At least, it sounds like everyone is now in agreement on avoiding producing the relocation for Windows then.
The lld code linked here is exactly what I locally modified to experiment and saw the crash go away with.
My recollection is that for most relocations, link does not support (in GNU parlance) partial in-place relocations, that is, any addend is generally (though not always) ignored. It was a pretty big divergence between GNU and MSVC semantics.
Strictly speaking I would consider that a bug in both linkers, but I totally agree we should avoid producing such relocations. I didn't have time to look at the implementation strategy itself yet (but trust others' judgement on it).
I'm not sure I consider it a bug since that is the normal handling of relocations, but certainly a lacking useful feature. At least, it sounds like everyone is now in agreement on avoiding producing the relocation for Windows then.
Yeah, as I implied by creating this revision, I also vote for avoiding this relocation type for compatibility.
I still don't understand why the MCSec == &A.getSection() case is relevant; if the offset is constant, it should be resolved, since "resolved" means "doesn't need a relocation".
Something like that, yes. If it fails, we need to print an error, because we don't have any way to encode it. At least, I don't think we have a way to encode it? Not sure off the top of my head what offsets the linker would tolerate.
Good point. Removed the MCSec == &A.getSection() case. There seem to be edges cases where relocations are kept even if the offset is constant such as a) relocations between functions are kept for the MSVC linker features as in WinCOFFObjectWriter::isSymbolRefDifferenceFullyResolvedImpl and b) the ADRP instruction / fixup_aarch64_pcrel_adrp_imm21 type as in AArch64AsmBackend::shouldForceRelocation. The crash case and the test we have here is due to a). So we cannot drop relocations here. Added the suggested check in adjustFixupValue.
Can you clarify? Does it mean that we need to check A.isInSection() && Layout.getSymbolOffset(A) < getSectionAddressSize(&A.getSection()) before we create a new symbol at where symbol A is? What should happen if that's not the case?
Something like that, yes. If it fails, we need to print an error, because we don't have any way to encode it. At least, I don't think we have a way to encode it? Not sure off the top of my head what offsets the linker would tolerate.
Done.
Thinking about this a bit further, I think that it might be better to conform to the Microsoft behaviour here. If we look at what MASM does, there is no concept of an "assembler temporary" hidden symbols. We should instead always emit all symbols into the symbol table. The linker is responsible for filtering the symbols that are emitted into the symbol table (only symbols marked for DLL export should be emitted there). This is one of the fundamental differences between PE and COFF. COFF will record all symbols (e.g. consider what happens when you build with debug information, all the temporary symbols for lines, etc. are emitted into the object file), PE will emit only the public symbols (technically, it does not have a symbol table at all, only the content needed for the EAT). This would bring us more in line with the expectations of the platform and the object file format and also avoid the need for any special casing here - we would form the relocation as per usual, the linker will have the necessary context, and then we would simply drop the "temporary" symbol which is not marked for DLL Export.
Thinking about this a bit further, I think that it might be better to conform to the Microsoft behaviour here. If we look at what MASM does, there is no concept of an "assembler temporary" hidden symbols. We should instead always emit all symbols into the symbol table.
I don't think that we can directly compare what we do to masm; we support an assembler dialect which is not masm, and that includes branching where the destination is an offset from a symbol. Given that, I think we need this logic in WinCOFFObjectWriter, whether or not we emit all temporary symbols. (The change in AArch64AsmBackend.cpp is necessary either way.)
Also, emitting all temporary symbols significantly bloats object files. (I don't have numbers at hand, but I recall it being significant.)
llvm/lib/MC/WinCOFFObjectWriter.cpp | ||
---|---|---|
962 | This still looks wrong in the case where SymbolMap[&A] is non-null; even if the initial symbol lookup succeeds, we still need to preserve the offset. |
I don't think that we can directly compare what we do to masm; we support an assembler dialect which is not masm, and that includes branching where the destination is an offset from a symbol. Given that, I think we need this logic in WinCOFFObjectWriter, whether or not we emit all temporary symbols. (The change in AArch64AsmBackend.cpp is necessary either way.)
I guess the piece that I'm wondering is why we would need that if the local symbol is available because the label is preserved, the linker should be able to resolve the symbol relative offset and then discard the temporary symbol I thought. What condition am I overlooking?
Also, emitting all temporary symbols significantly bloats object files. (I don't have numbers at hand, but I recall it being significant.)
Oh, I completely agree with you on that point. I wouldn't be surprised at all at it being significant. I think that matching semantics is something we should do any way.
I was thinking of the case of branching to "x+4", or something like that, where there isn't any existing symbol at the offset in question. We could forbid that, I guess, but it seems like that just shifts complexity elsewhere.
Right, I was thinking that we would have a(n extra) label at that point so the offset is always 0. We would generate a synthetic symbol for any case where the user is writing out an offset from a symbol.
e.g.
.def sym .scl 2 .type 32 .endef sym: nop nop $LN$reloc: .long $LN$reloc ret
would be equivalent to:
.def sym .scl 2 .type 32 .endef sym: nop nop .long sym + 4 ret
Edit: oh, I see what you mean - we do not know what the target was lowered as and we might be external to the symbol, which means that we cannot guarantee the label at that point ...
I agree that we can't emit all labels with the .L prefix into the object file symbol table. That would be prohibitively expensive.
We can, however, probably treat all private linkage symbols as internal if we don't already, so we get symbols for them in the symbol table, if this issue is still reachable from the IR level. We can probably also "upgrade" private symbols that the assembler reads to add them to the symbol table if they are used with this kind of fixup. Either way, I think we need the error handling added in the existing patch.
llvm/lib/MC/WinCOFFObjectWriter.cpp | ||
---|---|---|
953 | Please add a test exercising this error case as well. | |
llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp | ||
320 | Please add a test exercising the error case, presumably using something like the b .Lsym + 4 pattern. Also, I think you can make the diagnostic message clearer. Something like, "cannot perform a PC-relative fixup with a non-zero symbol offset" (emphasize that the offset from a symbol must be zero). |
Updated to treat all private linkage symbols as internal and add them to the symbol table, as Reid suggested. This fixes the issue with private linkage and the swift issue.
This also fixes the symbol class for private linkage for Windows/AArch64 from external to static to be in align with X86, and adds a test exercising the error case for the check in AArch64AsmBackend.
Please take another look.
I think that I generally prefer this approach. I would however like to see a test for the storage class changes (and ideally split from this change - it fixes a problem needed to fix here, but is a completely standalone improvement).
llvm/test/MC/AArch64/coff-relocations-branch26-bad.s | ||
---|---|---|
1 ↗ | (On Diff #550086) | In many occasions placing negative tests in the same file as the positive tests improves readability. You can use the --defsym ERR=1 pattern |
18 ↗ | (On Diff #550086) | New tests are recommended to add line/column information to diagnostics, similar to test/MC/ELF/cfi-scope-errors.s |
Rebased over https://reviews.llvm.org/D158122. Addressed the comments on the test code.
llvm/test/MC/AArch64/coff-relocations-branch26-bad.s | ||
---|---|---|
1 ↗ | (On Diff #550086) | Done. |
18 ↗ | (On Diff #550086) | Done. |
Asm.getContext().reportError(), like code earlier in this function.