For this relocation, which applies to two consecutive instructions, it's possible that the second instruction might not actually be the right one.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
This seems fine, but we generally trust input and don't do too many error checks, so I wonder what is your motivation to add this error check. Was this something you can hit without doing assembly programming?
COFF/Chunks.cpp | ||
---|---|---|
137 ↗ | (On Diff #161565) | Error messages should start with a lowercase letter. |
I ran into a bug when LLVM can produce code where the movw+movt instructions are separated. (I have a fix for the bug, just missing a good testcase before I'll submit it.)
This relocation is tricky since it covers both instructions, but the way it is handled within LLVM is that it only is attached to the movw (and the relocation for the movt is dropped). Ideally maybe LLVM should have a check at some point to make sure that wherever we have this relocation, it actually points to both these instructions - but I don't really know where such a check would be placed.
COFF/Chunks.cpp | ||
---|---|---|
137 ↗ | (On Diff #161565) | Will fix. |
For what it's worth I've checked that the bit-patterns used in the MOVT/MOVW instructions are correct. In assembler it seems like it is possible to create an error that isn't detectable at link time.
.global main .global variable .global variable2 .text .thumb main: movw r0, :lower16:variable movt r0, :upper16:variable2 ldr r0, [r0] bx lr .data variable: .long 42
The assembler doesn't detect the different relocation targets and just emits a 0x0 IMAGE_REL_ARM_MOV32T variable relocation. It would be much better if this were caught at assembly time. I don't have any great ideas of how to solve it in MC though as that typically only presents Targets with one fixup at a time.
Indeed, it's possible to craft other malicious cases that this won't catch - but it would at least catch most other bugs where the ARM target unintentionally produces something broken.
On the LLVM level, this is handled as the movw fixup translating into IMAGE_REL_ARM_MOV32T while the movt fixup just gets dropped; see lib/Target/ARM/MCTargetDesc/ARMWinCOFFObjectWriter.cpp:
unsigned ARMWinCOFFObjectWriter::getRelocType(MCContext &Ctx, const MCValue &Target, const MCFixup &Fixup, bool IsCrossSection, const MCAsmBackend &MAB) const { switch (static_cast<unsigned>(Fixup.getKind())) { [snip] case ARM::fixup_t2_movw_lo16: case ARM::fixup_t2_movt_hi16: return COFF::IMAGE_REL_ARM_MOV32T; } } bool ARMWinCOFFObjectWriter::recordRelocation(const MCFixup &Fixup) const { return static_cast<unsigned>(Fixup.getKind()) != ARM::fixup_t2_movt_hi16; }
So past this stage, the information about the movt instruction's intent is lost.
As for the actual bug that I was able to catch with this error, I'll post the patch now.
Does @efriedma have any opinion about this one? It's slightly unconventional to check the instruction contents in the linker, but OTOH it's at least a simple and safe place to check that nothing really did break these pairs. Unless there's some easy way to add a check in LLVM (and I guess the correct solution there is to keep them together as a pseudo instruction longer, as suggested in the other review), do you think this would be worthwhile?
Not @efreidma, but that doesn't stop me from opining... I think it's a reasonable safety check. I don't think we have any relaxations in the COFF linker, but they are important elsewhere (turn GOT load into leaq sym(%rip) for ELF), and that requires looking at instructions.
This clearly isn't a substitute for catching this issue in the asm parser; LLVM shouldn't generate invalid object files. That said, it seems reasonable for the linker to check the input is sane. (Probably the testcase should use yaml2obj so it doesn't break when we fix LLVM.)
LGTM
It looks like these guards are reasonable given that these relocations are tricky. Our policy is to not protect lld from all possible input corruptions, but that doesn't mean we shouldn't do a reasonable error check.