This is an archive of the discontinued LLVM Phabricator instance.

[LLD] [COFF] Check the instructions in ARM MOV32T relocations
ClosedPublic

Authored by mstorsjo on Aug 20 2018, 2:38 PM.

Details

Summary

For this relocation, which applies to two consecutive instructions, it's possible that the second instruction might not actually be the right one.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Aug 20 2018, 2:38 PM
ruiu added a comment.Aug 21 2018, 12:18 AM

This seems fine, but we generally trust input and don't do too many error checks, so I wonder what is your motivation to add this error check. Was this something you can hit without doing assembly programming?

COFF/Chunks.cpp
137 ↗(On Diff #161565)

Error messages should start with a lowercase letter.

This seems fine, but we generally trust input and don't do too many error checks, so I wonder what is your motivation to add this error check. Was this something you can hit without doing assembly programming?

I ran into a bug when LLVM can produce code where the movw+movt instructions are separated. (I have a fix for the bug, just missing a good testcase before I'll submit it.)

This relocation is tricky since it covers both instructions, but the way it is handled within LLVM is that it only is attached to the movw (and the relocation for the movt is dropped). Ideally maybe LLVM should have a check at some point to make sure that wherever we have this relocation, it actually points to both these instructions - but I don't really know where such a check would be placed.

COFF/Chunks.cpp
137 ↗(On Diff #161565)

Will fix.

mstorsjo updated this revision to Diff 161650.Aug 21 2018, 12:26 AM

Lowercased the error message.

For what it's worth I've checked that the bit-patterns used in the MOVT/MOVW instructions are correct. In assembler it seems like it is possible to create an error that isn't detectable at link time.

    .global main
    .global variable
    .global variable2
    .text
    .thumb
main:
    movw r0, :lower16:variable
    movt r0, :upper16:variable2
    ldr  r0, [r0]
    bx   lr
    .data
variable:
 .long 42

The assembler doesn't detect the different relocation targets and just emits a 0x0 IMAGE_REL_ARM_MOV32T variable relocation. It would be much better if this were caught at assembly time. I don't have any great ideas of how to solve it in MC though as that typically only presents Targets with one fixup at a time.

For what it's worth I've checked that the bit-patterns used in the MOVT/MOVW instructions are correct. In assembler it seems like it is possible to create an error that isn't detectable at link time.

    .global main
    .global variable
    .global variable2
    .text
    .thumb
main:
    movw r0, :lower16:variable
    movt r0, :upper16:variable2
    ldr  r0, [r0]
    bx   lr
    .data
variable:
 .long 42

The assembler doesn't detect the different relocation targets and just emits a 0x0 IMAGE_REL_ARM_MOV32T variable relocation. It would be much better if this were caught at assembly time. I don't have any great ideas of how to solve it in MC though as that typically only presents Targets with one fixup at a time.

Indeed, it's possible to craft other malicious cases that this won't catch - but it would at least catch most other bugs where the ARM target unintentionally produces something broken.

On the LLVM level, this is handled as the movw fixup translating into IMAGE_REL_ARM_MOV32T while the movt fixup just gets dropped; see lib/Target/ARM/MCTargetDesc/ARMWinCOFFObjectWriter.cpp:

unsigned ARMWinCOFFObjectWriter::getRelocType(MCContext &Ctx,
                                              const MCValue &Target,
                                              const MCFixup &Fixup,
                                              bool IsCrossSection,
                                              const MCAsmBackend &MAB) const {
  switch (static_cast<unsigned>(Fixup.getKind())) {
  [snip]
  case ARM::fixup_t2_movw_lo16:
  case ARM::fixup_t2_movt_hi16:
    return COFF::IMAGE_REL_ARM_MOV32T;
  }
}

bool ARMWinCOFFObjectWriter::recordRelocation(const MCFixup &Fixup) const {
  return static_cast<unsigned>(Fixup.getKind()) != ARM::fixup_t2_movt_hi16;
}

So past this stage, the information about the movt instruction's intent is lost.

As for the actual bug that I was able to catch with this error, I'll post the patch now.

Does @efriedma have any opinion about this one? It's slightly unconventional to check the instruction contents in the linker, but OTOH it's at least a simple and safe place to check that nothing really did break these pairs. Unless there's some easy way to add a check in LLVM (and I guess the correct solution there is to keep them together as a pseudo instruction longer, as suggested in the other review), do you think this would be worthwhile?

rnk added a comment.Aug 24 2018, 1:34 PM

Does @efriedma have any opinion about this one? It's slightly unconventional to check the instruction contents in the linker, but OTOH it's at least a simple and safe place to check that nothing really did break these pairs. Unless there's some easy way to add a check in LLVM (and I guess the correct solution there is to keep them together as a pseudo instruction longer, as suggested in the other review), do you think this would be worthwhile?

Not @efreidma, but that doesn't stop me from opining... I think it's a reasonable safety check. I don't think we have any relaxations in the COFF linker, but they are important elsewhere (turn GOT load into leaq sym(%rip) for ELF), and that requires looking at instructions.

This clearly isn't a substitute for catching this issue in the asm parser; LLVM shouldn't generate invalid object files. That said, it seems reasonable for the linker to check the input is sane. (Probably the testcase should use yaml2obj so it doesn't break when we fix LLVM.)

mstorsjo updated this revision to Diff 162472.Aug 24 2018, 1:46 PM

Changed the test to use yaml2obj instead of assembling it with llvm-mc.

ruiu accepted this revision.Aug 26 2018, 7:16 PM

LGTM

It looks like these guards are reasonable given that these relocations are tricky. Our policy is to not protect lld from all possible input corruptions, but that doesn't mean we shouldn't do a reasonable error check.

This revision is now accepted and ready to land.Aug 26 2018, 7:16 PM
This revision was automatically updated to reflect the committed changes.