This is an archive of the discontinued LLVM Phabricator instance.

[llvm-mc] fix 64-bit mode call disassembly to ignore opcode size prefix
ClosedPublic

Authored by m4b on May 7 2015, 1:48 PM.

Details

Summary

This is a potential fix for disassembling unusual instruction sequences in 64-bit mode w.r.t the CALL instruction. It might be desirable to move the check somewhere else, etc., but it essentially mimics the special case handling with JCXZ in 16-bit mode.

The current behavior accepts the opcode size prefix and causes the call's immediate to stop disassembling after 2 bytes. When debugging sequences of instructions with this pattern, the disassembler output becomes extremely unreliable and essentially useless (if you jump midway into what lldb thinks is a unified instruction, you'll lose %rip). Thus the choice of ignoring the prefix and consuming all 4 bytes when disassembling a 64-bit mode binary.

Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S. is defined as:

Indicates an instruction syntax that requires an address override prefix in 64-bit mode
and is not supported. Using an address override prefix in 64-bit mode
may result in model-specific execution behavior. (Vol. 2A 3-7)

Since 0x66 is an operand override prefix we should be OK (although we may want to warn about 0x67 prefixes to 0xe8). On the CPUs I have tested with, they all ignore the 0x66 prefix in 64-bit mode.

Diff Detail

Repository
rL LLVM

Event Timeline

m4b updated this revision to Diff 25224.May 7 2015, 1:48 PM
m4b retitled this revision from to [llvm-mc] fix 64-bit mode call disassembly to ignore opcode size prefix .
m4b updated this object.
m4b edited the test plan for this revision. (Show Details)
m4b added a reviewer: dougk.
m4b set the repository for this revision to rL LLVM.
m4b added a subscriber: Unknown Object (MLST).
dougk accepted this revision.May 8 2015, 7:12 AM
dougk edited edge metadata.

code LGTM but "ignore opcode size" -> "ignore operand size" and even though Intel defines "N.S.", I'd spell out "Not Supported" in the comment so that it can reasonably stand alone.

This revision is now accepted and ready to land.May 8 2015, 7:12 AM
dougk added a comment.May 8 2015, 7:18 AM

JMP has the same issue. No reason not to try and fix both please.

m4b updated this revision to Diff 25347.May 8 2015, 10:57 AM
m4b retitled this revision from [llvm-mc] fix 64-bit mode call disassembly to ignore opcode size prefix to [llvm-mc] fix 64-bit mode call disassembly to ignore opcode size prefix.
m4b edited edge metadata.

fix for ignoring opcode prefixes for jmp 0xe9

Unfortunately all of the jcc instructions which use rel16 are broken as well; it doesn't seem very pretty to add a bunch of cases, but not sure what else to do without doing some refactoring or digging into other potential places to correctly disassemble these kinds of edge cases.

what do you think?

  • added n.s. -> not supported;
  • quick fix for jmp e9; using switch now, and e9 requires displacement and immediate explicitly set, xoring opsize alone will not suffice for correct disassembly;
  • jmp 0xe9 x86-64 unit tests added;
m4b updated this revision to Diff 25397.May 8 2015, 7:11 PM

Slightly more robust version, switching on all the relevant opcodes I could find for call/jmp/jcc.

Should probably do an audit of some sort to determine the extent of bad f64 subscript opcode table disassembly.

I'm still uncomfortable reassigning insn->immediateSize and insn->displacementSize in the getID() method, but not sure where else to do this. Removing the ATTR_OPSIZE does not affect jmp and jcc instructions unfortunately, as it does for call.

May need to fix this in the releavnt .td files or perhaps in the readImmediate() method at some point.

Added more unit tests, specifically for jzz cases with 0x66 prefix.

  • current revision switches on relevant call/jmp/jcc instructions, and avoids breaking lea, et. al.
  • added unit tests for jzz 0x66 cases and removed whitespace;
craig.topper added inline comments.
lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp
994 ↗(On Diff #25397)

Why no opcodeType check here? Can't this alias?

1013 ↗(On Diff #25397)

The indentation on this line is showing up funny in phabricator

m4b updated this revision to Diff 25399.May 8 2015, 9:02 PM

Now checking for ONEBYTE opcodeType for call and jump, as per comment; added some more unit tests and fixed indentation.

m4b added a comment.May 11 2015, 11:41 AM

FYI, I don't have write privileges, so I assume someone else needs to run arc land if this is ready to be committed and pushed?

Similarly for diff D9514

lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp
994 ↗(On Diff #25397)

Nice catch! The instructions they would have messed up, like psubsb/vpsubsb, don't have any unit tests, so it wasn't showing up in testing. I'll check for a ONEBYTE opcodeType here, add the unit tests, and fix the identation problems in the next diff.

*ping* this has been sitting quite a while. As I said, I don't have write perms. So either some feedback, a rejection, or landing this commit is probably in order.

vsk added a subscriber: vsk.Aug 22 2015, 11:43 PM
vsk added a comment.Aug 22 2015, 11:52 PM

Thanks for your work on this!

This passes check-all on x86, but there are a few changes I'd like to see. (1) Please shorten and clean up the comments you added, (2) clang-format -style=llvm your code. I'm happy to commit this for you once that's done.

m4b updated this revision to Diff 32931.Aug 23 2015, 7:38 PM

updated according to vsk suggestions

  • shortened comments
  • ran clang-format-3.4 -i -style=llvm as per vsk's suggestions

*NOTE*: I noticed there were pretty massive source code formatting changes after clang-format; I'm not sure if this was/is intended?

To prevent clang-format from adjusting whitespace in the entire file, you want to use clang-format-diff.py -

git diff | tools/clang/tools/clang-format/clang-format-diff.py -p1 -style=llvm

m4b updated this revision to Diff 32956.Aug 24 2015, 7:41 AM

Shorter comments and only formatting code changes

  • shorter comments
  • reformat with clang-format-diff.py as per dougk
This revision was automatically updated to reflect the committed changes.