This is an archive of the discontinued LLVM Phabricator instance.

[X86] Emit two-byte NOP when possible
ClosedPublic

Authored by aganea on Jun 5 2020, 1:30 PM.

Details

Summary

In order to support hot-patching, we need to make sure the first emitted instruction in a function a two-byte+ op. This is already the case on x86_64, which seems to always emit two-byte+ ops. However on 32-bit targets this wasn't the case.

Whenever using the "patchable-function" attribute, a PATCHABLE_OP now lowers to a XCHG AX, AX, (66 90) like MSVC does. However when targetting pentium3 or i386 targets with /arch:IA32 or /arch:SSE, we generate MOV EDI,EDI (8B FF) like MSVC does. This is for compatiblity reasons with older tools that rely on this byte pattern.

The goal of this patch, along with D43002 and D80833, is to support the clang-cl flag /hotpatch: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=vs-2019

Diff Detail

Event Timeline

aganea created this revision.Jun 5 2020, 1:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2020, 1:30 PM
MaskRay added inline comments.Jun 5 2020, 2:19 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
99–100

Since you are changing the signature, you can rename it to emitNops as well to conform to the coding standards.

1106

Can the x86-64 code path below be reused?

llvm/test/CodeGen/X86/patchable-function-entry.ll
66

32-NEXT

reames requested changes to this revision.Jun 5 2020, 2:26 PM

Are two byte nops legal on all 32 bit x86? The comment seems to imply no.

Once this question is answered and we're happy with the general design, I will ask you to split out the NFC signature change and land it separately, but let's *not* do that yet.

This revision now requires changes to proceed.Jun 5 2020, 2:26 PM
efriedma added inline comments.
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

I see three issues with using the 64-bit codepath on 32-bit:

  1. We can't use the patterns based on REX prefixes.
  2. The X86::NOOPL opcode requires FeatureNOPL.
  3. If Mode16Bit is enabled, XCHG16ar is actually a one-byte instruction.

So we could sort of reuse the code below, but it would require a lot of refactoring.

Are two byte nops legal on all 32 bit x86? The comment seems to imply no.

Once this question is answered and we're happy with the general design, I will ask you to split out the NFC signature change and land it separately, but let's *not* do that yet.

The encoding for xchg %ax, %ax should be legal all the way back to the instruction of 32-bit mode. It wasn't given special treatment in hardware until 486. Prior to that it really did read ax and swap it with itself.

llvm/lib/Target/X86/X86MCInstLower.cpp
1106

Can Mode16Bit be set here? We're coming from MachineInstr and we don't compile to 16 bit do we?

The only part in the reset of the code that is truly 64-bit specific is the IndexReg being RAX. There's nothing in there that requiers a REX prefix.

So it seems like we just need to replace the Is64Bit check we had before with FeatureNOPL and then pick the index register as RAX or EAX depending on 64 bit or 32 bit.

craig.topper added inline comments.Jun 5 2020, 3:01 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

I guess we'd still need to handle the case where the CPU was set to a 386/486/pentium where FeatureNOPL isn't set.

efriedma added inline comments.Jun 5 2020, 3:27 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

Can Mode16Bit be set here? We're coming from MachineInstr and we don't compile to 16 bit do we?

We support generating "16-bit" code, yes, if you specify -m16 to clang. (Semantically, it's actually 32-bit code with a bunch of size prefixes to allow running it in 16-bit mode, but that doesn't really matter from the assembler's perspective.)

then pick the index register as RAX or EAX depending on 64 bit or 32 bit.

We can't just replace RAX with EAX; that would reduce the length of the opcode by one. We'd actually need to pick different encodings in the cases where that's relevant.

craig.topper added inline comments.Jun 5 2020, 3:49 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

We can't just replace RAX with EAX; that would reduce the length of the opcode by one. We'd actually need to pick different encodings in the cases where that's relevant.

It's the index register, it only requires REX prefix if its R8-R15. The jump from 4 bytes to 5 bytes where index was added would have been the SIB byte.

efriedma added inline comments.Jun 5 2020, 5:29 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

It's the index register, it only requires REX prefix if its R8-R15. The jump from 4 bytes to 5 bytes where index was added would have been the SIB byte.

Right; I misread the code.

Whenever using the "patchable-function" attribute, a PATCHABLE_OP now lowers to a xchg ax, ax, like MSVC does.

Is this true? Last I checked--and admittedly, this was a long time ago--MSVC emitted mov edi, edi--more specifically, opcode bytes 8b ff. GCC also emits those same bytes.

There's also a compatibility angle here. The reason this was added to GCC in the first place, and the reason I wrote the original change, is that Wine needed this to make certain Windows programs run correctly. I think they specifically check for that 8b ff signature before installing their hotpatches--which I admit they really shouldn't be doing, and may be a factor in why MS started using xchg ax, ax (66 90), but they do it, and we need specifically that sequence to make them work correctly.

aganea added a comment.EditedJun 6 2020, 6:56 AM

Are two byte nops legal on all 32 bit x86? The comment seems to imply no.

Once this question is answered and we're happy with the general design, I will ask you to split out the NFC signature change and land it separately, but let's *not* do that yet.

My understanding is that the story goes like this:

  • 90 NOP - introduced with 8086.
  • 66 90 XCHG AX, AX - supported starting with 80386, with the introduction of instruction prefixes [1]
  • 0F 1F 00 .. NOP DWORD ptr - long NOP variants were introduced with SSE, in Pentium III Katmai [2]

llvm/lib/Target/X86/X86.td specifies FeatureNOPL being supported in Pentium Pro [3] and Pentium 2 [4], but looking at the respective manuals that seems incorrect. There's no trace of long NOP instructions for those CPUs. @craig.topper could you please confirm?

[1] http://css.csail.mit.edu/6.858/2014/readings/i386/s02_04.htm
[2] https://en.wikipedia.org/wiki/X86_instruction_listings
[3] http://www.mathcs.emory.edu/~cheung/Courses/355/Syllabus/6-io/Docs/Intel-instructions.pdf
[4] http://www.archive.ece.cmu.edu/~ece548/localcpy/24281603.pdf

Whenever using the "patchable-function" attribute, a PATCHABLE_OP now lowers to a xchg ax, ax, like MSVC does.

Is this true? Last I checked--and admittedly, this was a long time ago--MSVC emitted mov edi, edi--more specifically, opcode bytes 8b ff. GCC also emits those same bytes.

VS2012 and VS2013 are emitting 8B FF MOV EDI, EDI.
After VS2015, 66 90 XCHG AX, AX is generated, as recommended by Intel, see [5] Vol. 2B page 4-167.

[5] https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html#combined

There's also a compatibility angle here. The reason this was added to GCC in the first place, and the reason I wrote the original change, is that Wine needed this to make certain Windows programs run correctly. I think they specifically check for that 8b ff signature before installing their hotpatches--which I admit they really shouldn't be doing, and may be a factor in why MS started using xchg ax, ax (66 90), but they do it, and we need specifically that sequence to make them work correctly.

I'm a bit confused. Wine lists support up to Windows 10, which means recently-compiled applications should be supported. Meanning they (Wine) should see 66 90, and support that. I asked the question anyway on the wine-devel mailing list.

+ @tentzen @BillyONeal in case they know more about why MSVC moved from 8B FF MOV EDI, EDI to 66 90 XCHG AX, AX.

aganea edited the summary of this revision. (Show Details)Jun 6 2020, 9:26 AM

Are two byte nops legal on all 32 bit x86? The comment seems to imply no.

Once this question is answered and we're happy with the general design, I will ask you to split out the NFC signature change and land it separately, but let's *not* do that yet.

My understanding is that the story goes like this:

  • 90 NOP - introduced with 8086.
  • 66 90 XCHG AX, AX - supported starting with 80386, with the introduction of instruction prefixes [1]
  • 0F 1F 00 .. NOP DWORD ptr - long NOP variants were introduced with SSE, in Pentium III Katmai [2]

llvm/lib/Target/X86/X86.td specifies FeatureNOPL being supported in Pentium Pro [3] and Pentium 2 [4], but looking at the respective manuals that seems incorrect. There's no trace of long NOP instructions for those CPUs. @craig.topper could you please confirm?

Other sites on internet show them a being added in P6(Pentium Pro). For example the row for group 16 here https://sandpile.org/x86/opc_grp.htm I think they were undocumented for a long time since they were originally intended for future expansion. So they didn't want compilers to start depending on them as NOPs. Sometime later they made 0F 1F an official multibyte NOP. 0F 18 eventually became a prefetch instruction, but you can still execute it on a Pentium Pro, it just won't do anything.

I'm a bit confused. Wine lists support up to Windows 10, which means recently-compiled applications should be supported. Meanning they (Wine) should see 66 90, and support that. I asked the question anyway on the wine-devel mailing list.

No, I meant "they" as in "Windows programs that run on Wine." Older programs compiled before VS 2015 in particular are going to check for 8b ff in functions from the system's DLLs. That's why Wine needed this from GCC and needs this from LLVM: so that functions in its DLLs have the signature so programs will know it's OK to hotpatch them.

jacek added a subscriber: jacek.Jun 7 2020, 4:18 AM
aganea added a comment.Jun 7 2020, 9:13 AM

A ReactOS developer pointed out that the default for x86 cl.exe is /arch:SSE2 which generates 66 90 when specifying /hotpatch. However with /arch:[IA32|SSE] /hotpatch the previous sequence 8B FF is emitted instead. I will replicate the behavior.

aganea updated this revision to Diff 269681.Jun 9 2020, 2:40 PM
aganea marked 2 inline comments as done.

Addressed comments.

  • Fold the codepath for 32-bit and 64-bit.
  • Emit MOV EDI,EDI when using /arch:IA32 and /arch:SSE.
aganea edited the summary of this revision. (Show Details)Jun 9 2020, 2:43 PM
aganea marked 8 inline comments as done.
aganea added inline comments.
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

@efriedma Is there a way to generate a 16-bit triple from llc?

craig.topper added inline comments.Jun 9 2020, 3:04 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1106

I think i386-unknown-linux-code16 might be the 16-bit triple

llvm/test/CodeGen/X86/patchable-prologue.ll
3

Should we add -show-mc-encoding so we make sure we're really emitting the right bytes?

aganea updated this revision to Diff 269716.Jun 9 2020, 6:28 PM

As suggested, added -show-mc-encoding to ensure we emit the right opcodes. Added coverage for 16-bit targets as well.

aganea marked 2 inline comments as done.Jun 9 2020, 6:28 PM

Ping! Any further comments?

craig.topper added inline comments.Jun 15 2020, 12:53 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1364

How much code would it be to just emit the legacy nop directly here and call emitNop for the other cases? Rather than pushing this legacy nop concept into emitNop?

aganea updated this revision to Diff 270887.Jun 15 2020, 3:49 PM
aganea marked 2 inline comments as done.

Rebase. Remove LegacyNOP param.

llvm/lib/Target/X86/X86MCInstLower.cpp
1364

Changed as suggested, looks better now, thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Jun 17 2020, 10:46 AM
This revision was automatically updated to reflect the committed changes.