This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix handling of maskmovdqu in X32
ClosedPublic

Authored by hvdijk on May 31 2021, 2:42 PM.

Details

Summary

The maskmovdqu instruction has a 32-bit and a 64-bit variant, the former using EDI, the latter RDI, but the use of the register is implicit. In 64-bit mode, a 0x67 prefix can be used to get the version using EDI, but there is no way to express this in assembly in a single instruction, the only way is with an explicit addr32.

This change adds support for the instruction. When generating assembly text, that explicit addr32 will be added. When not generating assembly text, it will be kept as a single instruction and will be emitted with that 0x67 prefix. When parsing assembly text, it will be re-parsed as ADDR32 followed by MASKMOVDQU64, which still results in the correct bytes when converted to machine code.

The same applies to vmaskmovdqu as well.

Diff Detail

Event Timeline

hvdijk created this revision.May 31 2021, 2:42 PM
hvdijk requested review of this revision.May 31 2021, 2:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2021, 2:42 PM
craig.topper added inline comments.May 31 2021, 9:18 PM
llvm/lib/Target/X86/X86InstrSSE.td
4015

Can we make this CodeGenOnly=1 so the disassembler tables don't need to be updated? Especially since there are no dissambler tests in this patch.

hvdijk added inline comments.May 31 2021, 11:29 PM
llvm/lib/Target/X86/X86InstrSSE.td
4015

We do need that for correct disassembly. It seems like we have rather limited testing of the disassembler in general; I could add a new test specifically for this that only tests these instructions, but if there's some approach that lets us test the disassembler for all (most) instructions in one go that might be better.

hvdijk updated this revision to Diff 350081.Jun 5 2021, 3:38 PM

After adding a new test for the disassembly of these instructions, I found that I had not fully tested my previous version's attempt at disassembling: it only correctly handled the disassembly of addr32 maskmovdqu, not that of addr32 vmaskmovdqu. This version fixes it and adds tests for it. The instruction classes confuse me, so I may have missed a simpler way of achieving the same result.

ping, re-tested today, still applies and passes tests.

ping, re-tested today, still applies and passes tests.

I don't see a disassembler test. Did I miss it?

I don't see a disassembler test. Did I miss it?

The disassembly is tested in maskmovdqu64.s.

This revision is now accepted and ready to land.Jul 15 2021, 2:50 PM
This revision was landed with ongoing or failed builds.Jul 15 2021, 2:56 PM
This revision was automatically updated to reflect the committed changes.

I'm sorry to say that this patch introduced a serious regression for the disassembler. Almost all the VEX instructions w/ address-size prefix can not be decoded due to this change. The context IC_64BIT_VEX_OPSIZE_ADSIZE should never be added b/c we can always add a addr32 prefix on any VEX instruction w/ a memory operand. I think the best way to support maskmovdqu in X32 is support something like ExplicitVEXPrefix in X86InstrFormats.td.

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2022, 6:27 PM
Herald added a subscriber: StephenFan. · View Herald Transcript
skan added inline comments.Mar 24 2022, 7:31 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

BTW, why do we need add an instruction like MASKMOVDQUX32 when we already have MASKMOVDQU? They have the same encoding/decoding except a address-size prefix. We can definitly encode a 0x67 during encoding and and print a "addr32" during decoding according to the mode w/o adding any new intrustion. If removing Not64BitMode of MASKMOVDQU may cause a ISEL issue, we should fix it in ISEL.

craig.topper added inline comments.Mar 24 2022, 7:55 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

The address register is implicit in the instruction. Our addr32 emission is normally based on the register class of the address registers. But we don't have that here. Are you suggesting to hardcode a special case for MASKMOVDQU in the encoder?

skan added inline comments.Mar 24 2022, 8:10 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

Harcode is one of the solution. Another solution is that we can add X86::IP_HAS_AD_SIZE to Flags of MCInst when translating a MachineInstr to a MCInst, so that a 0x67 will be emitted. MASKMOVDQU accesses memory and has implicit use EDI, and we can get such information from a MachineInstr.

craig.topper added inline comments.Mar 24 2022, 9:22 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

The implicit def will always be RDI because it's part of the tablegen Uses. I think. So SelectionDAG will always create the MachineInstr with it.

skan added inline comments.Mar 24 2022, 9:37 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

Do you mean MASKMOVDQU use RDI rather than EDI in MachineInstr?

craig.topper added inline comments.Mar 24 2022, 9:41 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

Oh nevermind, I should have looked more carefully. I didn't realize we had 3 instructions and you want to reduce to 2.

hvdijk added inline comments.Mar 26 2022, 6:52 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

We can definitly encode a 0x67 during encoding and and print a "addr32" during decoding according to the mode w/o adding any new intrustion.

The 0x67 during encoding is handled automatically if we mark the instruction AdSize32, but we also need to print the addr32 when printing in text form, that's what I had trouble with originally. However, trying again to reuse the existing MASKMOVDQU and VMASKMOVDQU, we can actually get that working with something along these lines instead:

--- a/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp
@@ -69,8 +69,12 @@ void X86ATTInstPrinter::printInst(const MCInst *MI, uint64_t Address,
     OS << "\tdata32";
   }
   // Try to print any aliases first.
-  else if (!printAliasInstr(MI, Address, OS) && !printVecCompareInstr(MI, OS))
+  else if (!printAliasInstr(MI, Address, OS) && !printVecCompareInstr(MI, OS)) {
+    if ((MI->getOpcode() == X86::MASKMOVDQU || MI->getOpcode() == X86::VMASKMOVDQU) &&
+        STI.getFeatureBits()[X86::Is64Bit])
+      OS << "\taddr32\n";
     printInstruction(MI, Address, OS);
+  }
 
   // Next always print the annotation.
   printAnnotation(OS, Annot);

Is that about what you had in mind too? I'll continue with this approach and see if it passes your and my tests.

hvdijk marked an inline comment as not done.Mar 26 2022, 7:22 PM
hvdijk added inline comments.
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

That should be handled in printInstFlags instead, which already checks whether addr32 needs to be printed but does not handle this case. The important part of the question, however, is whether this should be a special hardcoded exception for (V)MASKMOVDQU, or whether there should be something to automatically detect the need for the prefix.

skan added inline comments.Mar 26 2022, 8:19 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

Exactly not... Let me propose a patch to illustrate that.

skan added a comment.Mar 26 2022, 8:44 PM

@hvdijk D122448 + D122537 can fix both bugs.

hvdijk marked an inline comment as not done.Mar 26 2022, 8:46 PM
hvdijk added inline comments.
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

Looking at your D122537: so they *do* need to be special cased, it's just that you moved the special casing into X86MCInstLower::Lower. I had already come up with that independently as well while cleaning up my own special casing. I have a little bit more than what you put in D122537 though, let me check whether that is still needed.

skan added inline comments.Mar 26 2022, 8:57 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

No. The two functions are in different stages. The decoding for maskmovdqu/vmaskmovdqu is already correct if we revert this change, no more fix is needed. printInstFlags or printInst is used mostly for disassembler, so we do need to touch it. X86MCInstLower::Lower is used to tranlate MachineInstr to MCInst, which is used to add address-size prefix when we lowering from MIR.

hvdijk added inline comments.Mar 26 2022, 9:00 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

So, yes. You're saying exactly what I'm saying. I don't have a change in printInstFlags or printInst any longer either because like I said, I moved the special casing to X86MCInstLower::Lower exactly like you did.

skan added inline comments.Mar 26 2022, 9:02 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

No. The two functions are in different stages. The decoding for maskmovdqu/vmaskmovdqu is already correct if we revert this change, no more fix is needed. printInstFlags or printInst is used mostly for disassembler, so we do need to touch it. X86MCInstLower::Lower is used to tranlate MachineInstr to MCInst, which is used to add address-size prefix when we lowering from MIR.

so we do need to touch it -> so we don't need to touch it

4032–4039

Looking at your D122537: so they *do* need to be special cased, it's just that you moved the special casing into X86MCInstLower::Lower. I had already come up with that independently as well while cleaning up my own special casing. I have a little bit more than what you put in D122537 though, let me check whether that is still needed.

And I commented before, it's not necessary to add a special case for maskmovdqu b/c we can check the implicit operand of the instruction. However, there are so many existing speical cases in X86MCInstLower::Lower, I think it's okay to hardcode it.

hvdijk added inline comments.Mar 26 2022, 9:13 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

I went back to read your previous comments to see whether I missed it and am not seeing where you said so. Regardless, the discussion here has well ceased to be constructive so I propose we drop that.

skan added inline comments.Mar 26 2022, 9:21 PM
llvm/lib/Target/X86/X86InstrSSE.td
4032–4039

Harcode is one of the solution. Another solution is that we can add X86::IP_HAS_AD_SIZE to Flags of MCInst when translating a MachineInstr to a MCInst, so that a 0x67 will be emitted. MASKMOVDQU accesses memory and has implicit use EDI, and we can get such information from a MachineInstr.

I commented here.

4032–4039

Agree