This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add In64BitMode for MOVSX64/MOVZX64 instructions
AbandonedPublic

Authored by HaohaiWen on Nov 24 2022, 12:02 AM.

Details

Reviewers
skan
Summary

REX_W prefix requires In64BitMode

Diff Detail

Event Timeline

HaohaiWen created this revision.Nov 24 2022, 12:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2022, 12:02 AM
HaohaiWen requested review of this revision.Nov 24 2022, 12:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2022, 12:02 AM

I might be wrong but I think we are often missing In64BitMode on instructions that have a GR64 operand.

Adding them might increase the size of the table in X86GenDAGISel.inc

Are you seeing a functional issue that this fixes?

Are you seeing a functional issue that this fixes?

I'm fixing a AlderlakeP schedmodel issue: https://github.com/llvm/llvm-project/issues/58792.
To fix it, I need to use AsmMatcherEmitter to match CodeGenOnly but encodable instructions like CVTSD2SI64rm_Int.
This trigged assertion in X86AsmParser::validateInstruction:

if (UsesRex && HReg != X86::NoRegister) {
  StringRef RegName = X86IntelInstPrinter::getRegisterName(HReg);
  return Error(Ops[0]->getStartLoc(),
               "can't encode '" + RegName + "' in an instruction requiring "
               "REX prefix");
}

By validating hasREX_W and predicates in tblgen, I found there're 245 instructions has REX_W prefix but without In64BitMode predicate.

skan added a comment.Nov 24 2022, 12:55 AM

Hi @craig.topper , why do you think this may increase the size of the table in X86GenDAGISel.inc?
@HaohaiWen Could you check how much the table grows with this patch?

Hi @craig.topper , why do you think this may increase the size of the table in X86GenDAGISel.inc?
@HaohaiWen Could you check how much the table grows with this patch?

I think it will increase 2 bytes for each pattern:

/*Scope*/ 13, /*->133335*/
 OPC_CheckChild0Type, MVT::i32,
 OPC_CheckType, MVT::i64,
 OPC_CheckPatternPredicate, 8, // (Subtarget->is64Bit())     <-- increase here.
 OPC_MorphNodeTo1, TARGET_VAL(X86::MOVSX64rr32), 0,
     MVT::i64, 1/*#Ops*/, 0,
 // Src: (sext:{ *:[i64] } GR32:{ *:[i32] }:$src) - Complexity = 3
 // Dst: (MOVSX64rr32:{ *:[i64] } GR32:{ *:[i32] }:$src)

Are you seeing a functional issue that this fixes?

I'm fixing a AlderlakeP schedmodel issue: https://github.com/llvm/llvm-project/issues/58792.
To fix it, I need to use AsmMatcherEmitter to match CodeGenOnly but encodable instructions like CVTSD2SI64rm_Int.

I'm not following. It's already not a CodeGenOnly instruction. I see this in the asm matcher table

{ 2010 /* cvtsd2si */, X86::CVTSD2SI64rm_Int, Convert__Reg1_1__Mem645_0, AMFBS_None, { MCK_Mem64, MCK_GR64 }, }

Isn't X86::CVTSD2SI64rm the CodeGenOnly instruction?

Isn't X86::CVTSD2SI64rm the CodeGenOnly instruction?

Yes, CVTSD2SI64rm isCodeGenOnly and CVTSD2SI64rm_Int is not.
I'm trying to auto gen asm enumeration for each encodable instructions. This relies on predicates to indicate the mode {16bit, 32bit, 64bit}.

Isn't X86::CVTSD2SI64rm the CodeGenOnly instruction?

Yes, CVTSD2SI64rm isCodeGenOnly and CVTSD2SI64rm_Int is not.
I'm trying to auto gen asm enumeration for each encodable instructions. This relies on predicates to indicate the mode {16bit, 32bit, 64bit}.

We’ve been assuming the assembler would never parse a GR64 register name outside of 64-bit mode which is why the predicates were omitted.

Are you seeing a functional issue that this fixes?

I'm fixing a AlderlakeP schedmodel issue: https://github.com/llvm/llvm-project/issues/58792.
To fix it, I need to use AsmMatcherEmitter to match CodeGenOnly but encodable instructions like CVTSD2SI64rm_Int.
This trigged assertion in X86AsmParser::validateInstruction:

if (UsesRex && HReg != X86::NoRegister) {
  StringRef RegName = X86IntelInstPrinter::getRegisterName(HReg);
  return Error(Ops[0]->getStartLoc(),
               "can't encode '" + RegName + "' in an instruction requiring "
               "REX prefix");
}

How does adding In64BitMode avoid this error? This error occurs when AH/BH/CH/DH are passed to an instruction that uses a REX prefix. Are you using In64BitMode to avoid passing AH/BH/CH/DH registers?

How does adding In64BitMode avoid this error? This error occurs when AH/BH/CH/DH are passed to an instruction that uses a REX prefix. Are you using In64BitMode to avoid passing AH/BH/CH/DH registers?

In fact, the issue is, I relied on predicates to auto gen asm. Let's take CVTSD2SI64rm as an example.
Since it have no predicates, I made assumption its encodable in all modes and then try to gen 16,32,64 bit asm string (by adding {.code16 .code32 . code64) enumeration and encode it with llvm-mc. Then fed the encoding into llvm-mc to decode in order to find all matchable llvm opcodes.
That's how I found this predicates error.

How does adding In64BitMode avoid this error? This error occurs when AH/BH/CH/DH are passed to an instruction that uses a REX prefix. Are you using In64BitMode to avoid passing AH/BH/CH/DH registers?

In fact, the issue is, I relied on predicates to auto gen asm. Let's take CVTSD2SI64rm as an example.
Since it have no predicates, I made assumption its encodable in all modes and then try to gen 16,32,64 bit asm string (by adding {.code16 .code32 . code64) enumeration and encode it with llvm-mc. Then fed the encoding into llvm-mc to decode in order to find all matchable llvm opcodes.
That's how I found this predicates error.

Can we ignore those with isCodeGenOnly. I think they are just duplications of the non codegen only ones from the perspective of encoding.

Can we ignore those with isCodeGenOnly. I think they are just duplications of the non codegen only ones from the perspective of encoding.

Of course we can ignore it in almost all cases because they'll never be generated to asm printer.
However we should describe them correctly in schedule model. In fact, current schedtool D130897 only emit scheduling info for not CodeGenOnly instruction. That means scheduling info for CodeGenOnly instructions like CVTSD2SI64rm may be not correct, although it should be same with CVTSD2SI64rm_Int.
I'm working on fixing that, this requires correct mode predicates for CodeGenOnly and encodable instructions.

Can we ignore those with isCodeGenOnly. I think they are just duplications of the non codegen only ones from the perspective of encoding.

Of course we can ignore it in almost all cases because they'll never be generated to asm printer.
However we should describe them correctly in schedule model. In fact, current schedtool D130897 only emit scheduling info for not CodeGenOnly instruction. That means scheduling info for CodeGenOnly instructions like CVTSD2SI64rm may be not correct, although it should be same with CVTSD2SI64rm_Int.
I'm working on fixing that, this requires correct mode predicates for CodeGenOnly and encodable instructions.

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

Apart from that, I think we'd better fix wrong predicates.

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

I don’t know anything about xed. What does it require?

Apart from that, I think we'd better fix wrong predicates.

Be careful using the word “wrong”. The current implementation was intentional as it saves space in some generated tables.

skan added a comment.Nov 24 2022, 8:53 PM

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

Apart from that, I think we'd better fix wrong predicates.

I think Craig means you can use the REX_W bit in TSFlags to assist you in Predicate check, e.g Condition = In64BitMode || HasREX_W.

I don’t know anything about xed. What does it require?

AFAIK, Input to it is normally encoding or asm string. That's why I need to enumerate asm string and encode it for each llvm opcodes.

skan added a comment.Nov 24 2022, 9:09 PM

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

I don’t know anything about xed. What does it require?

Apart from that, I think we'd better fix wrong predicates.

Be careful using the word “wrong”. The current implementation was intentional as it saves space in some generated tables.

@craig.topper I'm not following. I also see that mayLoad, mayStore is omitted for x86 instructions with memory operands. And when the first operand is memory, the instructions is assumed to load/store, otherwise it's assumed to load only. Is it intentional or just for a slack off?

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

I don’t know anything about xed. What does it require?

Apart from that, I think we'd better fix wrong predicates.

Be careful using the word “wrong”. The current implementation was intentional as it saves space in some generated tables.

@craig.topper I'm not following. I also see that mayLoad, mayStore is omitted for x86 instructions with memory operands. And when the first operand is memory, the instructions is assumed to load/store, otherwise it's assumed to load only. Is it intentional or just for a slack off?

Assumed where? In tablegen?

skan added a comment.EditedNov 24 2022, 9:51 PM

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

I don’t know anything about xed. What does it require?

Apart from that, I think we'd better fix wrong predicates.

Be careful using the word “wrong”. The current implementation was intentional as it saves space in some generated tables.

@craig.topper I'm not following. I also see that mayLoad, mayStore is omitted for x86 instructions with memory operands. And when the first operand is memory, the instructions is assumed to load/store, otherwise it's assumed to load only. Is it intentional or just for a slack off?

Assumed where? In tablegen?

llvm/lib/CodeGen/TargetInstrInfo.cpp

  auto Flags = MachineMemOperand::MONone;
  for (unsigned OpIdx : Ops)
    Flags |= MI.getOperand(OpIdx).isDef() ? MachineMemOperand::MOStore
                                          : MachineMemOperand::MOLoad;

...

    // Add a memory operand, foldMemoryOperandImpl doesn't do that.
    assert((!(Flags & MachineMemOperand::MOStore) ||
            NewMI->mayStore()) &&
           "Folded a def to a non-store!");
    assert((!(Flags & MachineMemOperand::MOLoad) ||
            NewMI->mayLoad()) &&
           "Folded a use to a non-load!");

llvm/lib/Target/X86/X86InstrInfo.cpp

if (I != nullptr) {
  unsigned Opcode = I->DstOp;
  bool FoldedLoad =
      isTwoAddrFold || (OpNum == 0 && I->Flags & TB_FOLDED_LOAD) || OpNum > 0;
  bool FoldedStore =
      isTwoAddrFold || (OpNum == 0 && I->Flags & TB_FOLDED_STORE);

Is it not possible to use the encoding information in TSFlags rather than going through the assembly parser? Your patches for schedtool seem very coupled to the names of operand classes and other things. It looks like it will require updates often.

The asm enumeration code and asm matcher patch as well as xed patch are used to build map between llvm opcode <-> Xed info <-> uops.info data / other scheduling info data source. Do you have any suggestion to build this map?
IsaSet in Xed info can also be used to identify whether a llvm opcode is supported by specific target. LLVM predicates can't determine that precisely.

I don’t know anything about xed. What does it require?

Apart from that, I think we'd better fix wrong predicates.

Be careful using the word “wrong”. The current implementation was intentional as it saves space in some generated tables.

@craig.topper I'm not following. I also see that mayLoad, mayStore is omitted for x86 instructions with memory operands. And when the first operand is memory, the instructions is assumed to load/store, otherwise it's assumed to load only. Is it intentional or just for a slack off?

Assumed where? In tablegen?

llvm/lib/CodeGen/TargetInstrInfo.cpp

  auto Flags = MachineMemOperand::MONone;
  for (unsigned OpIdx : Ops)
    Flags |= MI.getOperand(OpIdx).isDef() ? MachineMemOperand::MOStore
                                          : MachineMemOperand::MOLoad;

...

    // Add a memory operand, foldMemoryOperandImpl doesn't do that.
    assert((!(Flags & MachineMemOperand::MOStore) ||
            NewMI->mayStore()) &&
           "Folded a def to a non-store!");
    assert((!(Flags & MachineMemOperand::MOLoad) ||
            NewMI->mayLoad()) &&
           "Folded a use to a non-load!");

llvm/lib/Target/X86/X86InstrInfo.cpp

if (I != nullptr) {
  unsigned Opcode = I->DstOp;
  bool FoldedLoad =
      isTwoAddrFold || (OpNum == 0 && I->Flags & TB_FOLDED_LOAD) || OpNum > 0;
  bool FoldedStore =
      isTwoAddrFold || (OpNum == 0 && I->Flags & TB_FOLDED_STORE);

The first does that because we haven't folded the memory operand yet so the mayLoad/mayStore flag wouldn't be set. MI is a register only instruction at that point. I believe that function is used for folding reloads and spills for register allocation, so using isDef is accurate.

The second is taking a shortcut for the isTwoAddrFold case and the OpNum > 0 case. We could add the TB_FOLDED_LOAD and TB_FOLDED_STORE flags into the X86InstrFoldTables.cpp tables. I think I added that code to avoid updating all of the tables while fixing a bug in D89656. Thought similar assumptions exist here

X86MemUnfoldTable() {
  for (const X86MemoryFoldTableEntry &Entry : MemoryFoldTable2Addr)
    // Index 0, folded load and store, no alignment requirement.
    addTableEntry(Entry, TB_INDEX_0 | TB_FOLDED_LOAD | TB_FOLDED_STORE);

  for (const X86MemoryFoldTableEntry &Entry : MemoryFoldTable0)
    // Index 0, mix of loads and stores.
    addTableEntry(Entry, TB_INDEX_0);

  for (const X86MemoryFoldTableEntry &Entry : MemoryFoldTable1)
    // Index 1, folded load
    addTableEntry(Entry, TB_INDEX_1 | TB_FOLDED_LOAD);

  for (const X86MemoryFoldTableEntry &Entry : MemoryFoldTable2)
    // Index 2, folded load
    addTableEntry(Entry, TB_INDEX_2 | TB_FOLDED_LOAD);

  for (const X86MemoryFoldTableEntry &Entry : MemoryFoldTable3)
    // Index 3, folded load
    addTableEntry(Entry, TB_INDEX_3 | TB_FOLDED_LOAD);

  for (const X86MemoryFoldTableEntry &Entry : MemoryFoldTable4)
    // Index 4, folded load
    addTableEntry(Entry, TB_INDEX_4 | TB_FOLDED_LOAD);

  // Broadcast tables.
  for (const X86MemoryFoldTableEntry &Entry : BroadcastFoldTable2)
    // Index 2, folded broadcast
    addTableEntry(Entry, TB_INDEX_2 | TB_FOLDED_LOAD | TB_FOLDED_BCAST);

  for (const X86MemoryFoldTableEntry &Entry : BroadcastFoldTable3)
    // Index 3, folded broadcast
    addTableEntry(Entry, TB_INDEX_3 | TB_FOLDED_LOAD | TB_FOLDED_BCAST);

All of the flags could be moved into the tables and the second argument to addTableEntry could be removed.

My only concern is that it might increase the size and maybe compile time of the file.

skan added a comment.Nov 24 2022, 11:11 PM
This comment was removed by skan.
HaohaiWen abandoned this revision.Nov 24 2022, 11:19 PM