This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add scheduling info of CodeGenOnly but encodable instructions for AlderlakeP model
ClosedPublic

Authored by HaohaiWen on Dec 5 2022, 1:52 AM.

Details

Summary

Automatically generated by D130897. This fixed issue #58792.

Diff Detail

Event Timeline

HaohaiWen created this revision.Dec 5 2022, 1:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 1:52 AM
HaohaiWen requested review of this revision.Dec 5 2022, 1:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 1:52 AM
HaohaiWen added inline comments.Dec 5 2022, 2:02 AM
llvm/lib/Target/X86/X86SchedAlderlakeP.td
798–801

CVTSI642SSrr has different schedreadwrite with CVTSI642SSrr_Int?
I think they should be same.

RKSimon added inline comments.Dec 5 2022, 3:22 AM
llvm/lib/Target/X86/X86SchedAlderlakeP.td
530

why ((_NT)?) and not (_NT)?

798–801

Difference between sse12_cvt_s and sse12_vcvt_avx that are getting interpreted differently for some reason?

HaohaiWen added inline comments.
llvm/lib/Target/X86/X86SchedAlderlakeP.td
530

It's consistent with pattern like (V?), (Y?).
Actually, all ? is surrounded by ().

798–801

Hi @andreadb, could you please answer RKSimon's question? I noticed you edited it in D57148.

andreadb added inline comments.Dec 5 2022, 4:43 AM
llvm/lib/Target/X86/X86SchedAlderlakeP.td
798–801

It has been a while since I looked at that .td file. However, I think that the main difference is that some variants accept two operands, while other variants use three-operands forms.

More specifically, _Int variants tend to use three operands. CVTSI642SS is a codegen only instance of multiclass sse12_cvt_s, which assumes two input operands:

def rr : SI<opc, MRMSrcReg, (outs DstRC:$dst), (ins SrcRC:$src),
            !strconcat(asm,"\t{$src, $dst|$dst, $src}"),
            [(set DstRC:$dst, (OpNode SrcRC:$src))]>,
            Sched<[sched, Int2Fpu]>;

That's why we don't need to skip the first use by introducing an "extra" ReadDefault.

If you look at the tablegen'd definitions, you see this:

{ 909,        2,      1,      0,      613,    0|(1ULL<<MCID::MayRaiseFPException), 0x1508023029ULL, ImplicitList18, nullptr, OperandInfo205 },  // Inst #909 = CVTSI642SSrr
{ 910,        3,      1,      0,      614,    0|(1ULL<<MCID::MayRaiseFPException), 0x1508023029ULL, ImplicitList18, nullptr, OperandInfo204 },  // Inst #910 = CVTSI642SSrr_Int

Note how CVTSI642SSrr takes 2 operands (second field), but the _Int variant takes 3 operands.

HaohaiWen marked 2 inline comments as done.Dec 5 2022, 5:09 AM
HaohaiWen added inline comments.
llvm/lib/Target/X86/X86SchedAlderlakeP.td
798–801
multiclass sse12_cvt_sint_3addr<bits<8> opc, RegisterClass SrcRC,
                    RegisterClass DstRC, X86MemOperand x86memop,
                    string asm, string mem, X86FoldableSchedWrite sched,
                    Domain d, bit Is2Addr = 1> {
let hasSideEffects = 0, ExeDomain = d in {
  def rr_Int : SI<opc, MRMSrcReg, (outs DstRC:$dst), (ins DstRC:$src1, SrcRC:$src2),
                  !if(Is2Addr,
                      !strconcat(asm, "\t{$src2, $dst|$dst, $src2}"),
                      !strconcat(asm, "\t{$src2, $src1, $dst|$dst, $src1, $src2}")),
                  []>, Sched<[sched, ReadDefault, ReadInt2Fpu]>;
  let mayLoad = 1 in
  def rm_Int : SI<opc, MRMSrcMem, (outs DstRC:$dst),
                  (ins DstRC:$src1, x86memop:$src2),
                  !if(Is2Addr,
                      asm#"{"#mem#"}\t{$src2, $dst|$dst, $src2}",
                      asm#"{"#mem#"}\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                  []>, Sched<[sched.Folded, sched.ReadAfterFold]>;
}
}

Should we use Sched<[sched, ReadInt2Fpu] if Is2Addr == 1?

pengfei added inline comments.Dec 5 2022, 5:19 AM
llvm/lib/Target/X86/X86SchedAlderlakeP.td
798–801

I'd think there's some problem when sharing sse12_cvt_sint_3addr between SSE and AVX, though we used Is2Addr to select the assemble. We mistakenly assigned 3 operands to the SSE instructions. This is unexpected since SSE instructions cannot use 3 operands.

andreadb added inline comments.Dec 5 2022, 5:28 AM
llvm/lib/Target/X86/X86SchedAlderlakeP.td
798–801

Changing it would be incorrect.

The fact that some of variants are generated using two-address forms, doesn't change the fact that the machine instruction has three operands.
From a codegen perspective, the only difference is:

let Constraints = "$src1 = $dst"

LLVM would still model those instructions as having three operands, with the difference that the first input register must match the destination register. The assembly string would not print the first input operand.
The last operand is what we want to be marked ReadInt2Fpu. That means, we still need to skip the middle one.

Example:

{ 902,        3,      1,      0,      1150,   0|(1ULL<<MCID::MayRaiseFPException), 0x1508003029ULL, ImplicitList18, nullptr, OperandInfo201 },  // Inst #902 = CVTSI2SSrr_Int

It has three operands, even though flag Is2Addr is set.
We need to assign ReadInt2Fpu to the last operand.

I hope it helps
-Andrea

Matt added a subscriber: Matt.Dec 7 2022, 6:23 PM

Any comments?

HaohaiWen updated this revision to Diff 486469.Jan 4 2023, 9:51 PM

Ignore CodeGenOnly instructions with lock prefix

pengfei accepted this revision.Jan 4 2023, 10:37 PM

I had an offline communication with Haohai. The change to the CodeGenOnly MIs LGTM. Please wait one or two days for other reviewers.

This revision is now accepted and ready to land.Jan 4 2023, 10:37 PM
This revision was landed with ongoing or failed builds.Jan 11 2023, 2:44 AM
This revision was automatically updated to reflect the committed changes.