Use this to handle new transform: {v}unpck{l|h}pd -> {v}shufps. We
need the sched information here as {v}shufps is 1 more byte of code
size, so we only want to make this transformation if {v}shufps is
actually faster.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll | ||
|---|---|---|
| 7–16 | This seems a bad check. Same below. | |
| llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll | ||
|---|---|---|
| 7–16 | Fixed, sorry about that. | |
| llvm/lib/Target/X86/X86FixupInstTuning.cpp | ||
|---|---|---|
| 73 | Don't see much value to use optinal, should be better to use bool directly? template <typename T>
static bool CmpOptionals(T NewVal, T CurVal) {
if (NewVal && CurVal)
return *NewVal < *CurVal;
return false;
} | |
| 89 | Should be better hoist the check into NewOpcPreferable or ProcessVPERMILPSmi etc? | |
| 92–93 | Should be better to sink them into GetInstTput and GetInstLat? | |
| 178 | In which case will vmovlhps be transformed into vshufps r, r, 0xee? | |
| llvm/lib/Target/X86/X86FixupInstTuning.cpp | ||
|---|---|---|
| 73 |
The thing is we need three states:
<true> -> make change | |
| 178 |
see the define <4 x float> @transform_VUNPCKLPDrr test case. | |
Maybe use has_value() instead of nullopt comparisons?
| llvm/lib/Target/X86/X86FixupInstTuning.cpp | ||
|---|---|---|
| 77 | Maybe? | |
Add ICX test runs to tuning-shuffle-unpckpd.ll so we have test coverage ?
| llvm/lib/Target/X86/X86FixupInstTuning.cpp | ||
|---|---|---|
| 101 | if (unsigned Size = TII->get(Opcode).getSize()) return Size; | |
LGTM with one minor
| llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll | ||
|---|---|---|
| 2–3 | SKX/v3 can probably share a CHECK-AVX2 common prefix: --check-prefixes=CHECK,CHECK-AVX2,CHECK-SKL --check-prefixes=CHECK,CHECK-AVX2,CHECK-V3 | |
Don't see much value to use optinal, should be better to use bool directly?
template <typename T> static bool CmpOptionals(T NewVal, T CurVal) { if (NewVal && CurVal) return *NewVal < *CurVal; return false; }