This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add tests for replacing `{v}permilps` -> `{v}shufps/{v}pshufd`; NFC
ClosedPublic

Authored by goldstein.w.n on Feb 24 2023, 11:28 PM.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Feb 24 2023, 11:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 11:28 PM
Herald added a subscriber: pengfei. · View Herald Transcript
goldstein.w.n requested review of this revision.Feb 24 2023, 11:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 11:28 PM

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

llvm/test/CodeGen/X86/tuning-shuffle-permilps.ll
3

CHECK-AV? Maybe CHECK-AVX and CHECK-AVX1 ?

goldstein.w.n added a comment.EditedFeb 25 2023, 4:34 PM

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

Also would you prefer the same for the unpckpd tests (added masked versions for them as well).

CHECK-AV -> CHECK-AVX

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

The 512-bit and masked predicate cases - I'm also concerned whether the Z128/Z256 cases are being correctly tested or the evex-vec pass has already converted them to the regular xmm/ymm variants

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

The 512-bit and masked predicate cases - I'm also concerned whether the Z128/Z256 cases are being correctly tested or the evex-vec pass has already converted them to the regular xmm/ymm variants

The tuning pass is before evex-vex so we should be OK - it might be worth adding to the avx512 file a single additional RUN for "-mcpu=x86-86-v4 --show-mc-encoding" that greps that we have no "EVEX TO VEX Compression" strings

goldstein.w.n added a comment.EditedFeb 26 2023, 11:00 AM

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

The 512-bit and masked predicate cases - I'm also concerned whether the Z128/Z256 cases are being correctly tested or the evex-vec pass has already converted them to the regular xmm/ymm variants

The tuning pass is before evex-vex so we should be OK - it might be worth adding to the avx512 file a single additional RUN for "-mcpu=x86-86-v4 --show-mc-encoding" that greps that we have no "EVEX TO VEX Compression" strings

How do I add check-not for a grep? Any example test you can point me at?
Figures it out although it's in output.

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

The 512-bit and masked predicate cases - I'm also concerned whether the Z128/Z256 cases are being correctly tested or the evex-vec pass has already converted them to the regular xmm/ymm variants

The tuning pass is before evex-vex so we should be OK - it might be worth adding to the avx512 file a single additional RUN for "-mcpu=x86-86-v4 --show-mc-encoding" that greps that we have no "EVEX TO VEX Compression" strings

I don't think this works. Even if x86fixupinsttuning runs first, even to vex compression will eventually run and be in the output.

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

The 512-bit and masked predicate cases - I'm also concerned whether the Z128/Z256 cases are being correctly tested or the evex-vec pass has already converted them to the regular xmm/ymm variants

The tuning pass is before evex-vex so we should be OK - it might be worth adding to the avx512 file a single additional RUN for "-mcpu=x86-86-v4 --show-mc-encoding" that greps that we have no "EVEX TO VEX Compression" strings

I don't think this works. Even if x86fixupinsttuning runs first, even to vex compression will eventually run and be in the output.

What about -stop-before=x86-evex-to-vex-compress ? (Sorry I haven't used this much before)

You might be better off splitting this into tuning-shuffle-permilps.ll (although keep AVX512 test coverage here as well) and tuning-shuffle-permilps-avx512.ll

What do you mean? As in move the masked-predicate versions to a seperate test file and keep avx512 targets? Or something else?

The 512-bit and masked predicate cases - I'm also concerned whether the Z128/Z256 cases are being correctly tested or the evex-vec pass has already converted them to the regular xmm/ymm variants

The tuning pass is before evex-vex so we should be OK - it might be worth adding to the avx512 file a single additional RUN for "-mcpu=x86-86-v4 --show-mc-encoding" that greps that we have no "EVEX TO VEX Compression" strings

I don't think this works. Even if x86fixupinsttuning runs first, even to vex compression will eventually run and be in the output.

What about -stop-before=x86-evex-to-vex-compress ? (Sorry I haven't used this much before)

In D143787 we assert the order of passes in llvm/test/CodeGen/X86/opt-pipeline.ll and we have:

...
; CHECK-NEXT:       X86 Fixup Inst Tuning
; CHECK-NEXT:       Compressing EVEX instrs to VEX encoding when possible
...

So that should be sufficient for ordering no?

Split avx/avx512 tests

RKSimon added inline comments.Feb 26 2023, 1:53 PM
llvm/test/CodeGen/X86/tuning-shuffle-permilps.ll
4

Add a CHECK-AVX1-DELAY variant as well

14

Remove the VPERMILPSZ 512-bit cases - they are handled in the -avx512 file

Add AVX1 + DELAY and remove avx512 tests for avx file

goldstein.w.n marked 2 inline comments as done.Feb 26 2023, 2:54 PM
RKSimon accepted this revision.Feb 27 2023, 7:52 AM

LGTM

This revision is now accepted and ready to land.Feb 27 2023, 7:52 AM

Rebase (after D144832, no dep on D143786)

This revision was landed with ongoing or failed builds.Feb 27 2023, 4:53 PM
This revision was automatically updated to reflect the committed changes.