This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32.
ClosedPublic

Authored by craig.topper on Jul 27 2017, 11:45 PM.

Details

Summary

We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though.

This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need.

This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon added inline comments.Jul 31 2017, 10:22 AM
test/CodeGen/X86/avx512-insert-extract.ll
5 ↗(On Diff #108587)

Maybe add a --check-prefix=CHECK first option?

test/CodeGen/X86/subvector-broadcast.ll
993 ↗(On Diff #108587)

Is this a missed execution domain opportunity? Same for the others below

test/CodeGen/X86/x86-interleaved-access.ll
4 ↗(On Diff #108587)

Just noticed this is called AVX3?! Is that a good idea?

Fixed the execution domain issue in r309632.

Changed AVX3 to AVX512 in r309625.

Added common prefix to avx512-insert-extract.ll in r309629

RKSimon accepted this revision.Aug 1 2017, 5:38 AM

LGTM

This revision is now accepted and ready to land.Aug 1 2017, 5:38 AM
This revision was automatically updated to reflect the committed changes.