Page MenuHomePhabricator

[X86][Costmodel] `getReplicationShuffleCost()`: promote 1 bit-wide elements to 32 bit when have AVX512DQ

Authored by lebedev.ri on Sat, Nov 20, 3:12 AM.



I believe, this effectively completes X86TTIImpl::getReplicationShuffleCost()
for AVX512, other than the question of handling plain AVX512F,
where we end up with some really ugly "shuffles",
but then is there any CPU's that support AVX512, but not AVX512DQ/AVX512BW?

Diff Detail

Event Timeline

lebedev.ri created this revision.Sat, Nov 20, 3:12 AM
lebedev.ri requested review of this revision.Sat, Nov 20, 3:12 AM
lebedev.ri updated this revision to Diff 388693.

Add forgotten +vl runline, NFC.

RKSimon added inline comments.Sat, Nov 20, 8:30 AM

AVX512F or AVX512DQ?

lebedev.ri added inline comments.Sat, Nov 20, 8:32 AM

"<we can> promote to i32, AVX512F <then provides support for shuffling in that type>."

NFC, precommitted some more tests.

RKSimon added inline comments.Mon, Nov 22, 3:43 AM

so why not promote for AVX512F only targets?

lebedev.ri added inline comments.Mon, Nov 22, 3:46 AM

I'm not sure i understand.

AVX512DQ is the instruction set that provides VPMOVM2[DQ] / VPMOV[DQ]2M instructions.
Plain AVX512F does not provide any VPMOVM2. / VPMOV.2M instructions.
So we need AVX512DQ for this.
If we had AVX512BW/AVX512BW+VBMI, then we already chosen to promote to i16/i8.

Rebased over committed costmodel fixes.

lebedev.ri added inline comments.Mon, Nov 22, 4:57 AM

In other words, are you saying that we should always promote to i1<->i32 as a fallback,
even if we don't necessarily have an instructions do to so?

Poke, i think these two patches (this&D114316) are the last bits needed for D111460.

RKSimon accepted this revision.Wed, Nov 24, 3:38 AM


This revision is now accepted and ready to land.Wed, Nov 24, 3:38 AM
This revision was landed with ongoing or failed builds.Wed, Nov 24, 6:41 AM
This revision was automatically updated to reflect the committed changes.