Page MenuHomePhabricator

[X86][Costmodel] getMaskedMemoryOpCost(): don't scalarize non-power-of-two vectors with legal element type

Authored by lebedev.ri on May 23 2021, 12:00 PM.



This follows in steps of similar getMemoryOpCost() changes, D100099/D100684.

Intel SDM, VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores:

Faults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to
referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no
faults will be detected if the mask bits are all zero.

I.e., if mask is all-zeros, any address is fine.

Masked load/store's prime use-case is e.g. tail masking the loop remainder,
where for the last iteration, only first some few elements of a vector exist.

So much similarly, i don't see why must we scalarize non-power-of-two vectors,
iff the element type is something we can masked- store/load.
We simply need to legalize it, widen the mask, and be done with it.
And we even already count the cost of widening the mask.

Diff Detail

Event Timeline

lebedev.ri created this revision.May 23 2021, 12:00 PM
lebedev.ri requested review of this revision.May 23 2021, 12:00 PM

Does codegen support this kind of lowering?

Does codegen support this kind of lowering?

It already has to, because e.g. <2 x float> is a power of two,
but is only a half of XMM, non-powers-of-two aren't any different.

ABataev accepted this revision.May 24 2021, 5:09 AM

LG if there are no other opinions

This revision is now accepted and ready to land.May 24 2021, 5:09 AM

Fix an obvious bug that we don't actually count the cost of mask extension for some non-power-of-two's.

LG if there are no other opinions

Thank you for the review.
I believe this to be a rather straight-forward change, so i'm going to land this.

This revision was landed with ongoing or failed builds.May 24 2021, 10:10 AM
This revision was automatically updated to reflect the committed changes.