Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the VF elements
in a vector ReplicationFactor times. For example, the mask for
ReplicationFactor=3 and VF=4 is: <0,0,0,1,1,1,2,2,2,3,3,3>.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.
As discussed in previous installments, while it is used by LV,
and we seem to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.
As it has been witnessed in the recent AVX2 X86TTIImpl::getInterleavedMemoryOpCost()
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.
Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching the TargetTransformInfoImplCRTPBase::getUserCost()
to recognize Instruction::ShuffleVector that are repetition masks,
adding exhaustive test coverage using -cost-model -analyze + utils/update_analyze_test_checks.py
This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.
This patch adds precise undef-aware isReplicationMask(), with exhaustive test coverage.
InstructionsTest.ShuffleMaskIsReplicationMask shows that it correctly detects all the known masks.
InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness shows that if
we detected the replication mask with given params, then if we actually generate
a true replication mask with said params, it matches element-wise ignoring undef mask elements.
an -> a