On X86, gather/scatter story is sad. Native support appeared only in AVX2,
but even then, only in Skylake and newer their performance is not abysmal.
Even in Zen3 it's rather bad. So X86 says that masked gather/scatter
are not legal (except for +avx512 || +fast-gather),
and ScalarizeMaskedMemIntrin pass expands them.
But at the same time, we can model the cost of the expanded form
of gather/scatter, via X86TTIImpl::getGatherScatterOpCost(),
and most often it's better than the LV's "scalarization" cost,
but since we say the gather is illegal, LV does not even query it's cost.
I think this is not optimal. I propose to add a new TTI hook,
shouldUseMaskedGatherForVectorization(), which defaults to isLegalMaskedGather(),
but is overrided on X86 to unconditionally return true iff no variable mask is needed
(i.e. the gather/scatter sequence will not require branching).
If this makes sense i can follow up with SLP patch.
pre-commit the regeneration?