lowerShuffleAsLanePermuteAndRepeatedMask expands a shuffle from shuffle(x,y,mask) to shuffle(shuffle(x,y,lanemask1),shuffle(x,y,lanemask2),repeatedinlanemask)
However, we weren't making use of the fact that elements of the original mask might be undef - instead of fully applying the entire repeatedinlanemask to every lane, we can simplify the mask if we never demanded that element in the original mask.
Yet another improvement addressing regressions from D127115