Currently BasicTTIImpl::getInterleavedMemoryOpCost computes the cost of
the shuffles to extract/pack sub-vectors for interleave groups as the
cost of inserting/extracting all elements individually. This
over-estimates the cost in cases where the cost of the actual shuffle is
This patch also computes the cost for the shuffle using getShuffleCost
and picks the minimum. The extract/insert combination can be cheaper
than shuffles, e.g. if there are many gaps in the group.