Currently BasicTTIImpl::getInterleavedMemoryOpCost computes the cost of
the shuffles to extract/pack sub-vectors for interleave groups as the
cost of inserting/extracting all elements individually. This
over-estimates the cost in cases where the cost of the actual shuffle is
lower.
This patch also computes the cost for the shuffle using getShuffleCost
and picks the minimum. The extract/insert combination can be cheaper
than shuffles, e.g. if there are many gaps in the group.
clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]
not useful