This patch implements vectorization of tensor.extract for n-D tensor
(n >= 2) using contiguous load operations, i.e. vector.transfer_read. This
is a follow-up of https://reviews.llvm.org/D137660 in which gather loads
were used, i.e. vector.gather.
It is always safe to use gather load operations when the underlying
memory pattern is contiguous, but not vice-verse. At the moment, the
following conditions have to be met for contiguous loads to be
- The _output tensor_ must be a 1-D vector with the trailing dim > 1, e.g. tensor<1x1x4xi32,
- The trailing dim in the _input tensor_ must be > 1, e.g. tensor<1x1x4i32> would be fine, but not tensor<1x4x1xi32>.
If these conditions are not satisfied, gather loads are generated
Condition 1 guarantees that the iteration space of the corresponding
linalg.generic Op is relatively simple. That makes analysing the
indices for tensor.extract rather straightforward.
Condition 2 is mostly there to avoid weird vectorisation patterns, e.g.
vector<1x1x1xi32>. In practice, tensors like tensor<1x4x1xi32>
should first be collapsed to tensor<1x4xi32> before vectorisation, but
that should happen somewhere else.
If needed, both conditions can be relaxed. I've not been able to find a
good motivating example for these, hence skipping. For reference,
tosa.resize (lowered to Linalg) was the driving example used here.
Co-authored-by: Diego Caballero <firstname.lastname@example.org>