This patch implements vectorization of tensor.extract for n-D tensor

(n >= 2) using contiguous load operations, i.e. `vector.transfer_read`. This

is a follow-up of https://reviews.llvm.org/D137660 in which gather loads

were used, i.e. `vector.gather`.

It is always safe to use gather load operations when the underlying

memory pattern is contiguous, but not vice-verse. At the moment, the

following conditions have to be met for contiguous loads to be

generated:

- The _output tensor_ must be a 1-D vector with the trailing dim > 1, e.g.
`tensor<1x1x4xi32`, - The trailing dim in the _input tensor_ must be > 1, e.g.
`tensor<1x1x4i32>`would be fine, but not`tensor<1x4x1xi32>`.

If these conditions are not satisfied, gather loads are generated

instead.

Condition 1 guarantees that the iteration space of the corresponding

`linalg.generic` Op is relatively simple. That makes analysing the

indices for `tensor.extract` rather straightforward.

Condition 2 is mostly there to avoid weird vectorisation patterns, e.g.

`vector<1x1x1xi32`>. In practice, tensors like `tensor<1x4x1xi32>`

should first be collapsed to `tensor<1x4xi32>` before vectorisation, but

that should happen somewhere else.

If needed, both conditions can be relaxed. I've not been able to find a

good motivating example for these, hence skipping. For reference,

`tosa.resize` (lowered to Linalg) was the driving example used here.

Co-authored-by: Diego Caballero <diegocaballero@google.com>