This patch updates the vectorization of the extract Op so that the
permutation map for the transfer_read Op is defined explicitly by the
vectorizer.
This change is needed for cases where the rank of the source tensor is
lower than the rank of the output vector generated by the vectorizer:
mlir %17 = vector.transfer_read %arg1[%14, %16], %cst_4 {in_bounds = [true, true]} : tensor<257x24xf32>, vector<1x1x4xf32>
In cases like this, the vectorize will create the following permutation map:
(d0, d1) -> (0, d0, d1)
In other cases the behavior remains unchanged.
Fixes https://github.com/openxla/iree/issues/13036. That's also where
the test case was extracted from.
Please z!