This patch extends the Linalg vectoriser so that scalar loads are
correctly identified as scalar rather than gather loads. Below is an
example of a scalar load:
func.func @example(%arg0: tensor<80x16xf32>, %arg2: tensor<1x4xf32>) -> tensor<1x4xf32> {
%c8 = arith.constant 8 : index
%c16 = arith.constant 16 : index
%1 = linalg.generic {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>],
iterator_types = ["parallel", "parallel"]
} outs(%arg2 : tensor<1x4xf32>) {
^bb0(%out: f32):
%2 = linalg.index 0 : index
%extracted = tensor.extract %arg0[%2, %c16] : tensor<80x16xf32>
linalg.yield %extracted : f32
} -> tensor<1x4xf32>
return %1 : tensor<1x4xf32>
}This patch also makes sure that these scalar loads are indeed lowered to
scalar loads (implemented as vector.gather) followed by broadcast:
%8 = vector.gather %arg0[%c0, %c0] [%7], %cst_1, %cst_2 : tensor<80x16xf32>, vector<1xindex>, vector<1xi1>, vector<1xf32> into vector<1xf32> %9 = vector.broadcast %8 : vector<1xf32> to vector<1x4xf32>
We still need to check what backends do in these cases and whether the
vectoriser could generate genuinely scalar code instead.
is ValueRange needed here?