This patch extends the Linalg vectoriser so that scalar loads are
correctly identified as scalar rather than gather loads. Below is an
example of a scalar load:
func.func @example(%arg0: tensor<80x16xf32>, %arg2: tensor<1x4xf32>) -> tensor<1x4xf32> { %c8 = arith.constant 8 : index %c16 = arith.constant 16 : index %1 = linalg.generic { indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"] } outs(%arg2 : tensor<1x4xf32>) { ^bb0(%out: f32): %2 = linalg.index 0 : index %extracted = tensor.extract %arg0[%2, %c16] : tensor<80x16xf32> linalg.yield %extracted : f32 } -> tensor<1x4xf32> return %1 : tensor<1x4xf32> }
This patch also makes sure that these scalar loads are indeed lowered to
scalar loads (implemented as vector.gather) followed by broadcast:
%8 = vector.gather %arg0[%c0, %c0] [%7], %cst_1, %cst_2 : tensor<80x16xf32>, vector<1xindex>, vector<1xi1>, vector<1xf32> into vector<1xf32> %9 = vector.broadcast %8 : vector<1xf32> to vector<1x4xf32>
We still need to check what backends do in these cases and whether the
vectoriser could generate genuinely scalar code instead.
is ValueRange needed here?