This patch extends the Linalg vectoriser so that scalar loads are

correctly identified as scalar rather than gather loads. Below is an

example of a scalar load:

func.func @example(%arg0: tensor<80x16xf32>, %arg2: tensor<1x4xf32>) -> tensor<1x4xf32> { %c8 = arith.constant 8 : index %c16 = arith.constant 16 : index %1 = linalg.generic { indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"] } outs(%arg2 : tensor<1x4xf32>) { ^bb0(%out: f32): %2 = linalg.index 0 : index %extracted = tensor.extract %arg0[%2, %c16] : tensor<80x16xf32> linalg.yield %extracted : f32 } -> tensor<1x4xf32> return %1 : tensor<1x4xf32> }

This patch also makes sure that these scalar loads are indeed lowered to

scalar loads (implemented as `vector.gather`) followed by broadcast:

%8 = vector.gather %arg0[%c0, %c0] [%7], %cst_1, %cst_2 : tensor<80x16xf32>, vector<1xindex>, vector<1xi1>, vector<1xf32> into vector<1xf32> %9 = vector.broadcast %8 : vector<1xf32> to vector<1x4xf32>

We still need to check what backends do in these cases and whether the

vectoriser could generate genuinely scalar code instead.