Vector shape casts between single element vectors (vector<1xf32>) and zero dim vectors (vector<f32>) are created when dropping unit dims on transfer read/write.
For example:
%13 = vector.transfer_read %subview_14[], %c0_i32 : memref<i32, strided<[], offset: ?>>, vector<i32> %14 = vector.shape_cast %13 : vector<i32> to vector<1xi32> %15 = arith.muli %8, %cst_1 : vector<1xi32> %16 = arith.subi %12, %15 : vector<1xi32> %17 = arith.addi %10, %16 : vector<1xi32> %18 = "tosa.apply_scale"(%17, %14, %cst_2) <{double_round = true}> : (vector<1xi32>, vector<1xi32>, vector<1xi8>) -> vector<1xi32> %19 = arith.addi %18, %cst_3 : vector<1xi32> %20 = arith.cmpi slt, %19, %cst_1 : vector<1xi32> %21 = arith.select %20, %cst_1, %19 : vector<1xi1>, vector<1xi32> %22 = arith.cmpi sgt, %19, %cst_4 : vector<1xi32> %23 = arith.select %22, %cst_4, %21 : vector<1xi1>, vector<1xi32> %24 = arith.trunci %23 : vector<1xi32> to vector<1xi8> %25 = arith.sitofp %24 : vector<1xi8> to vector<1xf32> %26 = arith.subf %25, %cst_5 : vector<1xf32> %27 = arith.mulf %26, %cst_6 : vector<1xf32> %subview_15 = memref.subview %subview[0] [1] [1] : memref<1xf32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<f32, strided<[], offset: ?>, #hal.descriptor_type<storage_buffer>> %28 = vector.shape_cast %27 : vector<1xf32> to vector<f32> vector.transfer_write %28, %subview_15[] : vector<f32>, memref<f32, strided<[], offset: ?>, #hal.descriptor_type<storage_buffer>>
Add a pattern to distribute those trivial shape casts.
Multi-element shape casts can also happen when dropping unit dims (e.g. vector<8x1> to vector<8>). Those can be handled later when we have an example.
Related issue: https://github.com/openxla/iree/issues/14191
Actually this is just broadcast? Have you considered adding a canonicalization pattern to turn this into broadcast? Then we can leverage the existing WarpOnBroadcast pattern. It could be more widely applicable and help to clean other places too.