This revision adds a transformation and a pattern that rewrites a "maybe masked" vector.transfer_read %view[...], %pad into a pattern resembling:
%1:3 = scf.if (%inBounds) { scf.yield %view : memref<A...>, index, index } else { %2 = vector.transfer_read %view[...], %pad : memref<A...>, vector<...> %3 = vector.type_cast %extra_alloc : memref<...> to memref<vector<...>> store %2, %3[] : memref<vector<...>> %4 = memref_cast %extra_alloc: memref<B...> to memref<A...> scf.yield %4 : memref<A...>, index, index } %res= vector.transfer_read %1#0[%1#1, %1#2] {masked = [false ... false]}
where extra_alloc is a top of the function alloca'ed buffer of one vector.
This rewrite makes it possible to realize the "always full tile" abstraction where vector.transfer_read operations are guaranteed to read from a padded full buffer.
The extra work only occurs on the boundary tiles.
clang-tidy: warning: namespace 'scf' not terminated with a closing comment [llvm-namespace-comment]
not useful