In the vector distribute patterns, we used to move
vector.broadcasts out of vector.warp_execute_on_lane0s
irrespectively of how they were defined.
This could create broadcast operations with invalid semantic.
E.g.,
%r = warop ...[32] ... -> vector<1x2xf32> { %val = broadcast %in : vector<64xf32> to vetor<1x64xf32> vector.yield %val : vector<1x64xf32> }
>
%r = warop ...[32] ... -> vector<64xf32> { vector.yield %in : vector<64xf32> } // Broadcasting to a narrower type! broadcast %r : vector<64xf32> to vector<1x2xf32>
The root issue is we are trying to broadcast something that is not the same
for each thread, so there is actually nothing to propagate here.
The fix checks that the broadcast we want to create actually makes sense.