Ops such as %1 = vector.extractelement %0[%pos : index] : vector<96xf32>.
In case of an extract from a 1D vector, the source vector is distributed. The lane into which the requested position falls, extracts the element and shuffles it to all other lanes.
I would remove the if/else, since the code guarded by if is just an extract it is better to do it for all the lanes even if it is useful only for laneIdx.