Compared to permlane16, permlane64 has no BC input because it has no
boundary conditions, no fi input because the instruction acts as if FI
were always enabled, and no OLD input because it always writes to every
active lane.
Also use the new intrinsic in the atomic optimizer pass.
A small nit: maybe VOP_MOVRELS needs a new name now.