Since byte-swapping loads and stores are supported, a loop containing a load -> bswap or bswap -> store should have the cost reduced by 1 for each such pair.
Since the Instruction pointer is available in getMemoryOpCost() this is the place this search is done to detect these cases. Perhaps the 0 cost should have belonged to the bswap intrinsic, but it is not possible to handle both cases in getIntrinsicInstrCost() as only the arguments are available.
This is NFC on SPEC while ~20 loops get their scalar costs corrected without affecting any vectorizer decisions.