This carves out an exception for a pair of consecutive loads that are reversed from the consecutive order of a pair of stores. All of the existing profitability/legality checks for the memops remain between the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in PR41098 and PR44895:
https://bugs.llvm.org/show_bug.cgi?id=41098
https://bugs.llvm.org/show_bug.cgi?id=44895
I think we are missing a potential subsequent conversion to use "movbe" if the target supports that. That might be similar to what AArch64 would use to get "rev16".