When targeting CPUs that don't have LDBRX, we end up producing code that is very inefficient and large for this common idiom. This patch just optimizes it two 32-bit LWBRX instructions along with a merge.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49610
clang-tidy: warning: invalid case style for variable 'is64BitBswapOn64BitTgt' [readability-identifier-naming]
not useful