[PowerPC] Combine 64-bit bswap(load) without LDBRX
When targeting CPUs that don't have LDBRX, we end up producing code that is
very inefficient and large for this common idiom. This patch just
optimizes it two 32-bit LWBRX instructions along with a merge.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49610
Differential revision: https://reviews.llvm.org/D104836