When targeting CPUs that don't have LDBRX, we end up producing code that is very inefficient and large for this common idiom. This patch just optimizes it two 32-bit LWBRX instructions along with a merge.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49610
If we honor the clang-tidy (no else after return), that makes it more obvious that we are repeating conditions in both clauses. It would be cleaner to hoist those into a common 'if' clause, and probably better still to make it an early exit for the inverted clause (possibly add a helper function too):