The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
...and those are shown in the IR test file and x86 codegen file.
Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.
I replicated the codegen tests for AArch64 to show that forming usubo should be generally good (and that lines up with the existing uaddo sibling transform).
There's a potential regression seen in the AMDGPU test file when trying to form a SAD op though, so I added a hack to try to avoid that.
If I'm seeing the PPC changes correctly, this is a small improvement on those tests.
I have to look at the entire codegen here a little more closely. The reason I think this requires deeper investigation is that it is changing a branch from the CTR-decrementing bdzlr to the bnelr that doesn't touch the CTR. So I want to make sure we are not preventing the formation of CTR loops by this change.