[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
The main motivation is shown by all these neg instructions that are now created.
In particular, the @reg32_lshr_by_negated_unfolded_sub_b test.
AArch64 test changes all look good (neg created), or neutral.
X86 changes look neutral (vectors), or good (neg / xor eax, eax created).
I'm not sure about X86/ragreedy-hoist-spill.ll, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving PR41952.
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D62223