Lots of diff. The entire check-llvm-codegen passes,
so only X86 had conflicting transform. (D62327)
We want this transform because currently every single DAGCombine add %x, C
vector pattern needs to be written twice - for add and for sub.
Not good.
- AArch64 changes look neutral-positive. I'm not good with that asm, but i think movi v1.2d encodes the entire all-ones as an imm0_255:$imm8, so there should not be codesize penalty?
- AMDGPU changes look neutral-positive.
- MIPS changes are neutral, regressions are being addressed by D66805.
- PowerPC - not great, some regressions, same fold as MIPS seems missing.
- X86 - in average looks like an improvement :) There are more deletions than additions. We delete 137 unfolded constant-pool loads, but add 56; delete 233 folded constant-pool loads, but add 350. Can't tell yet if there is some missing combines..
Do we need a (!VT.isVector() || N1.hasOneUse()) limit?