Convert vector increment or decrement to sub/add with an all-ones constant:
add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...>
The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency and/or has no latency), so that's better than loading a splat 1 constant.
The SSE and AVX1/2 diffs look like what I expected - we prefer 'pcmpeq' even over a folded load. AVX512 uses 'vpternlogd' for 512-bit vectors. Is that optimal?
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=33483