This patch converts 0 - abs(x) to Y = sra (X, size(X)-1); sub (Y, xor (X, Y)) for better codegen.
|410 ms||linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp|
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
Should we fold it into sub (0, ABS()) if ABS is legal ? And how about the VSELECT ? And it also surprise me that, there isn't pattern here for the ABS(). If it has, your code might merge with that one.
That is good news. But we still need to combine the sub(0, abs) to sradi/xor/sub as it saves one instruction.
What we have now with D87188:
0: 76 fe 64 7c sradi r4,r3,63 4: 14 22 63 7c add r3,r3,r4 8: 78 22 63 7c xor r3,r3,r4 c: d0 00 63 7c neg r3,r3
And it can be improved to:
0: 76 fe 60 7c sradi r0,r3,63 4: 78 02 63 7c xor r3,r3,r0 8: 50 00 63 7c subf r3,r3,r0
It seems strange that we would need to add a TLI hook to avoid regressing a target that has abs/nabs support in the ISA.
If I'm seeing it correctly, SystemZ relies on ISD::ABS being expanded and then pattern-matching the expansion back to lpgr for example. Could we change that target to declare ISD::ABS as legal and adjust the tablegen matching for it?
Hmm, it looks strange to me that we have to add these two ops into worklist manual. Isn't it added into worklist automatically as far as the parent node (SUB) returned ? Please correct me if I misunderstand this.
Good catch. I think the combine loop is supposed to add direct child nodes to the worklist. Since AddToWorklist uniques nodes added, I think it's also ok to call it manually. Another question arises, can we always rely on the behavior of the combine loop adding direct child node to the worklist automatically.