This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] fold bit-hack form of usubsat
ClosedPublic

Authored by spatel on Oct 19 2021, 10:23 AM.

Details

Summary

(i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128

I haven't found a generalization of this identity:
https://alive2.llvm.org/ce/z/_sriEQ

Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here!

The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only).

On x86 for example, we have custom lowering for some vector types and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties.

Diff Detail

Event Timeline

spatel created this revision.Oct 19 2021, 10:23 AM
spatel requested review of this revision.Oct 19 2021, 10:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2021, 10:23 AM
Herald added a subscriber: wdng. · View Herald Transcript
RKSimon added inline comments.Oct 19 2021, 10:39 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6021

If we could use hasOperation then AVX1 should be able to match this for 256-bit integers as well.

spatel added inline comments.Oct 19 2021, 11:13 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6021

Yes - that's how I had drafted this initially, but I noticed potential regressions with extra uses.
But extra-use patterns on something like this might be too rare to worry about.
I'll add some more tests and update.

spatel updated this revision to Diff 380746.Oct 19 2021, 11:40 AM
spatel marked an inline comment as done.

Patch updated:
Allow transform for custom-lowered ops and restrict so there are no extra uses.
This exposes potential x86 improvements for custom lowering for very old SSE and AVX512 targets (or the default expansion should be changed).

This exposes potential x86 improvements for custom lowering for very old SSE and AVX512 targets (or the default expansion should be changed).

We already have custom usubsat expansion in LowerADDSAT_SUBSAT - adding the special case shouldn't be a problem.

spatel updated this revision to Diff 380778.Oct 19 2021, 2:23 PM

Patch updated:
Rebased after D112095 / 92a0389b0425 - so we don't have the SSE2/3 regression.
There's still an extra instruction for AVX512 on that test. If that's a regression, I can mark that as TODO or try to avoid that too.

RKSimon accepted this revision.Oct 19 2021, 2:32 PM

LGTM with one minor

llvm/test/CodeGen/X86/psubus.ll
115

We have a couple of 'UseVPTERNLOG' style checks in code already - you can add that to the D112095 special case to fix this.

This revision is now accepted and ready to land.Oct 19 2021, 2:32 PM
spatel updated this revision to Diff 381234.Oct 21 2021, 6:01 AM

Rebased after D112138 / 40163f1df8c60f987e8adc0 - known x86 regressions are now avoided with enhanced custom lowering.

spatel edited the summary of this revision. (Show Details)Oct 21 2021, 6:06 AM
This revision was landed with ongoing or failed builds.Oct 21 2021, 6:47 AM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.Oct 22 2021, 7:25 AM
foad added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5653

What about the same thing with X +/- 128 instead of X ^ 128? Do they already get canonicalized to the XOR version?

RKSimon added inline comments.Oct 22 2021, 7:31 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5653

I think instcombine will have canonicalized them: https://alive2.llvm.org/ce/z/n2TfRo

spatel added inline comments.Oct 22 2021, 8:11 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5653

Yes, instcombine will convert the math op to logic op. But this is an interesting question for codegen because one form may be better than the other depending on target.

For example on x86 we get these variants:

leal	-2147483648(%rdi), %eax

vs.

movl	%edi, %eax
xorl	$-2147483648, %eax              ## imm = 0x80000000

...while on aarch64:

mov	w8, #-2147483648
add	w0, w0, w8

vs.

eor	w0, w0, #0x80000000

Also, this patch is part of a chain that eventually leads back to a question about whether instcombine is canonicalizing or even can (in case of extra uses) canonicalize to a form that is consistent (we do form usubsat in IR sometimes, but we're missing patterns as shown in this patch).

So I think it's not a stretch to make the match more flexible in this fold...gives us some protection in case code outside of here decides to do things differently in the future.