This is an archive of the discontinued LLVM Phabricator instance.

[X86] Create PSUBUS from (add (umax X, C), -C)
ClosedPublic

Authored by craig.topper on Dec 17 2018, 11:20 AM.

Details

Summary

InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.

This patch recognizes this pattern and turns it into PSUBUS.

Fixes some of PR40053

Diff Detail

Event Timeline

craig.topper created this revision.Dec 17 2018, 11:20 AM
  1. Preliminary: add the tests with baseline trunk checks.
  2. We should have test coverage for v8i16, v16i16, and v32i8.
  3. This doesn't solve the addusb part of PR40053, right? That needs some enhancement around MatchADDUS I think.
craig.topper edited the summary of this revision. (Show Details)

Rebase after committing baseline tests for all types.

nikic added a subscriber: nikic.Dec 17 2018, 3:57 PM

Hi Craig,

Your patch addresses the issue with the subs example from PR40053. However, as soon as we change the code to something like this, then your rule would not trigger:

unsigned long long test_sub_2(__m128i x) {
    __m128i c = _mm_set1_epi8(70);
    return _mm_subs_epu8(x, c)[0];
}

This is similar to the example from PR40053, with the difference that only element zero is effectively used.
If I build this with clang, the optimizer performs a simplify demanded elts-like optimization that propagates undefs to the constant vector.
That undef propagation later on breaks the UMAX pattern. So, it won't appear in the "Initial Selection DAG".
As a consequence, your new pattern would not work, and we end up with this codegen (avx):

vpmaxub .LCPI1_0(%rip), %xmm0, %xmm1
vpcmpeqb        %xmm1, %xmm0, %xmm1
vpaddb  .LCPI1_1(%rip), %xmm0, %xmm0
vpand   %xmm0, %xmm1, %xmm0
vmovq   %xmm0, %rax

Your patch improves the matching of SUBUS, but more work would need to be done in this area.

There is also another problem caused by undef being aggressively propagated by the logic that simplifies demanded vector elements. See my comment below.

lib/Target/X86/X86ISelLowering.cpp
40522

My understanding is that matchBinaryPredicate() bails out if not all elements from the input vectors are ConstantSDNode. In the presence of constant build vectors with some undefs, that logic would not be able to match a SUBUS. However, the presence of undef values shouldn't really matter in this particular case. It should be safe to ignore them.

Recently simplify demanded vector elts has become more effective (and a bit more aggressive). Being able to propagate undef to unused lanes of an input vector is great. However, in one particular case (D55600) it caused a regression.
The root cause of that regression was the the inability of matchBinaryPredicate() to handle undefs.

I am worried that, the more we improve that simplification logic, the more likely it is to end up propagating undef values to constant vectors.

I think that we need a version of matchBinaryPredicate that knows how to skip undefs. Alternatively, `matchBinaryPredicate could accept an extra (optional) bool argument named IgnoreUndefs.

RKSimon added inline comments.Dec 18 2018, 6:06 AM
lib/Target/X86/X86ISelLowering.cpp
40522

I'm looking at this now - D55819 does the same for matchUnaryPredicate and I should have the equivalent matchBinaryPredicate patch available soon

spatel accepted this revision.Dec 18 2018, 6:15 AM

This change handles psubus cases with no undefs, so LGTM.
Depending on how the patches land, we could add the undef capability and tests for it directly here or make that an enhancement.
But as I mentioned, I don't think this patch alone is enough to close PR40053 because that's at least 2 independent bugs in 1 report.

This revision is now accepted and ready to land.Dec 18 2018, 6:15 AM
RKSimon added inline comments.Dec 18 2018, 6:53 AM
lib/Target/X86/X86ISelLowering.cpp
40522

D55822 is for matchBinaryPredicate undef handling - I have no objections for this patch to go in first and I can add support for it to D55822

This change handles psubus cases with no undefs, so LGTM.
Depending on how the patches land, we could add the undef capability and tests for it directly here or make that an enhancement.
But as I mentioned, I don't think this patch alone is enough to close PR40053 because that's at least 2 independent bugs in 1 report.

Don't get me wrong: I am not suggesting that we shouldn't commit this patch.

I just wanted to point out that it is not true that all the psubus cases are fixed as you wrote. To fully fix psubus we need to make sure that we match UMAX even in the presence of undefs.
As I wrote in my previous comment, if you slightly change the example from the bugzilla, then we no longer get a UMAX in the DAG, and we lose this optimization (see below):

unsigned long long test_sub_2(__m128i x) {
    __m128i c = _mm_set1_epi8(70);
    return _mm_subs_epu8(x, c)[0];
}

This change handles psubus cases with no undefs, so LGTM.
Depending on how the patches land, we could add the undef capability and tests for it directly here or make that an enhancement.
But as I mentioned, I don't think this patch alone is enough to close PR40053 because that's at least 2 independent bugs in 1 report.

Don't get me wrong: I am not suggesting that we shouldn't commit this patch.

I just wanted to point out that it is not true that all the psubus cases are fixed as you wrote. To fully fix psubus we need to make sure that we match UMAX even in the presence of undefs.
As I wrote in my previous comment, if you slightly change the example from the bugzilla, then we no longer get a UMAX in the DAG, and we lose this optimization (see below):

unsigned long long test_sub_2(__m128i x) {
    __m128i c = _mm_set1_epi8(70);
    return _mm_subs_epu8(x, c)[0];
}

Ah, sorry I forgot to address that example. I filed it here:
https://bugs.llvm.org/show_bug.cgi?id=40083

And I agree, we're missing many saturating math patterns. I started looking into that with:
https://bugs.llvm.org/show_bug.cgi?id=14613
...but now that we have IR intrinsics and DAG nodes for saturating math, I think it needs to be revisited so we use those ops.

This change handles psubus cases with no undefs, so LGTM.
Depending on how the patches land, we could add the undef capability and tests for it directly here or make that an enhancement.
But as I mentioned, I don't think this patch alone is enough to close PR40053 because that's at least 2 independent bugs in 1 report.

Don't get me wrong: I am not suggesting that we shouldn't commit this patch.

I just wanted to point out that it is not true that all the psubus cases are fixed as you wrote. To fully fix psubus we need to make sure that we match UMAX even in the presence of undefs.
As I wrote in my previous comment, if you slightly change the example from the bugzilla, then we no longer get a UMAX in the DAG, and we lose this optimization (see below):

unsigned long long test_sub_2(__m128i x) {
    __m128i c = _mm_set1_epi8(70);
    return _mm_subs_epu8(x, c)[0];
}

Ah, sorry I forgot to address that example. I filed it here:
https://bugs.llvm.org/show_bug.cgi?id=40083

No problem. Thanks for raising that bug!

And I agree, we're missing many saturating math patterns. I started looking into that with:
https://bugs.llvm.org/show_bug.cgi?id=14613
...but now that we have IR intrinsics and DAG nodes for saturating math, I think it needs to be revisited so we use those ops.

As Simon wrote, we could wait for D55822, so that we get support for undefs too. That being said, I am okay even if this patch is committed first.

-Andrea

This revision was automatically updated to reflect the committed changes.