InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.
This patch recognizes this pattern and turns it into PSUBUS.
Fixes some of PR40053
Differential D55780
[X86] Create PSUBUS from (add (umax X, C), -C) craig.topper on Dec 17 2018, 11:20 AM. Authored by
Details InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Fixes some of PR40053
Diff Detail Event TimelineComment Actions
Comment Actions Hi Craig, Your patch addresses the issue with the subs example from PR40053. However, as soon as we change the code to something like this, then your rule would not trigger: unsigned long long test_sub_2(__m128i x) { __m128i c = _mm_set1_epi8(70); return _mm_subs_epu8(x, c)[0]; } This is similar to the example from PR40053, with the difference that only element zero is effectively used. vpmaxub .LCPI1_0(%rip), %xmm0, %xmm1 vpcmpeqb %xmm1, %xmm0, %xmm1 vpaddb .LCPI1_1(%rip), %xmm0, %xmm0 vpand %xmm0, %xmm1, %xmm0 vmovq %xmm0, %rax Your patch improves the matching of SUBUS, but more work would need to be done in this area. There is also another problem caused by undef being aggressively propagated by the logic that simplifies demanded vector elements. See my comment below.
Comment Actions This change handles psubus cases with no undefs, so LGTM. Comment Actions Don't get me wrong: I am not suggesting that we shouldn't commit this patch. I just wanted to point out that it is not true that all the psubus cases are fixed as you wrote. To fully fix psubus we need to make sure that we match UMAX even in the presence of undefs. unsigned long long test_sub_2(__m128i x) { __m128i c = _mm_set1_epi8(70); return _mm_subs_epu8(x, c)[0]; } Comment Actions Ah, sorry I forgot to address that example. I filed it here: And I agree, we're missing many saturating math patterns. I started looking into that with: Comment Actions No problem. Thanks for raising that bug!
As Simon wrote, we could wait for D55822, so that we get support for undefs too. That being said, I am okay even if this patch is committed first. -Andrea |
My understanding is that matchBinaryPredicate() bails out if not all elements from the input vectors are ConstantSDNode. In the presence of constant build vectors with some undefs, that logic would not be able to match a SUBUS. However, the presence of undef values shouldn't really matter in this particular case. It should be safe to ignore them.
Recently simplify demanded vector elts has become more effective (and a bit more aggressive). Being able to propagate undef to unused lanes of an input vector is great. However, in one particular case (D55600) it caused a regression.
The root cause of that regression was the the inability of matchBinaryPredicate() to handle undefs.
I am worried that, the more we improve that simplification logic, the more likely it is to end up propagating undef values to constant vectors.
I think that we need a version of matchBinaryPredicate that knows how to skip undefs. Alternatively, `matchBinaryPredicate could accept an extra (optional) bool argument named IgnoreUndefs.