InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.
This patch recognizes this pattern and turns it into PSUBUS.
Fixes some of PR40053
Paths
| Differential D55780
[X86] Create PSUBUS from (add (umax X, C), -C) ClosedPublic Authored by craig.topper on Dec 17 2018, 11:20 AM.
Details Summary InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant. This patch recognizes this pattern and turns it into PSUBUS. Fixes some of PR40053
Diff Detail
Event TimelineComment Actions
Comment Actions Hi Craig, Your patch addresses the issue with the subs example from PR40053. However, as soon as we change the code to something like this, then your rule would not trigger: unsigned long long test_sub_2(__m128i x) { __m128i c = _mm_set1_epi8(70); return _mm_subs_epu8(x, c)[0]; } This is similar to the example from PR40053, with the difference that only element zero is effectively used. vpmaxub .LCPI1_0(%rip), %xmm0, %xmm1 vpcmpeqb %xmm1, %xmm0, %xmm1 vpaddb .LCPI1_1(%rip), %xmm0, %xmm0 vpand %xmm0, %xmm1, %xmm0 vmovq %xmm0, %rax Your patch improves the matching of SUBUS, but more work would need to be done in this area. There is also another problem caused by undef being aggressively propagated by the logic that simplifies demanded vector elements. See my comment below.
Comment Actions This change handles psubus cases with no undefs, so LGTM. This revision is now accepted and ready to land.Dec 18 2018, 6:15 AM Comment Actions
Don't get me wrong: I am not suggesting that we shouldn't commit this patch. I just wanted to point out that it is not true that all the psubus cases are fixed as you wrote. To fully fix psubus we need to make sure that we match UMAX even in the presence of undefs. unsigned long long test_sub_2(__m128i x) { __m128i c = _mm_set1_epi8(70); return _mm_subs_epu8(x, c)[0]; } Comment Actions
Ah, sorry I forgot to address that example. I filed it here: And I agree, we're missing many saturating math patterns. I started looking into that with: Comment Actions
No problem. Thanks for raising that bug!
As Simon wrote, we could wait for D55822, so that we get support for undefs too. That being said, I am okay even if this patch is committed first. -Andrea Closed by commit rL349519: [X86] Create PSUBUS from (add (umax X, C), -C) (authored by ctopper). · Explain WhyDec 18 2018, 10:29 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 178729 llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
llvm/trunk/test/CodeGen/X86/psubus.ll
|