This is an archive of the discontinued LLVM Phabricator instance.

[X86] Canonicalize andnp for bitmask arithmetic
AbandonedPublic

Authored by loladiro on Aug 6 2020, 9:46 PM.

Details

Summary

We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. The easy change is just to avoid changing
the VT, but this is slightly problematic because the canonical pattern for
kandn is (and (vnot a) b) rather than (x86andnp a b), so this fails
to select. Rather than playing games here with having the mask vectors
use a different canonical representation, the bulk of this commit switches
the canonical ISD representation for kandn to (x86andnp a b) such
that all vector types may be handled equally here. To avoid regressing
other tests, we need to extend a few other folds to handle x86andnp in
addition to plain and. However, that should be generally a good
improvement, since x86andnp is already canonical for non-i1 vectors
prior to this commit, and said folds were just missing.

When all is said and done, fixes the issue reported in
https://github.com/JuliaLang/julia/issues/36955.

Diff Detail

Event Timeline

loladiro created this revision.Aug 6 2020, 9:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2020, 9:46 PM
loladiro requested review of this revision.Aug 6 2020, 9:46 PM
loladiro added a subscriber: Restricted Project.Aug 6 2020, 9:47 PM

I kind of think we shouldn't change the canonical form for masks and we should handle the mask case of vselect in generic DAG combine like we do in InstCombine. Maybe we could handle the non mask case there instead too, but that depend on whether we can form ANDNP afterwards.

I think that would be fine, but in that case, we should probably get rid of x86andnp entirely and just make (and (not x) y) canonical everywhere. The supported patterns for these are much the same, so if we have both canonical forms, we're just signing up for double the work.

I'm all for removing ANDNP. I think it largely exists for load folding and/or shuffle lowering/combining. Neither of which apply to mask which is why I'm hesitant to change mask to match it. There are also other mask patterns in tablegen rooted on AND that aren't covered by tryVPTESTM.

Are you saying you think we can get away with removing X86ISD::ANDNP entirely? We don't have it for the BMI scalar variant, and it /usually/ works.

llvm/lib/Target/X86/X86ISelLowering.cpp
39637

This looks a separate NFC-ish change - a legacy from when the bitops were legal for just vXi64 types

Are you saying you think we can get away with removing X86ISD::ANDNP entirely? We don't have it for the BMI scalar variant, and it /usually/ works.

I'm not saying that we can definitely get away with removing it. It has reasons for existing. I'm just saying I don't want to spread it further. We usually like to avoid target specific nodes when possible so increasing the scope of a target specific node seems the wrong direction. Masks and xmm/ymm/zmm vectors are already treated differently in many places. I don't see that having two canonical representations makes their differences that much larger than they already are.

llvm/lib/Target/X86/X86ISelLowering.cpp
39637

This is the buggy code that is creating ANDNP for vXi1. The bitcast part is NFCish.

Abandon this now D85553 has landed?

loladiro abandoned this revision.Aug 10 2020, 1:47 PM