This is an archive of the discontinued LLVM Phabricator instance.

[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)
ClosedPublic

Authored by spatel on Sep 20 2018, 11:47 AM.

Details

Summary

This is the final (I hope!) problem pattern mentioned in PR37749:
https://bugs.llvm.org/show_bug.cgi?id=37749

We are trying to avoid an AVX1 sinkhole that arises because the bitwise logic ops are the only supported 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.

The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence:

        t29: v8i32 = concat_vectors t27, t28
      t30: v4i64 = bitcast t29
        t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>, Constant:i32<-1>
      t31: v4i64 = bitcast t18
    t32: v4i64 = xor t30, t31
      t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>, Constant:i32<255>
    t34: v4i64 = bitcast t9
  t35: v4i64 = and t32, t34
t36: v8i32 = bitcast t35
      t37: v4i32 = extract_subvector t36, Constant:i64<0>
      t38: v4i32 = extract_subvector t36, Constant:i64<4>

Diff Detail

Event Timeline

spatel created this revision.Sep 20 2018, 11:47 AM

Hi Sanjay,

You should add a test where the mask vector is not a constant.

I verified that on Jaguar, this change improves cases where:

  • the mask is a constant
  • users access the lo/hi part of the defined YMM.

In one particular case, I saw a quite nice improvement in IPC.

Unfortunately, I also found this regression:

define <8 x i32> @bar(<8 x i32> %A, <8 x i32> %B, <8 x i32> %Mask) {
  %1 = and <8 x i32> %A, %Mask
  %2 = xor <8 x i32> %1, %Mask
  %3 = add <8 x i32> %2, %B
  ret <8 x i32> %3
}

Before this patch (-mcpu=btver2):

vandnps %ymm2, %ymm0, %ymm0
vextractf128    $1, %ymm1, %xmm3
vextractf128    $1, %ymm0, %xmm2
vpaddd  %xmm1, %xmm0, %xmm0
vpaddd  %xmm3, %xmm2, %xmm2
vinsertf128     $1, %xmm2, %ymm0, %ymm0
retq

After your patch:

vxorps  %xmm3, %xmm3, %xmm3
vextractf128    $1, %ymm1, %xmm4
vcmptrueps      %ymm3, %ymm3, %ymm3
vxorps  %ymm3, %ymm0, %ymm0
vandps  %xmm2, %xmm0, %xmm3
vextractf128    $1, %ymm0, %xmm0
vextractf128    $1, %ymm2, %xmm2
vpand   %xmm2, %xmm0, %xmm0
vpaddd  %xmm1, %xmm3, %xmm1
vpaddd  %xmm4, %xmm0, %xmm0
vinsertf128     $1, %xmm0, %ymm1, %ymm0
retq

Could you please have a look at it?

Thanks,
Andrea

Unfortunately, I also found this regression:

define <8 x i32> @bar(<8 x i32> %A, <8 x i32> %B, <8 x i32> %Mask) {
  %1 = and <8 x i32> %A, %Mask
  %2 = xor <8 x i32> %1, %Mask
  %3 = add <8 x i32> %2, %B
  ret <8 x i32> %3
}

https://gcc.godbolt.org/z/Byx2OR

spatel updated this revision to Diff 166562.Sep 21 2018, 2:40 PM

Patch updated:
The previous rev of the patch hinted at a constraint that I assumed, but wasn't actually checked: we should only do this transform when the input to the 'not' is the result of a vector concatenation. Ie, there must be some leading vector integer op that got split up itself. Without that, we're going to end up with more instructions than we started with.

andreadb accepted this revision.Sep 25 2018, 10:24 AM

Thanks Sanjay.

LGTM.

This revision is now accepted and ready to land.Sep 25 2018, 10:24 AM

For reference in case this comes up later: now that we're checking for the leading concat op, I think we could loosen the AVX1 constraint (do AVX512 flavors have this problem too?).

I tried to solve the problem generally in IR (if it occurs pre-legalization) with this instcombine patch:
https://reviews.llvm.org/rL342988

This revision was automatically updated to reflect the committed changes.