This is an archive of the discontinued LLVM Phabricator instance.

AVX512: shuff62x2 DAG lowering
ClosedPublic

Authored by igorb on Oct 12 2015, 6:46 AM.

Details

Summary

AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
example

  shufflevector <8 x double> %x, <8 x double> %x1, <8 x i32> <i32 0, i32 1, i32 4, i32 5, i32 0, i32 1,  i32 4, i32 5>
Before optimization the follow instructions was generated:
  vmovdqa64	LCPI0_0(%rip), %zmm1
  vpermpd	%zmm0, %zmm1, %zmm0

After optimization:

vshuff64x2 $136, %zmm0, %zmm0, %zmm0

Diff Detail

Repository
rL LLVM

Event Timeline

igorb updated this revision to Diff 37101.Oct 12 2015, 6:46 AM
igorb retitled this revision from to AVX512: shuff62x2 DAG lowering.
igorb updated this object.
igorb added reviewers: delena, RKSimon, chandlerc.
igorb set the repository for this revision to rL LLVM.
igorb added a subscriber: llvm-commits.
RKSimon added inline comments.Oct 12 2015, 10:53 AM
lib/Target/X86/Utils/X86ShuffleDecode.h
91

Shouldn't this be called DecodeVSHUF64x2Mask?

lib/Target/X86/X86ISelLowering.cpp
10727

Is this actually necessary? I'd expect it to have been dealt with by canonicalization in lowerVectorShuffle.

test/CodeGen/X86/avx512-intrinsics.ll
4190

Why isn't this CHECK-NEXT any more?

4191

Why isn't this CHECK-NEXT any more?

4232

Why isn't this CHECK-NEXT any more?

test/CodeGen/X86/vector-shuffle-512-v8.ll
1237

Nothing to do with this patch but its a shame that this is ANDing with a 512-bit constant mask instead of pre-ANDing with a 128-bit constant mask and then calling vpmovzxwq.

igorb updated this revision to Diff 37226.Oct 13 2015, 3:23 AM
igorb marked 4 inline comments as done.

Simon, Thank you very much for your review! I have updated the patch according to your comments.

igorb added inline comments.Oct 13 2015, 3:33 AM
lib/Target/X86/Utils/X86ShuffleDecode.h
91

This function decode mask for SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions.

lib/Target/X86/X86ISelLowering.cpp
10727

I can't do canonicalization for 256bit vector, lowerV2X128VectorShuffle can't be moved to lowerVectorShuffle, a lot of test failed.

test/CodeGen/X86/vector-shuffle-512-v8.ll
1237

This is AND with a 512-bit vector broadcasted from a 64-bit memory location.

vpandq	LCPI0_0(%rip){1to8}, %zmm1, %zmm1
delena added inline comments.Oct 13 2015, 5:52 AM
lib/Target/X86/InstPrinter/X86InstComments.cpp
729

You can use a macro to hide the bunch of "cases" and handle 256-bit form of these instructions.
You also can distinguish between memory and register forms in one boolean variable.

RKSimon added inline comments.Oct 13 2015, 6:38 AM
lib/Target/X86/Utils/X86ShuffleDecode.h
91

What I meant was the Decode functions in this file tend to be named after the x86 instruction itself and not the shuffle / ISD type. AFAICT all of these use the VSHUFF64x2 / VSHUFI64x2 instruction - that's why I suggested calling it DecodeVSHUF64x2Mask.

lib/Target/X86/X86ISelLowering.cpp
10727

OK.

igorb updated this revision to Diff 37332.Oct 14 2015, 4:04 AM
igorb marked 3 inline comments as done.

Thank you very much for your review! I have updated the patch according to your comments.

delena added inline comments.Oct 14 2015, 6:28 AM
lib/Target/X86/InstPrinter/X86InstComments.cpp
729–786

case_VSHUF(64x2)
case_VSHUF(32x4)

lib/Target/X86/Utils/X86ShuffleDecode.cpp
273

you can use VT.getScalarSizeInBits()

lib/Target/X86/X86ISelLowering.cpp
10721

VT.getScalarSizeInBits()

10743

mask

igorb updated this revision to Diff 37474.Oct 15 2015, 4:54 AM
igorb marked 4 inline comments as done.

Thank for your review, I have updated the patch according to your comments.
Please take a look.

delena edited edge metadata.Oct 15 2015, 5:03 AM

LGTM (from my side).

RKSimon accepted this revision.Oct 15 2015, 5:50 AM
RKSimon edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 15 2015, 5:50 AM
This revision was automatically updated to reflect the committed changes.