This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX512] Adding missing shuffle lowering to blend mask instructions (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
ClosedPublic

Authored by m_zuckerman on Jan 9 2017, 2:18 PM.

Details

Summary

Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
In this patch, I added new pattern match for this case.

This pattern only catches zmm, since we are using a more efficient blend instruction (without a mask) in the other cases.

Diff Detail

Event Timeline

m_zuckerman updated this revision to Diff 83689.Jan 9 2017, 2:18 PM
m_zuckerman retitled this revision from to [X86][AVX512] Adding missing shuffle lowering to blend mask instructions (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . .
m_zuckerman updated this object.
RKSimon added inline comments.Jan 9 2017, 2:53 PM
lib/Target/X86/X86ISelLowering.cpp
12427

Unnecessary as AVX512BW is a requirement for v32i16 - see the assert at the top of function.

12472

Unnecessary as AVX512BW is a requirement for v64i8 - see the assert at the top of function.

RKSimon added inline comments.Jan 9 2017, 3:04 PM
test/CodeGen/X86/vector-shuffle-512-v32.ll
116

Any idea why this isn't using a blend with zero: _mm512_maskz_mov_epi16 ?

igorb edited edge metadata.Jan 10 2017, 12:25 AM

Hi,
for v16i8 and v32i8 we should use vblendmb if possible.

define <16 x i8> @test_mask_blend_epi8(<16 x i8> %A, <16 x i8> %W){
entry:

%0 = shufflevector <16 x i8> %A, <16 x i8> %W, <16 x i32>  <i32 16, i32 1, i32 18, i32 3, i32 20, i32 5, i32 22, i32 7, i32 24, i32 9, i32 26, i32 11, i32 28, i32 13, i32 30, i32 15>
ret <16 x i8> %0

}

vmovdqu .LCPI0_0(%rip), %xmm2 # xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
vpblendvb %xmm2, %xmm1, %xmm0, %xmm0
retq

lib/Target/X86/X86ISelLowering.cpp
7858

you can use getVectorMaskingNode to simplify the code, all logic already implemented.

7860

not in use.

test/CodeGen/X86/vector-shuffle-to-blend-avx512.ll.ll
4

Could you please add tests for the all cases i8/i16/f64

m_zuckerman edited edge metadata.
m_zuckerman marked 5 inline comments as done.
m_zuckerman marked an inline comment as done.Jan 11 2017, 9:45 AM
m_zuckerman added inline comments.
test/CodeGen/X86/vector-shuffle-512-v32.ll
116

Patterns were missing, This was changed in commit 291368

RKSimon accepted this revision.Jan 11 2017, 10:21 AM
RKSimon edited edge metadata.

LGTM - possibly move the new tests into the existing vector-shuffle-avx512.ll file for tidiness but that's very minor.

test/CodeGen/X86/vector-shuffle-to-blend-avx512.ll
4 ↗(On Diff #83991)

Move these into vector-shuffle-avx512.ll ?

This revision is now accepted and ready to land.Jan 11 2017, 10:21 AM
m_zuckerman edited edge metadata.
m_zuckerman marked an inline comment as done.
igorb accepted this revision.Jan 12 2017, 6:09 AM
igorb edited edge metadata.
This revision was automatically updated to reflect the committed changes.
dlj added a subscriber: dlj.Jan 13 2017, 7:56 PM
dlj added inline comments.
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
8334 ↗(On Diff #84256)

This is problematic with v64i8, since the mask can now be 64 bits wide.

With UndefinedBehaviorSanitizer, I see this failure from test/CodeGen/X86/vector-shuffle-avx512.ll:

  • TEST 'LLVM :: CodeGen/X86/vector-shuffle-avx512.ll' FAILED ****

Script:

/build/build-ubsan/./bin/llc < /src/test/CodeGen/X86/vector-shuffle-avx512.ll -mtriple=x86_64-pc-linux-gnu -mcpu=skx | /build/build-ubsan/./bin/FileCheck /src/test/CodeGen/X86/vector-shuffle-avx512.ll --check-prefix=SKX

/build/build-ubsan/./bin/llc < /src/test/CodeGen/X86/vector-shuffle-avx512.ll -mtriple=x86_64-pc-linux-gnu -mcpu=knl | /build/build-ubsan/./bin/FileCheck /src/test/CodeGen/X86/vector-shuffle-avx512.ll --check-prefix=KNL

Exit Code: 1

Command Output (stderr):

/src/lib/Target/X86/X86ISelLowering.cpp:8342:23: runtime error: shift exponent 33 is too large for 32-bit type 'unsigned int'
/src/test/CodeGen/X86/vector-shuffle-avx512.ll:248:14: error: expected string not found in input
; SKX-LABEL: expand12:

^

<stdin>:168:11: note: scanning from here
expand11: # @expand11

^

<stdin>:168:14: note: possible intended match here
expand11: # @expand11

^
craig.topper edited edge metadata.Jan 13 2017, 8:11 PM

The test case for 64 x i8 is also showing only a 32-bit immediate being used.