This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Combine v16i8 SHL by constants to multiplies
ClosedPublic

Authored by RKSimon on Jul 5 2018, 4:35 AM.

Details

Summary

Pre-AVX512 (which can perform a quick extend/shift/truncate), extending to 2 v8i16 for the PMULLW and then truncating is more performant than relying on the generic PBLENDVB vXi8 shift path and uses a similar amount of mask constant pool data.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jul 5 2018, 4:35 AM

Should v32i8 do the same or is there something different about it?

Should v32i8 do the same or is there something different about it?

Some cases AVX2 gives smaller code with the PBLENDVB generic shift pattern - but I didn't spend that much time looking at it - I'll take another look.

RKSimon added inline comments.Jul 7 2018, 12:03 PM
test/CodeGen/X86/vector-shift-shl-256.ll
1027 ↗(On Diff #154200)

I think we're best off doing this just for v16i8, enabling v32i8 increases AVX2 to:

; AVX2-LABEL: constant_shift_v32i8:
; AVX2:       # %bb.0:
; AVX2-NEXT:    vextracti128 $1, %ymm0, %xmm1
; AVX2-NEXT:    vpmovsxbw %xmm1, %ymm1
; AVX2-NEXT:    vmovdqa {{.*#+}} ymm2 = [1,2,4,8,16,32,64,65408,65408,64,32,16,8,4,2,1]
; AVX2-NEXT:    vpmullw %ymm2, %ymm1, %ymm1
; AVX2-NEXT:    vextracti128 $1, %ymm1, %xmm3
; AVX2-NEXT:    vmovdqa {{.*#+}} xmm4 = <0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u>
; AVX2-NEXT:    vpshufb %xmm4, %xmm3, %xmm3
; AVX2-NEXT:    vpshufb %xmm4, %xmm1, %xmm1
; AVX2-NEXT:    vpunpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm3[0]
; AVX2-NEXT:    vpmovsxbw %xmm0, %ymm0
; AVX2-NEXT:    vpmullw %ymm2, %ymm0, %ymm0
; AVX2-NEXT:    vextracti128 $1, %ymm0, %xmm2
; AVX2-NEXT:    vpshufb %xmm4, %xmm2, %xmm2
; AVX2-NEXT:    vpshufb %xmm4, %xmm0, %xmm0
; AVX2-NEXT:    vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; AVX2-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-NEXT:    retq
This revision is now accepted and ready to land.Jul 7 2018, 1:15 PM
This revision was automatically updated to reflect the committed changes.