This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Add BITCAST handling to ComputeNumSignBits for splatted sign bits.
ClosedPublic

Authored by RKSimon on Sep 14 2017, 6:18 AM.

Details

Summary

For cases where we are BITCASTing to vectors of smaller elements, then if the entire source was a splatted sign (src's NumSignBits == SrcBitWidth) we can say that the dst's NumSignBit == DstBitWidth, as we're just splitting those sign bits across multiple elements.

We could generalize this but at the moment the only use case I have is to peek through bitcasts to vector comparison results.

psubus.ll - @spatel I think you've encountered the PAND -> PBLENDVB+ZERO issue before - where is the best place to fix it? Do we just need to improve VSELECT/SHRUNKBLEND handling?

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Sep 14 2017, 6:18 AM
spatel edited edge metadata.Sep 14 2017, 8:10 AM
spatel added subscribers: zvi, yulia_koval, DavidKreitzer.

I haven't looked at either patch yet, but let me cc @yulia_koval @zvi @DavidKreitzer from D37534 ; they may already know what the interaction of these 2 patches will eventually produce.

delena added inline comments.Sep 16 2017, 11:04 PM
test/CodeGen/X86/bitcast-and-setcc-256.ll
380 ↗(On Diff #115206)

This code fits in 4 instructions:
cmp
cmp
and
pmovmskpd

What happens without "and", just cmp + bitcast ?

RKSimon added inline comments.Sep 17 2017, 4:21 AM
test/CodeGen/X86/bitcast-and-setcc-256.ll
380 ↗(On Diff #115206)

The codegen below is from the bitcast-setcc-*.ll sibling test files, there's still plenty of improvements possible with bool vectors, it's just taking a long time. D35320 should help as well of course.

; AVX1-LABEL: v8f32:
; AVX1:       # BB#0:
; AVX1-NEXT:    vcmpltps %ymm0, %ymm1, %ymm0
; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm1
; AVX1-NEXT:    vpacksswb %xmm1, %xmm0, %xmm0
; AVX1-NEXT:    vpshufb {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
; AVX1-NEXT:    vpmovmskb %xmm0, %eax
; AVX1-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
; AVX1-NEXT:    vzeroupper
; AVX1-NEXT:    retq
;
; AVX2-LABEL: v8f32:
; AVX2:       # BB#0:
; AVX2-NEXT:    vcmpltps %ymm0, %ymm1, %ymm0
; AVX2-NEXT:    vmovmskps %ymm0, %eax
; AVX2-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
; AVX2-NEXT:    vzeroupper
; AVX2-NEXT:    retq
;
; AVX512-LABEL: v8f32:
; AVX512:       # BB#0:
; AVX512-NEXT:    vcmpltps %ymm0, %ymm1, %k0
; AVX512-NEXT:    kmovd %k0, %eax
; AVX512-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
; AVX512-NEXT:    vzeroupper
; AVX512-NEXT:    retq
  %x = fcmp ogt <8 x float> %a, %b
  %res = bitcast <8 x i1> %x to i8
  ret i8 %res
}
delena added inline comments.Sep 17 2017, 5:52 AM
test/CodeGen/X86/bitcast-and-setcc-256.ll
380 ↗(On Diff #115206)

Agree.

test/CodeGen/X86/psubus.ll
521 ↗(On Diff #115206)

Theoretically, your patch does the right thing. Did you check why an additional instruction here?

RKSimon added inline comments.Sep 17 2017, 9:25 AM
test/CodeGen/X86/psubus.ll
521 ↗(On Diff #115206)

I haven't confirmed yet, but it might be the same issue as PR28925

RKSimon updated this revision to Diff 115645.Sep 18 2017, 7:32 AM

Updated after D37975/rL313532 has landed, fixing the psubus issue

delena accepted this revision.Sep 18 2017, 9:26 AM
This revision is now accepted and ready to land.Sep 18 2017, 9:26 AM
This revision was automatically updated to reflect the committed changes.