This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks.
ClosedPublic

Authored by RKSimon on May 7 2017, 11:41 AM.

Details

Summary

Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead.

This is a first step in several things we can do to improve PBLENDV support:

  • Better matching of X86ISD::ANDNP patterns.
  • Handle floating point cases.
  • Better vector and bitcast support in computeNumSignBits.
  • Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.).

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.May 7 2017, 11:41 AM
delena added inline comments.May 7 2017, 11:42 PM
lib/Target/X86/X86ISelLowering.cpp
31566 ↗(On Diff #98104)

Why VT should be v2i64 of v4i64? Isn't the same transformation profitable for other integer types?

RKSimon added inline comments.May 8 2017, 4:05 AM
lib/Target/X86/X86ISelLowering.cpp
31566 ↗(On Diff #98104)

It's due to us promoting to v2i64/v4i64 for X86ISD::ANDNP - it's not actually necessary but seems to have been used as an early out.

I'll remove it - it won't make any difference to the codegen now but will make it easier to perform more thorough ANDNP matching mentioned in the TODO.

RKSimon updated this revision to Diff 98152.May 8 2017, 4:24 AM

Addressed Elena's comments.

delena added inline comments.May 8 2017, 4:51 AM
test/CodeGen/X86/pr32907.ll
26 ↗(On Diff #98152)

I assume that ComputeNumSignBits() does not recognize the ASHR sequence here, otherwise it would be able to generate VPBLEND, right?

RKSimon added inline comments.May 8 2017, 5:32 AM
test/CodeGen/X86/pr32907.ll
26 ↗(On Diff #98152)

Exactly - this is an example of ones of the TODOs - the ANDNP is being generated at the same time that the ASHR_v2i64 is being lowered into the PSRAD+PSHUFD. If we could recognise the AND(XOR(-1,M), X) pattern earlier it would combine.

Note it wouldn't generate a VPBLEND, it would generate the SUB(XOR(X, M), M) pattern similar to the AVX512 codegen.

delena accepted this revision.May 8 2017, 6:29 AM
This revision is now accepted and ready to land.May 8 2017, 6:29 AM
spatel added inline comments.May 8 2017, 7:11 AM
test/CodeGen/X86/vselect-pcmp.ll
144 ↗(On Diff #98152)

You can remove those snarky 16-bit comments. 😃

154–156 ↗(On Diff #98152)

Looks like we have to go to extremes to get the AVX1 case although this may improve with a patch that I'm working on for PR32790:
https://bugs.llvm.org/show_bug.cgi?id=32790

This revision was automatically updated to reflect the committed changes.