This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improve `matchBinaryShuffle()`'s `BLEND` lowering with per-element all-zero/all-ones knowledge
ClosedPublic

Authored by lebedev.ri on Sep 13 2021, 3:09 PM.

Details

Summary

We can use OR instead of BLEND if either the element we are not picking is zero (or masked away);
or the element we are picking overwhelms (e.g. it's all-ones) whatever the element we are not picking:
https://alive2.llvm.org/ce/z/RKejao

Diff Detail

Event Timeline

lebedev.ri created this revision.Sep 13 2021, 3:09 PM

Why couldn't this be put in more generic ISD::OR combines somewhere? Possibly TargetLowering::SimplifyDemandedVectorElts?

Why couldn't this be put in more generic ISD::OR combines somewhere? Possibly TargetLowering::SimplifyDemandedVectorElts?

What do you mean by 'it'? blend->or lowering?

Why couldn't this be put in more generic ISD::OR combines somewhere? Possibly TargetLowering::SimplifyDemandedVectorElts?

What do you mean by 'it'? blend->or lowering?

The insert tests look like we're just failing to recognize that we're OR'ing all-ones elements, which should allow the SimplifyDemanded call to realise that we're going to completely overwrite those elements, so we don't need to mask them. Its most of the logic you've written but it feels like it should be somewhere more generic than inside x86 shuffle combining.

Why couldn't this be put in more generic ISD::OR combines somewhere? Possibly TargetLowering::SimplifyDemandedVectorElts?

What do you mean by 'it'? blend->or lowering?

The insert tests look like we're just failing to recognize that we're OR'ing all-ones elements, which should allow the SimplifyDemanded call to realise that we're going to completely overwrite those elements, so we don't need to mask them. Its most of the logic you've written but it feels like it should be somewhere more generic than inside x86 shuffle combining.

https://reviews.llvm.org/D109927 - doesn't seem to work.

RKSimon accepted this revision.Sep 17 2021, 5:48 AM

Why couldn't this be put in more generic ISD::OR combines somewhere? Possibly TargetLowering::SimplifyDemandedVectorElts?

What do you mean by 'it'? blend->or lowering?

The insert tests look like we're just failing to recognize that we're OR'ing all-ones elements, which should allow the SimplifyDemanded call to realise that we're going to completely overwrite those elements, so we don't need to mask them. Its most of the logic you've written but it feels like it should be somewhere more generic than inside x86 shuffle combining.

https://reviews.llvm.org/D109927 - doesn't seem to work.

That's a shame - D109927 seems to be useful, but not useful enough for our regression :(

I was kind of hoping we could avoid this - adding so much more shuffle code for so few regressions - but it seems to be the most straightforward fix atm, and will unblock the patch dependency tree you currently have - please can you add a TODO comment suggesting we review this in the future?

The only other possibility might be to perform OR/AND more aggressively for INSERT_VECTOR_ELT lowering for -1/0 elements?

llvm/lib/Target/X86/X86ISelLowering.cpp
36240

APInt Mask = APInt::getOneBitSet(NumV1Elts, V1EltIdx) ?

This revision is now accepted and ready to land.Sep 17 2021, 5:48 AM
lebedev.ri edited the summary of this revision. (Show Details)

@RKSimon thank you for taking a look!
Rebased, refactored.

I do think it would be nice to handle this in general demandedelts,
but i'm currently just not seeing how we can do that.

RKSimon accepted this revision.Sep 17 2021, 9:07 AM
RKSimon added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
36235

break the if-else chain?

This revision was landed with ongoing or failed builds.Sep 17 2021, 9:15 AM
This revision was automatically updated to reflect the committed changes.