This is an archive of the discontinued LLVM Phabricator instance.

[ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches.
ClosedPublic

Authored by craig.topper on Jul 25 2019, 11:52 PM.

Details

Summary

X86 at least is able to use movmsk or kmov to move the mask to the scalar
domain. Then we can just use test instructions to test individual bits.

This is more efficient than extracting each mask element
individually.

I special cased v1i1 to use the previous behavior. This avoids
poor type legalization of bitcast of v1i1 to i1.

I've skipped expandload/compressstore as I think we need to
handle constant masks for those better first.

Many tests end up with duplicate test instructions due to tail
duplication in the branch folding pass. But the same thing
happens when constructing similar code in C. So its not unique
to the scalarization.

Not sure if this lowering code will also be good for other targets,
but we're only testing X86 today.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jul 25 2019, 11:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 11:52 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

I think this covers all of https://reviews.llvm.org/D65319 ?

Wrong link? That’s just a link to this review

I think this covers all of https://reviews.llvm.org/D65319 ?

Wrong link? That’s just a link to this review

I don't like Mondays - https://bugs.llvm.org/show_bug.cgi?id=39665 - which goes beyond masked memory ops so probably isn't covered by this patch along.

RKSimon accepted this revision.Jul 31 2019, 3:05 PM

LGTM - I think this is the way forward for the scalarization of a lot of masked ops and I've already raised some similar bugs for arm/aarch64/ppc (PR41634, PR41635, PR41636) to better handle bitcasts of <X x i1> to iX for reductions - this patch should be able to make use of those upcoming improvements.

Please can you raise bugs for the expandload/compressstore improvements you mentioned and the duplication of test+branch instructions.

This revision is now accepted and ready to land.Jul 31 2019, 3:05 PM
This revision was automatically updated to reflect the committed changes.