This is an archive of the discontinued LLVM Phabricator instance.

[ScalarizeMaskedMemIntrin] Scalarize constant mask expandload as shuffle(build_vector,pass_through)
ClosedPublic

Authored by RKSimon on Aug 6 2020, 3:04 AM.

Details

Summary

As noticed on D66004, scalarization of an expandload with a constant mask as a chain of irregular loads+inserts makes it difficult to optimize before lowering, resulting in difficulties in merging loads etc.

This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.

Diff Detail

Event Timeline

RKSimon created this revision.Aug 6 2020, 3:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2020, 3:04 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.Aug 6 2020, 3:04 AM
spatel added inline comments.Aug 6 2020, 7:18 AM
llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
166

Would it make sense to do the same thing here? Ie, make a shared helper function.
It might be that later matching always gets this case, but it would still be an efficiency win to produce the blend shuffle here too?

630

-1 -> "UndefMaskElem"

RKSimon added inline comments.Aug 6 2020, 8:01 AM
llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
166

Yes, I'm happy to do that as a followup if everyone agrees - I'd prefer to just get this one in first as its blocking D66004 and I'm keen to get that patch done (finally!),

I've been wondering how compress store of constant masks should be expressed as well.

spatel accepted this revision.Aug 7 2020, 8:14 AM

I'm ok with this as-is (mod "-1" code nit) followed by cleanup, so LGTM.
Wait a bit to see if anyone else wants to comment though.

This revision is now accepted and ready to land.Aug 7 2020, 8:14 AM
This revision was landed with ongoing or failed builds.Aug 10 2020, 3:06 AM
This revision was automatically updated to reflect the committed changes.