This is an archive of the discontinued LLVM Phabricator instance.

[ScalarizeMaskedMemIntrin] Scalarize constant mask load as shuffle(build_vector,pass_through)
Changes PlannedPublic

Authored by RKSimon on Sep 2 2020, 3:47 AM.

Details

Summary

As noticed on D66004, scalarization of a load with a constant mask as a chain of irregular loads+inserts makes it difficult to optimize before lowering, resulting in difficulties in merging loads etc.

This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.

There's a couple of regressions that I'm still looking at where we can better combine an element insertion with the final blend, and also a few places where shuffle combining forgets which elements are already zero.

Followup to D85416

Diff Detail

Event Timeline

RKSimon created this revision.Sep 2 2020, 3:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2020, 3:47 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.Sep 2 2020, 3:47 AM
craig.topper added inline comments.Sep 2 2020, 5:15 PM
llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
175

Is the explicitly inserting undefs required for optimal build vector creation? I figured just inserting all the elements into an initial undef vector would have been enough.

RKSimon added inline comments.Sep 3 2020, 1:35 AM
llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
175

unfortunately not - the build vector builder code isn't very clever tbh

RKSimon planned changes to this revision.Sep 3 2020, 1:35 AM