This is an archive of the discontinued LLVM Phabricator instance.

[x86, AVX] optimize masked loads with constant masks
ClosedPublic

Authored by spatel on Mar 4 2016, 3:56 PM.

Details

Summary

Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.

Note the FIXME for AVX512: I saw that masked loads were followed by blends after this change but there is currently no blend at all. I assume that's because the blend is part of the AVX512 masked load itself? I don't know enough about AVX512 to be sure how to solve it, so I've just enabled this for AVX1/AVX2 for now.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 49859.Mar 4 2016, 3:56 PM
spatel retitled this revision from to [x86, AVX] optimize masked loads with constant masks.
spatel updated this object.
spatel added reviewers: RKSimon, delena, ab.
spatel added a subscriber: llvm-commits.
RKSimon added inline comments.Mar 8 2016, 3:29 AM
lib/Target/X86/X86ISelLowering.cpp
27256 ↗(On Diff #49859)

Why is it important that we don't blend with an undefined pass-through?

mcrosier removed a subscriber: mcrosier.Mar 8 2016, 6:52 AM
spatel added inline comments.Mar 8 2016, 7:23 AM
lib/Target/X86/X86ISelLowering.cpp
27256 ↗(On Diff #49859)

Ah, I'll make that explicit in the comment: this transform is *creating* a masked load with an undef pass-through. If we don't guard against that case, we'll infinite loop.

spatel updated this revision to Diff 50043.Mar 8 2016, 7:59 AM

Patch updated (just code comments, nothing functional):

  1. Changed documentation comment to make the benefit of this transform clearer.
  2. Added comment to explain why we filter out the undef pass-through case.
  3. Changed AVX512 comment from FIXME to TODO. It's not broken; it just could be better based on what I see in the regression tests. And that should be a separate patch.
RKSimon accepted this revision.Mar 9 2016, 11:01 AM
RKSimon edited edge metadata.

LGTM

Would it be possible to add some additional tests with the undef pass through? The ones I can see are already special cases (i.e. an all true mask), for non-legal types or for types that don't naturally have a masked load op.

This revision is now accepted and ready to land.Mar 9 2016, 11:01 AM

LGTM

Would it be possible to add some additional tests with the undef pass through? The ones I can see are already special cases (i.e. an all true mask), for non-legal types or for types that don't naturally have a masked load op.

Sure - I'll add something before committing. Thanks!

This revision was automatically updated to reflect the committed changes.