This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] simplify masked load intrinsics with constant masks
ClosedPublic

Authored by spatel on Jan 28 2016, 12:45 PM.

Details

Summary

A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.

I think something similar may be happening in CodeGenPrepare with D13855, but it doesn't trigger for a target that actually supports these ops (an x86 AVX target for example). We may be able to remove some of that logic. Doing these transforms in InstCombine is a better solution because it will trigger sooner and allow more optimizations from other passes.

Eventually, I think we should be able to replace the x86 intrinsics with the llvm IR intrinsics.

Diff Detail

Event Timeline

spatel updated this revision to Diff 46305.Jan 28 2016, 12:45 PM
spatel retitled this revision from to [InstCombine] simplify masked load intrinsics with constant masks.
spatel updated this object.
spatel added reviewers: delena, igorb, RKSimon.
spatel added a subscriber: llvm-commits.
RKSimon edited edge metadata.Jan 29 2016, 4:14 AM

Some minor thoughts - I'd like to see what others have to say as well though.

lib/Transforms/InstCombine/InstCombineCalls.cpp
690

Minor: You can relax the type requirements by using Constant::isNullValue() and Constant::isAllOnesValue()

But this is true for a lot of the code in InstCombineCalls.cpp .....

708

I'm really not sure whether we should try to do this here or leaving it until lowering. Its making a big assumption that the target is good at scalar loads + insertions.

spatel added inline comments.Jan 29 2016, 1:48 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
690

Yes, that makes the code simpler.

I think we still want all of the asserts for sanity checking the intrinsic signature, but let me know if I'm missing anything.

708

I wasn't sure about that when I wrote it, so I threw it in to see what the consensus might be.

But I agree with you now. Although I would only expect this intrinsic to be formed from C a intrinsic or the vectorizers, and therefore, implying that the masked op is supported by the target, we shouldn't assume that also means that other vector ops are supported equally. It could also be wrong if someone's trying to minimize size (-Oz). Let's handle other constant values in the DAG.

spatel updated this revision to Diff 46427.Jan 29 2016, 1:51 PM
spatel edited edge metadata.

Patch updated based on feedback from Simon:

  1. Cast the mask to plain 'Constant' for simpler predicates.
  2. Remove the 'TODO' and associated test cases for constant masks with a single set bit.
delena added inline comments.Jan 30 2016, 11:38 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
678

Verifier checks intrinsic signature. You don't need these checks at all.

843

Not sure that you can handle masked gather/scatter this way.

spatel updated this revision to Diff 46493.Jan 31 2016, 8:45 AM

Patch updated: remove assertions because these are or should be handled by the IR verifier.

spatel added inline comments.Jan 31 2016, 8:51 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
677–712

Thanks. I think the verifier has a hole - it doesn't confirm that the alignment operand is constant. But I've removed the assertions here because it does confirm the other things.

843

Please correct me if I've misunderstood, but I think scatter and gather each have one degenerate folding opportunity that we can handle here:

  1. A scatter with zero mask is a nop.
  2. A gather with zero mask will return the passthru arg.
delena accepted this revision.Feb 1 2016, 1:56 AM
delena edited edge metadata.

LGTM

lib/Transforms/InstCombine/InstCombineCalls.cpp
843

yes, for zero mask it can be optimized.

This revision is now accepted and ready to land.Feb 1 2016, 1:56 AM
This revision was automatically updated to reflect the committed changes.