This is an archive of the discontinued LLVM Phabricator instance.

[x86] convert masked store of one element to scalar store
ClosedPublic

Authored by spatel on Feb 2 2016, 3:28 PM.

Details

Summary

Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set' transform in InstCombine, but I think this should be a win for any AVX machine.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 46713.Feb 2 2016, 3:28 PM
spatel retitled this revision from to [x86] convert masked store of one element to scalar store.
spatel updated this object.
spatel added reviewers: delena, igorb, RKSimon.
spatel added a subscriber: llvm-commits.
RKSimon edited edge metadata.Feb 3 2016, 2:02 AM

Please can you add tests for i64 or f64 vectors as well? Also cases where the element is not from the lower 128-bits of a ymm/zmm.

Some minor comments. I wonder if this could be moved to DAGCombiner - possibly with a test for scalar store legality and TLI.isExtractVectorElementCheap()?

lib/Target/X86/X86ISelLowering.cpp
26727 ↗(On Diff #46713)

This can probably be brought inside reduceMaskedStoreToScalarStore as a helper predicate

26730 ↗(On Diff #46713)

Don't we have cases where the build vector input operands are implicitly truncated to i1? For instance vector constant folding is likely to have created legal types that are larger.

Some minor comments. I wonder if this could be moved to DAGCombiner - possibly with a test for scalar store legality and TLI.isExtractVectorElementCheap()?

I know that on AVX, VMASKMOV does not imply the non-temporal hint. However, what about targets that implement masked stores as non-temporal stores of selected bytes? For those targets, not preserving the non-temporal hint when converting the mask store into a scalar store would affect the hardware write combining logic.
That said, my knowledge is only limited to x86, so I cannot say that those targets exist.. What I am trying to say is that we probably have to be careful if we decide to move this transformation into the DAGCombiner as different targets may expand those nodes in a slightly different way (not sure if this makes sense..).
Basically, I recommend leaving it for now as a target specific combine (just my opinion).

spatel added a comment.Feb 3 2016, 9:35 AM

Some minor comments. I wonder if this could be moved to DAGCombiner - possibly with a test for scalar store legality and TLI.isExtractVectorElementCheap()?

I know that on AVX, VMASKMOV does not imply the non-temporal hint. However, what about targets that implement masked stores as non-temporal stores of selected bytes? For those targets, not preserving the non-temporal hint when converting the mask store into a scalar store would affect the hardware write combining logic.

I have preserved the non-temporal hint of the original masked store in the scalar store with this patch, but...

That said, my knowledge is only limited to x86, so I cannot say that those targets exist.. What I am trying to say is that we probably have to be careful if we decide to move this transformation into the DAGCombiner as different targets may expand those nodes in a slightly different way (not sure if this makes sense..).
Basically, I recommend leaving it for now as a target specific combine (just my opinion).

That was my thinking too. Let's make sure this is sane across x86 first. There's a lot of potential variation in how a masked op is implemented. I'll add a 'TODO' comment to remind us that this could be lifted to DAGCombiner once we have more confidence.

spatel added inline comments.Feb 3 2016, 12:17 PM
lib/Target/X86/X86ISelLowering.cpp
26727 ↗(On Diff #46713)

Yes - good idea. My lambda skills are non-existent.
Please double-check in the updated patch.

26730 ↗(On Diff #46713)

I'm not sure about the interaction between legalization and the IR intrinsic. I was expecting that we always get here with the IR-defined form. How about I add another 'TODO' for now? I think supporting different mask types may require careful logic because ISD::MSTORE doesn't actually define the mask format AFAICT.

spatel updated this revision to Diff 46812.Feb 3 2016, 12:22 PM
spatel edited edge metadata.

Patch updated:

  1. Converted getOneTrueElt() into a lambda.
  2. Added TODO comment about hoisting this to DAGCombiner.
  3. Added TODO comment about supporting other mask types (but if we do, what is the format for those masks?)
  4. Added test cases to demonstrate extraction from different types, high elements, and different sizes of vectors.

Patch updated:

  1. Converted getOneTrueElt() into a lambda.

On 2nd thought, we're going to need that exact function for the sibling patch for masked loads. I suppose it's good form to keep it this way for now, but I think it's going to get pulled back out soon enough.

RKSimon accepted this revision.Feb 8 2016, 9:40 AM
RKSimon edited edge metadata.

LGTM - feel free to pull getOneTrueElt back out again before committing if you wish.

This revision is now accepted and ready to land.Feb 8 2016, 9:40 AM
This revision was automatically updated to reflect the committed changes.