This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold for masked scatters to a uniform address
ClosedPublic

Authored by CarolineConcatto on Dec 14 2021, 5:49 AM.

Details

Summary

When masked scatter intrinsic does a uniform store to a destination
address from a source vector, and in this case, the mask is all one value.
This patch replaces the masked scatter with an extracted element of the
last lane of the source vector and stores it in the destination vector.

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Dec 14 2021, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 5:49 AM
sdesmalen added inline comments.Dec 14 2021, 6:29 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
378

I think you can distinguish two cases here:

  1. The pointer is a splat(), the value is not a splat, the mask is an all-active mask.
  2. The pointer is a splat(), the value is a splat(), the mask is not all-inactive (but not necessarily all active).

For 2, you can extract any lane instead of the last lane.

379

What is operand 1? Can you change Splat to something more descriptive? e.g. SplatPtr ?

381

nit: WideLoadTy ?

382

this must be cast<VectorType>

386

This is only valid if the the type is a ScalableVectorType

david-arm added inline comments.Dec 14 2021, 8:21 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
378

For 2, I don't think you can extract any lane - it has to be the last active lane otherwise it's not functionally correct since that's the only thing that matches the scalar equivalent? The trouble with extracting the last active lane is that the resulting code is potentially going to be worse than a scatter I think? I don't think there is a trivial way in LLVM IR to ask for the last active lane.

sdesmalen added inline comments.Dec 14 2021, 8:31 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
378

If all lanes contain the same value (the value is a splat()), and at least one lane is active, why couldn't you extract any lane?

david-arm added inline comments.Dec 14 2021, 8:33 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
378

Ah sorry, I misunderstood your original comment. I see what you mean now - if the values are all the same yeah you're right.

CarolineConcatto marked 6 inline comments as done.
  • Add fold for the value when it is a splat.
  • Add tests for fixed vector

Thank you @sdesmalen and @david-arm.
I believe both folds:

  1. The pointer is a splat(), the value is not a splat, the mask is an all-active mask.
  2. The pointer is a splat(), the value is a splat(), the mask is not all-inactive (but not necessarily all active). are now implemented.

I noticed that for option 2 (when the value is a splat) in the extract element the value is the same as the original value that was splatted.
So in case 2, the fold is to a scalar store of value in the destination pointer. I don't need to extract the element from any line.
So I only have:
StoreInst *S = new StoreInst(SplatValue, SplatPtr, false, Alignment);
I also added the same tests for fixed and scalable vector.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
386

I believe this fold should also work for fixed vector.
So I've update it to work for fixed vector too.
And I added tests for it as well.

sdesmalen added inline comments.Dec 16 2021, 5:44 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
379

This case still needs a check to ensure that !ConstMask->isNullValue(), because at least one lane must be active for this to be valid. (see my comment on your test to explain why this is currently untested)

385

nit: Vector splat address with all-active mask -> scalar store of last lane.

386–388

These lines of comments can be dropped if you take the previous suggestion.

396

nit: remove comment, the line of code below says as much.

llvm/test/Transforms/InstCombine/masked_intrinsics.ll
285

This is still an all-active mask though (contrary to the comment on this test), so this is not testing the code you're aiming to test.

Also, fixed-width vectors can be written like this: <4 x i1> <i32 1, i32 1, i32 1, i32 0> (for a mask that is not all-active, but also not all-inactive, which I think is what you're after)

llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll
16 ↗(On Diff #394832)

instead of passing an all true mask, you could simply insert 1 i1 1 element into a zeroinitializer

CarolineConcatto marked 4 inline comments as done.
  • Address comments in the test
  • Remove redundant comment in the function
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
379

As discussed internally this is done in line 371:

// If the mask is all zeros, a scatter does nothing.
if (ConstMask->isNullValue())
  return eraseInstFromFunction(II);
llvm/test/Transforms/InstCombine/masked_intrinsics.ll
285

I have changed all the fixed vector to be as you said.
<i32 X, i32 X, i32 X, i32 X>, where X can be 0 or 1 depending on what is being tested.

llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll
16 ↗(On Diff #394832)

I have updated the all zero mask to be:
<vscale x 4 x i1> zeroinitializer,
I don't know if I follow what you want here.

sdesmalen added inline comments.Dec 16 2021, 1:28 PM
llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll
16 ↗(On Diff #394832)

The comment says:

;; Value splat and  mask is not not all active/one

Yet in the test it uses an all-poison mask:

<vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 1), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer

this inserts i1 1 into element 1 (not zero) of a poison vector, so you get:

<poison, 1, poison, poison, ... >

and then does a splat of element 0, leading to

<poison, poison, poison, poison, ...>

If you want to pass a mask that is not all active *and* not all ones *and* not all poison, I think you want something like this:

<vscale x 4 x i1> insertelement (<vscale x 4 x i1> zeroinitializer, i1 true, i32 1)
<=>
<0, 1, 0, 0, ...>

-Changed the mask to be an input for vscale test when it is
not all active or inactive

CarolineConcatto marked an inline comment as done and an inline comment as not done.Dec 17 2021, 5:57 AM
CarolineConcatto added inline comments.
llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll
16 ↗(On Diff #394832)

I have done something different, suggested by David.
Mask is an input, so we don't know if it is all active or inactive.

sdesmalen added inline comments.Jan 4 2022, 5:48 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
377–378

nit: // scatter(splat(value), splat(ptr), non-zero-mask) -> store value, ptr

381

nit: /*IsVolatile=*/ false

385

nit: // scatter(vector, splat(ptr), splat(true)) -> store extract(vector, lastlane), ptr

396

nit: /*IsVolatile=*/ false

llvm/test/Transforms/InstCombine/masked_intrinsics.ll
274

nit: The scatters in the tests below take a splatted pointer

276

can you stick with only 1 terminology? i.e. just 'all active'.

277

Like mentioned on D115726, I'd avoid using 'valid' and 'invalid', but instead do:

@scatter_v4i16_uniform_vals_uniform_ptrs_no_all_active_mask
304

This case isn't affected by your patch, so I would just remove it.

309

The mask is <0, 0, 1, 1> which is not all active.

Maybe place these under 'negative tests', since they cannot be optimised.

350–351

Just pass a <4 x i16*> as argument, it's a bit opaque that this is not a splat. (i.e. you're splatting element zero, which is not where %dst is inserted, but that's not an obvious pattern to spot)

llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll
8 ↗(On Diff #395106)

I think the tests for fixed-width vectors are largely sufficient, only for the two positive tests/cases I think you should add a scalable-vector test.

CarolineConcatto marked 9 inline comments as done.
  • Move positive tests in vscale_masked_intrinsics.ll to masked_intrinsics.ll
  • Rename tests as suggested in D115726
  • Fix test scatter_nxv4i16_uniform_vals_uniform_ptrs_all_active_mask
CarolineConcatto marked an inline comment as done.
  • Remove sve attribute from masked_intrinsics.ll
sdesmalen added inline comments.Jan 6 2022, 2:50 AM
llvm/test/Transforms/InstCombine/masked_intrinsics.ll
274

nit: ; Test scatters that can be simplified to scalar stores.

292

this test seems is redundant, because it's covered by the test above.

307

nit: remove one of the newlines here

346

nit: can you add a comment here that these are negative tests?
(and change the name of the functions to @negative_scatter_...

CarolineConcatto marked 3 inline comments as done.
  • Improve comments in the test file
llvm/test/Transforms/InstCombine/masked_intrinsics.ll
292

Yes, it is, but this one is with vscale. It is the same on gather.
This is a positive test, and not negative!

This revision is now accepted and ready to land.Jan 7 2022, 6:36 AM
Matt added a subscriber: Matt.Jan 13 2022, 10:49 AM
This revision was landed with ongoing or failed builds.Jan 14 2022, 1:56 AM
This revision was automatically updated to reflect the committed changes.