When masked scatter intrinsic does a uniform store to a destination
address from a source vector, and in this case, the mask is all one value.
This patch replaces the masked scatter with an extracted element of the
last lane of the source vector and stores it in the destination vector.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 378 | I think you can distinguish two cases here: 
 For 2, you can extract any lane instead of the last lane. | |
| 379 | What is operand 1? Can you change Splat to something more descriptive? e.g. SplatPtr ? | |
| 381 | nit: WideLoadTy ? | |
| 382 | this must be cast<VectorType> | |
| 386 | This is only valid if the the type is a ScalableVectorType | |
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 378 | For 2, I don't think you can extract any lane - it has to be the last active lane otherwise it's not functionally correct since that's the only thing that matches the scalar equivalent? The trouble with extracting the last active lane is that the resulting code is potentially going to be worse than a scatter I think? I don't think there is a trivial way in LLVM IR to ask for the last active lane. | |
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 378 | If all lanes contain the same value (the value is a splat()), and at least one lane is active, why couldn't you extract any lane? | |
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 378 | Ah sorry, I misunderstood your original comment. I see what you mean now - if the values are all the same yeah you're right. | |
Thank you @sdesmalen and @david-arm.
I believe both folds:
- The pointer is a splat(), the value is not a splat, the mask is an all-active mask.
- The pointer is a splat(), the value is a splat(), the mask is not all-inactive (but not necessarily all active). are now implemented.
I noticed that for option 2 (when the value is a splat) in the extract element the value is the same as the original value that was splatted.
So in case 2, the fold is to a scalar store of value in the destination pointer.  I don't need to extract the element from any line.
So I only have:
StoreInst *S = new StoreInst(SplatValue, SplatPtr, false, Alignment);
I also added the same tests for fixed  and scalable vector.
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 386 | I believe this fold should also work for fixed vector.   | |
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 379 | This case still needs a check to ensure that !ConstMask->isNullValue(), because at least one lane must be active for this to be valid. (see my comment on your test to explain why this is currently untested) | |
| 385 | nit: Vector splat address with all-active mask -> scalar store of last lane. | |
| 386–388 | These lines of comments can be dropped if you take the previous suggestion. | |
| 396 | nit: remove comment, the line of code below says as much. | |
| llvm/test/Transforms/InstCombine/masked_intrinsics.ll | ||
| 285 | This is still an all-active mask though (contrary to the comment on this test), so this is not testing the code you're aiming to test. Also, fixed-width vectors can be written like this: <4 x i1> <i32 1, i32 1, i32 1, i32 0> (for a mask that is not all-active, but also not all-inactive, which I think is what you're after) | |
| llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll | ||
| 16 ↗ | (On Diff #394832) | instead of passing an all true mask, you could simply insert 1 i1 1 element into a zeroinitializer | 
- Address comments in the test
- Remove redundant comment in the function
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 379 | As discussed internally this is done in line 371: // If the mask is all zeros, a scatter does nothing. if (ConstMask->isNullValue()) return eraseInstFromFunction(II); | |
| llvm/test/Transforms/InstCombine/masked_intrinsics.ll | ||
| 285 | I have changed all the fixed vector to be as you said. | |
| llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll | ||
| 16 ↗ | (On Diff #394832) | I have updated the all zero mask to be: | 
| llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll | ||
|---|---|---|
| 16 ↗ | (On Diff #394832) | The comment says: ;; Value splat and mask is not not all active/one Yet in the test it uses an all-poison mask: <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i32 1), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer this inserts i1 1 into element 1 (not zero) of a poison vector, so you get: <poison, 1, poison, poison, ... > and then does a splat of element 0, leading to <poison, poison, poison, poison, ...> If you want to pass a mask that is not all active *and* not all ones *and* not all poison, I think you want something like this: <vscale x 4 x i1> insertelement (<vscale x 4 x i1> zeroinitializer, i1 true, i32 1) <=> <0, 1, 0, 0, ...> | 
-Changed the mask to be an input for vscale test when it is 
not all active or inactive
| llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll | ||
|---|---|---|
| 16 ↗ | (On Diff #394832) | I have done something different, suggested by David.   | 
| llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
|---|---|---|
| 377–378 | nit: // scatter(splat(value), splat(ptr), non-zero-mask) -> store value, ptr | |
| 381 | nit: /*IsVolatile=*/ false | |
| 385 | nit: // scatter(vector, splat(ptr), splat(true)) -> store extract(vector, lastlane), ptr | |
| 396 | nit: /*IsVolatile=*/ false | |
| llvm/test/Transforms/InstCombine/masked_intrinsics.ll | ||
| 274 | nit: The scatters in the tests below take a splatted pointer | |
| 276 | can you stick with only 1 terminology? i.e. just 'all active'. | |
| 277 | Like mentioned on D115726, I'd avoid using 'valid' and 'invalid', but instead do: @scatter_v4i16_uniform_vals_uniform_ptrs_no_all_active_mask | |
| 304 | This case isn't affected by your patch, so I would just remove it. | |
| 309 | The mask is <0, 0, 1, 1> which is not all active. Maybe place these under 'negative tests', since they cannot be optimised. | |
| 350–351 | Just pass a <4 x i16*> as argument, it's a bit opaque that this is not a splat. (i.e. you're splatting element zero, which is not where %dst is inserted, but that's not an obvious pattern to spot) | |
| llvm/test/Transforms/InstCombine/vscale_masked_intrinsics.ll | ||
| 8 ↗ | (On Diff #395106) | I think the tests for fixed-width vectors are largely sufficient, only for the two positive tests/cases I think you should add a scalable-vector test. | 
- Move positive tests in vscale_masked_intrinsics.ll to masked_intrinsics.ll
- Rename tests as suggested in D115726
- Fix test scatter_nxv4i16_uniform_vals_uniform_ptrs_all_active_mask
| llvm/test/Transforms/InstCombine/masked_intrinsics.ll | ||
|---|---|---|
| 274 | nit: ; Test scatters that can be simplified to scalar stores. | |
| 292 | this test seems is redundant, because it's covered by the test above. | |
| 307 | nit: remove one of the newlines here | |
| 346 | nit: can you add a comment here that these are negative tests? | |
- Improve comments in the test file
| llvm/test/Transforms/InstCombine/masked_intrinsics.ll | ||
|---|---|---|
| 292 | Yes, it is, but this one is with vscale. It is the same on gather. | |
I think you can distinguish two cases here:
For 2, you can extract any lane instead of the last lane.