This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][LoopVectorize] Add truncated store values to list of types for widening
Needs ReviewPublic

Authored by Rin on Aug 24 2023, 4:16 AM.

Details

Summary

When dealing with truncated stores, the Min/Max width would not consider source values of truncated stores. Instead it would only look at the type value after performing the truncate instruction. This patch adds the type value of truncated stores to the list of types for widening.

Diff Detail

Event Timeline

Rin created this revision.Aug 24 2023, 4:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 4:16 AM
Rin requested review of this revision.Aug 24 2023, 4:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 4:16 AM

Thanks @Rin - this looks like a useful improvement! In truncate-type-widening.ll we're clearly now choosing a more natural VF that we can generate code for more efficiently, in particular the step vector intrinsic call in the preheader.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5732

I wonder if we should also be asking if the store operand is loop invariant too? This would avoid tests changing such as lvm/test/Transforms/LoopVectorize/vplan-stress-test-no-explict-vf.ll. If the input is loop invariant then it's not really participating in the vector loop.

5734

I think you can write this more simply as

if (auto *Trunc = dyn_cast<TruncInst>(ST->getOperand(0)))
  T = Trunc->getSrcTy();
5735

nit: Normally variables in LLVM start with a capital, i.e. CastTrunc

llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll
3

It looks like we're still using tail-folding despite passing in -sve-tail-folding=disabled, which I think is because the vectoriser knows the trip count is low. Perhaps you can just remove the flag?

97

I think you can delete this block and just let everything jump directly to %for.cond.cleanup

103

Could you rewrite the blocks in a more natural order, i.e. entry, for.body.preheader, for.body, for.cond.cleanup?

david-arm added inline comments.Aug 24 2023, 8:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5734

nit: Sorry @Rin, just one more thing. Perhaps for consistency it makes sense to also use ST->getValueOperand() even though it's the same thing?

Rin added inline comments.Aug 24 2023, 8:45 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5732

Discussed this and there is a trunc store in the loop

%0 = trunc i64 %indvars.iv21 to i32
 store i32 %0, ptr %arrayidx, align 4
5734

I'll change it, no problem.

5735

Ah, my bad, I'll change that.

llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll
3

You're right, I'll take that flag out.

97

Makes sense I'll do that and rewrite the blocks.

Matt added a subscriber: Matt.Aug 25 2023, 12:36 PM
Rin updated this revision to Diff 554366.Aug 29 2023, 8:33 AM

Simplify if in LoopVectorize.cpp and reorder blocks in test to resolve comments.

Rin marked 6 inline comments as done.Aug 29 2023, 8:35 AM
Rin updated this revision to Diff 554376.Aug 29 2023, 8:54 AM

Use different function for consistency.

Rin marked an inline comment as done.Aug 29 2023, 8:57 AM
fhahn added a subscriber: fhahn.Aug 30 2023, 7:56 AM

It looks like there are number of Analysis test failures e.g. LLVM.Analysis/CostModel/X86::interleaved-load-i16-stride-3.ll in the precommit checks. Do the test sources need changing to preserve the original intention of those tests?

Rin added a comment.Sep 1 2023, 3:07 AM

It looks like there are number of Analysis test failures e.g. LLVM.Analysis/CostModel/X86::interleaved-load-i16-stride-3.ll in the precommit checks. Do the test sources need changing to preserve the original intention of those tests?

I'm working on updating the patch which will result in these tests no longer failing.

Rin added a comment.Sep 1 2023, 3:09 AM

It looks like there are number of Analysis test failures e.g. LLVM.Analysis/CostModel/X86::interleaved-load-i16-stride-3.ll in the precommit checks. Do the test sources need changing to preserve the original intention of those tests?

And by the tests no longer failing I mean these specific X86 tests will not be affected.

Rin updated this revision to Diff 555367.Sep 1 2023, 7:21 AM

Check for loads in loop before adding source type of truncating stores to list of types for widening.

Rin added a comment.Sep 1 2023, 7:26 AM

I will revert the changes in test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll

david-arm added inline comments.Sep 1 2023, 7:51 AM
llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll
746

Hmm, it looks like we've decided not to vectorise at all now. Perhaps because the maximum register width is 32 bits, and since the largest type in the loop is now 32 bits the max VF we can choose is 1? In order to still demonstrate some vectorisation you might have to change the loop IR to be something like this:

%conv = trunc i32 %0 to i16
store i16 %conv, ptr %arrayidx, align 1
llvm/test/Transforms/LoopVectorize/vplan-stress-test-no-explict-vf.ll
18–19

Similar to the test above you may need to change the test so you still get VF=1. You could try choosing to use a 32-bit phi and truncate that to i16?

Rin updated this revision to Diff 555757.Sep 4 2023, 8:17 AM

Make tests vectorize.

Rin marked 2 inline comments as done.Sep 4 2023, 8:18 AM
Rin updated this revision to Diff 555866.Sep 5 2023, 7:41 AM

Add both source type of truncating store and store type to ElementTypesInLoop list.