When dealing with truncated stores, the Min/Max width would not consider source values of truncated stores. Instead it would only look at the type value after performing the truncate instruction. This patch adds the type value of truncated stores to the list of types for widening.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Thanks @Rin - this looks like a useful improvement! In truncate-type-widening.ll we're clearly now choosing a more natural VF that we can generate code for more efficiently, in particular the step vector intrinsic call in the preheader.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5727 | I wonder if we should also be asking if the store operand is loop invariant too? This would avoid tests changing such as lvm/test/Transforms/LoopVectorize/vplan-stress-test-no-explict-vf.ll. If the input is loop invariant then it's not really participating in the vector loop. | |
5729 | I think you can write this more simply as if (auto *Trunc = dyn_cast<TruncInst>(ST->getOperand(0))) T = Trunc->getSrcTy(); | |
5730 | nit: Normally variables in LLVM start with a capital, i.e. CastTrunc | |
llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll | ||
3 | It looks like we're still using tail-folding despite passing in -sve-tail-folding=disabled, which I think is because the vectoriser knows the trip count is low. Perhaps you can just remove the flag? | |
97 | I think you can delete this block and just let everything jump directly to %for.cond.cleanup | |
103 | Could you rewrite the blocks in a more natural order, i.e. entry, for.body.preheader, for.body, for.cond.cleanup? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5727 | Discussed this and there is a trunc store in the loop %0 = trunc i64 %indvars.iv21 to i32 store i32 %0, ptr %arrayidx, align 4 | |
5729 | I'll change it, no problem. | |
5730 | Ah, my bad, I'll change that. | |
llvm/test/Transforms/LoopVectorize/AArch64/truncate-type-widening.ll | ||
3 | You're right, I'll take that flag out. | |
97 | Makes sense I'll do that and rewrite the blocks. |
It looks like there are number of Analysis test failures e.g. LLVM.Analysis/CostModel/X86::interleaved-load-i16-stride-3.ll in the precommit checks. Do the test sources need changing to preserve the original intention of those tests?
I'm working on updating the patch which will result in these tests no longer failing.
And by the tests no longer failing I mean these specific X86 tests will not be affected.
Check for loads in loop before adding source type of truncating stores to list of types for widening.
I will revert the changes in test/Transforms/LoopVectorize/AArch64/type-shrinkage-zext-costs.ll
llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll | ||
---|---|---|
743 | Hmm, it looks like we've decided not to vectorise at all now. Perhaps because the maximum register width is 32 bits, and since the largest type in the loop is now 32 bits the max VF we can choose is 1? In order to still demonstrate some vectorisation you might have to change the loop IR to be something like this: %conv = trunc i32 %0 to i16 store i16 %conv, ptr %arrayidx, align 1 | |
llvm/test/Transforms/LoopVectorize/vplan-stress-test-no-explict-vf.ll | ||
21 | Similar to the test above you may need to change the test so you still get VF=1. You could try choosing to use a 32-bit phi and truncate that to i16? |
I wonder if we should also be asking if the store operand is loop invariant too? This would avoid tests changing such as lvm/test/Transforms/LoopVectorize/vplan-stress-test-no-explict-vf.ll. If the input is loop invariant then it's not really participating in the vector loop.