After D121595 was commited, I noticed regressions assosicated with small trip count numbers vectorization by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
1,410 ms | x64 debian > libFuzzer.libFuzzer::fuzzer-finalstats.test |
Event Timeline
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
192 | Typo here, should be TailFolding? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
192 | Done. thanks Paul. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
192–195 | We're late into the LLVM 15 release cycle so here we're trying to fix an LLVM 14 regression without resorting to reversing the decision to use tail folding for all low trip count loops. I think it'll be better to now do this in a way with the most minimal impact to other targets and so rather than a global command line option can you instead add a TTI hook. That way we can restrict the change to just AArch64/SVE and if it turns out we can rely on a more accurate cost model in the future and there are no other uses at that time, we can remove the TTI hook. | |
5088–5092 | This doesn't look like the correct place to decide this. Within LoopVectorizePass::processLoop the code exists to choose CM_ScalarEpilogueNotAllowedLowTripLoop for low trip count looks. I think that's the code we want to change to also consider the new tail folding specific watermark. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
10148–10161 | Can this be if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) SEL = CM_ScalarEpilogueNotAllowedLowTripLoop and still do what you need? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
10148–10161 | I have to return false in order to prevent vectorization. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
10148–10161 | Thanks @dtemirbulatov. In which case it looks like you need to restructure the code a little so we emit the necessary remark and debug information. e.g. if (Hints.getForce() == LoopVectorizeHints::FK_Enabled) ... else if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) { LLVM_DEBUG(dbgs() << "\n"); SEL = CM_ScalarEpilogueNotAllowedLowTripLoop; } else { LLVM_DEBUG(dbgs() << " But the target considers the trip count too small to consider vectorizing.\n"); reportVectorizationFailure(....); Hints.emitRemarkWithHints(); return false; } |
A minor pre-push change but otherwise looks good. Once landed can you please do the necessary to request an LLVM 15 backport and then we can turn our attention to potentially fixing up the cost model so this rather brute force fix can be removed.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
10152 | To make the output consistent please add a <space> before the 'But' because otherwise you'll end up printing overheads are incurred.But the target considers... |
this was committed with cab6cd68340255be241b7cf169c67a1899ced115
but I made extra spaces before "Differential Revision", that is why it didn't close automatically.
Typo here, should be TailFolding?