- User Since
- Aug 30 2018, 9:33 AM (125 w, 5 d)
Nov 25 2020
It sounds like throttling patch should resolve this issue as cutting out ScatterVectorize entry with high cost will effectively return to previous behavior.
Nov 23 2020
Current SLP has significant drawback with regard to its cost modeling. And this patch highlights it.
Consider we have four scalar loads of i8 type. With prior approach (vectorization overhead) we had cost for such entry 4 (x86 target).
With this new approach we have two entries instead of one: ScatterVectorize loads + NeedToGather GEPs. And costs for these entries are 6 and 10 respectively, thus cost increased from 4 to 16.
And the problem here is once we put this pattern into the tree it pulls cost up for the entire tree. If we have multiple such patterns over the tree their effect is magnified. These entries finally outweigh possible profit of vectorization for remaining portion of the tree and we end up not vectorizing it at all (even if downstream optimizations could probably change it into optimal code). If SLP could make choice vectorization overhead vs gather intrinsic based in their costs while building vectorizable tree the outcome could be different.
Nov 13 2020
Jul 14 2020
Jul 10 2020
Jun 24 2020
Jun 20 2020
Vector combine dies on assertion with this test case.
May 29 2020
replaced \param with \p
Thank you, Alexey. I addressed your comments in new upload.
May 28 2020
Thank you for prompt review.
Updated patch to address your comments.
May 27 2020
Apr 28 2020
Apr 27 2020
Apr 15 2020
Apr 9 2020
Apr 8 2020
Apr 7 2020
Mar 31 2020
Mar 30 2020
Updated getScalarizationOverhead to use new TTI API for chained inserts/extracts.
This reduced cost overestimates in many cases brought with https://reviews.llvm.org/D74976.
Cost model tests adjusted to reflect that.
Tests Transforms/SLPVectorizer/X86/resched.ll and Transforms/LoopVectorize/X86/strided_load_cost.ll reverted to state prior to D74976.
Mar 27 2020
Fixed format and test failures.
Feb 28 2020
Feb 27 2020
SO, we have correct implementation and this is just a check to prevent possible bugs in the future?
Jan 17 2020
Jan 15 2020
Jan 13 2020
Dec 26 2019
Dec 12 2019
Thanks for the reproducer. Does this one produce an incorrect result, crashes compiler or something else? What exactly?
There is still problem with extracts.
This case shows it:
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
Dec 6 2019
Nov 4 2019
Update: I have spent some time investigating unit tests infra and finally came to conclusion that it is not quite possible to create a unit test that calls BoUpSLP::dumpVectorizableTree() as we need to construct BoUpSLP object. We are unable to do that having class forward declaration only.
Do you have any ideas how a test might look?
Jun 24 2019
May 30 2019
Dec 21 2018
Abandoned as issue was fixed with D51748
Sep 7 2018
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86" -DCMAKE_INSTALL_PREFIX="/your/install/dir" "/local/ll.org/source/root/llvm"