- User Since
- Sep 30 2018, 2:53 PM (132 w, 4 h)
Fri, Apr 9
Wed, Apr 7
Refactored code, removed new TreeEntry::State. Make InsertElementInst passing through ordinary scheduling, allowing inner deps for bundle of insertelements.
Wed, Mar 17
Closed by commit: https://reviews.llvm.org/rG9abe50047330
(modified summary a bit to comply with changes)
LGTM except for the comment. I think MaxVecRegSize % EltSize != 0 check could be removed (together with comment).
Closed by commit https://reviews.llvm.org/rGdd90c36d601e
Sorry, prematurely closed this revision by mistake.
Tue, Mar 16
The same work is resumed here: D98714, so abandoning this.
Here is another fix approach: https://reviews.llvm.org/D98714
Mon, Mar 15
Sun, Mar 14
Sat, Mar 13
I'm trying another approach here, so abandoned this for a while.
Mar 11 2021
Also could you please precommit test or just make diff against it (to see test changes before/after patch)?
Mar 3 2021
Feb 26 2021
Feb 25 2021
LG after addressing all comments.
Feb 22 2021
Actually, it is not reducing. This is how test-suite python script works. So, here lhs - number of instructions after this patch, rhs - before. And the less relative number, the more vector instructions we actually generate.
Btw, how could it be explained NumVectorInstructions stat reducing after this patch?
Feb 18 2021
Added fp2int scalar and vector test cases
I can add fp2int sample, but it is not touched by this patch. Actually, its tree built is not vectorized by marking as "tiny" (R.isTreeTinyAndNotFullyVectorizable() function), its size is 2 (since fp2int is unary operation, whereas add is binary one). Though it could be fixed by checking that stores are using its result and therefore tree size could be increased. But this looks too hacky, isn't it?
Feb 17 2021
Feb 16 2021
Feb 5 2021
Feb 2 2021
Due to what said above, I'm to abandon this change. It looks like over-optimization, breaking llvm IR middle-end abstraction.
Jan 27 2021
At Dinar's request, I've measured compile time regression: http://llvm-compile-time-tracker.com/compare.php?from=f3449ed6073cac58efd9b62d0eb285affa650238&to=39362e11add238c45a7a7d55c1e002005f396fb7&stat=instructions. The regression is visible, but it is acceptable for such change imho. The largest regression comes from CMakeFiles/clamscan.dir/libclamav_uuencode.c.o (+11.28%), so one can investigate this particular file.
Clean up tests
Small test fix
Jan 19 2021
Fixed comment containing this D94974 revision number
Jan 18 2021
You are planning to revert this patch after ConstantData use-list removing, am I right?
Jan 13 2021
Jan 12 2021
Looks good, but could you please precommit test (and rebase) to see actual output difference?
Jan 4 2021
Jan 1 2021
Dec 18 2020
In-short: I'm still do sure that current patch is bugfixing. But this bug had induced some of boundary vectorization cases before fixing. Below is a typical case.
Dec 14 2020
Dec 13 2020
Dec 9 2020
Dec 8 2020
Dec 7 2020
Looks good to me, just one style remark: we have newlines between function members for all classes throughout this module, so I'd prefer to see the same for ShuffleInstructionBuilder class.
Dec 5 2020
I'm agree with @fhahn -- tuning cost model is more right way. Also it could be the case of arch where gathers are missing but it's beneficial to use them for vectorization tree building. They are lowered to scalarized instrs further.
Dec 4 2020
The same work will be done by instcombine, so we can just zero redundant cost here to gain the same effect globally. But it makes sense to make shuffle merging at the vectorization stage as well, of course.
Dec 1 2020
Btw, I've observed significant compile-time regression with this patch: http://llvm-compile-time-tracker.com/compare.php?from=99d82412f822190a6caa3e3a5b9f87b71f56de47&to=81b636bae72c967f526bcd18de45a6f4a76daa41&stat=instructions (thanks to @nikic for awesome service). This could be justified in case of comparable performance improvements but have you done any benchmarking?
Nov 25 2020
Nov 23 2020
I believe this issue is related to the default cost for getGatherScatterOpCost(). For the arch not having gather/scatter instrs we use TargetTransformInfoImplBase::getGatherScatterOpCost() which returns 1 unconditionally: https://github.com/llvm/llvm-project/blob/release/11.x/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L480
I'd fix it by setting something like 1024 for the default cost.
Align fix. Style fix.
Nov 21 2020
Analogue of https://reviews.llvm.org/D90445 for stores.
Nov 20 2020
Nov 18 2020
Nov 17 2020
Added another assert
Nov 16 2020
Fixes and assert
Add overloaded newTreeEntry() for common case