User Details
- User Since
- Apr 12 2016, 8:44 PM (319 w, 3 d)
Yesterday
Rebased.
Replaced shuffle mask argument with a bitvector and moved all the opcode checking code to TTI->isLegalAltInstr().
Thu, May 26
Moved the cost calculation code from getShuffleCost() to a separate function: getAltInstrCost().
Added shufflemask argument to TTI function.
I understand your concerns, but I feel that this pattern is not as tricky as folded loads. We can accurately detect the add-sub pattern within the SLP pass itself, since it matches the AltShuffle TreeEntry. So we can even make the tree cost more accurate (this is in a follow up patch). The only issue that I see is perhaps with cost modeling of this pattern from within a different pass, other than SLP, since that would require special logic to skip the cost of the vector add and vector sub. But given the current infrastructure I don't think we can do anything about this, but that shouldn't stop us from making getShuffleCost() more accurate, even if it will only benefit SLP.
Wed, May 25
LG
Tue, May 24
ping
Removed AArch64/loadorder.ll changes.
It looks strange, let me double check.
Changed visiting order as suggested by @ABataev.
Mon, May 23
Fixed condition (line 4037) which was somehow flipped in the last version of the patch.
Fri, May 20
Removed single-user assertion for getting the UserTE. Instead we iterate over all user entries and get the one in the Users map.
Fixed matching of AddSub pattern.
Wed, May 18
I am not sure if there is any other pattern, but this one showed up in a regression in the eigen benchmark.
Hmm yeah fixing it in the back-end may be an option but what if we end up not vectorizing the code because of this? The alternative is a blend + add + sub which should increase the cost quite a bit. So it is probably best to teach reordering not to mess up this pattern in the first place.
Mon, May 16
Fixed stale comment and added DisableScheduling flag to BlockScheduling.
Thu, May 12
Wed, May 11
This was an issue that was correctly implemented initially and was then changed during the review process in an attempt to optimize the code, exposing the corner case that caused the crash.
An assertion in the code is probably more useful than an actual test, but I guess it is probably best to add both.
Disabled the scheduler for the fast buildtree.
Tue, May 10
Thanks for the comments. Updated sorting according to @dmgreen's comments.
Mon, May 9
Updated checks in tests.
This fixes a regression in SingleSource/Benchmarks/Misc/flops-5.c. Increasing the RootLookaheadMaxDepth doesn't fix the issue either. Building small trees instead of calling the lookahead heuristic seems to be more accurate in this case.
Fixed stability issue and also removed user tree indices out of the TreeEntry.
Thanks for the reproducer @vdmitrie , think I fixed the issue.
Fri, May 6
@vdmitrie please let me know if you find a stability issue, I will do more testing on my side too.
Addressed most of @vdmitrie's comments.
The regressions would still appear unless final solution for non-power-of-2 is landed. Even after this there might be the issues with the cost model. What is the actual cause of the regression? The code was vectorized before but then it did not? There are couole regressions in reductiins, caused bya bit dufferent processing order, they should go away with the final code for reductions and non-power-of2.
This is not related to the non-power-of-2. The cause is shown in the lit test: SLP vectorizes trees in isolation so it may generate extra shuffles. It was triggered by the load broadcast cost change.
This fixes a regression in eigen BM_Dot_ComplexComplex_Naive. I doubt it will have any large impact anywhere else, but we should know soon enough.
Thanks @vdmitrie for your comments. I will update the code.
Tue, May 3
Added more comments and added checks for the count and type of the load's users.
Mon, May 2
Updated condition for LoadCanBeCombined.
Somewhat related to this patch: For cost modeling changes it would probably be helpful to have tests with cost values in them. Because some of these changes may not directly affect the generated IR, but would show up in the cost diff. For example something like: https://reviews.llvm.org/D124802 . Any thoughts?
Added hasOneUse check and removed the BB check.
Yes, I think it makes sense to add the hasOneUse check.
I am not really sure about the BBs though, it was reported as a potential issue in SLP, but it turned out that it was a different issue after all. So I guess I will drop this check and keep.
Thu, Apr 28
Fixed condition.
We now only check if the load and its uses are in the same BB.
Apr 28 2022
Apr 27 2022
Added SSE, AVX, AVX2, AVX512 checks.
Apr 26 2022
Rebased and addressed comment.
Apr 22 2022
Looks great, @ABataev any comments?