User Details
- User Since
- Sep 19 2014, 10:34 AM (331 w, 2 h)
Feb 2 2018
D42260 seems a better fix to this regression.
The idea to remove loop headers from destination list when selecting a threadable destination looks reasonable to me. This also fixes the regression we saw in spec2017/perlbench due to D36404.
Oct 27 2017
Oct 26 2017
Thanks for the reviews.
Oct 25 2017
Hi everyone,
Thanks for the reviews.
Oct 17 2017
Reduce unittest.
Oct 16 2017
Add unittest and a comment.
Oct 10 2017
Oct 5 2017
Minor nit, use is_contained. NFCI.
I modified the test case per your suggestions before commiting. Thanks for taking a look.
Oct 4 2017
fix the sequence of deleting an edge from DT.
Address review comments.
Oct 3 2017
Added a comment in the test case to explain that the preheader is simplified during latesimplifycfg.
Oct 2 2017
Kindly ping.
Sep 29 2017
Thanks Daniel/Eli, I learnt something new today and am glad I asked the question. :)
Address review changes.
Fix make-check asserts.
Sep 28 2017
Use batch updater to inform DT about CFG updates.
Rebased.
Precompute DT as suggested by Hal and Eli. Please take a look.
Added a testcase. Please take a look.
Sep 27 2017
abandon this revision since it is a duplicate of D38263
Sep 26 2017
Address Chad's testcase.
Sep 25 2017
Address review comments:
Sep 22 2017
Sep 20 2017
Sep 14 2017
Address Krzysztof's comment.
We need to figure out if LSR is doing something reasonable. It's possible that LSR is actually causing the regression, and old jumpthreading is just serving to obscure the loop structure (thus preventing LSR from doing whatever bad thing it's doing).
Address Eli's comment.
Sep 13 2017
I tried right after latesimplifycfg and it almost recovers the regression. However, running it right before CGP made it even worse for performance of spec2017/perlbench. It seems to be due to some kind of bad interaction with LSR, I haven't digged deeper. Would running right before LSR be reasonable?
kindly ping.
Sep 9 2017
Subsumed by D37343
- Added some more checks to allow merging empty cases when we can estimate extra moves are not added.
- Added test cases.
- spec2017/perlbench improves by 4.5%. No other improvements or regressions observed in Spec.
Aug 31 2017
Is the actual value of the PHI operand relevant here? It looks like some of the testcases on r289988 involve constants, which are materialized during isel (and can be substantially more expensive).
I took a stab at this and created D37343. Please take a look.
Aug 30 2017
Thanks for the review.
Aug 29 2017
Move assert closer to AA->alias call.
rebase and fix x86 lit test.
Aug 25 2017
ContainFolderableUsersInLoop -> ContainFoldableUsersInLoop?
Aug 24 2017
Aug 21 2017
Aug 20 2017
Trim the patch and avoid handling stack references for non-value MMOs to prevent LNT test failures.
Aug 18 2017
Aug 16 2017
Aug 14 2017
I have looked into the phi operands closely and I am not convinced if phi operands involving constants have any influence on the profitability of merging empty blocks. I identified empty exit blocks if when merged with their successors improved the performance a bit, yet if another similar empty exit block was merged with the destination block, it sinks the performance. The only difference between them is that the successor block is also empty when the performance regressed.
Thanks for taking a look. Rebased and committed in r310825.
Aug 8 2017
Correct.
Is the actual value of the PHI operand relevant here? It looks like some of the testcases on r289988 involve constants, which are materialized during isel (and can be substantially more expensive).
Thanks Eli, this looks interesting. Another observation is that if we increase cgp-freq-ratio-to-skip-merge to 1000 it will recover the full 3% regression. I am trying to reduce the exact basic block which when skipped merging in CGP removes placing copies in hot path and improves performance due to reduced dynamic instruction count and the basic block(s) which when merged eliminate the unnecessary branches and improve the performance due to better branching/i-cache utilization. This might provide an answer to your question about the relevance of PHI operands.
Aug 7 2017
Aug 2 2017
Update to address Eli's comments.
A gentle ping.
Jul 31 2017
Jul 28 2017
Updated patch to address the underlying issue.
Jul 20 2017
The underlying issue is that after loopsimplify creates empty exit blocks, CGP cannot clean up the empty blocks if it happens to be coming from a switch case. I am working on an alternative approach to address this issue.
Jul 19 2017
Jul 18 2017
The CGP changes are here: D35584
Split CGP changes into a separate follow on patch, per Eli's request.