- User Since
- Jan 23 2018, 4:17 PM (86 w, 2 d)
Wed, Sep 18
Thanks Hideki & Ayal for quick response!
If you are satisfied with the current version please go ahead and commit it since I'm not permitted to do that myself.
Fixed as requested by Ayal.
Tue, Sep 17
Fri, Sep 13
Thu, Sep 12
I haven't seen anything in the literature which quite matches this. Given that, I'm not entirely sure that keeping the name "loop predication" is helpful. Anyone have suggestions for a better name? This is analogous to partial redundancy elimination - since we remove the condition flowing around the backedge - and has some parallels to our existing transforms which try to make conditions invariant in loops.
Wed, Sep 4
If you like current version please go ahead and commit it since I don't have committer rights yet.
Corrections requested by Philip.
Wed, Aug 21
Aug 11 2019
Aug 6 2019
Splitting exitsting patch to two parts. The first one is an actual fix for the problem. The second one is an enhancement to evaluateAtIteration which is going to be reviewd and committed separately.
Aug 4 2019
Please be aware about build bot failure (http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-full-sh/builds/2185) most likely caused by this change.
Original miscompile is expected to be fixed by:
Aug 2 2019
Hi Philip (@reames ), please take a look when possible. Thanks!
Jul 31 2019
Thanks for quick response @hsaito ! May I ask you to commit it as well since I don't have committer rights yet.
Would be nice if you find time to take a look.
Jul 30 2019
Jul 29 2019
Jul 26 2019
I've never looked to this piece of code before. For that reason I would like some one more familiar with this part to make a review.
Jul 25 2019
I checked your patch on original test case and it does resolve the issue. Unfortunately, we have 4 other test cases regressed from 1% to 5%. Bad news is that we don't have sources for these cases. At glance they don't involve operations with floats. So that is not a surprise they are not affected. I can continue investigation and try to provide short reproducer if needed. My fear here is that original change touches very wide class of cases and we won't be able to fix all of them one by one (or it will take too much time). Can we think of a more realistic approach to deal with regressions?
Jul 23 2019
Here is a C++ equivalent of my original code (which is actually java application) for you to reproduce.
Jul 19 2019
- Implemented check for "best" possible result suggested by Johannes.
- Merged test cases into one file and applied update_test_checks.py
No, it's not PGO.
Also do you have branch misprediction perf data? (large slowdowns like this is usually triggered by side effect like this).
I do. There is no any branch/data/instruction miss predictions. LSD works 100% in both cases as well. Even after regression CPI is 0.3. That means we are execution bounded. If I run the benchmark under perf scores change a little and there is 19% difference instead of original 35%. About half of slowdown (8.3%) comes from increased path length due to extra jump instruction. Rest comes from CPI increase by 10%. I'm checking different hypotheses what could cause that but no success so far.
Jul 18 2019
This change causes 35% regression on very simple loop which is hot part of our internal micro benchmark.
This loop takes 99% of total execution time and has reasonably large number of iterations.
All measurements were performed on Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz.
Jul 17 2019
I was independently looking to this part and came up with essentially same change. The only difference is that I chose MachineFrameInfo as source of "truth". Not really sure which one is better though...
Jul 16 2019
Just found small issue with the patch. Will update once testing is good.
Jul 15 2019
Updating patch to use implementation proposed by Ayal.
I don't have commit rights. I need someone's assistance with putting the fix in. Thanks!
One more test case added.
Updating comment as requested.
Jul 12 2019
Looks we are totally aligned here. Honestly I don't understand why we can't track this thread down to the end. I agree potentially we may reuse result of non-inv load for inv load if we know it's the best. But supporting this case would be about the same complexity as extending current caching for non-inv as well as inv loads. If I simply drop the first guard around the code which does cache lookup we will end up with inv load not removed on the following example.
Jul 11 2019
Jul 10 2019
Jul 9 2019
Adding unit test.
Remove accidental changes.
Jul 8 2019
Jul 7 2019
There is a https://bugs.llvm.org/show_bug.cgi?id=42384 failing due same problem.