Use AA when scanning to find an available load value.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Impressive amount of plumbing for such a tiny change. :)
The change is obvious enough. LGTM. Thanks!
In the future, can you please analyze why this is happening?
This is another example of a test case that is unrelated to the pattern you are trying to optimize. That means if this broke optimizing this tomorrow, we'd have to run spec again and analyze it.
We also really need to understand what optimizations need to be happening, not just randomly implement stuff in various passes because it increases SPEC
Hi Daniel,
I didn't see this as a "new optimisation that needs doing", just as an "omitted analysis that got finally added", and obviously, if you have AA, then alias analysis will be done and the case where that matters will be better.
IIUC, the test is just to make sure it doesn't get removed or changed in the future.
The comment about SPEC improvement, from my POV was irrelevant, given that this is an obvious win.
Unless I'm missing something obvious here, of course... :)
--renato
Hey Renato,
This is not the first patch in this line of extending the PRE in jump threading so I get that you may be missing context. :)
So far we've extended and then made this optimization more expensive without a single real test we care about, and every single testcase added is already caught by other passes. Rather than simply make every pass do everything, because it seems to improve spec score, I would really like to see us actually analyze and understand why it is improving spec and whether extending and making this optimization is really the right plan. I agree that otherwise this particular change is innocuous, but I asked for analysis and real cases for the last patches and haven't gotten them yet.
Obviously, if analysis shows this pass is the right thing to improve, we should do it. But I would like to see that happen before we keep going.
My original intension for this change was to handle a missing opportunity that should be handled in here so that it can obviously open up more threading, which I believe match with the purpose of SimplifyPartiallyRedundantLoad() in jump thread. Regarding why such pattern is exposed in this pass, I can add another test case by reducing from the real case like D29571 for my previous patch. Would it be acceptable with you?
Right, it seems I have jumped the gun on what I thought it was just an obvious patch. Apologies.
Obviously, if analysis shows this pass is the right thing to improve, we should do it. But I would like to see that happen before we keep going.
I totally agree. We're already slower than GCC for usually less performance. I agree having a better analysis of at least execution time worth doing.
Though, I wonder how much this one patch would fare (and be wrongly picked upon) across all previous patches. Not that this should stop any further investigation before commit, but that we maybe should look at the bigger picture (if this has been happening consistently) and plot a graph with relative performance vs. compile time for a particular set of patches...
Makes sense?
--renato
Added another test reduced from spec2000/crafty. This test shows the case caught by this change in O3. I didn't see why other passes before jump thread didn't handle this. Please let me know if you believe this need to be handled before jump thread in other pass.
As I mentioned before, the main purpose of SimplifyPartiallyRedundantLoad() is to encourage jump threading opportunities, and it is run interlaced with other jump threading tasks. Based on that, improving SimplifyPartiallyRedundantLoad() would be profitable even though the other pass can do the same (or even better) job in other places. Hopefully, the test added shows the case caught by this change properly as well as expose opportunities for other passes.
Thanks, with this testcase, i think this change is fine.
This is something we are unlikely to want to do elsewhere ATM.