User Details
- User Since
- Nov 12 2013, 7:42 AM (489 w, 4 d)
Nov 1 2018
Jun 18 2018
May 24 2018
May 23 2018
May 21 2018
LGTM.
Okay, now I understand. Seems like a reasonable extension.
May 16 2018
Can you please add a test case?
Apr 27 2018
Apr 12 2018
This make sense to me. LGTM.
Apr 11 2018
Thanks, Eric!
Apr 10 2018
LGTM. Thanks, Geoff. It would be nice to get this into the 6.0.1 release, if at all possible.
Update based on Eric's feedback.
Apr 6 2018
Thanks everyone for the quick review. I'll commit shortly.
Apr 5 2018
Apr 4 2018
Apr 3 2018
Mar 30 2018
Mar 29 2018
Still LGTM. Thanks for the fix, Haicheng.
Mar 28 2018
Hi Peter/Rafael,
After this commit, I noticed a ~3.3% regression in SPEC2006/libquantum when compiling with -O3 -flto and the gold-plugin. In short, this change disabled tail duplication during machine block placement as that optimization is only performed when the optlevel is >= Aggressive (and we now default to -O2/Default). In turn, I've started looking into adding function attributes to control the code generation optimization level, per the suggestion in the commit message. Before I get too far into the implementation, I was wondering if there had been some discussion on adding these function attributes? If so, can you point me toward these discussions?
Mar 23 2018
Mar 22 2018
Hi Florian,
We identified a 2.15% regression in SPEC2006/h264ref due to this commit. After this change, the FullPelBlockMotionBiPred() function is no longer inlined into the hottest function, BlockMotionSearch(). Previous to this change, the function was inlined because there was a single callsite in the entire program (known only when compiling in LTO) and the original definition could be removed after inlining. However, after loop peeling the callsite of FullPelBlockMotionBiPred() is replicated, which prevents inlining.
Mar 9 2018
Mar 5 2018
Feb 13 2018
LGTM, but formal approval should probably come from someone outside of our group (ping! :).
Feb 12 2018
A few quick comments. Will follow up with a more complete review later this week.
Feb 4 2018
Feb 2 2018
Hi Digeo,
Matt will be out of the office for a few weeks and he asked me to follow up on this patch. Hopefully, the new version addresses all of your concerns.
Address Diego's comments.
Jan 31 2018
ping.
Jan 29 2018
Thanks, Matthias/Jun!
Jan 28 2018
Would it be possible to revert r322917 while we investigate the regressions? We also identified a 3.61% regression in SPEC2006/bzip2, so here's to complete list of regressions we are currently seeing due to this change:
Jan 23 2018
Jan 18 2018
I think we've identified a work around internally, so I'm just going to abandon this patch.
Jan 15 2018
Jan 8 2018
Jan 5 2018
Dec 26 2017
I'd like to suggest that this implementation be included in MachineSink.cpp, but continue to live as a separate pass. The two passes have the same intent, but must be independent passes due to the previously mentioned constrains; I'd almost want to refer to this as the PostRAMachineSink pass... The point being I'd like to have one place where I can see what's sunk pre-RA and what's sunk post-RA.
Dec 11 2017
All of the review feedback has been addressed and I have no additional comments. LGTM. Thanks, Haicheng.
Nov 29 2017
Nov 27 2017
Am I correct in assuming this is going to be a problem for Falkor and Saphira as well? If so, can you add solutions for those as well? Cortex-a57 should be good enough for those targets as well.
Nov 21 2017
Nov 16 2017
Nov 15 2017
Please run clang-format, but otherwise LGTM.
Nov 14 2017
The load/store opt pass is already pretty expensive in terms of compile-time. Did you see any compile-time regressions in your testing? Also, what performance results have you collected?