As discussed offline, I think this should go through an RFC process.
Wed, Apr 17
Thu, Apr 11
There is still some test fixing to do for Power, but before I do that I'd like to get your opinion on the approach, in particular regarding the pass placement (I pretty much placed it randomly here).
[ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.
Wed, Apr 10
Alternatively, what do you think about making ExpandMemCmp a late IR optimization pass like the vectorizer passes?
Fri, Apr 5
Wed, Apr 3
Are you sure ?
Mon, Apr 1
Thanks Roman. I only have a comment regarding documentation.
I'll let gchatelet@ review that one as he started looking at doing that some time ago.
OK, I succeeded in profiling this using pprof instead of callgrind.
Could that be the reason for the regression ?
This made pdfium use 6.8% more cpu, https://bugs.chromium.org/p/chromium/issues/detail?id=947611
Since the intent here was to make things faster, any ideas how this could happen?
Fri, Mar 29
Submitted as 357239.
Thanks, just one nit.
Thu, Mar 28
What about CPU's that specify let PostRAScheduler = 1; ?
This could use test coverage i guess?
I don't think is a blocking issue, but in future we should revisit the logic in X86MacroFusion.
Only cosmetic comments, this looks good !
Thanks for the review.
Address review comments, rebase.
Wed, Mar 27
Update comment, swap features.
OK cool thanks for having a look.
Tue, Mar 26
Add unit test.
The issue is that such a test would have failed in a non-deterministic manner, which is why I've only added a unit test. But now that this should not longer fail, let's add one.
To reword: because if i do simple clustering by opcode, i will then need to add yet another
"stabilization" step - for each cluster, check that every measurement is neighbor of all
the other points in that cluster, and if they are not, mark cluster as noise.
(well, not every vs. every, just the lower/upper triangle excluding diagonal)
Mon, Mar 25
Mar 22 2019
Ignore template contexts and add a test.
I plan to run some experiments today using your patch.
To make my suggestion clearer: if you just want to compare measured instruction data to its checked-in data, why not cluster by instruction opcode (to merge the several measurements for a given instruction) and run the analysis on that ?