- User Since
- Apr 13 2015, 9:48 AM (371 w, 5 d)
Tue, May 24
Can we restore review thread https://reviews.llvm.org/D125845 instead?
At the risk of abandoning more patches, I suggest reviving https://reviews.llvm.org/D124889 and update it first instead of using this one. This is to make sure all prior review history is probably kept.
LGTM (Next time, please reuse the old patch https://reviews.llvm.org/D126018 to update.)
Mon, May 16
Fri, May 13
This one looks fine to me. Wait to see if other reviewers have further comment.
Thu, May 12
It might be worth annotating the early inliner for PGO as well.
Tue, May 10
Mon, May 9
Perhaps split out part of the patch that handles the following and then work on a patch to do reassociation (nikic's comment)
Should we have a test case to cover the post-inlining scaling. Otherwise looks good.
Does the data need to be scaled first?
lgtm with nikic's comment addressed.
Thu, May 5
Wed, May 4
Tue, May 3
Mon, May 2
From the feedbacks, the action items are
- checkin baseine tests;
- split out recurrence handling from the main patch -- have it reviewed and committed
- send the recurrence handling patch, get it reviewed and committed (the depth may not be needed in this case)
Sat, Apr 30
Apr 25 2022
This patch slightly changes the behavior (e.g consider alreadyPeeled when comparing with size threshold), but looks more correct. LGTM.
Apr 22 2022
Apr 20 2022
I suspect there is something else going on.
Apr 19 2022
A high level comment. I saw ScalingUpFactor is used in many places. It may be better to directly use Scaled64 data type instead of uint64 to avoid that.
Apr 18 2022
The patch looks reasonable. Please wait for other reviewers (Sanjay ) to comment as well.
Apr 15 2022
Have you analyzed the root cause of the performance regression (with performance counters)? Is it due to other missed optimizations without the peeling or is it due to side effects such as loop alignment?
For loops with trip count between maxPeelCount and 2*maxPeelCount, this patch enables partial peeling of the loop for maxPeelCount iterations. This feels very aggressive and can be detrimental to performance: 1) increased icache footprint; 2) more pressure to BPU and 3) the back branch of the remaining loop may become less predictable.
Apr 14 2022
thanks for the cleanup. LGTM
Apr 13 2022
Can you add back the original list of reviewers and mark the places where the prior comments are addressed?
Is it possible to generate the covmapping file instead of checking a binary blob?
Apr 11 2022
Apr 5 2022
Apr 4 2022
Apr 1 2022
Mar 31 2022
Mar 25 2022
Perhaps modify the test case to include debug statement for better coverage. Otherwise LGTM.
Mar 24 2022
Should this be fixed in Clang (not generating coverage record)? Adding the logic here is a little confusing.
Mar 22 2022
Mar 21 2022
When the option was introduced in https://reviews.llvm.org/D120344, I think the intention is to use it to disable both sampleloader inlining (pre/postlink), so perhaps we can just tighten the comment for the option and document that the profiles are merged back.
Mar 17 2022
Mar 14 2022
On that internal workload, we've got 6% less cmov with this pass turned on for IRPGO (it works, no correctness issue :-) ). But perf-wise it's neutral (we can measure 0.2% perf movement on that workload with high confidence).
Mar 13 2022
FSAFDO allows late profile loading so that the matching of branches should be less of an issue.
Mar 11 2022
Right. I mean pass placement.
Mar 10 2022
The patch looks good to me. Adding Teresa to look at the pass ordering change.
Mar 7 2022
As I mentioned, the cost of investigating performance regressions is high. However if y'all think it is better to shift the cost, I won't stay in the way.
If this is doable, it feels like a winning strategy to me.
Mar 6 2022
LLVM user base is large, the only way to get an answer to that is to turn it on and let user report -- but according to my experience, that is a very disruptive process and user may spend long time figuring out the source of regressions. From the nature of the change, the chances of that happening is quite high.
I would not object turning it on by default when the more elaborate solution is there.
Adding an option is a way to to forward, but I think it should be off by default. If it is on by default, it is very likely to cause some performance regressions.
Mar 4 2022
Mar 3 2022
Mar 1 2022
What is the size overhead on binary with probe? What is the impact on source drifting tolerance level?
Feb 27 2022
Feb 23 2022
Feb 22 2022
Probably add a test case in file test/Transforms/SampleProfile/profile-context-tracker.ll
Is it possible to check this in the caller context?
Feb 18 2022
Feb 17 2022
Feb 9 2022
Feb 4 2022
Jan 28 2022
Jan 27 2022
The refactoring looks good. However in what situation do we need to modify the main header? There might be other ways to achieve the functionality.
Jan 24 2022
Please add a test case. The documentation for llvm-cov tool also needs update.
Jan 19 2022