With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
the title should be something like [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops
otherwise this looks like the right approach
llvm/lib/Passes/PassBuilder.cpp | ||
---|---|---|
1398 ↗ | (On Diff #444089) | run git clang-format? |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
3122 | as we found offline, this may return nullptr in the case of opt -aa-pipeline=, we should bail out in that case I'd add a RUN: opt -passes=simple-loop-unswitch -aa-pipeline= to the test to make sure it doesn't crash (no need to check IR) | |
3238 | just directly pass nullptr below | |
3287 | these are specific to the legacy pass and don't need to be here since we're not actulaly using BFI in the legacy pass | |
llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll | ||
2 ↗ | (On Diff #444089) | I'd just use llvm/utils/update_test_checks.py rather than manually checking loops |
[llvm] Implement PGO for SLU
Using PGO for SLU to avoid non-trivial unswitch code on code loops, which will increase the binary code size without gain much benefit.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
3121 | still looks like you need to run git clang-format HEAD~ | |
llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll | ||
2 ↗ | (On Diff #444089) | the autogenerated CHECKs aren't in here, it should be |
commit looks good, but the message/title should be updated as I commented before
can you check in just the test now, making sure to regenerate the CHECK lines to work with ToT LLVM (which will unswitch both loops), then rebase this on top of that? so we can see the diff this patch causes. the test commit can be called something like [test][SimpleLoopUnswitch] Precommit test for D129599
llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll | ||
---|---|---|
2 ↗ | (On Diff #444089) | sorry, typo, yeah we don't need print<loops> |
lg with the commit title/description change
(one thing you can do is change it in phab, then run arc amend which will pull the commit title/description changes and also add a Reviewed-by: line)
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
3231–3245 | unintentional change |
Thank you for adding this!
Changes look good. Please address the comment about the title change before committing.
I noticed significant performance degradation (~30%) on a spec benchmark due to this commit. isColdBlock doesn't seem to work as expected, because it considered cold a loop that was in a hot function through the profile.
For example, it could be the case that the loop in question is nested and is vectorizable after being unswitched, so not unswitching it can result in performance degradation if the surrounding function is hot.
void hotFunction(int M, int N, int * A, int *B, int *C){ for (j = 0; j < M; j++) for (i=0; i < N; i++) { A[i] = B[i] + C[i] if (cond) do_something(); } }
Could you please revert this commit while you work on replacing the isColdBlock check with something stronger?
@alexgatea Thanks for the feedback.
isColdBlock should be global. One example is in (https://llvm.org/doxygen/ProfileSummaryInfo_8cpp_source.html#l00142), which uses isColdBlock to check whether a function is cold or not. So I assume the issue is the loop is non-cold in real case, but was concerned as cold in the profile data. I kindly suggest updating the profile data to see if there is still any degradation.
Please correct me if I am wrong. Thanks
I appreciate your prompt response. A couple of points ...
The profile data is updated every time I run the spec benchmark, so it's accurate. And the 30% degradation is significant.
Thank you for pointing out that example; I agree that isColdBlock should be global.
I actually called isFunctionColdInCallGraph on the function in question and it returns false, so this function is hot. I also checked the ColdCount of the loop in question and it is 1 (in particular non-zero); another hot block in this same function has a ColdCount of 2000. My guess is that the block in question is cold relative to other blocks in the function (by a factor of ~2000 I guess) but is still itself significant. And of course, this cold block analysis doesn't take into consideration how much the loop itself is optimizable if unswitched so it could still be beneficial to unswitch even though it is a cold block.
Please let me know your thoughts.
As suggested by other users, we need to make the skip check more restrict to avoid performance degradation.
Avoid applying non-trivial unswitching in cold Function. Compared with previous PGO solution, this version will apply non-trivial unswitching on cold loops in hot functions.
every patch should be its own phabricator patch, it's not good practice to overwrite one that's been submitted (excluding if the patch was reverted) because people tend to use D### to refer to a patch. can you create a new one?
should have a test with a hot and cold function now, rather than two loops in one function
Sorry for the mistake, I will keep this revision open and make it related to the update revision for the regression.
still looks like you need to run git clang-format HEAD~