Page MenuHomePhabricator

[SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops
AcceptedPublic

Authored by drcut on Jul 12 2022, 3:05 PM.

Details

Summary

With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.

Diff Detail

Unit TestsFailed

TimeTest
60,040 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; /usr/bin/cmake /var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld

Event Timeline

drcut created this revision.Jul 12 2022, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 3:05 PM
drcut requested review of this revision.Jul 12 2022, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 3:05 PM

the title should be something like [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops

otherwise this looks like the right approach

llvm/lib/Passes/PassBuilder.cpp
1402–1405

run git clang-format?

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3121

as we found offline, this may return nullptr in the case of opt -aa-pipeline=, we should bail out in that case

I'd add a RUN: opt -passes=simple-loop-unswitch -aa-pipeline= to the test to make sure it doesn't crash (no need to check IR)

3237

just directly pass nullptr below

3286

these are specific to the legacy pass and don't need to be here since we're not actulaly using BFI in the legacy pass

llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll
2

I'd just use llvm/utils/update_test_checks.py rather than manually checking loops
once the test looks good it should be precommitted so we can see the delta this patch gives

drcut updated this revision to Diff 450083.Aug 4 2022, 11:28 AM

[llvm] Implement PGO for SLU
Using PGO for SLU to avoid non-trivial unswitch code on code loops, which will increase the binary code size without gain much benefit.

aeubanks added inline comments.Aug 4 2022, 1:56 PM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3120

still looks like you need to run git clang-format HEAD~

llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll
2

the autogenerated CHECKs aren't in here, it should be
; RUN: opt < %s -passes='require<profile-summary>,function(loop-mssa(simple-loop-unswitch<nontrivial>)),print<loops>' -S | FileCheck %, then delete the existing CHECKs and run update_test_checks.py

drcut updated this revision to Diff 450132.Aug 4 2022, 2:34 PM
drcut marked 3 inline comments as done.

[llvm] Change test file for PGO on SLU

commit looks good, but the message/title should be updated as I commented before

can you check in just the test now, making sure to regenerate the CHECK lines to work with ToT LLVM (which will unswitch both loops), then rebase this on top of that? so we can see the diff this patch causes. the test commit can be called something like [test][SimpleLoopUnswitch] Precommit test for D129599

fhahn added a subscriber: fhahn.Aug 5 2022, 4:12 AM
fhahn added inline comments.
llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll
2

,print<loops>' is this needed? It looks like you are not checking the output (which is emitted to stderr)

5

if you are not checking the output, this should probably use -disable-output.

aeubanks added inline comments.Aug 5 2022, 8:55 AM
llvm/test/Transforms/SimpleLoopUnswitch/PGO-nontrivial-unswitch.ll
2

sorry, typo, yeah we don't need print<loops>

drcut updated this revision to Diff 450326.Aug 5 2022, 10:47 AM
drcut marked 4 inline comments as done.

[test][SimpleLoopUnswitch] Precommit test for D129599

drcut updated this revision to Diff 450405.Aug 5 2022, 2:23 PM
drcut marked 3 inline comments as done.

[test] Update PGO SLU test

aeubanks accepted this revision.Aug 6 2022, 11:37 AM

lg with the commit title/description change
(one thing you can do is change it in phab, then run arc amend which will pull the commit title/description changes and also add a Reviewed-by: line)

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3217–3231

unintentional change

This revision is now accepted and ready to land.Aug 6 2022, 11:37 AM
asbirlea accepted this revision.Aug 8 2022, 10:35 AM

Thank you for adding this!

Changes look good. Please address the comment about the title change before committing.

drcut retitled this revision from Apply PGO on SimpleLoopUnswitch to [SimpleLoopUnswitch] Skip non-trivial unswitching of cold loops.Aug 8 2022, 11:08 AM
drcut edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Aug 8 2022, 11:13 AM
This revision was automatically updated to reflect the committed changes.
alexgatea reopened this revision.Aug 25 2022, 6:29 AM
alexgatea added a subscriber: alexgatea.

I noticed significant performance degradation (~30%) on a spec benchmark due to this commit. isColdBlock doesn't seem to work as expected, because it considered cold a loop that was in a hot function through the profile.

For example, it could be the case that the loop in question is nested and is vectorizable after being unswitched, so not unswitching it can result in performance degradation if the surrounding function is hot.

void hotFunction(int M, int N, int * A, int *B, int *C){
for (j = 0; j < M; j++)
   for (i=0; i < N; i++) {
      A[i] = B[i] + C[i]
      if (cond) do_something();
   }
}

Could you please revert this commit while you work on replacing the isColdBlock check with something stronger?

This revision is now accepted and ready to land.Aug 25 2022, 6:29 AM
drcut added a comment.Aug 25 2022, 7:40 AM

@alexgatea Thanks for the feedback.
isColdBlock should be global. One example is in (https://llvm.org/doxygen/ProfileSummaryInfo_8cpp_source.html#l00142), which uses isColdBlock to check whether a function is cold or not. So I assume the issue is the loop is non-cold in real case, but was concerned as cold in the profile data. I kindly suggest updating the profile data to see if there is still any degradation.
Please correct me if I am wrong. Thanks

I noticed significant performance degradation (~30%) on a spec benchmark due to this commit. isColdBlock doesn't seem to work as expected, because it considered cold a loop that was in a hot function through the profile.

For example, it could be the case that the loop in question is nested and is vectorizable after being unswitched, so not unswitching it can result in performance degradation if the surrounding function is hot.

void hotFunction(int M, int N, int * A, int *B, int *C){
for (j = 0; j < M; j++)
   for (i=0; i < N; i++) {
      A[i] = B[i] + C[i]
      if (cond) do_something();
   }
}

Could you please revert this commit while you work on replacing the isColdBlock check with something stronger?

@alexgatea Thanks for the feedback.
isColdBlock should be global. One example is in (https://llvm.org/doxygen/ProfileSummaryInfo_8cpp_source.html#l00142), which uses isColdBlock to check whether a function is cold or not. So I assume the issue is the loop is non-cold in real case, but was concerned as cold in the profile data. I kindly suggest updating the profile data to see if there is still any degradation.
Please correct me if I am wrong. Thanks

I appreciate your prompt response. A couple of points ...
The profile data is updated every time I run the spec benchmark, so it's accurate. And the 30% degradation is significant.
Thank you for pointing out that example; I agree that isColdBlock should be global.
I actually called isFunctionColdInCallGraph on the function in question and it returns false, so this function is hot. I also checked the ColdCount of the loop in question and it is 1 (in particular non-zero); another hot block in this same function has a ColdCount of 2000. My guess is that the block in question is cold relative to other blocks in the function (by a factor of ~2000 I guess) but is still itself significant. And of course, this cold block analysis doesn't take into consideration how much the loop itself is optimizable if unswitched so it could still be beneficial to unswitch even though it is a cold block.
Please let me know your thoughts.

we can change the condition to if the function is cold

we can change the condition to if the function is cold

That seems reasonable to me

drcut updated this revision to Diff 456138.Aug 27 2022, 11:51 AM

As suggested by other users, we need to make the skip check more restrict to avoid performance degradation.

@alexgatea please try whether the new revision could solve your problem. Thanks

@alexgatea please try whether the new revision could solve your problem. Thanks

I have verified that the new revision solves my problem. Thank you!

drcut updated this revision to Diff 456331.Aug 29 2022, 7:15 AM

Avoid applying non-trivial unswitching in cold Function. Compared with previous PGO solution, this version will apply non-trivial unswitching on cold loops in hot functions.

drcut added a comment.Aug 29 2022, 7:16 AM
This comment was removed by drcut.

every patch should be its own phabricator patch, it's not good practice to overwrite one that's been submitted (excluding if the patch was reverted) because people tend to use D### to refer to a patch. can you create a new one?

should have a test with a hot and cold function now, rather than two loops in one function

drcut abandoned this revision.Sep 4 2022, 7:31 AM

I will create a new revision for this patch

drcut reclaimed this revision.Sep 4 2022, 7:37 AM

Sorry for the mistake, I will keep this revision open and make it related to the update revision for the regression.

This revision is now accepted and ready to land.Sep 4 2022, 7:37 AM
drcut updated this revision to Diff 457861.Sep 4 2022, 7:57 AM

revert to previous revision that has been merged into main branch