This is an archive of the discontinued LLVM Phabricator instance.

[PM/CC1] Add -f[no-]split-cold-code CC1 options to toggle splitting
ClosedPublic

Authored by compnerd on Jan 25 2019, 4:28 PM.

Details

Summary

This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
-mllvm -hot-cold-split=true to clang isn't ideal as it may not compose
correctly (say, with -O0 or -Oz).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Diff Detail

Event Timeline

vsk created this revision.Jan 25 2019, 4:28 PM
vsk updated this revision to Diff 183661.Jan 25 2019, 4:41 PM
  • Fix a think-o in a comment.
This revision is now accepted and ready to land.Feb 6 2019, 11:08 AM

Thanks @hiraditya. I'd also like to get a +1 from someone who's worked on the NewPM infrastructure extensively, just to make sure those changes look ok.

vsk updated this revision to Diff 185642.Feb 6 2019, 2:16 PM

Rebase.

Overall it looks ok to me, but I'd like Chandler to comment regarding the preferred way to do this with the new PM, since we don't tend to use booleans there in the PassBuilder to control passes. Is it preferable to instead use a new function attribute instead of boolean flags on the PMs (e.g. the way -fno-inline is handled)?

clang/lib/Frontend/CompilerInvocation.cpp
1340

would it be appropriate to give a warning when being ignored?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2019, 9:34 AM

I'd prefer not adding this kind of state to PassBuilder. SplitColdCode is soemthing that refers to the construction of one particular pipeline, not to pipeline-building in general. It should be an argument passed down through the build*Pipeline calls.

vsk marked an inline comment as done.Feb 11 2019, 11:39 AM

I'd prefer not adding this kind of state to PassBuilder. SplitColdCode is soemthing that refers to the construction of one particular pipeline, not to pipeline-building in general. It should be an argument passed down through the build*Pipeline calls.

I'm not sure I understand the distinction being made here -- could you elaborate? Isn't enabling a pass related to pipeline-building in general? If not, I don't see how arguments to build*Pipeline do become related to pipeline-building in general.

For context, I've modeled the addition of the SplitColdCode member to PassBuilder on PGOOpt (c.f. PGOOptions::RunProfileGen). One thing I like about this approach is that it doesn't require changing IR, or changing very many different APIs. But if it's really not reasonable, I'm happy to take another shot at it.

clang/lib/Frontend/CompilerInvocation.cpp
1340

I think so, I'll add this in an update.

In D57265#1393453, @vsk wrote:

I'd prefer not adding this kind of state to PassBuilder. SplitColdCode is soemthing that refers to the construction of one particular pipeline, not to pipeline-building in general. It should be an argument passed down through the build*Pipeline calls.

I'm not sure I understand the distinction being made here -- could you elaborate? Isn't enabling a pass related to pipeline-building in general? If not, I don't see how arguments to build*Pipeline do become related to pipeline-building in general.

For context, I've modeled the addition of the SplitColdCode member to PassBuilder on PGOOpt (c.f. PGOOptions::RunProfileGen). One thing I like about this approach is that it doesn't require changing IR, or changing very many different APIs. But if it's really not reasonable, I'm happy to take another shot at it.

I cant say for PGOOpt in particular, but overall idea of PassBuilder is that, well, it is not "builder" as per "builder pattern".
You do not fill it with parts of a complex pipeline object and then press a magical "build" button.
Instead you ask it to build various "named" pipelines, or just parse it from text.
If you compare with PassManagerBuilder, which contains a dozen of various data members that seem to drive the pipeline construction,
PassBuilder contains only a few. And the purpose of PassBuilder members is to be used during individual *pass*es creation.

Say, you wont find OptLevel there. Instead it is being passed as an argument to build*Pipeline functions.

As for PGOOpts - it seems to be the only member that stands out.
And to me it looks like we should remove it from PassBuilder either, and start passing it to build*Pipeline functions as well.
But honestly, this is the first time I really looked through the PGOOpts code, so I might be well mistaken.

vsk added a comment.Mar 22 2019, 3:38 PM
In D57265#1393453, @vsk wrote:

I'd prefer not adding this kind of state to PassBuilder. SplitColdCode is soemthing that refers to the construction of one particular pipeline, not to pipeline-building in general. It should be an argument passed down through the build*Pipeline calls.

I'm not sure I understand the distinction being made here -- could you elaborate? Isn't enabling a pass related to pipeline-building in general? If not, I don't see how arguments to build*Pipeline do become related to pipeline-building in general.

For context, I've modeled the addition of the SplitColdCode member to PassBuilder on PGOOpt (c.f. PGOOptions::RunProfileGen). One thing I like about this approach is that it doesn't require changing IR, or changing very many different APIs. But if it's really not reasonable, I'm happy to take another shot at it.

I cant say for PGOOpt in particular, but overall idea of PassBuilder is that, well, it is not "builder" as per "builder pattern".
You do not fill it with parts of a complex pipeline object and then press a magical "build" button.
Instead you ask it to build various "named" pipelines, or just parse it from text.
If you compare with PassManagerBuilder, which contains a dozen of various data members that seem to drive the pipeline construction,
PassBuilder contains only a few. And the purpose of PassBuilder members is to be used during individual *pass*es creation.

Say, you wont find OptLevel there. Instead it is being passed as an argument to build*Pipeline functions.

As for PGOOpts - it seems to be the only member that stands out.
And to me it looks like we should remove it from PassBuilder either, and start passing it to build*Pipeline functions as well.
But honestly, this is the first time I really looked through the PGOOpts code, so I might be well mistaken.

Thanks everyone for your patient feedback. I think I understand the advantages of keeping state out of the pipeline builder now.

Teresa's earlier suggestion to use an IR attribute seems like a promising approach. In particular, it makes it simple to correctly enable/disable splitting during LTO. I'll try to post an updated patch soon.

vsk updated this revision to Diff 192210.Mar 25 2019, 2:42 PM
vsk edited the summary of this revision. (Show Details)

Use a function attribute to implement the -fsplit-cold-code option. This removes some complexity from the old/new PMs and ensures consistent behavior when LTO is enabled.

it will be great to merge this patch.

fhahn added a subscriber: fhahn.Sep 29 2019, 8:06 AM

This seems fine to me.

IIUC the only potential drawback with the old pass manager is that we potentially have to run the required passes unconditionally, even if we do not use them. Vedant, did you have a chance to check the impact on overall compile time?

clang/test/CodeGen/split-cold-code.c
69

Could you extend the scope of the check to include a bit more context, i.e. make sure we emit a function attribute attached to the correct function?

llvm/test/Other/opt-Os-pipeline.ll
249

Are we missing an implementation of getPassName in HotColdSplitting?

plotfi added a subscriber: plotfi.Jan 27 2020, 11:18 PM

Why hasn't this patch landed in llvm.org? I see it on the github apple/master but not on llvm.org master.

vsk updated this revision to Diff 241019.Jan 28 2020, 4:20 PM
vsk marked 2 inline comments as done.
vsk edited the summary of this revision. (Show Details)

Rebase, address Florian's feedback.

Apologies for the delay here. I haven't had time to push this forward, but I have also been waiting to get an additional lgtm specifically for the PassManager changes. My compile-time measurements are a year old (I'll work on getting fresh numbers), but as I remember the change was in the noise.

rjf added a subscriber: rjf.Aug 9 2020, 11:24 PM

Would also really like to see this patch landed. Also, as of today (8/10/2020), the patch cannot be cleanly applied into trunk without inducing merge conflicts anymore.

In D57265#2205797, @rjf wrote:

Would also really like to see this patch landed. Also, as of today (8/10/2020), the patch cannot be cleanly applied into trunk without inducing merge conflicts anymore.

You might have some luck merging from github.com/apple/llvm-project. You may also want to look at some of the upstreaming work some of the Apple folks are doing at https://github.com/llvm/llvm-project-staging

plotfi added a comment.Oct 7 2020, 2:08 PM

@vsk any update on this? Is there anything we can do to help in landing this patch in llvm-project/main?

vsk added a comment.Oct 12 2020, 5:37 PM

@plotfi Sorry this work has stalled, unfortunately I haven't had any bandwidth to drive it forward.

At this point I don't think there are any outstanding concerns with this patch. If anyone is willing to rebase and land it, I would be really grateful. It looks like part of the motivation behind this is removing some downstream diff with the apple/llvm-project fork (as in D89078). I think extra work might be needed to do that, since the HotColdSplitting pass is on-by-default in the Apple fork. Turning the pass on by default on llvm.org could result in a fair amount of fallout, as the pass might interact badly with the machine function splitter (https://lists.llvm.org/pipermail/llvm-dev/2020-August/144012.html). It also hasn't had as much testing on non-Darwin targets (afaik). One solution might be to enable -fsplit-cold-code under -O only for Darwin targets.

compnerd commandeered this revision.Oct 13 2020, 3:50 PM
compnerd updated this revision to Diff 297983.
compnerd added a reviewer: vsk.

Rebase; there are a 4 more test failures to go through

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2020, 3:52 PM
compnerd updated this revision to Diff 298425.Oct 15 2020, 11:21 AM

Passes locally now

vsk accepted this revision.Oct 15 2020, 11:25 AM

Thank you! LGTM.

This revision was landed with ongoing or failed builds.Oct 15 2020, 4:15 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Oct 19 2020, 3:31 AM

This broke Chromium's build, causing it to fail with
"Cannot pop empty stack!" from one of the backend passes.

That specific error is probably a perhaps a previously existing problem, but I think it was triggered by hot-cold-splitting getting enabled by this pass. This was a surprise to us, because we're not passing any flags to enable hot-cold-splitting, instead it seems to get enabled by this patch anyway in our PGO-enabled builds.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1139384#c5 for a repro.

I've reverted in 0628bea5137047232f37c94b74bf26aa9b55f605 until this can be investigated.

vsk added a comment.Oct 19 2020, 10:45 AM

Thanks Hans. I've merged the revert into the apple fork. While the commit was in-tree, we noticed that this enabled HCS during LTO. There was a surprising ~17% regression of 483.xalancbmk with LTO (+ PGO) enabled: 2020-10-16 CINT2006/483.xalancbmk 17.87%. We don't see the regression on the non-LTO performance bots. This is something we'll need to investigate before relanding.

It looks like this caused breakage for us for x86 Linux kernels, too: https://github.com/ClangBuiltLinux/linux/issues/1177. The Linux kernel does not and will not link against libgcc_s/compiler-rt, so optimizations at -Oz tend to both produce libcalls that aren't implemented by the kernel's runtime, and larger code than -Os (see https://bugs.llvm.org/show_bug.cgi?id=47897 which was also filed in response to this patch's resulting bug). It appears with this patch applied, that any function marked __attribute__((cold)) (explicitly or due to AutoFDO profiles) would be optimized at -Oz, regardless of optimization level. (I think that approach is ok, but I would prefer -Os if possible to avoid the excessive code bloat that -Oz produces and avoid the libcalls; we have found -Oz to never be smaller than -Os for kernel builds).