This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
HotColdSplitting.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
7/7
HotColdSplitting.cpp
-
test/Transforms/HotColdSplit/
-
Transforms/
-
HotColdSplit/
1/1
custom-cold-cmd.ll

Differential D85628

[HotColdSplitting] Add command line options for supplying cold function names via user input.
Needs ReviewPublic

Authored by rjf on Aug 10 2020, 12:22 AM.

Download Raw Diff

Details

Reviewers

hiraditya
rcorcs

Summary

In some cases, the user might want to specify which functions are
explicitly cold. This commit adds two options, cold-functions-list and
cold-functions-file that enables the user to supply lists of cold
function names to hot/cold splitting. The optimization pass will then
mark any function encountered with the same name as cold.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rjf created this revision.Aug 10 2020, 12:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2020, 12:22 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

rjf requested review of this revision.Aug 10 2020, 12:22 AM

Harbormaster completed remote builds in B67671: Diff 284268.Aug 10 2020, 12:53 AM

hiraditya added inline comments.Aug 10 2020, 7:17 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
107	Let's move this inside the HotColdSplitting class
712	Let's not read from stdin, only from a regular file.

Based on discussion with @hiraditya and @rcorcs this morning, we're thinking doing this in in the ProfileSummaryInfo pass is better (have the PSI pass take as input user-specified cold funcs and mark them with cold attribute) as its benefit can be more wide-ranging than only for hotcoldsplit. Will prepare a separate patch for this.

Any reason not to mark up the relevant functions with __attribute__((cold)) directly?

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
700	Might be better to use SpecialCaseList.h rather than hand-rolling something new.

In D85628#2212047, @vsk wrote:

Any reason not to mark up the relevant functions with __attribute__((cold)) directly?

In many instances it may not be not possible to annotate the functions e.g., standard library functions. A recent example we found was cxa_guard_acquire which could be cold in many instances.

Address review comments from @vsk, @hiraditya

rjf marked 3 inline comments as done.Aug 12 2020, 7:39 AM

This comment was removed by rjf.

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
700	It seems like the public API of `SpecialCaseList.h` only supports looking up whether a particular string is contained in the list, rather than retrieving all strings in the list. If we do this then we'll either have to a) traverse the entire module to match which function names in the current module are in the list, and mark them as cold, or b)

rjf marked an inline comment as done.Aug 12 2020, 7:41 AM

rjf added inline comments.

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
700	Sorry, the above comment was written (and published) in mistake.

Add debug messages.

Harbormaster completed remote builds in B68114: Diff 285087.Aug 12 2020, 8:27 AM

Harbormaster completed remote builds in B68118: Diff 285089.Aug 12 2020, 9:02 AM

Support marking declarations, e.g. in @hiraditya's example of __cxa_guard_acquire/abort, as cold.

Remove unused includes.

Harbormaster completed remote builds in B68119: Diff 285097.Aug 12 2020, 9:37 AM

Harbormaster completed remote builds in B68122: Diff 285103.Aug 12 2020, 9:55 AM

Add a test case for command line-supplied cold functions list.

Herald added a subscriber: jfb. · View Herald TranscriptAug 12 2020, 10:30 AM

Add the test command for read from file.

Harbormaster completed remote builds in B68143: Diff 285140.Aug 12 2020, 11:58 AM

hiraditya added inline comments.Aug 12 2020, 12:14 PM

llvm/test/Transforms/HotColdSplit/custom-cold-cmd.ll
17	Remove trailing .*

Harbormaster completed remote builds in B68141: Diff 285136.Aug 12 2020, 12:24 PM

hiraditya added inline comments.Aug 12 2020, 12:26 PM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
731	Marking cold functions should be in a separate loop before this loop.

hiraditya added inline comments.Aug 12 2020, 12:28 PM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
731	ignore me.

Mark cold functions in advance before outlining them.

Originally, as illustrated in the test case, the go() function
should be cold but not marked as such, since it predates two functions that are cold but not yet marked as cold.

Thanks to @hiraditya for spotting this.

rjf marked 2 inline comments as done.Aug 12 2020, 12:40 PM

I’m not convinced this is a good idea. In what use case is it not possible to mark up relevant functions? It doesn’t make sense to me to make alternations to standard library functions within the compiler. It seems better to simply patch the standard library. In some cases llvm does infer function attributes for library functions, but these are generally lower level attributes that can’t be specified at the source level, and the attribute is made available to other passes in the pipeline.

In D85628#2213919, @vsk wrote:

I’m not convinced this is a good idea. In what use case is it not possible to mark up relevant functions? It doesn’t make sense to me to make alternations to standard library functions within the compiler. It seems better to simply patch the standard library. In some cases llvm does infer function attributes for library functions, but these are generally lower level attributes that can’t be specified at the source level, and the attribute is made available to other passes in the pipeline.

Do you mean this patch isn't a good idea in general, or the recent revision isn't a good idea? For the latter, I'm not sure if you meant we should not outline declarations or we should not split the original loop into two (e.g. marking as cold before outlining). IMO splitting the loop into two simply addresses what the original intent of what we're doing, which is to mark certain functions as cold before outlining. Whereas, if we don't outline declarations via user-provided input, it renders @hiraditya 's proposed testcase useless. Alternatively, we don't have to make the testcase involving standard library functions if that's what you want :).

Harbormaster completed remote builds in B68156: Diff 285163.Aug 12 2020, 1:58 PM

Update test case to reflect newest changes

rjf marked an inline comment as done.Aug 12 2020, 3:01 PM

In D85628#2213940, @rjf wrote:

In D85628#2213919, @vsk wrote:

I’m not convinced this is a good idea. In what use case is it not possible to mark up relevant functions? It doesn’t make sense to me to make alternations to standard library functions within the compiler. It seems better to simply patch the standard library. In some cases llvm does infer function attributes for library functions, but these are generally lower level attributes that can’t be specified at the source level, and the attribute is made available to other passes in the pipeline.

Do you mean this patch isn't a good idea in general, or the recent revision isn't a good idea? For the latter, I'm not sure if you meant we should not outline declarations or we should not split the original loop into two (e.g. marking as cold before outlining). IMO splitting the loop into two simply addresses what the original intent of what we're doing, which is to mark certain functions as cold before outlining. Whereas, if we don't outline declarations via user-provided input, it renders @hiraditya 's proposed testcase useless. Alternatively, we don't have to make the testcase involving standard library functions if that's what you want :).

My understanding is that today code can be considered "cold" based on the following:

Attribute on the function
Likely / unlikely annotations
Profile information
Other compiler heuristics

This adds another way to do it, but it's kind of a side-injection and it doesn't seem particularly principled. Presumably the list you're feeding through the command-line comes from a profile? Why isn't it provided as profile information?

It doesn’t make sense to me to make alternations to standard library functions within the compiler. It seems better to simply patch the standard library.

The standard library function is an example. There could be cases where the same function is hot in one instance but cold in another. Annotating function declarations hot/cold could introduce regressions in that case.

This adds another way to do it, but it's kind of a side-injection and it doesn't seem particularly principled. Presumably the list you're feeding through the command-line comes from a profile? Why isn't it provided as profile information?

For example: https://godbolt.org/z/ThaGEW the constructor of static object is always cold, how can we outline this if say we don't have a profile information. A workload can have a set of cold functions which programmer would know, but they necessarily don't have profile information. If there's a better way to make compiler aware of this, I'll be more than happy to work on that.

Harbormaster completed remote builds in B68170: Diff 285186.Aug 12 2020, 3:47 PM

In D85628#2214138, @hiraditya wrote:

For example: https://godbolt.org/z/ThaGEW the constructor of static object is always cold, how can we outline this if say we don't have a profile information. A workload can have a set of cold functions which programmer would know, but they necessarily don't have profile information. If there's a better way to make compiler aware of this, I'll be more than happy to work on that.

I don't think we need to mark the object constructor as cold. We want to mark inlined slow paths of local static variables (the call to __cxa_guard_acquire, the call to the object constructor, etc.,) as cold.

As one possibility the consequences of which I have certainly not thought through, but which maybe already exists to some degree, we can depend on a bit of inference - if __cxa_guard_acquire is marked as cold, then all code which is accessible only by passing the call to __cxa_guard_acquire is also cold. Intuition: blocks which *contain* unconditional cold code are themselves cold by transitivity. Intuition: don't call cold functions where you're not cold.

[[gnu::cold]] void cold_func();
void has_cold_branch(bool b) {
  // nothing here is cold
  if (b) {
    // everything here is cold
    cold_func();
    // everything here is cold
  } else {
    // nothing here is cold
  }
  // nothing here is cold
}

In particular, this would hopefully mean that the object constructor call which appears past the __cxa_guard_acquire would not get inlined into the call-site; rather, a call to the constructor would be emitted. (Unless inlining the constructor into the call-site is shorter than calling the constructor.) In this case, we would not compile the object constructor optimized for coldness - we would compile the object constructor as normal. In fact, since the same constructor is used in hot paths, we must not compile it as cold. But we would compile the call-site as though it were cold, since __cxa_guard_acquire is marked cold and this call site unconditionally passes through __cxa_guard_acquire.

I don't think we need to mark the object constructor as cold. We want to mark inlined slow paths of local static variables (the call to __cxa_guard_acquire, the call to the object constructor, etc.,) as cold.

If there was a way to provide handwritten profile/coverage file, maybe that would work in absence of profile information?

As one possibility the consequences of which I have certainly not thought through, but which maybe already exists to some degree, we can depend on a bit of inference

if cxa_guard_acquire is marked as cold, then all code which is accessible only by passing the call to cxa_guard_acquire is also cold.

This is taken care by dominance relation. HCS uses that to infer coldness/hotness.

In D85628#2214314, @hiraditya wrote:

If there was a way to provide handwritten profile/coverage file, maybe that would work in absence of profile information?

I am not sure I see a need for profiles here.

I don't think we need profile information. We just need __cxa_guard_acquire to be marked cold, and for the compiler to infer coldness of code in the same block as a call to something marked as cold. (Apparently HCS does this?)

The compiler can also infer coldness of hidden functions which are only called by cold code.

But let's take std::string as an example. Let's say we have a function with a local static variable of type std::string. The goal is to have the inlined slow path outlined to the cold section, and for the slow path to be minimal in size. So we just emit a call to the std::string ctor, which is compiled normally since it is inline and not hidden, instead of inlining the ctor into the slow path. We can make the assumption that the std::string ctor is ODR-used *somewhere* in the resulting binary, so we can make the assumption that forcing a reference to this function will not increase overall code size.

Whether the compiler emits a perf-optimized or a size-optimized definition for the std::string ctor may be influenced by profiles. But that seems to me like a separate question that doesn't pertain to the specific topic of handling local static variables.

But for another type, say hidden_foo which is defined in an anonymous namespace and which is only constructed once only at a site inferred to be cold, we can take a different approach and possibly inline the ctor. Here, if there is only one site, minimal code size would (likely) imply inlining the ctor; but if there are two sites, minimal code size would (likely) imply emitting a ctor definition and then calling it twice.

In D85628#2214429, @yfeldblum wrote:

In D85628#2214314, @hiraditya wrote:

If there was a way to provide handwritten profile/coverage file, maybe that would work in absence of profile information?

I am not sure I see a need for profiles here.

I don't think we need profile information. We just need __cxa_guard_acquire to be marked cold, and for the compiler to infer coldness of code in the same block as a call to something marked as cold. (Apparently HCS does this?)

I think you missed Aditya's point here: he's saying that certain things like outlining __cxa* functions might be unwise to mark as cold in general, hence leaving a command line option to supply user-defined cold func names might be a good idea for ad-hoc workloads; and alternatively to JF's point, he's saying if the command line option sounds too unprincipled we can also take in a more principled file format like code coverage or something like that, should that be available.

And also, yes, HCS propagates coldness information through the CFG. You can look at the implementation in OutliningRegion for more details.

Apparently there is already support for extended binary profile format (https://reviews.llvm.org/D66766) and we maybe able to use that. Thanks to @wenlei for sharing this information.

In D85628#2214536, @rjf wrote:

I think you missed Aditya's point here: he's saying that certain things like outlining __cxa* functions might be unwise to mark as cold in general

Maybe I missed that, or maybe I am missing some context. I didn't think there was a top-line goal of injecting cold attributes to arbitrary functions.

My understanding is that a key motivating case is minimizing native codegen for local static variables by understanding that the slow path is definitionally cold. If coldness propagates, then as long as __cxa_guard_acquire is attributed as cold, coldness propagation should extend to everywhere of interest. And since __cxa_guard_acquire is used for only one purpose, namely guarding local static variable initialization, and since that purpose is definitionally cold, it should be attributed as cold. I didn't think there was a need to inject cold attributes to other functions.

When it comes to library functions which are cold, presumably we can just patch libraries with cold-attributed function definitions but unattributed declarations, and attribute the declarations. It seems like adding compiler support for lists of cold functions is a workaround for not patching libraries?

I have two questions for this patch:

Why do we need another way to explicitly tell compiler that some functions are cold, rather than using existing mechanisms?
- If this is for a random cold function from user code, then using profile or in-source annotation should be the way to go, which is more scalable and more sustainable.
- If this is for special (generated) functions, e.g. __cxa_guard_acquire, then we shouldn't even need to tell the compiler. It's like we don't have to tell compiler EH pad, noreturn functions are cold, instead compiler should figure this out by itself. _cxa_guard_acquire is a generated function with very specific semantic, so I think it's similar to noreturn, EH pads in that regard.
If we have to tell compiler extra hotness/coldness info, why do we do it just HCS? hotness is very general, and could benefit many opts. Introducing a channel to tell specific optimization about hotness is not good design in general (imagine 10 passes each has its own way of getting hotness through a bunch of switches). If we really have to do this, I'd say we should do it in the framework, e.g. the static analysis part of BFI.

For my 2nd question, this patch is what I think how this should be done: https://reviews.llvm.org/D79485

In D85628#2214614, @yfeldblum wrote:

In D85628#2214536, @rjf wrote:

I think you missed Aditya's point here: he's saying that certain things like outlining __cxa* functions might be unwise to mark as cold in general

Maybe I missed that, or maybe I am missing some context. I didn't think there was a top-line goal of injecting cold attributes to arbitrary functions.

My understanding is that a key motivating case is minimizing native codegen for local static variables by understanding that the slow path is definitionally cold. If coldness propagates, then as long as __cxa_guard_acquire is attributed as cold, coldness propagation should extend to everywhere of interest. And since __cxa_guard_acquire is used for only one purpose, namely guarding local static variable initialization, and since that purpose is definitionally cold, it should be attributed as cold. I didn't think there was a need to inject cold attributes to other functions.

When it comes to library functions which are cold, presumably we can just patch libraries with cold-attributed function definitions but unattributed declarations, and attribute the declarations.

Added a patch to libcxxabi: https://reviews.llvm.org/D85873

It seems like adding compiler support for lists of cold functions is a workaround for not patching libraries?

I think, in general, it would be useful to have a way to tell compiler the functions which are cold. This will avoid annotations, as it is possible to have a function hot in one context and cold in another e.g., inlining of std::string::string() which Laxman and I added in https://reviews.llvm.org/D22782 Although the patch improved some of the workloads quite significantly, it is also a constant source of noise to many other workloads.

In D85628#2214751, @wenlei wrote:

For my 2nd question, this patch is what I think how this should be done: https://reviews.llvm.org/D79485

Thanks for sharing, this looks very promising.

In D85628#2214775, @hiraditya wrote:

Added a patch to libcxxabi: https://reviews.llvm.org/D85873

The patch looks legit. Wonder if such a patch should be sent to libstdc++ too.

But will the compiler see these declarations though when compiling a source file containing a local static variable, in a way that will let it propagate coldness from the __attribute__((cold))? Or does the compiler have built-in knowledge of these functions - presumably it does since it emits calls to them - and will the compiler's built-in knowledge need to be extended in addition to this patch?

The patch looks legit. Wonder if such a patch should be sent to libstdc++ too.

Sending it shortly.

But will the compiler see these declarations though when compiling a source file containing a local static variable, in a way that will let it propagate coldness from the attribute((cold))? Or does the compiler have built-in knowledge of these functions - presumably it does since it emits calls to them - and will the compiler's built-in knowledge need to be extended in addition to this patch?

The compiler will see the declaration while analyzing a callsite.

In D85628#2214119, @jfb wrote:

In D85628#2213940, @rjf wrote:

In D85628#2213919, @vsk wrote:

I’m not convinced this is a good idea. In what use case is it not possible to mark up relevant functions? It doesn’t make sense to me to make alternations to standard library functions within the compiler. It seems better to simply patch the standard library. In some cases llvm does infer function attributes for library functions, but these are generally lower level attributes that can’t be specified at the source level, and the attribute is made available to other passes in the pipeline.

Do you mean this patch isn't a good idea in general, or the recent revision isn't a good idea? For the latter, I'm not sure if you meant we should not outline declarations or we should not split the original loop into two (e.g. marking as cold before outlining). IMO splitting the loop into two simply addresses what the original intent of what we're doing, which is to mark certain functions as cold before outlining. Whereas, if we don't outline declarations via user-provided input, it renders @hiraditya 's proposed testcase useless. Alternatively, we don't have to make the testcase involving standard library functions if that's what you want :).

My understanding is that today code can be considered "cold" based on the following:

Attribute on the function

Likely / unlikely annotations

Profile information

Other compiler heuristics

This adds another way to do it, but it's kind of a side-injection and it doesn't seem particularly principled. Presumably the list you're feeding through the command-line comes from a profile? Why isn't it provided as profile information?

Let me try to formulate the problem statement to motivate this work. I'm happy to work on a better approach.
Let's consider a repository which builds multiple applications. For App1 we have set of cold callsites (CS1), and cold function declarations (FD1); similarly for App2 we have CS2 and FD2. These sets have the following properties:

CS1 and CS2 may have some intersection but one may not be necessarily a subset of another. a non-intersecting example would be: calling std::lower_bound in a loop vs. calling in an isolated instance. std::lower_bound could be cold in the latter case.
FD1 and FD2 may have some intersection but one may not be necessarily a subset of another. a non-intersecting example would be: constructing std::unordered_map<string, string> vs. constructing a std::string. std::string::string() could be cold in the latter case.

It may not be possible to get profile information with sample-profile or instrumented-profile (e.g., mobile phone apps), however, product developers would know hotness/coldness of many call-sites based on domain knowledge.
In order to optimize these call-sites, how do we tell compiler the about these FDs and CSs? Adding annotations like __attribute__((cold)) to FD1 could regress App2 and vice-versa.
Supplying a human readable/editable file would be ideal.

In D85628#2214733, @wenlei wrote:

I have two questions for this patch:

Why do we need another way to explicitly tell compiler that some functions are cold, rather than using existing mechanisms?

If this is for a random cold function from user code, then using profile or in-source annotation should be the way to go, which is more scalable and more sustainable.

If this is for special (generated) functions, e.g. __cxa_guard_acquire, then we shouldn't even need to tell the compiler. It's like we don't have to tell compiler EH pad, noreturn functions are cold, instead compiler should figure this out by itself. _cxa_guard_acquire is a generated function with very specific semantic, so I think it's similar to noreturn, EH pads in that regard.

If we have to tell compiler extra hotness/coldness info, why do we do it just HCS? hotness is very general, and could benefit many opts. Introducing a channel to tell specific optimization about hotness is not good design in general (imagine 10 passes each has its own way of getting hotness through a bunch of switches). If we really have to do this, I'd say we should do it in the framework, e.g. the static analysis part of BFI.

+ 1. Thanks for writing this up, this reflects my thinking about the patch.

In order to optimize these call-sites, how do we tell compiler the about these FDs and CSs? Adding annotations like attribute((cold)) to FD1 could regress App2 and vice-versa. Supplying a human readable/editable file would be ideal.

I don't think supplying a list of cold functions to a compiler invocation is a scalable way to approach this problem. This looks like an attempt to hand-annotate individual call sites via a side channel, and it's likely too big (& imprecise) of a hammer to apply. E.g., after adding a cold function list to a program's cflags, performance may regress if the program picks up a hot use of a function in the list. It seems like an actual profile would be a better fit for your use case (if instrumentation/sampling is not possible, then it may be possible to construct a synthetic/fake profile as a pre-processing step?, but that has the same issues with source drift).

then it may be possible to construct a synthetic/fake profile as a pre-processing step

Seems like the most viable step so far. Thanks for the suggestion.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

HotColdSplitting.h

9 lines

lib/

Transforms/

IPO/

HotColdSplitting.cpp

88 lines

test/

Transforms/

HotColdSplit/

custom-cold-cmd.ll

122 lines

Diff 285186

llvm/include/llvm/Transforms/IPO/HotColdSplitting.h

	//===- HotColdSplitting.h ---- Outline Cold Regions -------------- C++ --===//			//===- HotColdSplitting.h ---- Outline Cold Regions -------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This pass outlines cold regions to a separate function.			// This pass outlines cold regions to a separate function.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_IPO_HOTCOLDSPLITTING_H			#ifndef LLVM_TRANSFORMS_IPO_HOTCOLDSPLITTING_H
	#define LLVM_TRANSFORMS_IPO_HOTCOLDSPLITTING_H			#define LLVM_TRANSFORMS_IPO_HOTCOLDSPLITTING_H

				#include "llvm/ADT/StringSet.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
				#include "llvm/Support/SpecialCaseList.h"

	namespace llvm {			namespace llvm {

	class Module;			class Module;
	class ProfileSummaryInfo;			class ProfileSummaryInfo;
	class BlockFrequencyInfo;			class BlockFrequencyInfo;
	class TargetTransformInfo;			class TargetTransformInfo;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class AssumptionCache;			class AssumptionCache;
	class DominatorTree;			class DominatorTree;
	class CodeExtractorAnalysisCache;			class CodeExtractorAnalysisCache;

	/// A sequence of basic blocks.			/// A sequence of basic blocks.
	///			///
	/// A 0-sized SmallVector is slightly cheaper to move than a std::vector.			/// A 0-sized SmallVector is slightly cheaper to move than a std::vector.
	using BlockSequence = SmallVector<BasicBlock *, 0>;			using BlockSequence = SmallVector<BasicBlock *, 0>;

	class HotColdSplitting {			class HotColdSplitting {
	public:			public:
	HotColdSplitting(ProfileSummaryInfo *ProfSI,			HotColdSplitting(ProfileSummaryInfo *ProfSI,
	function_ref<BlockFrequencyInfo *(Function &)> GBFI,			function_ref<BlockFrequencyInfo *(Function &)> GBFI,
	function_ref<TargetTransformInfo &(Function &)> GTTI,			function_ref<TargetTransformInfo &(Function &)> GTTI,
	std::function<OptimizationRemarkEmitter &(Function &)> *GORE,			std::function<OptimizationRemarkEmitter &(Function &)> *GORE,
	function_ref<AssumptionCache *(Function &)> LAC)			function_ref<AssumptionCache *(Function &)> LAC)
	: PSI(ProfSI), GetBFI(GBFI), GetTTI(GTTI), GetORE(GORE), LookupAC(LAC) {}			: PSI(ProfSI), GetBFI(GBFI), GetTTI(GTTI), GetORE(GORE), LookupAC(LAC),
				FileMarkedColdFunctions(nullptr) {}
	bool run(Module &M);			bool run(Module &M);

	private:			private:
	bool isFunctionCold(const Function &F) const;			bool isFunctionCold(const Function &F) const;
				bool isFunctionInColdList(const Function &F) const;
	bool shouldOutlineFrom(const Function &F) const;			bool shouldOutlineFrom(const Function &F) const;
	bool outlineColdRegions(Function &F, bool HasProfileSummary);			bool outlineColdRegions(Function &F, bool HasProfileSummary);
	Function *extractColdRegion(const BlockSequence &Region,			Function *extractColdRegion(const BlockSequence &Region,
	const CodeExtractorAnalysisCache &CEAC,			const CodeExtractorAnalysisCache &CEAC,
	DominatorTree &DT, BlockFrequencyInfo *BFI,			DominatorTree &DT, BlockFrequencyInfo *BFI,
	TargetTransformInfo &TTI,			TargetTransformInfo &TTI,
	OptimizationRemarkEmitter &ORE,			OptimizationRemarkEmitter &ORE,
	AssumptionCache *AC, unsigned Count);			AssumptionCache *AC, unsigned Count);
	ProfileSummaryInfo *PSI;			ProfileSummaryInfo *PSI;
	function_ref<BlockFrequencyInfo *(Function &)> GetBFI;			function_ref<BlockFrequencyInfo *(Function &)> GetBFI;
	function_ref<TargetTransformInfo &(Function &)> GetTTI;			function_ref<TargetTransformInfo &(Function &)> GetTTI;
	std::function<OptimizationRemarkEmitter &(Function &)> *GetORE;			std::function<OptimizationRemarkEmitter &(Function &)> *GetORE;
	function_ref<AssumptionCache *(Function &)> LookupAC;			function_ref<AssumptionCache *(Function &)> LookupAC;
				StringSet<> CmdMarkedColdFunctions;
				std::unique_ptr<SpecialCaseList> FileMarkedColdFunctions;
	};			};

	/// Pass to outline cold regions.			/// Pass to outline cold regions.
	class HotColdSplittingPass : public PassInfoMixin<HotColdSplittingPass> {			class HotColdSplittingPass : public PassInfoMixin<HotColdSplittingPass> {
	public:			public:
	PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);			PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_IPO_HOTCOLDSPLITTING_H			#endif // LLVM_TRANSFORMS_IPO_HOTCOLDSPLITTING_H

llvm/lib/Transforms/IPO/HotColdSplitting.cpp

Show All 23 Lines
/// TODO: Reorder outlined functions.		/// TODO: Reorder outlined functions.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/HotColdSplitting.h"		#include "llvm/Transforms/IPO/HotColdSplitting.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/ADT/StringSet.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
Show All 14 Lines
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/BlockFrequency.h"		#include "llvm/Support/BlockFrequency.h"
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/SpecialCaseList.h"
		#include "llvm/Support/VirtualFileSystem.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/CodeExtractor.h"		#include "llvm/Transforms/Utils/CodeExtractor.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ValueMapper.h"		#include "llvm/Transforms/Utils/ValueMapper.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
		#include <fstream>
		#include <sstream>
		#include <string>

#define DEBUG_TYPE "hotcoldsplit"		#define DEBUG_TYPE "hotcoldsplit"

STATISTIC(NumColdRegionsFound, "Number of cold regions found.");		STATISTIC(NumColdRegionsFound, "Number of cold regions found.");
STATISTIC(NumColdRegionsOutlined, "Number of cold regions outlined.");		STATISTIC(NumColdRegionsOutlined, "Number of cold regions outlined.");

using namespace llvm;		using namespace llvm;

static cl::opt<bool> EnableStaticAnalyis("hot-cold-static-analysis",		static cl::opt<bool> EnableStaticAnalyis("hot-cold-static-analysis",
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<int>		static cl::opt<int>
SplittingThreshold("hotcoldsplit-threshold", cl::init(2), cl::Hidden,		SplittingThreshold("hotcoldsplit-threshold", cl::init(2), cl::Hidden,
cl::desc("Base penalty for splitting cold code (as a "		cl::desc("Base penalty for splitting cold code (as a "
"multiple of TCC_Basic)"));		"multiple of TCC_Basic)"));

		static cl::opt<std::string>
		ColdFunctionsList("cold-functions-list", cl::init(""), cl::Hidden,
		cl::desc("Comma-separated list of functions to mark"
		" as cold during hot/cold splitting."));

		static cl::opt<std::string>
		ColdFunctionsFile("cold-functions-file", cl::init(""), cl::Hidden,
		cl::desc("File name containing a newline-separated list"
		" of function names to mark as cold during"
		" hot/cold splitting."));

namespace {		namespace {
// Same as blockEndsInUnreachable in CodeGen/BranchFolding.cpp. Do not modify		// Same as blockEndsInUnreachable in CodeGen/BranchFolding.cpp. Do not modify
// this function unless you modify the MBB version as well.		// this function unless you modify the MBB version as well.
		hiradityaUnsubmitted Done Reply Inline Actions Let's move this inside the HotColdSplitting class hiraditya: Let's move this inside the HotColdSplitting class
//		//
/// A no successor, non-return block probably ends in unreachable and is cold.		/// A no successor, non-return block probably ends in unreachable and is cold.
/// Also consider a block that ends in an indirect branch to be a return block,		/// Also consider a block that ends in an indirect branch to be a return block,
/// since many targets use plain indirect branches to return.		/// since many targets use plain indirect branches to return.
bool blockEndsInUnreachable(const BasicBlock &BB) {		bool blockEndsInUnreachable(const BasicBlock &BB) {
if (!succ_empty(&BB))		if (!succ_empty(&BB))
return false;		return false;
if (BB.empty())		if (BB.empty())
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addUsedIfAvailable<AssumptionCacheTracker>();		AU.addUsedIfAvailable<AssumptionCacheTracker>();
}		}

bool runOnModule(Module &M) override;		bool runOnModule(Module &M) override;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// Check whether \p F is inherently cold.		// Check whether \p F is in list of user-supplied cold functions.
		bool HotColdSplitting::isFunctionInColdList(const Function &F) const {
		// If user supplies any extra information
		// on cold functions via command-line or file input,
		// use them to determine if function is cold or not.
		if (CmdMarkedColdFunctions.find(F.getName()) !=
		CmdMarkedColdFunctions.end()) {
		LLVM_DEBUG(dbgs() << "isFunctionCold: " << F.getName() << " is cold "
		<< " via command line info.\n");
		return true;
		}

		if (FileMarkedColdFunctions &&
		FileMarkedColdFunctions->inSection("", "", F.getName())) {
		LLVM_DEBUG(dbgs() << "isFunctionCold: " << F.getName() << " is cold "
		<< " via file info.\n");
		return true;
		}

		return false;
		}

		// Check whether \p F is inherently cold.
bool HotColdSplitting::isFunctionCold(const Function &F) const {		bool HotColdSplitting::isFunctionCold(const Function &F) const {
if (F.hasFnAttribute(Attribute::Cold))		if (F.hasFnAttribute(Attribute::Cold))
return true;		return true;

if (F.getCallingConv() == CallingConv::Cold)		if (F.getCallingConv() == CallingConv::Cold)
return true;		return true;

if (PSI->isFunctionEntryCold(&F))		if (PSI->isFunctionEntryCold(&F))
▲ Show 20 Lines • Show All 448 Lines • ▼ Show 20 Lines	bool HotColdSplitting::outlineColdRegions(Function &F, bool HasProfileSummary) {
} while (!OutliningWorklist.empty());		} while (!OutliningWorklist.empty());

return Changed;		return Changed;
}		}

bool HotColdSplitting::run(Module &M) {		bool HotColdSplitting::run(Module &M) {
bool Changed = false;		bool Changed = false;
bool HasProfileSummary = (M.getProfileSummary(/* IsCS */ false) != nullptr);		bool HasProfileSummary = (M.getProfileSummary(/* IsCS */ false) != nullptr);

		// Read in user-defined cold function names, if any.
		if (ColdFunctionsList != "") {
		vskUnsubmitted Done Reply Inline Actions Might be better to use SpecialCaseList.h rather than hand-rolling something new. vsk: Might be better to use SpecialCaseList.h rather than hand-rolling something new.
		rjfAuthorUnsubmitted Done Reply Inline Actions It seems like the public API of `SpecialCaseList.h` only supports looking up whether a particular string is contained in the list, rather than retrieving all strings in the list. If we do this then we'll either have to a) traverse the entire module to match which function names in the current module are in the list, and mark them as cold, or b) rjf: It seems like the public API of `SpecialCaseList.h` only supports looking up whether a…
		rjfAuthorUnsubmitted Done Reply Inline Actions Sorry, the above comment was written (and published) in mistake. rjf: Sorry, the above comment was written (and published) in mistake.
		LLVM_DEBUG(dbgs() << "Reading in cold functions from command line.\n");
		std::stringstream CFStream(ColdFunctionsList);
		while (CFStream.good()) {
		std::string CFName;
		std::getline(CFStream, CFName, ',');
		LLVM_DEBUG(dbgs() << " Function " << CFName
		<< " listed as cold from command line.\n");
		CmdMarkedColdFunctions.insert(CFName);
		}
		}

		// Read in user-defined cold function names supplied
		hiradityaUnsubmitted Done Reply Inline Actions Let's not read from stdin, only from a regular file. hiraditya: Let's not read from stdin, only from a regular file.
		// by a file.
		if (ColdFunctionsFile != "") {
		// Use the SpecialCaseList helper to read in the
		// cold functions file.
		LLVM_DEBUG(dbgs() << "Reading in functions from file "
		<< ColdFunctionsFile);
		std::unique_ptr<vfs::FileSystem> FS = vfs::createPhysicalFileSystem();
		FileMarkedColdFunctions =
		SpecialCaseList::createOrDie({ColdFunctionsFile}, *FS);
		}

for (auto It = M.begin(), End = M.end(); It != End; ++It) {		for (auto It = M.begin(), End = M.end(); It != End; ++It) {
Function &F = *It;		Function &F = *It;

		// Mark functions in user-supplied list of cold-functions
		// (if user decides to supply them) as cold.
		// The reason this has to be done separately from
		// isFunctionCold() is potentially declarations might be
		// marked, too.
		hiradityaUnsubmitted Done Reply Inline Actions Marking cold functions should be in a separate loop before this loop. hiraditya: Marking cold functions should be in a separate loop before this loop.
		hiradityaUnsubmitted Done Reply Inline Actions ignore me. hiraditya: ignore me.
		if (isFunctionInColdList(F)) {
		Changed \|= markFunctionCold(F);
		}

// Do not touch declarations.		// Do not touch declarations.
if (F.isDeclaration())		if (F.isDeclaration())
continue;		continue;

// Do not modify `optnone` functions.		// Do not modify `optnone` functions.
if (F.hasOptNone())		if (F.hasOptNone())
continue;		continue;

// Detect inherently cold functions and mark them as such.		// Detect inherently cold functions and mark them as such.
if (isFunctionCold(F)) {		if (isFunctionCold(F)) {
Changed \|= markFunctionCold(F);		Changed \|= markFunctionCold(F);
continue;		continue;
}		}
		}

		for (auto It = M.begin(), End = M.end(); It != End; ++It) {
		Function &F = *It;

		// Do not touch declarations.
		if (F.isDeclaration())
		continue;

		// Do not modify `optnone` functions.
		if (F.hasOptNone())
		continue;

if (!shouldOutlineFrom(F)) {		if (!shouldOutlineFrom(F)) {
LLVM_DEBUG(llvm::dbgs() << "Skipping " << F.getName() << "\n");		LLVM_DEBUG(llvm::dbgs() << "Skipping " << F.getName() << "\n");
continue;		continue;
}		}

LLVM_DEBUG(llvm::dbgs() << "Outlining in " << F.getName() << "\n");		LLVM_DEBUG(llvm::dbgs() << "Outlining in " << F.getName() << "\n");
Changed \|= outlineColdRegions(F, HasProfileSummary);		Changed \|= outlineColdRegions(F, HasProfileSummary);
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/HotColdSplit/custom-cold-cmd.ll

This file was added.

				; RUN: opt -S -hotcoldsplit -cold-functions-list=__cxa_guard_acquire,__cxa_guard_release,__cxa_guard_abort < %s 2>&1 \| FileCheck %s
				; RUN: echo -e ":__cxa_guard_acquire\n:__cxa_guard_release\n:__cxa_guard_abort\n" > ./coldfuncs && opt -S -hotcoldsplit -cold-functions-file=./coldfuncs < %s 2>&1 \| FileCheck %s
				%struct.foo = type { i8 }

				@_ZZ2govE1f = internal global %struct.foo zeroinitializer, align 1
				@_ZGVZ2govE1f = internal global i64 0, align 8
				@__dso_handle = external hidden global i8
				@_ZZ8go_leakyvE1f = internal global %struct.foo* null, align 8
				@_ZGVZ8go_leakyvE1f = internal global i64 0, align 8

				; Since __cxa_guard_acquire/release/abort functions are marked
				; as cold via command-line input, they should all share the same attribute.
				; CHECK: declare {{.}}@__cxa_guard_acquire{{.}} [[cold_attr:#[0-9]+]]
				; CHECK: declare {{.}}@__cxa_guard_release{{.}} [[cold_attr]]
				; CHECK: declare {{.}}@__cxa_guard_abort{{.}} [[cold_attr]]
				; CHECK: define internal void @_Z2gov.cold.1
				; CHECK: define internal void @_Z8go_leakyv.cold.1(i8* %call)
				hiradityaUnsubmitted Done Reply Inline Actions Remove trailing .* hiraditya: Remove trailing .*
				; CHECK: attributes [[cold_attr]] = { {{.}}cold{{.}} }

				define dso_local void @_Z2gov() #0 {
				entry:
				%0 = load atomic i8, i8* bitcast (i64* @_ZGVZ2govE1f to i8*) acquire, align 8
				%guard.uninitialized = icmp eq i8 %0, 0
				br i1 %guard.uninitialized, label %init.check, label %init.end, !prof !1

				init.check: ; preds = %entry
				%1 = call i32 @__cxa_guard_acquire(i64* @_ZGVZ2govE1f) #1
				%tobool = icmp ne i32 %1, 0
				br i1 %tobool, label %init, label %init.end

				init: ; preds = %init.check
				call void @_ZN3fooC1Ev(%struct.foo* @_ZZ2govE1f) #1
				%2 = call i32 @__cxa_atexit(void (i8) bitcast (void (%struct.foo) @_ZN3fooD1Ev to void (i8)), i8* getelementptr inbounds (%struct.foo, %struct.foo* @_ZZ2govE1f, i32 0, i32 0), i8* @__dso_handle) #1
				call void @__cxa_guard_release(i64* @_ZGVZ2govE1f) #1
				br label %init.end

				init.end: ; preds = %init, %init.check, %entry
				ret void
				}

				declare dso_local i32 @__cxa_guard_acquire(i64*) #1

				declare extern_weak dso_local void @_ZN3fooC1Ev(%struct.foo*) unnamed_addr #2

				declare extern_weak dso_local void @_ZN3fooD1Ev(%struct.foo*) unnamed_addr #2

				declare dso_local i32 @__cxa_atexit(void (i8), i8, i8) #1

				declare dso_local void @__cxa_guard_release(i64*) #1

				define dso_local void @_Z8go_leakyv() #3 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				%exn.slot = alloca i8*, align 8
				%ehselector.slot = alloca i32, align 4
				%0 = load atomic i8, i8* bitcast (i64* @_ZGVZ8go_leakyvE1f to i8*) acquire, align 8
				%guard.uninitialized = icmp eq i8 %0, 0
				br i1 %guard.uninitialized, label %init.check, label %init.end, !prof !1

				init.check: ; preds = %entry
				%1 = call i32 @__cxa_guard_acquire(i64* @_ZGVZ8go_leakyvE1f) #1
				%tobool = icmp ne i32 %1, 0
				br i1 %tobool, label %init, label %init.end

				init: ; preds = %init.check
				%call = invoke noalias nonnull i8* @_Znwm(i64 1) #6
				to label %invoke.cont unwind label %lpad

				invoke.cont: ; preds = %init
				%2 = bitcast i8* %call to %struct.foo*
				call void @_ZN3fooC1Ev(%struct.foo* %2) #1
				store %struct.foo* %2, %struct.foo** @_ZZ8go_leakyvE1f, align 8
				call void @__cxa_guard_release(i64* @_ZGVZ8go_leakyvE1f) #1
				br label %init.end

				init.end: ; preds = %invoke.cont, %init.check, %entry
				ret void

				lpad: ; preds = %init
				%3 = landingpad { i8*, i32 }
				cleanup
				%4 = extractvalue { i8*, i32 } %3, 0
				store i8* %4, i8** %exn.slot, align 8
				%5 = extractvalue { i8*, i32 } %3, 1
				store i32 %5, i32* %ehselector.slot, align 4
				call void @__cxa_guard_abort(i64* @_ZGVZ8go_leakyvE1f) #1
				br label %eh.resume

				eh.resume: ; preds = %lpad
				%exn = load i8, i8* %exn.slot, align 8
				%sel = load i32, i32* %ehselector.slot, align 4
				%lpad.val = insertvalue { i8, i32 } undef, i8 %exn, 0
				%lpad.val1 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
				resume { i8*, i32 } %lpad.val1
				}

				; Function Attrs: nobuiltin allocsize(0)
				declare dso_local nonnull i8* @_Znwm(i64) #4

				declare dso_local i32 @__gxx_personality_v0(...)

				; Function Attrs: nounwind
				declare dso_local void @__cxa_guard_abort(i64*) #1

				; Function Attrs: norecurse nounwind uwtable
				define dso_local i32 @main() #5 {
				entry:
				ret i32 0
				}

				attributes #0 = { nounwind uwtable }
				attributes #1 = { nounwind }
				attributes #2 = { nounwind }
				attributes #3 = { uwtable }
				attributes #4 = { nobuiltin allocsize(0) }
				attributes #5 = { norecurse nounwind uwtable }
				attributes #6 = { builtin allocsize(0) }

				!llvm.module.flags = !{!0}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"branch_weights", i32 1, i32 1048575}

This is an archive of the discontinued LLVM Phabricator instance.

[HotColdSplitting] Add command line options for supplying cold function names via user input.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 285186

llvm/include/llvm/Transforms/IPO/HotColdSplitting.h

llvm/lib/Transforms/IPO/HotColdSplitting.cpp

llvm/test/Transforms/HotColdSplit/custom-cold-cmd.ll

[HotColdSplitting] Add command line options for supplying cold function names via user input.
Needs ReviewPublic