This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
1/3
SampleProfileInference.h
5/5
SampleProfileLoaderBaseImpl.h
-
lib/Transforms/
-
Transforms/
-
IPO/
2/2
SampleProfile.cpp
-
Utils/
-
CMakeLists.txt
2/2
SampleProfileInference.cpp
1
SampleProfileLoaderBaseUtil.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
-
profile-inference.prof
-
profile-inference.ll

Differential D109860

profi - a flow-based profile inference algorithm: Part I (out of 3)
ClosedPublic

Authored by spupyrev on Sep 15 2021, 4:01 PM.

Download Raw Diff

Details

Reviewers

wenlei
hoy
wlei
davidxl
wmi
xur
rajeshwarv

Commits

rG7cc2493daaf5: profi - a flow-based profile inference algorithm: Part I (out of 3)
rG884b6dd31142: profi - a flow-based profile inference algorithm: Part I (out of 3)
rGb00fc198224e: profi - a flow-based profile inference algorithm: Part I (out of 3)

Summary

The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:

"fixing" errors in profiles caused by sampling;
converting basic block counts to edge frequencies (branch probabilities);
dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

UPD Dec 1st 2021:

synced the declaration and definition of the option SampleProfileUseProfi to use type cl::opt<bool;
added inline for SampleProfileInference<BT>::findUnlikelyJumps and SampleProfileInference<BT>::isExit to avoid linking problems on windows.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spupyrev created this revision.Sep 15 2021, 4:01 PM

Herald added subscribers: ormris, wenlei, hiraditya, mgorny. · View Herald TranscriptSep 15 2021, 4:01 PM

spupyrev requested review of this revision.Sep 15 2021, 4:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2021, 4:01 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B124107: Diff 372825.Sep 15 2021, 4:19 PM

spupyrev edited the summary of this revision. (Show Details)Sep 15 2021, 4:25 PM

spupyrev added reviewers: wenlei, hoy, wlei.

hoy added reviewers: davidxl, wmi.Sep 15 2021, 4:29 PM

lint

Harbormaster completed remote builds in B124115: Diff 372835.Sep 15 2021, 4:56 PM

davidxl added a reviewer: xur.Sep 15 2021, 7:37 PM

extended tests with 'block-freq' analysis

Harbormaster completed remote builds in B124225: Diff 372986.Sep 16 2021, 10:48 AM

spupyrev retitled this revision from profi - a flow-based profile inference algorithm: Part I to profi - a flow-based profile inference algorithm: Part I (out of 3).Sep 16 2021, 11:17 AM

spupyrev edited the summary of this revision. (Show Details)

davidxl added a reviewer: rajeshwarv.Sep 16 2021, 11:20 AM

spupyrev mentioned this in D109903: profi - a flow-based profile inference algorithm: Part II (out of 3).Sep 16 2021, 11:26 AM

spupyrev mentioned this in D109980: profi - a flow-based profile inference algorithm: Part III (out of 3).Sep 17 2021, 10:06 AM

A few high level questions before the review:

Is there a high level description of the 3 sub-patches and their relationship?
does it work without using pseudoprob? Is there a test case?
the test case seems to disable new pm, why is that?

tmsriram added a subscriber: tmsriram.Sep 17 2021, 11:09 AM

Is there a high level description of the 3 sub-patches and their relationship?

The split into 3 patches is pretty ad hoc; this is done just to simplify reviewing. In order to get good results, we really need all three pieces.
On a high level, the first part (D109860) implements a basic minimum-cost maximum flow algorithm and applies it to sampling-based profile. Two other diffs implement adjustments for the computed flow.
The second part (D109903) makes the computed flow "connected" -- without it hot loops might be ignored by BFI.
The third part (D109980) applies a post-processing for "dangling" basic blocks (having no sample counts in the profile).

does it work without using pseudoprob? Is there a test case?

Yes, just added such a test.

the test case seems to disable new pm, why is that?

This is needed for opt -analyze. Otherwise I get the following exception: Cannot specify -analyze under new pass manager, either specify '-enable-new-pm=0', or use the corresponding new pass manager pass, e.g. '-passes=print<scalar-evolution>'

Thanks Sergey. That looks like a good work to improve the consistency of the profile. Have you checked whether the new algorithm can infer the missing parts based on equivalence relationship on the CFG? If it can, could you add a test for it?

I will collect some performance number w/wo sample-profile-use-profi.

davidxl added inline comments.Sep 21 2021, 10:59 AM

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
898	It is probably better to restructure the code a little more. Add a helper routine called "PrepareForPropagation(..)' or "InitForPropagation'. In this helper buildEdges; compute equiv class for non-profi or profi specific initialization (basically move some code from propagateWeights' to here.
909	Move it into a Finalize helper method?
llvm/lib/Transforms/IPO/SampleProfile.cpp
1655	Is this a limitation? Can it be made to handle multi-graph?

rajeshwarv added inline comments.Sep 21 2021, 1:36 PM

llvm/lib/Transforms/Utils/SampleProfileInference.cpp
76	Could you please why the time complexity is not O(V*E^2) per http://go/wikip/Edmonds%E2%80%93Karp_algorithm and it is actually O(...) as described here? Also, what is n?

added helper functions following David's suggestions
expanded a comment on time complexity of the MCMF algorithm

In D109860#3009910, @wmi wrote:

Thanks Sergey. That looks like a good work to improve the consistency of the profile. Have you checked whether the new algorithm can infer the missing parts based on equivalence relationship on the CFG? If it can, could you add a test for it?

Do you mind to elaborate more on this? What kind of a test would you like to see? (i may not be familiar with the terminology)

In general, the algorithm builds a valid "flow" comprised of the block/jump counts. That is, the sum of incoming jump counts always equals the sum of outgoing jump counts (except for the source and sinks). Hence, if two blocks are guaranteed to have equal counts, the algorithm will always return equal counts. A few of the tests are (implicitly) verifying that.

I will collect some performance number w/wo sample-profile-use-profi.

That would be awesome, thanks!
Tbh, I've seen some weird problems while using the algorithm with AutoFDO. There are a few issues that make profi to work not as expected:

there are no "dangling" blocks in AutoFDO; the corresponding blocks seem to be reported as blocks with 0 counts;
some blocks/counts seem to have been duplicated and there is no concept of "distribution factor"; profi has difficulties with such incorrect blocks.

Overall we see benefits of using the new inference with CSSPGO (where the above issues are resolved); however, AutoFDO weights/counts need to be polished before the inference.

spupyrev marked 3 inline comments as done.Sep 22 2021, 11:41 AM

spupyrev added inline comments.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1655	The inference algorithm can process multi-graphs -- the questions is how do we store/process such edges outside of inference. The current implementation (see `SampleProfileLoaderBaseImpl::buildEdges`) merges all multiway branches into a single edge. We could probably modify `buildEdges` and avoid merging, but that will change the existing non-profi functionality. An alternative is to apply this post-processing and keep the existing implementation as is.
llvm/lib/Transforms/Utils/SampleProfileInference.cpp
76	Unfortunately, it is not trivial to derive a tight worst-case estimation of our implementation, since we modify the classical algorithm (see in particular lines 206-209). For sure we're not doing more than v(f) augmentation iterations and each of them takes at most n*m steps. But I don't know if this worst-case bound can be achieved on some instance. Furthermore, the worst-case complexity is a misleading concept to look into, since control-flow graphs have a "nice" structure. Typically the observed runtime is slightly super-linear. I've updated the comment, feel free to suggest more explanations.

Harbormaster completed remote builds in B125179: Diff 374304.Sep 22 2021, 11:55 AM

In D109860#3016016, @spupyrev wrote:

In D109860#3009910, @wmi wrote:

Thanks Sergey. That looks like a good work to improve the consistency of the profile. Have you checked whether the new algorithm can infer the missing parts based on equivalence relationship on the CFG? If it can, could you add a test for it?

Do you mind to elaborate more on this? What kind of a test would you like to see? (i may not be familiar with the terminology)

Like block A dominates block B and block B post dominates block A, then the two blocks must have equal execution frequency. That is what SampleProfileLoaderBaseImpl<BT>::findEquivalenceClasses does. I just want to know whether profi could cover it since profi could be used to replace the existing weight propagation logic.

In general, the algorithm builds a valid "flow" comprised of the block/jump counts. That is, the sum of incoming jump counts always equals the sum of outgoing jump counts (except for the source and sinks). Hence, if two blocks are guaranteed to have equal counts, the algorithm will always return equal counts. A few of the tests are (implicitly) verifying that.

Ok, sounds like profi already cover the case described above.

I will collect some performance number w/wo sample-profile-use-profi.

That would be awesome, thanks!
Tbh, I've seen some weird problems while using the algorithm with AutoFDO. There are a few issues that make profi to work not as expected:

there are no "dangling" blocks in AutoFDO; the corresponding blocks seem to be reported as blocks with 0 counts;

some blocks/counts seem to have been duplicated and there is no concept of "distribution factor"; profi has difficulties with such incorrect blocks.

Overall we see benefits of using the new inference with CSSPGO (where the above issues are resolved); however, AutoFDO weights/counts need to be polished before the inference.

I get the test result for our search benchmark. Enabling sample-profile-use-profi has a small regression - 0.3%. That may be related with the issues you described above.

I get the test result for our search benchmark. Enabling sample-profile-use-profi has a small regression - 0.3%. That may be related with the issues you described above.

Thanks for running those experiments, the results align with what I've seen in my benchmarks.
At this point I think we should think of the inference algorithm as a part of CSSPGO, where it provides benefits. Making the inference work with AutoFDO is an important but a separate (and potentially time-consuming) task. I'd try to address the regression in future diffs.

MTC added a subscriber: MTC.Oct 6 2021, 4:56 PM

I just left a couple comments about minor issues. @davidxl @xur @rajeshwarv We have done an around of code review internally. The current patch looks good to me in general. Please let the author know if you have more comments. Thanks.

llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
11	nit: comment the `profi` abbreviation and its full name here?
llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
148	nit: no need to use the `inline` keyword.
898	We still need this for MIR even if `SampleProfileUseProfi` is true.

In D109860#3028266, @spupyrev wrote:

I get the test result for our search benchmark. Enabling sample-profile-use-profi has a small regression - 0.3%. That may be related with the issues you described above.

Thanks for running those experiments, the results align with what I've seen in my benchmarks.
At this point I think we should think of the inference algorithm as a part of CSSPGO, where it provides benefits. Making the inference work with AutoFDO is an important but a separate (and potentially time-consuming) task. I'd try to address the regression in future diffs.

This sounds reasonable to me. We have turned on profi by default for CSSPGO internally, and we can leave it off for AutoFDO for now.

And as Hongtao mentioned, we have reviewed the implementation internally, so this is mostly for other reviewers to take a look.

-rebase

hoy's comments

LGTM, thanks.

This revision is now accepted and ready to land.Oct 25 2021, 1:24 PM

Harbormaster completed remote builds in B130537: Diff 382093.Oct 25 2021, 1:28 PM

wenlei added inline comments.Nov 1 2021, 9:40 PM

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
757	Performance and count quality aside, can we make it work with MIR here just to make sure functionality is there. i.e. we can probably have a test case for MIR+Profi using FS-AFDO. Making sure it generates better count for MIR can be dealt with separately later.

addressing wenlei's comment: making profi to work MIR

modified comments

lgtm except some comment nits, thanks

llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
109	same here.
274	nit, fix comment? same for line 278

Harbormaster completed remote builds in B135138: Diff 388521.Nov 19 2021, 9:48 AM

This revision was landed with ongoing or failed builds.Nov 23 2021, 9:09 AM

Closed by commit rGb00fc198224e: profi - a flow-based profile inference algorithm: Part I (out of 3) (authored by spupyrev, committed by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rGb00fc198224e: profi - a flow-based profile inference algorithm: Part I (out of 3).

reames added a reverting change: rG065f777d2740: Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)".Nov 23 2021, 9:19 AM

hoy added a commit: rG884b6dd31142: profi - a flow-based profile inference algorithm: Part I (out of 3).Nov 23 2021, 11:05 AM

Our windows buildbot failed the link with:

cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_exe --intdir=tools\mlir\unittests\ExecutionEngine\CMakeFiles\MLIRExecutionEngineTests.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100177~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100177~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~3\2017\COMMUN~1\VC\Tools\MSVC\1416~1.270\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\MLIRExecutionEngineTests.rsp  /out:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe /implib:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.lib /pdb:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.pdb /version:0.0 /machine:x64 /STACK:10000000 /INCREMENTAL:NO /subsystem:console  && cd ."
LINK: command "C:\PROGRA~2\MICROS~3\2017\COMMUN~1\VC\Tools\MSVC\1416~1.270\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\MLIRExecutionEngineTests.rsp /out:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe /implib:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.lib /pdb:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.pdb /version:0.0 /machine:x64 /STACK:10000000 /INCREMENTAL:NO /subsystem:console /MANIFEST /MANIFESTFILE:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe.manifest" failed (exit code 1120) with the following output:
LLVMCodeGen.lib(MIRSampleProfile.cpp.obj) : error LNK2019: unresolved external symbol "class llvm::cl::opt<unsigned int,0,class llvm::cl::parser<unsigned int> > llvm::SampleProfileUseProfi" (?SampleProfileUseProfi@llvm@@3V?$opt@I$0A@V?$parser@I@cl@llvm@@@cl@1@A) referenced in function "protected: bool __cdecl llvm::SampleProfileLoaderBaseImpl<class llvm::MachineBasicBlock>::computeAndPropagateWeights(class llvm::MachineFunction &,class llvm::DenseSet<unsigned __int64,struct llvm::DenseMapInfo<unsigned __int64,void> > const &)" (?computeAndPropagateWeights@?$SampleProfileLoaderBaseImpl@VMachineBasicBlock@llvm@@@llvm@@IEAA_NAEAVMachineFunction@2@AEBV?$DenseSet@_KU?$DenseMapInfo@_KX@llvm@@@2@@Z)
tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe : fatal error LNK1120: 1 unresolved externals

FYI

mehdi_amini added a reverting change: rG1392b654ff65: Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)".Nov 23 2021, 12:11 PM

In D109860#3149501, @mehdi_amini wrote:

Our windows buildbot failed the link with:

cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_exe --intdir=tools\mlir\unittests\ExecutionEngine\CMakeFiles\MLIRExecutionEngineTests.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100177~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100177~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~3\2017\COMMUN~1\VC\Tools\MSVC\1416~1.270\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\MLIRExecutionEngineTests.rsp  /out:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe /implib:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.lib /pdb:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.pdb /version:0.0 /machine:x64 /STACK:10000000 /INCREMENTAL:NO /subsystem:console  && cd ."
LINK: command "C:\PROGRA~2\MICROS~3\2017\COMMUN~1\VC\Tools\MSVC\1416~1.270\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\MLIRExecutionEngineTests.rsp /out:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe /implib:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.lib /pdb:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.pdb /version:0.0 /machine:x64 /STACK:10000000 /INCREMENTAL:NO /subsystem:console /MANIFEST /MANIFESTFILE:tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe.manifest" failed (exit code 1120) with the following output:
LLVMCodeGen.lib(MIRSampleProfile.cpp.obj) : error LNK2019: unresolved external symbol "class llvm::cl::opt<unsigned int,0,class llvm::cl::parser<unsigned int> > llvm::SampleProfileUseProfi" (?SampleProfileUseProfi@llvm@@3V?$opt@I$0A@V?$parser@I@cl@llvm@@@cl@1@A) referenced in function "protected: bool __cdecl llvm::SampleProfileLoaderBaseImpl<class llvm::MachineBasicBlock>::computeAndPropagateWeights(class llvm::MachineFunction &,class llvm::DenseSet<unsigned __int64,struct llvm::DenseMapInfo<unsigned __int64,void> > const &)" (?computeAndPropagateWeights@?$SampleProfileLoaderBaseImpl@VMachineBasicBlock@llvm@@@llvm@@IEAA_NAEAVMachineFunction@2@AEBV?$DenseSet@_KU?$DenseMapInfo@_KX@llvm@@@2@@Z)
tools\mlir\unittests\ExecutionEngine\MLIRExecutionEngineTests.exe : fatal error LNK1120: 1 unresolved externals

FYI

Thanks for reporting this issue. It looks like somehow MLIRExecutionEngineTests references LLVMCodeGen.lib(MIRSampleProfile.cpp.obj) , which in turn references llvm::SampleProfileUseProfi that is defined in LLVMTransformUtils. Could it be a possible missing dependency on the MLIRExecutionEngineTests side?

I see the definition being cl::opt<bool> SampleProfileUseProfi( while the declaration is extern cl::opt<unsigned> SampleProfileUseProfi;

llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp
37	This is defined as cl::opt<bool>

In D109860#3149536, @mehdi_amini wrote:

I see the definition being cl::opt<bool> SampleProfileUseProfi( while the declaration is extern cl::opt<unsigned> SampleProfileUseProfi;

Good catch! That should be the culprit. The declaration should be cl::opt<bool> as well. Could you kindly test it for us since we don't have a Windows environment? Thanks!

I don't have a Windows environment either.

You could try to push at a time when you can afford to monitor a builder like https://lab.llvm.org/buildbot/#/builders/13 until your commit is processed.

That said it seems that previous pre-merge checks run on this revision were failing on Windows, can you try to re-open this revision, update the patch, and see if the pre-merge checks are passing?

spupyrev reopened this revision.Dec 1 2021, 10:29 AM

This revision is now accepted and ready to land.Dec 1 2021, 10:29 AM

fixed an incorrect option type

Harbormaster completed remote builds in B136960: Diff 391077.Dec 1 2021, 12:04 PM

spupyrev edited the summary of this revision. (Show Details)Dec 1 2021, 12:19 PM

In D109860#3150557, @mehdi_amini wrote:

That said it seems that previous pre-merge checks run on this revision were failing on Windows, can you try to re-open this revision, update the patch, and see if the pre-merge checks are passing?

Thanks for the suggestion. The failing tests on Windows seem unrelated. Will land this.

This revision was landed with ongoing or failed builds.Dec 1 2021, 3:31 PM

Closed by commit rG7cc2493daaf5: profi - a flow-based profile inference algorithm: Part I (out of 3) (authored by spupyrev, committed by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG7cc2493daaf5: profi - a flow-based profile inference algorithm: Part I (out of 3).

hoy mentioned this in rG98dd2f9ed3dd: profi - a flow-based profile inference algorithm: Part II (out of 3).Dec 2 2021, 11:04 AM

hoy mentioned this in rG93a2c2919f73: profi - a flow-based profile inference algorithm: Part III (out of 3).Dec 2 2021, 12:01 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

SampleProfileInference.h

284 lines

SampleProfileLoaderBaseImpl.h

162 lines

lib/

Transforms/

IPO/

SampleProfile.cpp

30 lines

Utils/

CMakeLists.txt

1 line

SampleProfileInference.cpp

461 lines

SampleProfileLoaderBaseUtil.cpp

4 lines

test/

Transforms/

SampleProfile/

Inputs/

profile-inference.prof

23 lines

profile-inference.ll

245 lines

Diff 391077

llvm/include/llvm/Transforms/Utils/SampleProfileInference.h

This file was added.

				//===- Transforms/Utils/SampleProfileInference.h ----------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This file provides the interface for the profile inference algorithm, profi.
				//
				hoyUnsubmitted Done Reply Inline Actions nit: comment the `profi` abbreviation and its full name here? hoy: nit: comment the `profi` abbreviation and its full name here?
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H
				#define LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DepthFirstIterator.h"
				#include "llvm/ADT/SmallVector.h"

				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"

				namespace llvm {

				class BasicBlock;
				class Function;
				class MachineBasicBlock;
				class MachineFunction;

				namespace afdo_detail {

				template <class BlockT> struct TypeMap {};
				template <> struct TypeMap<BasicBlock> {
				using BasicBlockT = BasicBlock;
				using FunctionT = Function;
				};
				template <> struct TypeMap<MachineBasicBlock> {
				using BasicBlockT = MachineBasicBlock;
				using FunctionT = MachineFunction;
				};

				} // end namespace afdo_detail

				struct FlowJump;

				/// A wrapper of a binary basic block.
				struct FlowBlock {
				uint64_t Index;
				uint64_t Weight{0};
				bool UnknownWeight{false};
				uint64_t Flow{0};
				bool HasSelfEdge{false};
				std::vector<FlowJump *> SuccJumps;
				std::vector<FlowJump *> PredJumps;

				/// Check if it is the entry block in the function.
				bool isEntry() const { return PredJumps.empty(); }

				/// Check if it is an exit block in the function.
				bool isExit() const { return SuccJumps.empty(); }
				};

				/// A wrapper of a jump between two basic blocks.
				struct FlowJump {
				uint64_t Source;
				uint64_t Target;
				uint64_t Flow{0};
				bool IsUnlikely{false};
				};

				/// A wrapper of binary function with basic blocks and jumps.
				struct FlowFunction {
				std::vector<FlowBlock> Blocks;
				std::vector<FlowJump> Jumps;
				/// The index of the entry block.
				uint64_t Entry;
				};

				void applyFlowInference(FlowFunction &Func);

				/// Sample profile inference pass.
				template <typename BT> class SampleProfileInference {
				public:
				using BasicBlockT = typename afdo_detail::TypeMap<BT>::BasicBlockT;
				using FunctionT = typename afdo_detail::TypeMap<BT>::FunctionT;
				using Edge = std::pair<const BasicBlockT , const BasicBlockT >;
				using BlockWeightMap = DenseMap<const BasicBlockT *, uint64_t>;
				using EdgeWeightMap = DenseMap<Edge, uint64_t>;
				using BlockEdgeMap =
				DenseMap<const BasicBlockT , SmallVector<const BasicBlockT , 8>>;

				SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,
				BlockWeightMap &SampleBlockWeights)
				: F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights) {}

				/// Apply the profile inference algorithm for a given function
				void apply(BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);

				private:
				/// Try to infer branch probabilities mimicking implementation of
				/// BranchProbabilityInfo. Unlikely taken branches are marked so that the
				/// inference algorithm can avoid sending flow along corresponding edges.
				void findUnlikelyJumps(const std::vector<const BasicBlockT *> &BasicBlocks,
				BlockEdgeMap &Successors, FlowFunction &Func);

				/// Determine whether the block is an exit in the CFG.
				bool isExit(const BasicBlockT *BB);
				wenleiUnsubmitted Not Done Reply Inline Actions same here. wenlei: same here.

				/// Function.
				const FunctionT &F;

				/// Successors for each basic block in the CFG.
				BlockEdgeMap &Successors;

				/// Map basic blocks to their sampled weights.
				BlockWeightMap &SampleBlockWeights;
				};

				template <typename BT>
				void SampleProfileInference<BT>::apply(BlockWeightMap &BlockWeights,
				EdgeWeightMap &EdgeWeights) {
				// Find all forwards reachable blocks which the inference algorithm will be
				// applied on.
				df_iterator_default_set<const BasicBlockT *> Reachable;
				for (auto *BB : depth_first_ext(&F, Reachable))
				(void)BB /* Mark all reachable blocks */;

				// Find all backwards reachable blocks which the inference algorithm will be
				// applied on.
				df_iterator_default_set<const BasicBlockT *> InverseReachable;
				for (const auto &BB : F) {
				// An exit block is a block without any successors.
				if (isExit(&BB)) {
				for (auto *RBB : inverse_depth_first_ext(&BB, InverseReachable))
				(void)RBB;
				}
				}

				// Keep a stable order for reachable blocks
				DenseMap<const BasicBlockT *, uint64_t> BlockIndex;
				std::vector<const BasicBlockT *> BasicBlocks;
				BlockIndex.reserve(Reachable.size());
				BasicBlocks.reserve(Reachable.size());
				for (const auto &BB : F) {
				if (Reachable.count(&BB) && InverseReachable.count(&BB)) {
				BlockIndex[&BB] = BasicBlocks.size();
				BasicBlocks.push_back(&BB);
				}
				}

				BlockWeights.clear();
				EdgeWeights.clear();
				bool HasSamples = false;
				for (const auto *BB : BasicBlocks) {
				auto It = SampleBlockWeights.find(BB);
				if (It != SampleBlockWeights.end() && It->second > 0) {
				HasSamples = true;
				BlockWeights[BB] = It->second;
				}
				}
				// Quit early for functions with a single block or ones w/o samples
				if (BasicBlocks.size() <= 1 \|\| !HasSamples) {
				return;
				}

				// Create necessary objects
				FlowFunction Func;
				Func.Blocks.reserve(BasicBlocks.size());
				// Create FlowBlocks
				for (const auto *BB : BasicBlocks) {
				FlowBlock Block;
				if (SampleBlockWeights.find(BB) != SampleBlockWeights.end()) {
				Block.UnknownWeight = false;
				Block.Weight = SampleBlockWeights[BB];
				} else {
				Block.UnknownWeight = true;
				Block.Weight = 0;
				}
				Block.Index = Func.Blocks.size();
				Func.Blocks.push_back(Block);
				}
				// Create FlowEdges
				for (const auto *BB : BasicBlocks) {
				for (auto *Succ : Successors[BB]) {
				if (!BlockIndex.count(Succ))
				continue;
				FlowJump Jump;
				Jump.Source = BlockIndex[BB];
				Jump.Target = BlockIndex[Succ];
				Func.Jumps.push_back(Jump);
				if (BB == Succ) {
				Func.Blocks[BlockIndex[BB]].HasSelfEdge = true;
				}
				}
				}
				for (auto &Jump : Func.Jumps) {
				Func.Blocks[Jump.Source].SuccJumps.push_back(&Jump);
				Func.Blocks[Jump.Target].PredJumps.push_back(&Jump);
				}

				// Try to infer probabilities of jumps based on the content of basic block
				findUnlikelyJumps(BasicBlocks, Successors, Func);

				// Find the entry block
				for (size_t I = 0; I < Func.Blocks.size(); I++) {
				if (Func.Blocks[I].isEntry()) {
				Func.Entry = I;
				break;
				}
				}

				// Create and apply the inference network model.
				applyFlowInference(Func);

				// Extract the resulting weights from the control flow
				// All weights are increased by one to avoid propagation errors introduced by
				// zero weights.
				for (const auto *BB : BasicBlocks) {
				BlockWeights[BB] = Func.Blocks[BlockIndex[BB]].Flow;
				}
				for (auto &Jump : Func.Jumps) {
				Edge E = std::make_pair(BasicBlocks[Jump.Source], BasicBlocks[Jump.Target]);
				EdgeWeights[E] = Jump.Flow;
				}

				#ifndef NDEBUG
				// Unreachable blocks and edges should not have a weight.
				for (auto &I : BlockWeights) {
				assert(Reachable.contains(I.first));
				assert(InverseReachable.contains(I.first));
				}
				for (auto &I : EdgeWeights) {
				assert(Reachable.contains(I.first.first) &&
				Reachable.contains(I.first.second));
				assert(InverseReachable.contains(I.first.first) &&
				InverseReachable.contains(I.first.second));
				}
				#endif
				}

				template <typename BT>
				inline void SampleProfileInference<BT>::findUnlikelyJumps(
				const std::vector<const BasicBlockT *> &BasicBlocks,
				BlockEdgeMap &Successors, FlowFunction &Func) {}

				template <>
				inline void SampleProfileInference<BasicBlock>::findUnlikelyJumps(
				const std::vector<const BasicBlockT *> &BasicBlocks,
				BlockEdgeMap &Successors, FlowFunction &Func) {
				for (auto &Jump : Func.Jumps) {
				const auto *BB = BasicBlocks[Jump.Source];
				const auto *Succ = BasicBlocks[Jump.Target];
				const Instruction *TI = BB->getTerminator();
				// Check if a block ends with InvokeInst and mark non-taken branch unlikely.
				// In that case block Succ should be a landing pad
				if (Successors[BB].size() == 2 && Successors[BB].back() == Succ) {
				if (isa<InvokeInst>(TI)) {
				Jump.IsUnlikely = true;
				}
				}
				const Instruction *SuccTI = Succ->getTerminator();
				// Check if the target block contains UnreachableInst and mark it unlikely
				if (SuccTI->getNumSuccessors() == 0) {
				if (isa<UnreachableInst>(SuccTI)) {
				Jump.IsUnlikely = true;
				}
				}
				}
				}

				template <typename BT>
				inline bool SampleProfileInference<BT>::isExit(const BasicBlockT *BB) {
				wenleiUnsubmitted Not Done Reply Inline Actions nit, fix comment? same for line 278 wenlei: nit, fix comment? same for line 278
				return BB->succ_empty();
				}

				template <>
				inline bool SampleProfileInference<BasicBlock>::isExit(const BasicBlock *BB) {
				return succ_empty(BB);
				}

				} // end namespace llvm
				#endif // LLVM_TRANSFORMS_UTILS_SAMPLEPROFILEINFERENCE_H

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h

Show All 32 Lines
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
#include "llvm/ProfileData/SampleProfReader.h"		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/GenericDomTree.h"		#include "llvm/Support/GenericDomTree.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Transforms/Utils/SampleProfileInference.h"
#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"		#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"

namespace llvm {		namespace llvm {
using namespace sampleprof;		using namespace sampleprof;
using namespace sampleprofutil;		using namespace sampleprofutil;
using ProfileCount = Function::ProfileCount;		using ProfileCount = Function::ProfileCount;

#define DEBUG_TYPE "sample-profile-impl"		#define DEBUG_TYPE "sample-profile-impl"
Show All 20 Lines	static const BasicBlock getEntryBB(const Function F) {
return &F->getEntryBlock();		return &F->getEntryBlock();
}		}
static pred_range getPredecessors(BasicBlock *BB) { return predecessors(BB); }		static pred_range getPredecessors(BasicBlock *BB) { return predecessors(BB); }
static succ_range getSuccessors(BasicBlock *BB) { return successors(BB); }		static succ_range getSuccessors(BasicBlock *BB) { return successors(BB); }
};		};

} // end namespace afdo_detail		} // end namespace afdo_detail

		extern cl::opt<bool> SampleProfileUseProfi;

template <typename BT> class SampleProfileLoaderBaseImpl {		template <typename BT> class SampleProfileLoaderBaseImpl {
public:		public:
SampleProfileLoaderBaseImpl(std::string Name, std::string RemapName)		SampleProfileLoaderBaseImpl(std::string Name, std::string RemapName)
: Filename(Name), RemappingFilename(RemapName) {}		: Filename(Name), RemappingFilename(RemapName) {}
void dump() { Reader->dump(); }		void dump() { Reader->dump(); }

using InstructionT = typename afdo_detail::IRTraits<BT>::InstructionT;		using InstructionT = typename afdo_detail::IRTraits<BT>::InstructionT;
using BasicBlockT = typename afdo_detail::IRTraits<BT>::BasicBlockT;		using BasicBlockT = typename afdo_detail::IRTraits<BT>::BasicBlockT;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	protected:
void printBlockWeight(raw_ostream &OS, const BasicBlockT *BB) const;		void printBlockWeight(raw_ostream &OS, const BasicBlockT *BB) const;
void printBlockEquivalence(raw_ostream &OS, const BasicBlockT *BB);		void printBlockEquivalence(raw_ostream &OS, const BasicBlockT *BB);
bool computeBlockWeights(FunctionT &F);		bool computeBlockWeights(FunctionT &F);
void findEquivalenceClasses(FunctionT &F);		void findEquivalenceClasses(FunctionT &F);
void findEquivalencesFor(BasicBlockT *BB1,		void findEquivalencesFor(BasicBlockT *BB1,
ArrayRef<BasicBlockT *> Descendants,		ArrayRef<BasicBlockT *> Descendants,
PostDominatorTreeT *DomTree);		PostDominatorTreeT *DomTree);
void propagateWeights(FunctionT &F);		void propagateWeights(FunctionT &F);
		void applyProfi(FunctionT &F, BlockEdgeMap &Successors,
		hoyUnsubmitted Done Reply Inline Actions nit: no need to use the `inline` keyword. hoy: nit: no need to use the `inline` keyword.
		BlockWeightMap &SampleBlockWeights,
		BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);
uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);		uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);
void buildEdges(FunctionT &F);		void buildEdges(FunctionT &F);
bool propagateThroughEdges(FunctionT &F, bool UpdateBlockCount);		bool propagateThroughEdges(FunctionT &F, bool UpdateBlockCount);
void clearFunctionData(bool ResetDT = true);		void clearFunctionData(bool ResetDT = true);
void computeDominanceAndLoopInfo(FunctionT &F);		void computeDominanceAndLoopInfo(FunctionT &F);
bool		bool
computeAndPropagateWeights(FunctionT &F,		computeAndPropagateWeights(FunctionT &F,
const DenseSet<GlobalValue::GUID> &InlinedGUIDs);		const DenseSet<GlobalValue::GUID> &InlinedGUIDs);
		void initWeightPropagation(FunctionT &F,
		const DenseSet<GlobalValue::GUID> &InlinedGUIDs);
		void
		finalizeWeightPropagation(FunctionT &F,
		const DenseSet<GlobalValue::GUID> &InlinedGUIDs);
void emitCoverageRemarks(FunctionT &F);		void emitCoverageRemarks(FunctionT &F);

/// Map basic blocks to their computed weights.		/// Map basic blocks to their computed weights.
///		///
/// The weight of a basic block is defined to be the maximum		/// The weight of a basic block is defined to be the maximum
/// of all the instruction weights in that block.		/// of all the instruction weights in that block.
BlockWeightMap BlockWeights;		BlockWeightMap BlockWeights;

▲ Show 20 Lines • Show All 575 Lines • ▼ Show 20 Lines
/// we set the unknown edge weight to zero.		/// we set the unknown edge weight to zero.
///		///
/// - If there is a self-referential edge, and the weight of the block is		/// - If there is a self-referential edge, and the weight of the block is
/// known, the weight for that edge is set to the weight of the block		/// known, the weight for that edge is set to the weight of the block
/// minus the weight of the other incoming edges to that block (if		/// minus the weight of the other incoming edges to that block (if
/// known).		/// known).
template <typename BT>		template <typename BT>
void SampleProfileLoaderBaseImpl<BT>::propagateWeights(FunctionT &F) {		void SampleProfileLoaderBaseImpl<BT>::propagateWeights(FunctionT &F) {
		// Flow-based profile inference is only usable with BasicBlock instantiation
		// of SampleProfileLoaderBaseImpl.
		if (SampleProfileUseProfi) {
		wenleiUnsubmitted Done Reply Inline Actions Performance and count quality aside, can we make it work with MIR here just to make sure functionality is there. i.e. we can probably have a test case for MIR+Profi using FS-AFDO. Making sure it generates better count for MIR can be dealt with separately later. wenlei: Performance and count quality aside, can we make it work with MIR here just to make sure…
		// Prepare block sample counts for inference.
		BlockWeightMap SampleBlockWeights;
		for (const auto &BI : F) {
		ErrorOr<uint64_t> Weight = getBlockWeight(&BI);
		if (Weight)
		SampleBlockWeights[&BI] = Weight.get();
		}
		// Fill in BlockWeights and EdgeWeights using an inference algorithm.
		applyProfi(F, Successors, SampleBlockWeights, BlockWeights, EdgeWeights);
		} else {
bool Changed = true;		bool Changed = true;
unsigned I = 0;		unsigned I = 0;

// If BB weight is larger than its corresponding loop's header BB weight,		// If BB weight is larger than its corresponding loop's header BB weight,
// use the BB weight to replace the loop header BB weight.		// use the BB weight to replace the loop header BB weight.
for (auto &BI : F) {		for (auto &BI : F) {
BasicBlockT *BB = &BI;		BasicBlockT *BB = &BI;
LoopT *L = LI->getLoopFor(BB);		LoopT *L = LI->getLoopFor(BB);
if (!L) {		if (!L) {
continue;		continue;
}		}
BasicBlockT *Header = L->getHeader();		BasicBlockT *Header = L->getHeader();
if (Header && BlockWeights[BB] > BlockWeights[Header]) {		if (Header && BlockWeights[BB] > BlockWeights[Header]) {
BlockWeights[Header] = BlockWeights[BB];		BlockWeights[Header] = BlockWeights[BB];
}		}
}		}

// Before propagation starts, build, for each block, a list of
// unique predecessors and successors. This is necessary to handle
// identical edges in multiway branches. Since we visit all blocks and all
// edges of the CFG, it is cleaner to build these lists once at the start
// of the pass.
buildEdges(F);

// Propagate until we converge or we go past the iteration limit.		// Propagate until we converge or we go past the iteration limit.
while (Changed && I++ < SampleProfileMaxPropagateIterations) {		while (Changed && I++ < SampleProfileMaxPropagateIterations) {
Changed = propagateThroughEdges(F, false);		Changed = propagateThroughEdges(F, false);
}		}

// The first propagation propagates BB counts from annotated BBs to unknown		// The first propagation propagates BB counts from annotated BBs to unknown
// BBs. The 2nd propagation pass resets edges weights, and use all BB weights		// BBs. The 2nd propagation pass resets edges weights, and use all BB
// to propagate edge weights.		// weights to propagate edge weights.
VisitedEdges.clear();		VisitedEdges.clear();
Changed = true;		Changed = true;
while (Changed && I++ < SampleProfileMaxPropagateIterations) {		while (Changed && I++ < SampleProfileMaxPropagateIterations) {
Changed = propagateThroughEdges(F, false);		Changed = propagateThroughEdges(F, false);
}		}

// The 3rd propagation pass allows adjust annotated BB weights that are		// The 3rd propagation pass allows adjust annotated BB weights that are
// obviously wrong.		// obviously wrong.
Changed = true;		Changed = true;
while (Changed && I++ < SampleProfileMaxPropagateIterations) {		while (Changed && I++ < SampleProfileMaxPropagateIterations) {
Changed = propagateThroughEdges(F, true);		Changed = propagateThroughEdges(F, true);
}		}
}		}
		}

		template <typename BT>
		void SampleProfileLoaderBaseImpl<BT>::applyProfi(
		FunctionT &F, BlockEdgeMap &Successors, BlockWeightMap &SampleBlockWeights,
		BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights) {
		auto Infer = SampleProfileInference<BT>(F, Successors, SampleBlockWeights);
		Infer.apply(BlockWeights, EdgeWeights);
		}

/// Generate branch weight metadata for all branches in \p F.		/// Generate branch weight metadata for all branches in \p F.
///		///
/// Branch weights are computed out of instruction samples using a		/// Branch weights are computed out of instruction samples using a
/// propagation heuristic. Propagation proceeds in 3 phases:		/// propagation heuristic. Propagation proceeds in 3 phases:
///		///
/// 1- Assignment of block weights. All the basic blocks in the function		/// 1- Assignment of block weights. All the basic blocks in the function
/// are initial assigned the same weight as their most frequently		/// are initial assigned the same weight as their most frequently
Show All 40 Lines
bool SampleProfileLoaderBaseImpl<BT>::computeAndPropagateWeights(		bool SampleProfileLoaderBaseImpl<BT>::computeAndPropagateWeights(
FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {		FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {
bool Changed = (InlinedGUIDs.size() != 0);		bool Changed = (InlinedGUIDs.size() != 0);

// Compute basic block weights.		// Compute basic block weights.
Changed \|= computeBlockWeights(F);		Changed \|= computeBlockWeights(F);

if (Changed) {		if (Changed) {
		// Initialize propagation.
		initWeightPropagation(F, InlinedGUIDs);

		// Propagate weights to all edges.
		propagateWeights(F);

		// Post-process propagated weights.
		finalizeWeightPropagation(F, InlinedGUIDs);
		}

		return Changed;
		}

		template <typename BT>
		void SampleProfileLoaderBaseImpl<BT>::initWeightPropagation(
		FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {
// Add an entry count to the function using the samples gathered at the		// Add an entry count to the function using the samples gathered at the
// function entry.		// function entry.
// Sets the GUIDs that are inlined in the profiled binary. This is used		// Sets the GUIDs that are inlined in the profiled binary. This is used
// for ThinLink to make correct liveness analysis, and also make the IR		// for ThinLink to make correct liveness analysis, and also make the IR
// match the profiled binary before annotation.		// match the profiled binary before annotation.
getFunction(F).setEntryCount(		getFunction(F).setEntryCount(
ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),		ProfileCount(Samples->getHeadSamples() + 1, Function::PCT_Real),
&InlinedGUIDs);		&InlinedGUIDs);

		if (!SampleProfileUseProfi) {
// Compute dominance and loop info needed for propagation.		// Compute dominance and loop info needed for propagation.
computeDominanceAndLoopInfo(F);		computeDominanceAndLoopInfo(F);
		davidxlUnsubmitted Done Reply Inline Actions It is probably better to restructure the code a little more. Add a helper routine called "PrepareForPropagation(..)' or "InitForPropagation'. In this helper buildEdges; compute equiv class for non-profi or profi specific initialization (basically move some code from propagateWeights' to here. davidxl: It is probably better to restructure the code a little more. Add a helper routine called…
		hoyUnsubmitted Done Reply Inline Actions We still need this for MIR even if `SampleProfileUseProfi` is true. hoy: We still need this for MIR even if `SampleProfileUseProfi` is true.

// Find equivalence classes.		// Find equivalence classes.
findEquivalenceClasses(F);		findEquivalenceClasses(F);
		}

// Propagate weights to all edges.		// Before propagation starts, build, for each block, a list of
propagateWeights(F);		// unique predecessors and successors. This is necessary to handle
		// identical edges in multiway branches. Since we visit all blocks and all
		// edges of the CFG, it is cleaner to build these lists once at the start
		// of the pass.
		buildEdges(F);
		davidxlUnsubmitted Done Reply Inline Actions Move it into a Finalize helper method? davidxl: Move it into a Finalize helper method?
}		}

return Changed;		template <typename BT>
		void SampleProfileLoaderBaseImpl<BT>::finalizeWeightPropagation(
		FunctionT &F, const DenseSet<GlobalValue::GUID> &InlinedGUIDs) {
		// If we utilize a flow-based count inference, then we trust the computed
		// counts and set the entry count as computed by the algorithm. This is
		// primarily done to sync the counts produced by profi and BFI inference,
		// which uses the entry count for mass propagation.
		// If profi produces a zero-value for the entry count, we fallback to
		// Samples->getHeadSamples() + 1 to avoid functions with zero count.
		if (SampleProfileUseProfi) {
		const BasicBlockT *EntryBB = getEntryBB(&F);
		if (BlockWeights[EntryBB] > 0) {
		getFunction(F).setEntryCount(
		ProfileCount(BlockWeights[EntryBB], Function::PCT_Real),
		&InlinedGUIDs);
		}
		}
}		}

template <typename BT>		template <typename BT>
void SampleProfileLoaderBaseImpl<BT>::emitCoverageRemarks(FunctionT &F) {		void SampleProfileLoaderBaseImpl<BT>::emitCoverageRemarks(FunctionT &F) {
// If coverage checking was requested, compute it now.		// If coverage checking was requested, compute it now.
const Function &Func = getFunction(F);		const Function &Func = getFunction(F);
if (SampleProfileRecordCoverage) {		if (SampleProfileRecordCoverage) {
unsigned Used = CoverageTracker.countUsedRecords(Samples, PSI);		unsigned Used = CoverageTracker.countUsedRecords(Samples, PSI);
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/ProfiledCallGraph.h"		#include "llvm/Transforms/IPO/ProfiledCallGraph.h"
#include "llvm/Transforms/IPO/SampleContextTracker.h"		#include "llvm/Transforms/IPO/SampleContextTracker.h"
#include "llvm/Transforms/IPO/SampleProfileProbe.h"		#include "llvm/Transforms/IPO/SampleProfileProbe.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Utils/CallPromotionUtils.h"		#include "llvm/Transforms/Utils/CallPromotionUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
		#include "llvm/Transforms/Utils/SampleProfileInference.h"
#include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"		#include "llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h"
#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"		#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <limits>		#include <limits>
#include <map>		#include <map>
▲ Show 20 Lines • Show All 1,551 Lines • ▼ Show 20 Lines	for (auto &BI : F) {
DebugLoc BranchLoc = TI->getDebugLoc();		DebugLoc BranchLoc = TI->getDebugLoc();
LLVM_DEBUG(dbgs() << "\nGetting weights for branch at line "		LLVM_DEBUG(dbgs() << "\nGetting weights for branch at line "
<< ((BranchLoc) ? Twine(BranchLoc.getLine())		<< ((BranchLoc) ? Twine(BranchLoc.getLine())
: Twine("<UNKNOWN LOCATION>"))		: Twine("<UNKNOWN LOCATION>"))
<< ".\n");		<< ".\n");
SmallVector<uint32_t, 4> Weights;		SmallVector<uint32_t, 4> Weights;
uint32_t MaxWeight = 0;		uint32_t MaxWeight = 0;
Instruction *MaxDestInst;		Instruction *MaxDestInst;
		// Since profi treats multiple edges (multiway branches) as a single edge,
		davidxlUnsubmitted Done Reply Inline Actions Is this a limitation? Can it be made to handle multi-graph? davidxl: Is this a limitation? Can it be made to handle multi-graph?
		spupyrevAuthorUnsubmitted Done Reply Inline Actions The inference algorithm can process multi-graphs -- the questions is how do we store/process such edges outside of inference. The current implementation (see `SampleProfileLoaderBaseImpl::buildEdges`) merges all multiway branches into a single edge. We could probably modify `buildEdges` and avoid merging, but that will change the existing non-profi functionality. An alternative is to apply this post-processing and keep the existing implementation as is. spupyrev: The inference algorithm can process multi-graphs -- the questions is how do we store/process…
		// we need to distribute the computed weight among the branches. We do
		// this by evenly splitting the edge weight among destinations.
		DenseMap<const BasicBlock *, uint64_t> EdgeMultiplicity;
		std::vector<uint64_t> EdgeIndex;
		if (SampleProfileUseProfi) {
		EdgeIndex.resize(TI->getNumSuccessors());
		for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {
		const BasicBlock *Succ = TI->getSuccessor(I);
		EdgeIndex[I] = EdgeMultiplicity[Succ];
		EdgeMultiplicity[Succ]++;
		}
		}
for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {		for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {
BasicBlock *Succ = TI->getSuccessor(I);		BasicBlock *Succ = TI->getSuccessor(I);
Edge E = std::make_pair(BB, Succ);		Edge E = std::make_pair(BB, Succ);
uint64_t Weight = EdgeWeights[E];		uint64_t Weight = EdgeWeights[E];
LLVM_DEBUG(dbgs() << "\t"; printEdgeWeight(dbgs(), E));		LLVM_DEBUG(dbgs() << "\t"; printEdgeWeight(dbgs(), E));
// Use uint32_t saturated arithmetic to adjust the incoming weights,		// Use uint32_t saturated arithmetic to adjust the incoming weights,
// if needed. Sample counts in profiles are 64-bit unsigned values,		// if needed. Sample counts in profiles are 64-bit unsigned values,
// but internally branch weights are expressed as 32-bit values.		// but internally branch weights are expressed as 32-bit values.
if (Weight > std::numeric_limits<uint32_t>::max()) {		if (Weight > std::numeric_limits<uint32_t>::max()) {
LLVM_DEBUG(dbgs() << " (saturated due to uint32_t overflow)");		LLVM_DEBUG(dbgs() << " (saturated due to uint32_t overflow)");
Weight = std::numeric_limits<uint32_t>::max();		Weight = std::numeric_limits<uint32_t>::max();
}		}
		if (!SampleProfileUseProfi) {
// Weight is added by one to avoid propagation errors introduced by		// Weight is added by one to avoid propagation errors introduced by
// 0 weights.		// 0 weights.
Weights.push_back(static_cast<uint32_t>(Weight + 1));		Weights.push_back(static_cast<uint32_t>(Weight + 1));
		} else {
		// Profi creates proper weights that do not require "+1" adjustments but
		// we evenly split the weight among branches with the same destination.
		uint64_t W = Weight / EdgeMultiplicity[Succ];
		// Rounding up, if needed, so that first branches are hotter.
		if (EdgeIndex[I] < Weight % EdgeMultiplicity[Succ])
		W++;
		Weights.push_back(static_cast<uint32_t>(W));
		}
if (Weight != 0) {		if (Weight != 0) {
if (Weight > MaxWeight) {		if (Weight > MaxWeight) {
MaxWeight = Weight;		MaxWeight = Weight;
MaxDestInst = Succ->getFirstNonPHIOrDbgOrLifetime();		MaxDestInst = Succ->getFirstNonPHIOrDbgOrLifetime();
}		}
}		}
}		}

▲ Show 20 Lines • Show All 481 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/CMakeLists.txt

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMTransformUtils
PredicateInfo.cpp		PredicateInfo.cpp
PromoteMemoryToRegister.cpp		PromoteMemoryToRegister.cpp
RelLookupTableConverter.cpp		RelLookupTableConverter.cpp
ScalarEvolutionExpander.cpp		ScalarEvolutionExpander.cpp
SCCPSolver.cpp		SCCPSolver.cpp
StripGCRelocates.cpp		StripGCRelocates.cpp
SSAUpdater.cpp		SSAUpdater.cpp
SSAUpdaterBulk.cpp		SSAUpdaterBulk.cpp
		SampleProfileInference.cpp
SampleProfileLoaderBaseUtil.cpp		SampleProfileLoaderBaseUtil.cpp
SanitizerStats.cpp		SanitizerStats.cpp
SimplifyCFG.cpp		SimplifyCFG.cpp
SimplifyIndVar.cpp		SimplifyIndVar.cpp
SimplifyLibCalls.cpp		SimplifyLibCalls.cpp
SizeOpts.cpp		SizeOpts.cpp
SplitModule.cpp		SplitModule.cpp
StripNonLineTableDebugInfo.cpp		StripNonLineTableDebugInfo.cpp
Show All 19 Lines

llvm/lib/Transforms/Utils/SampleProfileInference.cpp

This file was added.

				//===- SampleProfileInference.cpp - Adjust sample profiles in the IR ------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a profile inference algorithm. Given an incomplete and
				// possibly imprecise block counts, the algorithm reconstructs realistic block
				// and edge counts that satisfy flow conservation rules, while minimally modify
				// input block counts.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Utils/SampleProfileInference.h"
				#include "llvm/Support/Debug.h"
				#include <queue>
				#include <set>

				using namespace llvm;
				#define DEBUG_TYPE "sample-profile-inference"

				namespace {

				/// A value indicating an infinite flow/capacity/weight of a block/edge.
				/// Not using numeric_limits<int64_t>::max(), as the values can be summed up
				/// during the execution.
				static constexpr int64_t INF = ((int64_t)1) << 50;

				/// The minimum-cost maximum flow algorithm.
				///
				/// The algorithm finds the maximum flow of minimum cost on a given (directed)
				/// network using a modified version of the classical Moore-Bellman-Ford
				/// approach. The algorithm applies a number of augmentation iterations in which
				/// flow is sent along paths of positive capacity from the source to the sink.
				/// The worst-case time complexity of the implementation is O(v(f)mn), where
				/// where m is the number of edges, n is the number of vertices, and v(f) is the
				/// value of the maximum flow. However, the observed running time on typical
				/// instances is sub-quadratic, that is, o(n^2).
				///
				/// The input is a set of edges with specified costs and capacities, and a pair
				/// of nodes (source and sink). The output is the flow along each edge of the
				/// minimum total cost respecting the given edge capacities.
				class MinCostMaxFlow {
				public:
				// Initialize algorithm's data structures for a network of a given size.
				void initialize(uint64_t NodeCount, uint64_t SourceNode, uint64_t SinkNode) {
				Source = SourceNode;
				Target = SinkNode;

				Nodes = std::vector<Node>(NodeCount);
				Edges = std::vector<std::vector<Edge>>(NodeCount, std::vector<Edge>());
				}

				// Run the algorithm.
				int64_t run() {
				// Find an augmenting path and update the flow along the path
				size_t AugmentationIters = 0;
				while (findAugmentingPath()) {
				augmentFlowAlongPath();
				AugmentationIters++;
				}

				// Compute the total flow and its cost
				int64_t TotalCost = 0;
				int64_t TotalFlow = 0;
				for (uint64_t Src = 0; Src < Nodes.size(); Src++) {
				for (auto &Edge : Edges[Src]) {
				if (Edge.Flow > 0) {
				TotalCost += Edge.Cost * Edge.Flow;
				if (Src == Source)
				TotalFlow += Edge.Flow;
				}
				}
				}
				rajeshwarvUnsubmitted Done Reply Inline Actions Could you please why the time complexity is not O(VE^2) per http://go/wikip/Edmonds%E2%80%93Karp_algorithm and it is actually O(...) as described here? Also, what is n? rajeshwarv:* Could you please why the time complexity is not O(V*E^2) per http…
				spupyrevAuthorUnsubmitted Done Reply Inline Actions Unfortunately, it is not trivial to derive a tight worst-case estimation of our implementation, since we modify the classical algorithm (see in particular lines 206-209). For sure we're not doing more than v(f) augmentation iterations and each of them takes at most nm steps. But I don't know if this worst-case bound can be achieved on some instance. Furthermore, the worst-case complexity is a misleading concept to look into, since control-flow graphs have a "nice" structure. Typically the observed runtime is slightly super-linear. I've updated the comment, feel free to suggest more explanations. spupyrev:* Unfortunately, it is not trivial to derive a tight worst-case estimation of our implementation…
				LLVM_DEBUG(dbgs() << "Completed profi after " << AugmentationIters
				<< " iterations with " << TotalFlow << " total flow"
				<< " of " << TotalCost << " cost\n");
				return TotalCost;
				}

				/// Adding an edge to the network with a specified capacity and a cost.
				/// Multiple edges between a pair of nodes are allowed but self-edges
				/// are not supported.
				void addEdge(uint64_t Src, uint64_t Dst, int64_t Capacity, int64_t Cost) {
				assert(Capacity > 0 && "adding an edge of zero capacity");
				assert(Src != Dst && "loop edge are not supported");

				Edge SrcEdge;
				SrcEdge.Dst = Dst;
				SrcEdge.Cost = Cost;
				SrcEdge.Capacity = Capacity;
				SrcEdge.Flow = 0;
				SrcEdge.RevEdgeIndex = Edges[Dst].size();

				Edge DstEdge;
				DstEdge.Dst = Src;
				DstEdge.Cost = -Cost;
				DstEdge.Capacity = 0;
				DstEdge.Flow = 0;
				DstEdge.RevEdgeIndex = Edges[Src].size();

				Edges[Src].push_back(SrcEdge);
				Edges[Dst].push_back(DstEdge);
				}

				/// Adding an edge to the network of infinite capacity and a given cost.
				void addEdge(uint64_t Src, uint64_t Dst, int64_t Cost) {
				addEdge(Src, Dst, INF, Cost);
				}

				/// Get the total flow from a given source node.
				/// Returns a list of pairs (target node, amount of flow to the target).
				const std::vector<std::pair<uint64_t, int64_t>> getFlow(uint64_t Src) const {
				std::vector<std::pair<uint64_t, int64_t>> Flow;
				for (auto &Edge : Edges[Src]) {
				if (Edge.Flow > 0)
				Flow.push_back(std::make_pair(Edge.Dst, Edge.Flow));
				}
				return Flow;
				}

				/// Get the total flow between a pair of nodes.
				int64_t getFlow(uint64_t Src, uint64_t Dst) const {
				int64_t Flow = 0;
				for (auto &Edge : Edges[Src]) {
				if (Edge.Dst == Dst) {
				Flow += Edge.Flow;
				}
				}
				return Flow;
				}

				/// A cost of increasing a block's count by one.
				static constexpr int64_t AuxCostInc = 10;
				/// A cost of decreasing a block's count by one.
				static constexpr int64_t AuxCostDec = 20;
				/// A cost of increasing a count of zero-weight block by one.
				static constexpr int64_t AuxCostIncZero = 11;
				/// A cost of increasing the entry block's count by one.
				static constexpr int64_t AuxCostIncEntry = 40;
				/// A cost of decreasing the entry block's count by one.
				static constexpr int64_t AuxCostDecEntry = 10;
				/// A cost of taking an unlikely jump.
				static constexpr int64_t AuxCostUnlikely = ((int64_t)1) << 20;

				private:
				/// Check for existence of an augmenting path with a positive capacity.
				bool findAugmentingPath() {
				// Initialize data structures
				for (auto &Node : Nodes) {
				Node.Distance = INF;
				Node.ParentNode = uint64_t(-1);
				Node.ParentEdgeIndex = uint64_t(-1);
				Node.Taken = false;
				}

				std::queue<uint64_t> Queue;
				Queue.push(Source);
				Nodes[Source].Distance = 0;
				Nodes[Source].Taken = true;
				while (!Queue.empty()) {
				uint64_t Src = Queue.front();
				Queue.pop();
				Nodes[Src].Taken = false;
				// Although the residual network contains edges with negative costs
				// (in particular, backward edges), it can be shown that there are no
				// negative-weight cycles and the following two invariants are maintained:
				// (i) Dist[Source, V] >= 0 and (ii) Dist[V, Target] >= 0 for all nodes V,
				// where Dist is the length of the shortest path between two nodes. This
				// allows to prune the search-space of the path-finding algorithm using
				// the following early-stop criteria:
				// -- If we find a path with zero-distance from Source to Target, stop the
				// search, as the path is the shortest since Dist[Source, Target] >= 0;
				// -- If we have Dist[Source, V] > Dist[Source, Target], then do not
				// process node V, as it is guaranteed _not_ to be on a shortest path
				// from Source to Target; it follows from inequalities
				// Dist[Source, Target] >= Dist[Source, V] + Dist[V, Target]
				// >= Dist[Source, V]
				if (Nodes[Target].Distance == 0)
				break;
				if (Nodes[Src].Distance > Nodes[Target].Distance)
				continue;

				// Process adjacent edges
				for (uint64_t EdgeIdx = 0; EdgeIdx < Edges[Src].size(); EdgeIdx++) {
				auto &Edge = Edges[Src][EdgeIdx];
				if (Edge.Flow < Edge.Capacity) {
				uint64_t Dst = Edge.Dst;
				int64_t NewDistance = Nodes[Src].Distance + Edge.Cost;
				if (Nodes[Dst].Distance > NewDistance) {
				// Update the distance and the parent node/edge
				Nodes[Dst].Distance = NewDistance;
				Nodes[Dst].ParentNode = Src;
				Nodes[Dst].ParentEdgeIndex = EdgeIdx;
				// Add the node to the queue, if it is not there yet
				if (!Nodes[Dst].Taken) {
				Queue.push(Dst);
				Nodes[Dst].Taken = true;
				}
				}
				}
				}
				}

				return Nodes[Target].Distance != INF;
				}

				/// Update the current flow along the augmenting path.
				void augmentFlowAlongPath() {
				// Find path capacity
				int64_t PathCapacity = INF;
				uint64_t Now = Target;
				while (Now != Source) {
				uint64_t Pred = Nodes[Now].ParentNode;
				auto &Edge = Edges[Pred][Nodes[Now].ParentEdgeIndex];
				PathCapacity = std::min(PathCapacity, Edge.Capacity - Edge.Flow);
				Now = Pred;
				}

				assert(PathCapacity > 0 && "found incorrect augmenting path");

				// Update the flow along the path
				Now = Target;
				while (Now != Source) {
				uint64_t Pred = Nodes[Now].ParentNode;
				auto &Edge = Edges[Pred][Nodes[Now].ParentEdgeIndex];
				auto &RevEdge = Edges[Now][Edge.RevEdgeIndex];

				Edge.Flow += PathCapacity;
				RevEdge.Flow -= PathCapacity;

				Now = Pred;
				}
				}

				/// An node in a flow network.
				struct Node {
				/// The cost of the cheapest path from the source to the current node.
				int64_t Distance;
				/// The node preceding the current one in the path.
				uint64_t ParentNode;
				/// The index of the edge between ParentNode and the current node.
				uint64_t ParentEdgeIndex;
				/// An indicator of whether the current node is in a queue.
				bool Taken;
				};
				/// An edge in a flow network.
				struct Edge {
				/// The cost of the edge.
				int64_t Cost;
				/// The capacity of the edge.
				int64_t Capacity;
				/// The current flow on the edge.
				int64_t Flow;
				/// The destination node of the edge.
				uint64_t Dst;
				/// The index of the reverse edge between Dst and the current node.
				uint64_t RevEdgeIndex;
				};

				/// The set of network nodes.
				std::vector<Node> Nodes;
				/// The set of network edges.
				std::vector<std::vector<Edge>> Edges;
				/// Source node of the flow.
				uint64_t Source;
				/// Target (sink) node of the flow.
				uint64_t Target;
				};

				/// Initializing flow network for a given function.
				///
				/// Every block is split into three nodes that are responsible for (i) an
				/// incoming flow, (ii) an outgoing flow, and (iii) penalizing an increase or
				/// reduction of the block weight.
				void initializeNetwork(MinCostMaxFlow &Network, FlowFunction &Func) {
				uint64_t NumBlocks = Func.Blocks.size();
				assert(NumBlocks > 1 && "Too few blocks in a function");
				LLVM_DEBUG(dbgs() << "Initializing profi for " << NumBlocks << " blocks\n");

				// Pre-process data: make sure the entry weight is at least 1
				if (Func.Blocks[Func.Entry].Weight == 0) {
				Func.Blocks[Func.Entry].Weight = 1;
				}
				// Introducing dummy source/sink pairs to allow flow circulation.
				// The nodes corresponding to blocks of Func have indicies in the range
				// [0..3 * NumBlocks); the dummy nodes are indexed by the next four values.
				uint64_t S = 3 * NumBlocks;
				uint64_t T = S + 1;
				uint64_t S1 = S + 2;
				uint64_t T1 = S + 3;

				Network.initialize(3 * NumBlocks + 4, S1, T1);

				// Create three nodes for every block of the function
				for (uint64_t B = 0; B < NumBlocks; B++) {
				auto &Block = Func.Blocks[B];
				assert((!Block.UnknownWeight \|\| Block.Weight == 0 \|\| Block.isEntry()) &&
				"non-zero weight of a block w/o weight except for an entry");

				// Split every block into two nodes
				uint64_t Bin = 3 * B;
				uint64_t Bout = 3 * B + 1;
				uint64_t Baux = 3 * B + 2;
				if (Block.Weight > 0) {
				Network.addEdge(S1, Bout, Block.Weight, 0);
				Network.addEdge(Bin, T1, Block.Weight, 0);
				}

				// Edges from S and to T
				assert((!Block.isEntry() \|\| !Block.isExit()) &&
				"a block cannot be an entry and an exit");
				if (Block.isEntry()) {
				Network.addEdge(S, Bin, 0);
				} else if (Block.isExit()) {
				Network.addEdge(Bout, T, 0);
				}

				// An auxiliary node to allow increase/reduction of block counts:
				// We assume that decreasing block counts is more expensive than increasing,
				// and thus, setting separate costs here. In the future we may want to tune
				// the relative costs so as to maximize the quality of generated profiles.
				int64_t AuxCostInc = MinCostMaxFlow::AuxCostInc;
				int64_t AuxCostDec = MinCostMaxFlow::AuxCostDec;
				if (Block.UnknownWeight) {
				// Do not penalize changing weights of blocks w/o known profile count
				AuxCostInc = 0;
				AuxCostDec = 0;
				} else {
				// Increasing the count for "cold" blocks with zero initial count is more
				// expensive than for "hot" ones
				if (Block.Weight == 0) {
				AuxCostInc = MinCostMaxFlow::AuxCostIncZero;
				}
				// Modifying the count of the entry block is expensive
				if (Block.isEntry()) {
				AuxCostInc = MinCostMaxFlow::AuxCostIncEntry;
				AuxCostDec = MinCostMaxFlow::AuxCostDecEntry;
				}
				}
				// For blocks with self-edges, do not penalize a reduction of the count,
				// as all of the increase can be attributed to the self-edge
				if (Block.HasSelfEdge) {
				AuxCostDec = 0;
				}

				Network.addEdge(Bin, Baux, AuxCostInc);
				Network.addEdge(Baux, Bout, AuxCostInc);
				if (Block.Weight > 0) {
				Network.addEdge(Bout, Baux, AuxCostDec);
				Network.addEdge(Baux, Bin, AuxCostDec);
				}
				}

				// Creating edges for every jump
				for (auto &Jump : Func.Jumps) {
				uint64_t Src = Jump.Source;
				uint64_t Dst = Jump.Target;
				if (Src != Dst) {
				uint64_t SrcOut = 3 * Src + 1;
				uint64_t DstIn = 3 * Dst;
				uint64_t Cost = Jump.IsUnlikely ? MinCostMaxFlow::AuxCostUnlikely : 0;
				Network.addEdge(SrcOut, DstIn, Cost);
				}
				}

				// Make sure we have a valid flow circulation
				Network.addEdge(T, S, 0);
				}

				/// Extract resulting block and edge counts from the flow network.
				void extractWeights(MinCostMaxFlow &Network, FlowFunction &Func) {
				uint64_t NumBlocks = Func.Blocks.size();

				// Extract resulting block counts
				for (uint64_t Src = 0; Src < NumBlocks; Src++) {
				auto &Block = Func.Blocks[Src];
				uint64_t SrcOut = 3 * Src + 1;
				int64_t Flow = 0;
				for (auto &Adj : Network.getFlow(SrcOut)) {
				uint64_t DstIn = Adj.first;
				int64_t DstFlow = Adj.second;
				bool IsAuxNode = (DstIn < 3 * NumBlocks && DstIn % 3 == 2);
				if (!IsAuxNode \|\| Block.HasSelfEdge) {
				Flow += DstFlow;
				}
				}
				Block.Flow = Flow;
				assert(Flow >= 0 && "negative block flow");
				}

				// Extract resulting jump counts
				for (auto &Jump : Func.Jumps) {
				uint64_t Src = Jump.Source;
				uint64_t Dst = Jump.Target;
				int64_t Flow = 0;
				if (Src != Dst) {
				uint64_t SrcOut = 3 * Src + 1;
				uint64_t DstIn = 3 * Dst;
				Flow = Network.getFlow(SrcOut, DstIn);
				} else {
				uint64_t SrcOut = 3 * Src + 1;
				uint64_t SrcAux = 3 * Src + 2;
				int64_t AuxFlow = Network.getFlow(SrcOut, SrcAux);
				if (AuxFlow > 0)
				Flow = AuxFlow;
				}
				Jump.Flow = Flow;
				assert(Flow >= 0 && "negative jump flow");
				}
				}

				#ifndef NDEBUG
				/// Verify that the computed flow values satisfy flow conservation rules
				void verifyWeights(const FlowFunction &Func) {
				const uint64_t NumBlocks = Func.Blocks.size();
				auto InFlow = std::vector<uint64_t>(NumBlocks, 0);
				auto OutFlow = std::vector<uint64_t>(NumBlocks, 0);
				for (auto &Jump : Func.Jumps) {
				InFlow[Jump.Target] += Jump.Flow;
				OutFlow[Jump.Source] += Jump.Flow;
				}

				uint64_t TotalInFlow = 0;
				uint64_t TotalOutFlow = 0;
				for (uint64_t I = 0; I < NumBlocks; I++) {
				auto &Block = Func.Blocks[I];
				if (Block.isEntry()) {
				TotalInFlow += Block.Flow;
				assert(Block.Flow == OutFlow[I] && "incorrectly computed control flow");
				} else if (Block.isExit()) {
				TotalOutFlow += Block.Flow;
				assert(Block.Flow == InFlow[I] && "incorrectly computed control flow");
				} else {
				assert(Block.Flow == OutFlow[I] && "incorrectly computed control flow");
				assert(Block.Flow == InFlow[I] && "incorrectly computed control flow");
				}
				}
				assert(TotalInFlow == TotalOutFlow && "incorrectly computed control flow");
				}
				#endif

				} // end of anonymous namespace

				/// Apply the profile inference algorithm for a given flow function
				void llvm::applyFlowInference(FlowFunction &Func) {
				// Create and apply an inference network model
				auto InferenceNetwork = MinCostMaxFlow();
				initializeNetwork(InferenceNetwork, Func);
				InferenceNetwork.run();

				// Extract flow values for every block and every edge
				extractWeights(InferenceNetwork, Func);

				#ifndef NDEBUG
				// Verify the result
				verifyWeights(Func);
				#endif
				}

llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp

Show All 28 Lines	cl::opt<unsigned> SampleProfileSampleCoverage(
cl::desc("Emit a warning if less than N% of samples in the input profile "		cl::desc("Emit a warning if less than N% of samples in the input profile "
"are matched to the IR."));		"are matched to the IR."));

cl::opt<bool> NoWarnSampleUnused(		cl::opt<bool> NoWarnSampleUnused(
"no-warn-sample-unused", cl::init(false), cl::Hidden,		"no-warn-sample-unused", cl::init(false), cl::Hidden,
cl::desc("Use this option to turn off/on warnings about function with "		cl::desc("Use this option to turn off/on warnings about function with "
"samples but without debug information to use those samples. "));		"samples but without debug information to use those samples. "));

		cl::opt<bool> SampleProfileUseProfi(
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions This is defined as cl::opt<bool> mehdi_amini: This is defined as cl::opt<bool>
		"sample-profile-use-profi", cl::init(false), cl::Hidden, cl::ZeroOrMore,
		cl::desc("Use profi to infer block and edge counts."));

namespace sampleprofutil {		namespace sampleprofutil {

/// Return true if the given callsite is hot wrt to hot cutoff threshold.		/// Return true if the given callsite is hot wrt to hot cutoff threshold.
///		///
/// Functions that were inlined in the original binary will be represented		/// Functions that were inlined in the original binary will be represented
/// in the inline stack in the sample profile. If the profile shows that		/// in the inline stack in the sample profile. If the profile shows that
/// the original inline decision was "good" (i.e., the callsite is executed		/// the original inline decision was "good" (i.e., the callsite is executed
/// frequently), then we will recreate the inline decision and apply the		/// frequently), then we will recreate the inline decision and apply the
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof

This file was added.

				test_1:23968:0
				1: 100
				2: 60
				3: 40
				!CFGChecksum: 4294967295

				test_2:23968:0
				1: 100
				3: 10
				!CFGChecksum: 37753817093

				test_3:10000:0
				3: 13
				5: 89
				!CFGChecksum: 69502983527

				sum_of_squares:23968:0
				2: 5993
				3: 1
				4: 5992
				5: 5992
				8: 5992
				!CFGChecksum: 175862120757

llvm/test/Transforms/SampleProfile/profile-inference.ll

This file was added.

				; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-use-profi -sample-profile-file=%S/Inputs/profile-inference.prof \| opt -analyze -branch-prob -enable-new-pm=0 \| FileCheck %s
				; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-use-profi -sample-profile-file=%S/Inputs/profile-inference.prof \| opt -analyze -block-freq -enable-new-pm=0 \| FileCheck %s --check-prefix=CHECK2

				; The test verifies that profile inference correctly builds branch probabilities
				; from sampling-based block counts.
				;
				; +---------+ +----------+
				; \| b3 [40] \| <-- \| b1 [100] \|
				; +---------+ +----------+
				; \|
				; \|
				; v
				; +----------+
				; \| b2 [60] \|
				; +----------+

				@yydebug = dso_local global i32 0, align 4

				; Function Attrs: nounwind uwtable
				define dso_local i32 @test_1() #0 {
				b1:
				call void @llvm.pseudoprobe(i64 7964825052912775246, i64 1, i32 0, i64 -1)
				%0 = load i32, i32* @yydebug, align 4
				%cmp = icmp ne i32 %0, 0
				br i1 %cmp, label %b2, label %b3
				; CHECK: edge b1 -> b2 probability is 0x4ccccccd / 0x80000000 = 60.00%
				; CHECK: edge b1 -> b3 probability is 0x33333333 / 0x80000000 = 40.00%
				; CHECK2: - b1: float = {{.}}, int = {{.}}, count = 100

				b2:
				call void @llvm.pseudoprobe(i64 7964825052912775246, i64 2, i32 0, i64 -1)
				ret i32 %0
				; CHECK2: - b2: float = {{.}}, int = {{.}}, count = 60

				b3:
				call void @llvm.pseudoprobe(i64 7964825052912775246, i64 3, i32 0, i64 -1)
				ret i32 %0
				; CHECK2: - b3: float = {{.}}, int = {{.}}, count = 40
				}


				; The test verifies that profile inference correctly builds branch probabilities
				; from sampling-based block counts in the presence of "dangling" probes (whose
				; block counts are missing).
				;
				; +---------+ +----------+
				; \| b3 [10] \| <-- \| b1 [100] \|
				; +---------+ +----------+
				; \|
				; \|
				; v
				; +----------+
				; \| b2 [?] \|
				; +----------+

				; Function Attrs: nounwind uwtable
				define dso_local i32 @test_2() #0 {
				b1:
				call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 1, i32 0, i64 -1)
				%0 = load i32, i32* @yydebug, align 4
				%cmp = icmp ne i32 %0, 0
				br i1 %cmp, label %b2, label %b3
				; CHECK: edge b1 -> b2 probability is 0x73333333 / 0x80000000 = 90.00%
				; CHECK: edge b1 -> b3 probability is 0x0ccccccd / 0x80000000 = 10.00%
				; CHECK2: - b1: float = {{.}}, int = {{.}}, count = 100

				b2:
				call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 2, i32 0, i64 -1)
				ret i32 %0
				; CHECK2: - b2: float = {{.}}, int = {{.}}, count = 90

				b3:
				call void @llvm.pseudoprobe(i64 -6216829535442445639, i64 3, i32 0, i64 -1)
				ret i32 %0
				}
				; CHECK2: - b3: float = {{.}}, int = {{.}}, count = 10


				; The test verifies that profi is able to infer block counts from hot subgraphs.
				;
				; +---------+ +---------+
				; \| b4 [?] \| <-- \| b1 [?] \|
				; +---------+ +---------+
				; \| \|
				; \| \|
				; v v
				; +---------+ +---------+
				; \| b5 [89] \| \| b2 [?] \|
				; +---------+ +---------+
				; \|
				; \|
				; v
				; +---------+
				; \| b3 [13] \|
				; +---------+

				; Function Attrs: nounwind uwtable
				define dso_local i32 @test_3() #0 {
				b1:
				call void @llvm.pseudoprobe(i64 1649282507922421973, i64 1, i32 0, i64 -1)
				%0 = load i32, i32* @yydebug, align 4
				%cmp = icmp ne i32 %0, 0
				br i1 %cmp, label %b2, label %b4
				; CHECK: edge b1 -> b2 probability is 0x10505050 / 0x80000000 = 12.75%
				; CHECK: edge b1 -> b4 probability is 0x6fafafb0 / 0x80000000 = 87.25%
				; CHECK2: - b1: float = {{.}}, int = {{.}}, count = 102

				b2:
				call void @llvm.pseudoprobe(i64 1649282507922421973, i64 2, i32 0, i64 -1)
				br label %b3
				; CHECK: edge b2 -> b3 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b2: float = {{.}}, int = {{.}}, count = 13

				b3:
				call void @llvm.pseudoprobe(i64 1649282507922421973, i64 3, i32 0, i64 -1)
				ret i32 %0
				; CHECK2: - b3: float = {{.}}, int = {{.}}, count = 13

				b4:
				call void @llvm.pseudoprobe(i64 1649282507922421973, i64 4, i32 0, i64 -1)
				br label %b5
				; CHECK: edge b4 -> b5 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b4: float = {{.}}, int = {{.}}, count = 89

				b5:
				call void @llvm.pseudoprobe(i64 1649282507922421973, i64 5, i32 0, i64 -1)
				ret i32 %0
				; CHECK2: - b5: float = {{.}}, int = {{.}}, count = 89
				}


				; A larger test to verify that profile inference correctly identifies hot parts
				; of the control-flow graph.
				;
				; +-----------+
				; \| b1 [?] \|
				; +-----------+
				; \|
				; \|
				; v
				; +--------+ +-----------+
				; \| b3 [1] \| <-- \| b2 [5993] \|
				; +--------+ +-----------+
				; \| \|
				; \| \|
				; \| v
				; \| +-----------+ +--------+
				; \| \| b4 [5992] \| --> \| b6 [?] \|
				; \| +-----------+ +--------+
				; \| \| \|
				; \| \| \|
				; \| v \|
				; \| +-----------+ \|
				; \| \| b5 [5992] \| \|
				; \| +-----------+ \|
				; \| \| \|
				; \| \| \|
				; \| v \|
				; \| +-----------+ \|
				; \| \| b7 [?] \| \|
				; \| +-----------+ \|
				; \| \| \|
				; \| \| \|
				; \| v \|
				; \| +-----------+ \|
				; \| \| b8 [5992] \| <-----+
				; \| +-----------+
				; \| \|
				; \| \|
				; \| v
				; \| +-----------+
				; +----------> \| b9 [?] \|
				; +-----------+

				; Function Attrs: nounwind uwtable
				define dso_local i32 @sum_of_squares() #0 {
				b1:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 1, i32 0, i64 -1)
				%0 = load i32, i32* @yydebug, align 4
				%cmp = icmp ne i32 %0, 0
				br label %b2
				; CHECK: edge b1 -> b2 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b1: float = {{.}}, int = {{.}}, count = 5993

				b2:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 2, i32 0, i64 -1)
				br i1 %cmp, label %b4, label %b3
				; CHECK: edge b2 -> b4 probability is 0x7ffa8844 / 0x80000000 = 99.98%
				; CHECK: edge b2 -> b3 probability is 0x000577bc / 0x80000000 = 0.02%
				; CHECK2: - b2: float = {{.}}, int = {{.}}, count = 5993

				b3:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 3, i32 0, i64 -1)
				br label %b9
				; CHECK: edge b3 -> b9 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b3: float = {{.}}, int = {{.}}, count = 1

				b4:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 4, i32 0, i64 -1)
				br i1 %cmp, label %b5, label %b6
				; CHECK: edge b4 -> b5 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK: edge b4 -> b6 probability is 0x00000000 / 0x80000000 = 0.00%
				; CHECK2: - b4: float = {{.}}, int = {{.}}, count = 5992

				b5:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 5, i32 0, i64 -1)
				br label %b7
				; CHECK: edge b5 -> b7 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b5: float = {{.}}, int = {{.}}, count = 5992

				b6:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 6, i32 0, i64 -1)
				br label %b8
				; CHECK: edge b6 -> b8 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b6: float = {{.}}, int = {{.}}, count = 0

				b7:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 7, i32 0, i64 -1)
				br label %b8
				; CHECK: edge b7 -> b8 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b7: float = {{.}}, int = {{.}}, count = 5992

				b8:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 8, i32 0, i64 -1)
				br label %b9
				; CHECK: edge b8 -> b9 probability is 0x80000000 / 0x80000000 = 100.00%
				; CHECK2: - b8: float = {{.}}, int = {{.}}, count = 5992

				b9:
				call void @llvm.pseudoprobe(i64 -907520326213521421, i64 9, i32 0, i64 -1)
				ret i32 %0
				}
				; CHECK2: - b9: float = {{.}}, int = {{.}}, count = 5993

				declare void @llvm.pseudoprobe(i64, i64, i32, i64) #1

				attributes #0 = { noinline nounwind uwtable "use-sample-profile"}
				attributes #1 = { nounwind }

				!llvm.pseudo_probe_desc = !{!6, !7, !8, !9}

				!6 = !{i64 7964825052912775246, i64 4294967295, !"test_1", null}
				!7 = !{i64 -6216829535442445639, i64 37753817093, !"test_2", null}
				!8 = !{i64 1649282507922421973, i64 69502983527, !"test_3", null}
				!9 = !{i64 -907520326213521421, i64 175862120757, !"sum_of_squares", null}

This is an archive of the discontinued LLVM Phabricator instance.

profi - a flow-based profile inference algorithm: Part I (out of 3)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 391077

llvm/include/llvm/Transforms/Utils/SampleProfileInference.h

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/lib/Transforms/Utils/CMakeLists.txt

llvm/lib/Transforms/Utils/SampleProfileInference.cpp

llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp

llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof

llvm/test/Transforms/SampleProfile/profile-inference.ll

profi - a flow-based profile inference algorithm: Part I (out of 3)
ClosedPublic