This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-profdata.rst
-
include/llvm/
-
llvm/
-
ProfileData/
-
InstrProf.h
-
Support/
4/7
BalancedPartitioning.h
-
lib/
-
ProfileData/
-
InstrProf.cpp
-
Support/
15/20
BalancedPartitioning.cpp
-
CMakeLists.txt
-
test/tools/llvm-profdata/
-
tools/
-
llvm-profdata/
-
show-order.proftext
-
tools/llvm-profdata/
-
llvm-profdata/
5/5
llvm-profdata.cpp
-
unittests/Support/
-
Support/
-
BalancedPartitioningTest.cpp
1/2
CMakeLists.txt

Differential D147812

[InstrProf] Use BalancedPartitioning to order temporal profiling trace data
ClosedPublic

Authored by ellis on Apr 7 2023, 2:02 PM.

Download Raw Diff

Details

Reviewers

snehasish
davidxl
spupyrev
wenlei
MaskRay

Commits

rGa4845eaf2e9a: [InstrProf] Skip Balanced Partitioning tests on ARM
rG1117b9a284aa: [InstrProf] Use BalancedPartitioning to order temporal profiling trace data

Summary

In [0] we described an algorithm called BalancedPartitioning (bp) to consume function traces [1] and compute a function order that reduces the number of page faults during startup.

This patch adds the order command to the llvm-profdata tool which uses bp to output a function order that can be passed to the linker via --symbol-ordering-file=.

Special thanks to Sergey Pupyrev and Julian Mestre for designing this balanced partitioning algorithm.

[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
[1] https://reviews.llvm.org/D147287

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ellis created this revision.Apr 7 2023, 2:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 2:02 PM

Herald added subscribers: wlei, wenlei, mgrang, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B224286: Diff 511792.Apr 7 2023, 2:09 PM

Update

Harbormaster completed remote builds in B224322: Diff 511837.Apr 7 2023, 7:57 PM

ellis retitled this revision from [InstrProf] Use BalancedPartitioning to order trace data to [InstrProf] Use BalancedPartitioning to order temporal profiling trace data.Apr 11 2023, 8:36 AM

ellis added reviewers: snehasish, davidxl, spupyrev, wenlei, MaskRay.

ellis published this revision for review.Apr 11 2023, 9:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2023, 9:17 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

A few high level questions:

Different types of traces don't have the same frequency, so it might be useful to support weighting. The frequency certain trace pattern appear in the profile data does not necessarily match to their frequency in real world usage. To support this, some kind of symbolic id may be needed to annotate the trace data.

The cost metric is entropy like. Have you tried other metrics such as gini impurity?

llvm/lib/Support/BalancedPartitioning.cpp
202	why 4x larger? Is the number of utility nodes the same as 'the number of traces * number of cutoffs" ?
llvm/tools/llvm-profdata/llvm-profdata.cpp
3042	Need to document it in commandline guide.
3050	How are the default values selected?

In D147812#4259190, @davidxl wrote:

A few high level questions:

Different types of traces don't have the same frequency, so it might be useful to support weighting. The frequency certain trace pattern appear in the profile data does not necessarily match to their frequency in real world usage. To support this, some kind of symbolic id may be needed to annotate the trace data.

I see what you are saying. We could have a set of raw profiles collected under type "A" conditions and another set under type "B" conditions. Maybe type "A" is common while type "B" is more rare so we'd like to weight "A"s traces more than "B"s.

This could be implemented with an extra "weight" field in the trace data for each trace. It hasn't been long since I landed https://reviews.llvm.org/D147287. Do you think it makes sense to land a patch to add this field without updating the version and supporting two trace formats?

The other option is to have the llvm-profdata merge command duplicate traces with extra weight. Of course this reduces the data density so we couldn't store as many profiles.

The cost metric is entropy like. Have you tried other metrics such as gini impurity?

@spupyrev @jmestre Do you have any thoughts on this?

llvm/lib/Support/BalancedPartitioning.cpp
202	The `Cutoffs` variable is only used with bp when ordering functions from traces. We can also use bp to order functions to improve compression by placing similar function close together. For that case we have another way to assign utility nodes to function nodes, so the number of signatures will be different.
llvm/tools/llvm-profdata/llvm-profdata.cpp
3050	These values were also empirically found by optimizing a large binary. I was debating leaving out a default value since these values are pretty specific to the binary we tested. Now that I think about it, we can try to derive a similar list by looking at the total number of functions and assign cutoffs linearly.

In D147812#4259461, @ellis wrote:

In D147812#4259190, @davidxl wrote:

A few high level questions:

Different types of traces don't have the same frequency, so it might be useful to support weighting. The frequency certain trace pattern appear in the profile data does not necessarily match to their frequency in real world usage. To support this, some kind of symbolic id may be needed to annotate the trace data.

I see what you are saying. We could have a set of raw profiles collected under type "A" conditions and another set under type "B" conditions. Maybe type "A" is common while type "B" is more rare so we'd like to weight "A"s traces more than "B"s.

This could be implemented with an extra "weight" field in the trace data for each trace. It hasn't been long since I landed https://reviews.llvm.org/D147287. Do you think it makes sense to land a patch to add this field without updating the version and supporting two trace formats?

It should be fine -- since there is no ordering tool/support (like this patch does) that enable folks to use the feature yet.

The other option is to have the llvm-profdata merge command duplicate traces with extra weight. Of course this reduces the data density so we couldn't store as many profiles.

The cost metric is entropy like. Have you tried other metrics such as gini impurity?

@spupyrev @jmestre Do you have any thoughts on this?

llvm/tools/llvm-profdata/llvm-profdata.cpp
3050	or add some empirical guidance in the description.

Add commandline docs

Harbormaster completed remote builds in B224882: Diff 512598.Apr 11 2023, 3:43 PM

ellis mentioned this in D148150: [InstrProf][Temporal] Add weight field to traces.Apr 12 2023, 11:00 AM

ellis mentioned this in rG4bddef411740: [InstrProf][Temporal] Add weight field to traces.Apr 13 2023, 10:37 AM

Rebase and remove utility nodes if they have edges to only one function or all the functions.

In D147812#4259507, @davidxl wrote:

In D147812#4259461, @ellis wrote:

In D147812#4259190, @davidxl wrote:

A few high level questions:

Different types of traces don't have the same frequency, so it might be useful to support weighting. The frequency certain trace pattern appear in the profile data does not necessarily match to their frequency in real world usage. To support this, some kind of symbolic id may be needed to annotate the trace data.

I see what you are saying. We could have a set of raw profiles collected under type "A" conditions and another set under type "B" conditions. Maybe type "A" is common while type "B" is more rare so we'd like to weight "A"s traces more than "B"s.

This could be implemented with an extra "weight" field in the trace data for each trace. It hasn't been long since I landed https://reviews.llvm.org/D147287. Do you think it makes sense to land a patch to add this field without updating the version and supporting two trace formats?

It should be fine -- since there is no ordering tool/support (like this patch does) that enable folks to use the feature yet.

The other option is to have the llvm-profdata merge command duplicate traces with extra weight. Of course this reduces the data density so we couldn't store as many profiles.

The cost metric is entropy like. Have you tried other metrics such as gini impurity?

@spupyrev @jmestre Do you have any thoughts on this?

I've added https://reviews.llvm.org/rG4bddef4117403a305727d145a9abf6bda700f8ff but I'm not using the weights yet. I haven't decided the best way to use them yet, and I haven't decided if I'll do that here or in a followup diff.

Harbormaster completed remote builds in B225454: Diff 513382.Apr 13 2023, 5:32 PM

The cost metric is entropy like. Have you tried other metrics such as gini impurity?

For context, we're using (a variant of) the same reordering optimization across several applications internally for several years now. We did extensively test and engineer the implementation, and I am so happy to see it finally open sourced! There are a couple of research papers describing the technique with details and evaluation, e.g., (1) or (2), including the evaluation of various objectives.

We did try several alternatives for the objective and the log-gap (the "entropy") seems to be the best choice across the majority of use cases, including the start-up function layout. For the dataset I have at hand, the Gini impurity behaves slightly worse than the current log-gap one. I'd like also to clarify that we're speaking about second order effects here. That is, the difference between "no-reordered layout" vs "log-gap reordered one" is an order (or two) of magnitude larger than the difference between "log-gap" and "gini".

spupyrev added inline comments.Apr 14 2023, 12:22 PM

llvm/include/llvm/Support/BalancedPartitioning.h
89	I thought we use `18` by default?
llvm/lib/Support/BalancedPartitioning.cpp
86	Would be great to unify the implementation with the existing ThreadPool, but I don't have a suggestion on how exactly to implement that. I'm fine with keeping it as is, if there are no alternative suggestions.
215	Do we want to get rid of the map? (and the weird 4x initialization)
367	I think we've got rid of this method
llvm/tools/llvm-profdata/llvm-profdata.cpp
3050	The assumption here is that most of the traces are of size ~32K. Perhaps we could also add a comment and a clarification to re-consider the values, if the traces are of significantly different sizes?

Remove cutoff values and refactor unittest

ellis planned changes to this revision.Apr 15 2023, 9:08 AM

ellis marked 3 inline comments as done.

Harbormaster completed remote builds in B225857: Diff 513921.Apr 15 2023, 10:02 AM

Make moveGain() much faster

I've ran some performance analysis by running the BalancedPartitioningTest.Large test with different sizes.

GTEST_FILTER="BalancedPartitioningTest.Large" perf record --call-graph lbr -- build/unittests/Support/SupportTests

I found that most of the time is spent in moveGain(), so I made several changes.

Switch from double to float.
Use std::accumulate() instead of a for loop to compute gain.
Pull out constant FromLeftToRight from loop.
Use a vector instead of a map for Signatures.
Separate CachedCost into two floats and a bool since moveGain() assumes the cache is valid.

llvm/lib/Support/BalancedPartitioning.cpp
215	I've removed the map because I found that it is more efficient to renumber these utility nodes so these signatures can be an array indexed by the utility node.
367	I've deleted `computeGoal()`.

Harbormaster completed remote builds in B226246: Diff 514438.Apr 17 2023, 4:57 PM

spupyrev added inline comments.May 2 2023, 7:17 AM

llvm/lib/Support/BalancedPartitioning.cpp
148
201	How much overhead does `DenseMap` have in comparison with `std::vector`? I assume one could first find the maximum utility index and then use a vector instead of a map. Of course, that should only be done if there is visible speedup.
213	maybe add `UtilityNodeIndex.reserve(NumUtilities)` (and use the variable for signature initialization too)
258–265
363	should it be `Gain + Signatures[UN].CachedCostLR` instead? Assuming I'm correct and this is a bug, can we add a test to catch the problem?
367	same here

Herald added a subscriber: hoy. · View Herald TranscriptMay 2 2023, 7:17 AM

Fix bug in moveGain

llvm/lib/Support/BalancedPartitioning.cpp
201	From looking at `perf record` it seems like very little time is spent with `UtilityNodeDegree`. And since I'm renumbering the utility nodes below, we can't index by utilities at this point, so I think it's best to leave it as it is.
213	We don't actually know how many utilities there are at this point because there could be duplicates.
363	Thanks for the catch! I've added a simple unittest which would have caught this. I guess this explains why I saw a perf improvement. We are still spending the most time (~30%) in `moveGain()`, but I've switched back to a simpler implementation because I'm not seeing any gains from using `std::accumulate()`.

Looks good to me, thanks! I'd wait a bit to let others provide their feedback

snehasish added inline comments.May 2 2023, 2:53 PM

llvm/include/llvm/Support/BalancedPartitioning.h
43	[optional] It seems a bit odd to have an include from llvm/ProfileData -> llvm/Support. Can we forward decl TemporalTraceTy and keep this Support header independent? I think that's the only thing we need from InstrProf in this header but it might require changing the fromTemporalProfTraces API a little bit. Just a flyby comment, I'm not planning on reviewing the patch in detail since others already spent time looking into it.

Harbormaster completed remote builds in B229533: Diff 518861.May 2 2023, 4:16 PM

Move BPFunctionNode::fromTemporalProfTraces() -> TemporalProfTraceTy::createBPFunctionNodes()

llvm/include/llvm/Support/BalancedPartitioning.h
43	Thanks for the feedback! I've moved `BPFunctionNode::fromTemporalProfTraces()` -> `TemporalProfTraceTy::createBPFunctionNodes()` and I think that fits better since the trace data is now responsible for knowing how to construct utility nodes. Now `Support/BalancedPartitioning` is independent of InstrProf, but `BalancedPartitioningTest.cpp` does now need to link `ProfileData`. I also changed the API to use `ArrayRef` since it doesn't need to modify `Traces`.

Harbormaster completed remote builds in B230032: Diff 519560.May 4 2023, 11:11 AM

Does anyone have any more concerns?

kyulee added a subscriber: kyulee.May 25 2023, 10:28 PM

kyulee added inline comments.

llvm/include/llvm/Support/BalancedPartitioning.h
27	Not sure where you got, but it seems slightly different than the one in the linked paper.
36	Not blocking, but I wonder if you want to add a link to LCTES 2023 when the proceeding is available soon?
llvm/lib/Support/BalancedPartitioning.cpp
179	I'm not sure what this code does. I might miss something.
317	Given `logCost` is a negative value which we want to minimize for an objective, my read on `CachedCostLR` is actually a cost saving (or gain) when `N` is moved from Left to Right. So, wouldn't it make sense naming it `CachedCostSavingLR` or `CachedGainLR` so that accumulating them to `Gain` sounds natural?

Rename CachedCost -> CachedGain and refactor runIteration() a bit

llvm/include/llvm/Support/BalancedPartitioning.h
27	Yes, the paper reports a runtime of O(m log n + n log2 n). However O(m log2 n) should be slightly worse and much easier to understand, so I think this is good enough 🙂
36	Since the paper we submitted to LCTES 2023 is slightly shorter I think it makes sense to keep this arxiv link which contains our whole paper.
llvm/lib/Support/BalancedPartitioning.cpp
179	The contents of `BPFunctionNode::UtilityNodes` could originally contain arbitrary values. To make matters worse, each bisect step could shuffle these nodes into different sections. Here we are computing `UtilityNodeIndex` to be a map from `UtilityNodeT` to an int in range [0,N) where N is the number of unique utility nodes. Then we can update each utility node so they are in range [0,N). Then we can make `SignaturesT` a normal vector indexed by utility nodes for performance.
317	I've renamed it to `CachedGainLR` which makes sense because we want to maximize it. Thanks for the suggestion!

Harbormaster completed remote builds in B234900: Diff 526113.May 26 2023, 11:40 AM

LGTM

This revision is now accepted and ready to land.Jun 5 2023, 8:36 AM

Closed by commit rG1117b9a284aa: [InstrProf] Use BalancedPartitioning to order temporal profiling trace data (authored by ellis). · Explain WhyJun 6 2023, 12:00 PM

This revision was automatically updated to reflect the committed changes.

ellis added a commit: rG1117b9a284aa: [InstrProf] Use BalancedPartitioning to order temporal profiling trace data.

ellis mentioned this in rG266ffd7afff9: [InstrProf] Fix warning about converting double to float.Jun 6 2023, 12:38 PM

thakis added a subscriber: thakis.Jun 6 2023, 4:59 PM

thakis added inline comments.

llvm/unittests/Support/CMakeLists.txt
2	This increases the number of files needed to compile SupportTests by 200% (from ~200 to ~600). Is there a way to prevent this dep with some mocking or similar?

ellis mentioned this in D152325: [InstrProf] Move BPFunctionNode test to ProfileDataTests.Jun 6 2023, 6:08 PM

ellis added inline comments.Jun 6 2023, 6:08 PM

llvm/unittests/Support/CMakeLists.txt
2	Thanks for flagging! In https://reviews.llvm.org/D152325 I've moved one of theses tests to `unittests/ProfileData` so we can remove the dependency.

ellis mentioned this in rG1794532bb942: [InstrProf] Move BPFunctionNode test to ProfileDataTests.Jun 6 2023, 7:13 PM

The show-order.proftext test never finishes on armhf, causing a timeout in ninja check-all and a failure in this build bot: https://lab.llvm.org/buildbot/#/builders/178

In D147812#4403228, @luporl wrote:

The show-order.proftext test never finishes on armhf, causing a timeout in ninja check-all and a failure in this build bot: https://lab.llvm.org/buildbot/#/builders/178

Looking now. I suspect this is related to the fact that threads are disabled (-DLLVM_ENABLE_THREADS=OFF). I don't see any mention of the show-order.proftext test in the job. Have you reproduced locally?

luporl added a commit: rGa4845eaf2e9a: [InstrProf] Skip Balanced Partitioning tests on ARM.Jun 7 2023, 10:43 AM

In D147812#4403792, @ellis wrote:

In D147812#4403228, @luporl wrote:

The show-order.proftext test never finishes on armhf, causing a timeout in ninja check-all and a failure in this build bot: https://lab.llvm.org/buildbot/#/builders/178

Looking now. I suspect this is related to the fact that threads are disabled (-DLLVM_ENABLE_THREADS=OFF). I don't see any mention of the show-order.proftext test in the job. Have you reproduced locally?

Sorry, I saw your message after having just disabled the tests on ARM, to get the build bot working again. But we can revert it as soon as they are fixed.

I have been able to reproduce it locally, both show-order.proftext and BalancedPartitioningTest just hang and never finish. They don't consume any significant amount of CPU or memory. It seems like they are waiting for something.

I guess the tests weren't printed in the bot logs because they haven't passed or failed, they were still being executed when the timeout killed them.

In D147812#4403842, @luporl wrote:

In D147812#4403792, @ellis wrote:

In D147812#4403228, @luporl wrote:

The show-order.proftext test never finishes on armhf, causing a timeout in ninja check-all and a failure in this build bot: https://lab.llvm.org/buildbot/#/builders/178

Looking now. I suspect this is related to the fact that threads are disabled (-DLLVM_ENABLE_THREADS=OFF). I don't see any mention of the show-order.proftext test in the job. Have you reproduced locally?

Sorry, I saw your message after having just disabled the tests on ARM, to get the build bot working again. But we can revert it as soon as they are fixed.

I have been able to reproduce it locally, both show-order.proftext and BalancedPartitioningTest just hang and never finish. They don't consume any significant amount of CPU or memory. It seems like they are waiting for something.

I guess the tests weren't printed in the bot logs because they haven't passed or failed, they were still being executed when the timeout killed them.

No problem at all! Thanks for fixing the bots for me. I think I just reproduced locally (simply by adding -DLLVM_ENABLE_THREADS=OFF to the cmake command) so I should be able to post a fix soon.

ellis mentioned this in D152390: [InstrProf] Fix BalancedPartitioning when threads are disabled.Jun 7 2023, 11:16 AM

ellis mentioned this in rGc1d935ece346: [InstrProf] Fix BalancedPartitioning when threads are disabled.Jun 7 2023, 12:04 PM

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-profdata.rst

35 lines

include/

llvm/

ProfileData/

InstrProf.h

12 lines

Support/

BalancedPartitioning.h

197 lines

lib/

ProfileData/

InstrProf.cpp

43 lines

Support/

BalancedPartitioning.cpp

325 lines

CMakeLists.txt

1 line

test/

tools/

llvm-profdata/

show-order.proftext

46 lines

tools/

llvm-profdata/

llvm-profdata.cpp

50 lines

unittests/

Support/

BalancedPartitioningTest.cpp

121 lines

CMakeLists.txt

2 lines

Diff 528975

llvm/docs/CommandGuide/llvm-profdata.rst

	Show All 14 Lines
	data files.			data files.

	COMMANDS			COMMANDS
	--------			--------

	* :ref:`merge <profdata-merge>`			* :ref:`merge <profdata-merge>`
	* :ref:`show <profdata-show>`			* :ref:`show <profdata-show>`
	* :ref:`overlap <profdata-overlap>`			* :ref:`overlap <profdata-overlap>`
				* :ref:`order <profdata-order>`

	.. program:: llvm-profdata merge			.. program:: llvm-profdata merge

	.. _profdata-merge:			.. _profdata-merge:

	MERGE			MERGE
	-----			-----

	▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines
	Show only those functions whose max count values are greater or equal to ``n``.			Show only those functions whose max count values are greater or equal to ``n``.
	By default, the value-cutoff is set to max of unsigned long long.			By default, the value-cutoff is set to max of unsigned long long.

	.. option:: --cs			.. option:: --cs

	Only show overlap for the context sensitive profile counts. The default is to show			Only show overlap for the context sensitive profile counts. The default is to show
	non-context sensitive profile counts.			non-context sensitive profile counts.

				.. program:: llvm-profdata order

				.. _profdata-order:

				ORDER
				-------

				SYNOPSIS
				^^^^^^^^

				:program:`llvm-profdata order` [options] [filename]

				DESCRIPTION
				^^^^^^^^^^^

				:program:`llvm-profdata order` uses temporal profiling traces from a profile and
				finds a function order that reduces the number of page faults for those traces.
				This output can be directly passed to ``lld`` via ``--symbol-ordering-file=``
				for ELF or ``-order-file`` for Mach-O. If the traces found in the profile are
				representative of the real world, then this order should improve startup
				performance.

				OPTIONS
				^^^^^^^

				.. option:: --help

				Print a summary of command line options.

				.. option:: --output=<output>, -o

				Specify the output file name. If output is ``-`` or it isn't specified,
				then the output is sent to standard output.

	EXIT STATUS			EXIT STATUS
	-----------			-----------

	:program:`llvm-profdata` returns 1 if the command is omitted or is invalid,			:program:`llvm-profdata` returns 1 if the command is omitted or is invalid,
	if it cannot read input files, or if there is a mismatch between their data.			if it cannot read input files, or if there is a mismatch between their data.

llvm/include/llvm/ProfileData/InstrProf.h

Show All 17 Lines
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/BitmaskEnum.h"		#include "llvm/ADT/BitmaskEnum.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/ProfileSummary.h"		#include "llvm/IR/ProfileSummary.h"
#include "llvm/ProfileData/InstrProfData.inc"		#include "llvm/ProfileData/InstrProfData.inc"
		#include "llvm/Support/BalancedPartitioning.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MD5.h"		#include "llvm/Support/MD5.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	enum class instrprof_error {
empty_raw_profile,		empty_raw_profile,
zlib_unavailable,		zlib_unavailable,
raw_profile_version_mismatch		raw_profile_version_mismatch
};		};

/// An ordered list of functions identified by their NameRef found in		/// An ordered list of functions identified by their NameRef found in
/// INSTR_PROF_DATA		/// INSTR_PROF_DATA
struct TemporalProfTraceTy {		struct TemporalProfTraceTy {
uint64_t Weight = 1;
std::vector<uint64_t> FunctionNameRefs;		std::vector<uint64_t> FunctionNameRefs;
		uint64_t Weight;
		TemporalProfTraceTy(std::initializer_list<uint64_t> Trace = {},
		uint64_t Weight = 1)
		: FunctionNameRefs(Trace), Weight(Weight) {}

		/// Use a set of temporal profile traces to create a list of balanced
		/// partitioning function nodes used by BalancedPartitioning to generate a
		/// function order that reduces page faults during startup
		static std::vector<BPFunctionNode>
		createBPFunctionNodes(ArrayRef<TemporalProfTraceTy> Traces);
};		};

inline std::error_code make_error_code(instrprof_error E) {		inline std::error_code make_error_code(instrprof_error E) {
return std::error_code(static_cast<int>(E), instrprof_category());		return std::error_code(static_cast<int>(E), instrprof_category());
}		}

class InstrProfError : public ErrorInfo<InstrProfError> {		class InstrProfError : public ErrorInfo<InstrProfError> {
public:		public:
▲ Show 20 Lines • Show All 899 Lines • Show Last 20 Lines

llvm/include/llvm/Support/BalancedPartitioning.h

This file was added.

				//===- BalancedPartitioning.h ---------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements BalancedPartitioning, a recursive balanced graph
				// partitioning algorithm.
				//
				// The algorithm is used to find an ordering of FunctionNodes while optimizing
				// a specified objective. The algorithm uses recursive bisection; it starts
				// with a collection of unordered FunctionNodes and tries to split them into
				// two sets (buckets) of equal cardinality. Each bisection step is comprised of
				// iterations that greedily swap the FunctionNodes between the two buckets while
				// there is an improvement of the objective. Once the process converges, the
				// problem is divided into two sub-problems of half the size, which are
				// recursively applied for the two buckets. The final ordering of the
				// FunctionNodes is obtained by concatenating the two (recursively computed)
				// orderings.
				//
				// In order to speed up the computation, we limit the depth of the recursive
				// tree by a specified constant (SplitDepth) and apply at most a constant
				// number of greedy iterations per split (IterationsPerSplit). The worst-case
				// time complexity of the implementation is bounded by O(M*log^2 N), where
				// N is the number of FunctionNodes and M is the number of
				kyuleeUnsubmitted Not Done Reply Inline Actions Not sure where you got, but it seems slightly different than the one in the linked paper. kyulee: Not sure where you got, but it seems slightly different than the one in the linked paper.
				ellisAuthorUnsubmitted Done Reply Inline Actions Yes, the paper reports a runtime of O(m log n + n log2 n). However O(m log2 n) should be slightly worse and much easier to understand, so I think this is good enough 🙂 ellis: Yes, the paper reports a runtime of O(m log n + n log2 n). However O(m log2 n) should be…
				// FunctionNode-UtilityNode edges; (assuming that any collection of D
				// FunctionNodes contains O(D) UtilityNodes). Notice that the two different
				// recursive sub-problems are independent and thus can be efficiently processed
				// in parallel.
				//
				// Reference:
				// * Optimizing Function Layout for Mobile Applications,
				// https://arxiv.org/abs/2211.09285
				//
				kyuleeUnsubmitted Not Done Reply Inline Actions Not blocking, but I wonder if you want to add a link to LCTES 2023 when the proceeding is available soon? kyulee: Not blocking, but I wonder if you want to add a link to LCTES 2023 when the proceeding is…
				ellisAuthorUnsubmitted Done Reply Inline Actions Since the paper we submitted to LCTES 2023 is slightly shorter I think it makes sense to keep this arxiv link which contains our whole paper. ellis: Since the paper we submitted to LCTES 2023 is slightly shorter I think it makes sense to keep…
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_SUPPORT_BALANCED_PARTITIONING_H
				#define LLVM_SUPPORT_BALANCED_PARTITIONING_H

				#include "raw_ostream.h"
				#include "llvm/Support/ThreadPool.h"
				snehasishUnsubmitted Not Done Reply Inline Actions [optional] It seems a bit odd to have an include from llvm/ProfileData -> llvm/Support. Can we forward decl TemporalTraceTy and keep this Support header independent? I think that's the only thing we need from InstrProf in this header but it might require changing the fromTemporalProfTraces API a little bit. Just a flyby comment, I'm not planning on reviewing the patch in detail since others already spent time looking into it. snehasish: [optional] It seems a bit odd to have an include from llvm/ProfileData -> llvm/Support. Can we…
				ellisAuthorUnsubmitted Done Reply Inline Actions Thanks for the feedback! I've moved `BPFunctionNode::fromTemporalProfTraces()` -> `TemporalProfTraceTy::createBPFunctionNodes()` and I think that fits better since the trace data is now responsible for knowing how to construct utility nodes. Now `Support/BalancedPartitioning` is independent of InstrProf, but `BalancedPartitioningTest.cpp` does now need to link `ProfileData`. I also changed the API to use `ArrayRef` since it doesn't need to modify `Traces`. ellis: Thanks for the feedback! I've moved `BPFunctionNode::fromTemporalProfTraces()` ->…

				#include <random>
				#include <vector>

				namespace llvm {

				/// A function with a set of utility nodes where it is beneficial to order two
				/// functions close together if they have similar utility nodes
				class BPFunctionNode {
				friend class BalancedPartitioning;

				public:
				using IDT = uint64_t;
				using UtilityNodeT = uint32_t;

				/// \param UtilityNodes the set of utility nodes (must be unique'd)
				BPFunctionNode(IDT Id, ArrayRef<UtilityNodeT> UtilityNodes)
				: Id(Id), UtilityNodes(UtilityNodes) {}

				/// The ID of this node
				IDT Id;

				void dump(raw_ostream &OS) const;

				protected:
				/// The list of utility nodes associated with this node
				SmallVector<UtilityNodeT, 4> UtilityNodes;
				/// The bucket assigned by balanced partitioning
				std::optional<unsigned> Bucket;
				/// The index of the input order of the FunctionNodes
				uint64_t InputOrderIndex = 0;

				friend class BPFunctionNodeTest_Basic_Test;
				friend class BalancedPartitioningTest_Basic_Test;
				friend class BalancedPartitioningTest_Large_Test;
				};

				/// Algorithm parameters; default values are tuned on real-world binaries
				struct BalancedPartitioningConfig {
				/// The depth of the recursive bisection
				unsigned SplitDepth = 18;
				/// The maximum number of bp iterations per split
				unsigned IterationsPerSplit = 40;
				/// The probability for a vertex to skip a move from its current bucket to
				/// another bucket; it often helps to escape from a local optima
				float SkipProbability = 0.1;
				spupyrevUnsubmitted Done Reply Inline Actions I thought we use `18` by default? spupyrev: I thought we use `18` by default?
				/// Recursive subtasks up to the given depth are added to the queue and
				/// distributed among threads by ThreadPool; all subsequent calls are executed
				/// on the same thread
				unsigned TaskSplitDepth = 9;
				};

				class BalancedPartitioning {
				public:
				BalancedPartitioning(const BalancedPartitioningConfig &Config);

				/// Run recursive graph partitioning that optimizes a given objective.
				void run(std::vector<BPFunctionNode> &Nodes) const;

				private:
				struct UtilitySignature;
				using SignaturesT = SmallVector<UtilitySignature, 4>;
				using FunctionNodeRange =
				iterator_range<std::vector<BPFunctionNode>::iterator>;

				/// A special ThreadPool that allows for spawning new tasks after blocking on
				/// wait(). BalancedPartitioning recursively spawns new threads inside other
				/// threads, so we need to track how many active threads that could spawn more
				/// threads.
				struct BPThreadPool {
				ThreadPool TheThreadPool;
				std::mutex mtx;
				std::condition_variable cv;
				/// The number of threads that could spawn more threads
				std::atomic<int> NumActiveThreads = 0;
				/// Only true when all threads are down spawning new threads
				bool IsFinishedSpawning = false;
				/// Asynchronous submission of the task to the pool
				template <typename Func> void async(Func &&F);
				/// Blocking wait for all threads to complete. Unlike ThreadPool, it is
				/// acceptable for other threads to add more tasks while blocking on this
				/// call.
				void wait();
				};

				/// Run a recursive bisection of a given list of FunctionNodes
				/// \param RecDepth the current depth of recursion
				/// \param RootBucket the initial bucket of the dataVertices
				/// \param Offset the assigned buckets are the range [Offset, Offset +
				/// Nodes.size()]
				void bisect(const FunctionNodeRange Nodes, unsigned RecDepth,
				unsigned RootBucket, unsigned Offset,
				std::optional<BPThreadPool> &TP) const;

				/// Run bisection iterations
				void runIterations(const FunctionNodeRange Nodes, unsigned RecDepth,
				unsigned LeftBucket, unsigned RightBucket,
				std::mt19937 &RNG) const;

				/// Run a bisection iteration to improve the optimization goal
				/// \returns the total number of moved FunctionNodes
				unsigned runIteration(const FunctionNodeRange Nodes, unsigned LeftBucket,
				unsigned RightBucket, SignaturesT &Signatures,
				std::mt19937 &RNG) const;

				/// Try to move \p N from one bucket to another
				/// \returns true iff \p N is moved
				bool moveFunctionNode(BPFunctionNode &N, unsigned LeftBucket,
				unsigned RightBucket, SignaturesT &Signatures,
				std::mt19937 &RNG) const;

				/// Split all the FunctionNodes into 2 buckets, StartBucket and StartBucket +
				/// 1 The method is used for an initial assignment before a bisection step
				void split(const FunctionNodeRange Nodes, unsigned StartBucket) const;

				/// The cost of the uniform log-gap cost, assuming a utility node has \p X
				/// FunctionNodes in the left bucket and \p Y FunctionNodes in the right one.
				float logCost(unsigned X, unsigned Y) const;

				float log2Cached(unsigned i) const;

				const BalancedPartitioningConfig &Config;

				/// Precomputed values of log2(x). Table size is small enough to fit in cache.
				static constexpr unsigned LOG_CACHE_SIZE = 16384;
				float Log2Cache[LOG_CACHE_SIZE];

				/// The signature of a particular utility node used for the bisection step,
				/// i.e., the number of \p FunctionNodes in each of the two buckets
				struct UtilitySignature {
				/// The number of \p FunctionNodes in the left bucket
				unsigned LeftCount = 0;
				/// The number of \p FunctionNodes in the right bucket
				unsigned RightCount = 0;
				/// The cached gain of moving a \p FunctionNode from the left bucket to the
				/// right bucket
				float CachedGainLR;
				/// The cached gain of moving a \p FunctionNode from the right bucket to the
				/// left bucket
				float CachedGainRL;
				/// Whether \p CachedGainLR and \p CachedGainRL are valid
				bool CachedGainIsValid = false;
				};

				protected:
				/// Compute the move gain for uniform log-gap cost
				static float moveGain(const BPFunctionNode &N, bool FromLeftToRight,
				const SignaturesT &Signatures);
				friend class BalancedPartitioningTest_MoveGain_Test;
				};

				} // end namespace llvm

				#endif // LLVM_SUPPORT_BALANCED_PARTITIONING_H

llvm/lib/ProfileData/InstrProf.cpp

//===- InstrProf.cpp - Instrumented profiling format support --------------===//		//===- InstrProf.cpp - Instrumented profiling format support --------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains support for clang's instrumentation based PGO and		// This file contains support for clang's instrumentation based PGO and
// coverage.		// coverage.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ProfileData/InstrProf.h"		#include "llvm/ProfileData/InstrProf.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Config/config.h"		#include "llvm/Config/config.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
▲ Show 20 Lines • Show All 774 Lines • ▼ Show 20 Lines	void InstrProfRecord::addValueData(uint32_t ValueKind, uint32_t Site,
std::vector<InstrProfValueSiteRecord> &ValueSites =		std::vector<InstrProfValueSiteRecord> &ValueSites =
getOrCreateValueSitesForKind(ValueKind);		getOrCreateValueSitesForKind(ValueKind);
if (N == 0)		if (N == 0)
ValueSites.emplace_back();		ValueSites.emplace_back();
else		else
ValueSites.emplace_back(VData, VData + N);		ValueSites.emplace_back(VData, VData + N);
}		}

		std::vector<BPFunctionNode> TemporalProfTraceTy::createBPFunctionNodes(
		ArrayRef<TemporalProfTraceTy> Traces) {
		using IDT = BPFunctionNode::IDT;
		using UtilityNodeT = BPFunctionNode::UtilityNodeT;
		// Collect all function IDs ordered by their smallest timestamp. This will be
		// used as the initial FunctionNode order.
		SetVector<IDT> FunctionIds;
		size_t LargestTraceSize = 0;
		for (auto &Trace : Traces)
		LargestTraceSize =
		std::max(LargestTraceSize, Trace.FunctionNameRefs.size());
		for (size_t Timestamp = 0; Timestamp < LargestTraceSize; Timestamp++)
		for (auto &Trace : Traces)
		if (Timestamp < Trace.FunctionNameRefs.size())
		FunctionIds.insert(Trace.FunctionNameRefs[Timestamp]);

		int N = std::ceil(std::log2(LargestTraceSize));

		// TODO: We need to use the Trace.Weight field to give more weight to more
		// important utilities
		DenseMap<IDT, SmallVector<UtilityNodeT, 4>> FuncGroups;
		for (size_t TraceIdx = 0; TraceIdx < Traces.size(); TraceIdx++) {
		auto &Trace = Traces[TraceIdx].FunctionNameRefs;
		for (size_t Timestamp = 0; Timestamp < Trace.size(); Timestamp++) {
		for (int I = std::floor(std::log2(Timestamp + 1)); I < N; I++) {
		auto &FunctionId = Trace[Timestamp];
		UtilityNodeT GroupId = TraceIdx * N + I;
		FuncGroups[FunctionId].push_back(GroupId);
		}
		}
		}

		std::vector<BPFunctionNode> Nodes;
		for (auto &Id : FunctionIds) {
		auto &UNs = FuncGroups[Id];
		llvm::sort(UNs);
		UNs.erase(std::unique(UNs.begin(), UNs.end()), UNs.end());
		Nodes.emplace_back(Id, UNs);
		}
		return Nodes;
		}

#define INSTR_PROF_COMMON_API_IMPL		#define INSTR_PROF_COMMON_API_IMPL
#include "llvm/ProfileData/InstrProfData.inc"		#include "llvm/ProfileData/InstrProfData.inc"

/*!		/*!
* ValueProfRecordClosure Interface implementation for InstrProfRecord		* ValueProfRecordClosure Interface implementation for InstrProfRecord
* class. These C wrappers are used as adaptors so that C++ code can be		* class. These C wrappers are used as adaptors so that C++ code can be
* invoked as callbacks.		* invoked as callbacks.
*/		*/
▲ Show 20 Lines • Show All 613 Lines • Show Last 20 Lines

llvm/lib/Support/BalancedPartitioning.cpp

This file was added.

//===- BalancedPartitioning.cpp -------------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file implements BalancedPartitioning, a recursive balanced graph

// partitioning algorithm.

//===----------------------------------------------------------------------===//

#include "llvm/Support/BalancedPartitioning.h"

#include "llvm/ADT/SetVector.h"

#include "llvm/Support/Debug.h"

#include "llvm/Support/Format.h"

#include "llvm/Support/FormatVariadic.h"

using namespace llvm;

#define DEBUG_TYPE "balanced-partitioning"

void BPFunctionNode::dump(raw_ostream &OS) const {

OS << formatv("{{ID={0} Utilities={{{1:$[,]}} Bucket={2}}", Id,

make_range(UtilityNodes.begin(), UtilityNodes.end()), Bucket);

}

template <typename Func>

void BalancedPartitioning::BPThreadPool::async(Func &&F) {

// This new thread could spawn more threads, so mark it as active

++NumActiveThreads;

TheThreadPool.async([=]() {

// Run the task

F();

// This thread will no longer spawn new threads, so mark it as inactive

if (--NumActiveThreads == 0) {

// There are no more active threads, so mark as finished and notify

{

std::unique_lock<std::mutex> lock(mtx);

assert(!IsFinishedSpawning);

IsFinishedSpawning = true;

}

cv.notify_one();

}

});

}

void BalancedPartitioning::BPThreadPool::wait() {

// TODO: We could remove the mutex and condition variable and use

// std::atomic::wait() instead, but that isn't available until C++20

{

std::unique_lock<std::mutex> lock(mtx);

cv.wait(lock, [&]() { return IsFinishedSpawning; });

assert(IsFinishedSpawning && NumActiveThreads == 0);

}

// Now we can call ThreadPool::wait() since all tasks have been submitted

TheThreadPool.wait();

}

BalancedPartitioning::BalancedPartitioning(

const BalancedPartitioningConfig &Config)

: Config(Config) {

// Pre-computing log2 values

Log2Cache[0] = 0.0;

for (unsigned I = 1; I < LOG_CACHE_SIZE; I++)

Log2Cache[I] = std::log2(I);

}

void BalancedPartitioning::run(std::vector<BPFunctionNode> &Nodes) const {

LLVM_DEBUG(

dbgs() << format(

"Partitioning %d nodes using depth %d and %d iterations per split\n",

Nodes.size(), Config.SplitDepth, Config.IterationsPerSplit));

std::optional<BPThreadPool> TP;

if (Config.TaskSplitDepth > 1)

TP.emplace();

// Record the input order

for (unsigned I = 0; I < Nodes.size(); I++)

Nodes[I].InputOrderIndex = I;

auto NodesRange = llvm::make_range(Nodes.begin(), Nodes.end());

auto BisectTask = [=, &TP]() {

bisect(NodesRange, /*RecDepth=*/0, /*RootBucket=*/1, /*Offset=*/0, TP);

};

spupyrevUnsubmitted

Not Done

Would be great to unify the implementation with the existing ThreadPool, but I don't have a suggestion on how exactly to implement that. I'm fine with keeping it as is, if there are no alternative suggestions.

spupyrev: Would be great to unify the implementation with the existing ThreadPool, but I don't have a…

if (TP) {

TP->async(std::move(BisectTask));

TP->wait();

} else {

BisectTask();

}

llvm::stable_sort(NodesRange, [](const auto &L, const auto &R) {

return L.Bucket < R.Bucket;

});

LLVM_DEBUG(dbgs() << "Balanced partitioning completed\n");

}

void BalancedPartitioning::bisect(const FunctionNodeRange Nodes,

unsigned RecDepth, unsigned RootBucket,

unsigned Offset,

std::optional<BPThreadPool> &TP) const {

unsigned NumNodes = std::distance(Nodes.begin(), Nodes.end());

if (NumNodes <= 1 || RecDepth >= Config.SplitDepth) {

// We've reach the lowest level of the recursion tree. Fall back to the

// original order and assign to buckets.

llvm::stable_sort(Nodes, [](const auto &L, const auto &R) {

return L.InputOrderIndex < R.InputOrderIndex;

});

for (auto &N : Nodes)

N.Bucket = Offset++;

return;

}

LLVM_DEBUG(dbgs() << format("Bisect with %d nodes and root bucket %d\n",

NumNodes, RootBucket));

std::mt19937 RNG(RootBucket);

unsigned LeftBucket = 2 * RootBucket;

unsigned RightBucket = 2 * RootBucket + 1;

// Split into two and assign to the left and right buckets

split(Nodes, LeftBucket);

runIterations(Nodes, RecDepth, LeftBucket, RightBucket, RNG);

// Split nodes wrt the resulting buckets

auto NodesMid =

llvm::partition(Nodes, [&](auto &N) { return N.Bucket == LeftBucket; });

unsigned MidOffset = Offset + std::distance(Nodes.begin(), NodesMid);

auto LeftNodes = llvm::make_range(Nodes.begin(), NodesMid);

auto RightNodes = llvm::make_range(NodesMid, Nodes.end());

auto LeftRecTask = [=, &TP]() {

bisect(LeftNodes, RecDepth + 1, LeftBucket, Offset, TP);

};

auto RightRecTask = [=, &TP]() {

bisect(RightNodes, RecDepth + 1, RightBucket, MidOffset, TP);

};

if (TP && RecDepth < Config.TaskSplitDepth && NumNodes >= 4) {

TP->async(std::move(LeftRecTask));

TP->async(std::move(RightRecTask));

} else {

spupyrevUnsubmitted

Done

unsigned NumNodes = std::distance(Nodes.begin(), Nodes.end());

- if (NumNodes < 1 || RecDepth >= Config.SplitDepth) {

+ if (NumNodes <= 1 || RecDepth >= Config.SplitDepth) {

// We've reach the lowest level of the recursion tree. Fall back to the

spupyrev:

LeftRecTask();

RightRecTask();

}

void BalancedPartitioning::runIterations(const FunctionNodeRange Nodes,

unsigned RecDepth, unsigned LeftBucket,

unsigned RightBucket,

std::mt19937 &RNG) const {

unsigned NumNodes = std::distance(Nodes.begin(), Nodes.end());

DenseMap<BPFunctionNode::UtilityNodeT, unsigned> UtilityNodeDegree;

for (auto &N : Nodes)

for (auto &UN : N.UtilityNodes)

++UtilityNodeDegree[UN];

// Remove utility nodes if they have just one edge or are connected to all

// functions

for (auto &N : Nodes)

llvm::erase_if(N.UtilityNodes, [&](auto &UN) {

return UtilityNodeDegree[UN] <= 1 || UtilityNodeDegree[UN] >= NumNodes;

});

// Renumber utility nodes so they can be used to index into Signatures

DenseMap<BPFunctionNode::UtilityNodeT, unsigned> UtilityNodeIndex;

for (auto &N : Nodes)

for (auto &UN : N.UtilityNodes)

if (!UtilityNodeIndex.count(UN))

UtilityNodeIndex[UN] = UtilityNodeIndex.size();

for (auto &N : Nodes)

for (auto &UN : N.UtilityNodes)

UN = UtilityNodeIndex[UN];

kyuleeUnsubmitted

Not Done

I'm not sure what this code does. I might miss something.

kyulee: I'm not sure what this code does. I might miss something.

ellisAuthorUnsubmitted

Done

The contents of BPFunctionNode::UtilityNodes could originally contain arbitrary values. To make matters worse, each bisect step could shuffle these nodes into different sections.

Here we are computing UtilityNodeIndex to be a map from UtilityNodeT to an int in range [0,N) where N is the number of unique utility nodes. Then we can update each utility node so they are in range [0,N). Then we can make SignaturesT a normal vector indexed by utility nodes for performance.

ellis: The contents of `BPFunctionNode::UtilityNodes` could originally contain arbitrary values. To…

// Initialize signatures

SignaturesT Signatures(/*Size=*/UtilityNodeIndex.size());

for (auto &N : Nodes) {

for (auto &UN : N.UtilityNodes) {

assert(UN < Signatures.size());

if (N.Bucket == LeftBucket) {

Signatures[UN].LeftCount++;

} else {

Signatures[UN].RightCount++;

}

for (unsigned I = 0; I < Config.IterationsPerSplit; I++) {

unsigned NumMovedNodes =

runIteration(Nodes, LeftBucket, RightBucket, Signatures, RNG);

if (NumMovedNodes == 0)

break;

}

unsigned BalancedPartitioning::runIteration(const FunctionNodeRange Nodes,

spupyrevUnsubmitted

Not Done

How much overhead does DenseMap have in comparison with std::vector? I assume one could first find the maximum utility index and then use a vector instead of a map. Of course, that should only be done if there is visible speedup.

spupyrev: How much overhead does `DenseMap` have in comparison with `std::vector`? I assume one could…

ellisAuthorUnsubmitted

Done

From looking at perf record it seems like very little time is spent with UtilityNodeDegree. And since I'm renumbering the utility nodes below, we can't index by utilities at this point, so I think it's best to leave it as it is.

ellis: From looking at `perf record` it seems like very little time is spent with `UtilityNodeDegree`.

unsigned LeftBucket,

davidxlUnsubmitted

Done

why 4x larger? Is the number of utility nodes the same as 'the number of traces * number of cutoffs" ?

davidxl: why 4x larger? Is the number of utility nodes the same as 'the number of traces * number of…

ellisAuthorUnsubmitted

Done

The Cutoffs variable is only used with bp when ordering functions from traces. We can also use bp to order functions to improve compression by placing similar function close together. For that case we have another way to assign utility nodes to function nodes, so the number of signatures will be different.

ellis: The `Cutoffs` variable is only used with bp when ordering functions from traces. We can also…

unsigned RightBucket,

SignaturesT &Signatures,

std::mt19937 &RNG) const {

// Init signature cost caches

for (auto &Signature : Signatures) {

if (Signature.CachedGainIsValid)

continue;

unsigned L = Signature.LeftCount;

unsigned R = Signature.RightCount;

assert((L > 0 || R > 0) && "incorrect signature");

float Cost = logCost(L, R);

spupyrevUnsubmitted

Not Done

maybe add UtilityNodeIndex.reserve(NumUtilities)
(and use the variable for signature initialization too)

spupyrev: maybe add `UtilityNodeIndex.reserve(NumUtilities)` (and use the variable for signature…

ellisAuthorUnsubmitted

Done

We don't actually know how many utilities there are at this point because there could be duplicates.

ellis: We don't actually know how many utilities there are at this point because there could be…

Signature.CachedGainLR = 0;

Signature.CachedGainRL = 0;

spupyrevUnsubmitted

Done

Do we want to get rid of the map? (and the weird 4x initialization)

spupyrev: Do we want to get rid of the map? (and the weird 4x initialization)

ellisAuthorUnsubmitted

Done

I've removed the map because I found that it is more efficient to renumber these utility nodes so these signatures can be an array indexed by the utility node.

ellis: I've removed the map because I found that it is more efficient to renumber these utility nodes…

if (L > 0)

Signature.CachedGainLR = Cost - logCost(L - 1, R + 1);

if (R > 0)

Signature.CachedGainRL = Cost - logCost(L + 1, R - 1);

Signature.CachedGainIsValid = true;

}

// Compute move gains

typedef std::pair<float, BPFunctionNode *> GainPair;

std::vector<GainPair> Gains;

for (auto &N : Nodes) {

bool FromLeftToRight = (N.Bucket == LeftBucket);

float Gain = moveGain(N, FromLeftToRight, Signatures);

Gains.push_back(std::make_pair(Gain, &N));

}

// Collect left and right gains

auto LeftEnd = llvm::partition(

Gains, [&](const auto &GP) { return GP.second->Bucket == LeftBucket; });

auto LeftRange = llvm::make_range(Gains.begin(), LeftEnd);

auto RightRange = llvm::make_range(LeftEnd, Gains.end());

// Sort gains in descending order

auto LargerGain = [](const auto &L, const auto &R) {

return L.first > R.first;

};

llvm::stable_sort(LeftRange, LargerGain);

llvm::stable_sort(RightRange, LargerGain);

unsigned NumMovedDataVertices = 0;

for (auto [LeftPair, RightPair] : llvm::zip(LeftRange, RightRange)) {

auto &[LeftGain, LeftNode] = LeftPair;

auto &[RightGain, RightNode] = RightPair;

// Stop when the gain is no longer beneficial

if (LeftGain + RightGain <= 0.0)

break;

// Try to exchange the nodes between buckets

if (moveFunctionNode(*LeftNode, LeftBucket, RightBucket, Signatures, RNG))

++NumMovedDataVertices;

if (moveFunctionNode(*RightNode, LeftBucket, RightBucket, Signatures, RNG))

++NumMovedDataVertices;

}

return NumMovedDataVertices;

}

bool BalancedPartitioning::moveFunctionNode(BPFunctionNode &N,

unsigned LeftBucket,

unsigned RightBucket,

SignaturesT &Signatures,

std::mt19937 &RNG) const {

spupyrevUnsubmitted

Done

// = U * log(U) - (x * log(x+1) + y * log(y+1))

- float cost = logCost(L, R);

- float CostLR = 0, CostRL = 0;

+ float Cost = logCost(L, R);

+ Signature.CachedCostLR = 0;

+ Signature.CachedCostRL = 0;

if (L > 0)

- CostLR = cost - logCost(L - 1, R + 1);

+ Signature.CachedCostLR = Cost - logCost(L - 1, R + 1);

if (R > 0)

- CostRL = cost - logCost(L + 1, R - 1);

- Signature.CachedCostLR = CostLR;

- Signature.CachedCostRL = CostRL;

- Signature.CachedCostIsValid = true;

+ Signature.CachedCostRL = Cost - logCost(L + 1, R - 1); Signature.CachedCostIsValid = true;

spupyrev:

// Sometimes we skip the move. This helps to escape local optima

if (std::uniform_real_distribution<float>(0.0, 1.0)(RNG) <=

Config.SkipProbability)

return false;

bool FromLeftToRight = (N.Bucket == LeftBucket);

// Update the current bucket

N.Bucket = (FromLeftToRight ? RightBucket : LeftBucket);

// Update signatures and invalidate gain cache

if (FromLeftToRight) {

for (auto &UN : N.UtilityNodes) {

auto &Signature = Signatures[UN];

Signature.LeftCount--;

Signature.RightCount++;

Signature.CachedGainIsValid = false;

}

} else {

for (auto &UN : N.UtilityNodes) {

auto &Signature = Signatures[UN];

Signature.LeftCount++;

Signature.RightCount--;

Signature.CachedGainIsValid = false;

}

return true;

}

void BalancedPartitioning::split(const FunctionNodeRange Nodes,

unsigned StartBucket) const {

unsigned NumNodes = std::distance(Nodes.begin(), Nodes.end());

auto NodesMid = Nodes.begin() + (NumNodes + 1) / 2;

std::nth_element(Nodes.begin(), NodesMid, Nodes.end(), [](auto &L, auto &R) {

return L.InputOrderIndex < R.InputOrderIndex;

});

for (auto &N : llvm::make_range(Nodes.begin(), NodesMid))

N.Bucket = StartBucket;

for (auto &N : llvm::make_range(NodesMid, Nodes.end()))

N.Bucket = StartBucket + 1;

}

float BalancedPartitioning::moveGain(const BPFunctionNode &N,

bool FromLeftToRight,

const SignaturesT &Signatures) {

float Gain = 0;

for (auto &UN : N.UtilityNodes)

Gain += (FromLeftToRight ? Signatures[UN].CachedGainLR

: Signatures[UN].CachedGainRL);

return Gain;

}

kyuleeUnsubmitted

Not Done

Given logCost is a negative value which we want to minimize for an objective, my read on CachedCostLR is actually a cost saving (or gain) when N is moved from Left to Right. So, wouldn't it make sense naming it CachedCostSavingLR or CachedGainLR so that accumulating them to Gain sounds natural?

kyulee: Given `logCost` is a negative value which we want to minimize for an objective, my read on…

ellisAuthorUnsubmitted

Done

I've renamed it to CachedGainLR which makes sense because we want to maximize it. Thanks for the suggestion!

ellis: I've renamed it to `CachedGainLR` which makes sense because we want to maximize it. Thanks for…

float BalancedPartitioning::logCost(unsigned X, unsigned Y) const {

return -(X * log2Cached(X + 1) + Y * log2Cached(Y + 1));

}

float BalancedPartitioning::log2Cached(unsigned i) const {

return (i < LOG_CACHE_SIZE) ? Log2Cache[i] : std::log2(i);

}

spupyrevUnsubmitted

Done

I think we've got rid of this method

spupyrev: I think we've got rid of this method

ellisAuthorUnsubmitted

Done

I've deleted computeGoal().

ellis: I've deleted `computeGoal()`.

spupyrevUnsubmitted

Done

should it be Gain + Signatures[UN].CachedCostLR instead?

Assuming I'm correct and this is a bug, can we add a test to catch the problem?

spupyrev: should it be `Gain + Signatures[UN].CachedCostLR` instead? Assuming I'm correct and this is a…

ellisAuthorUnsubmitted

Done

Thanks for the catch! I've added a simple unittest which would have caught this. I guess this explains why I saw a perf improvement. We are still spending the most time (~30%) in moveGain(), but I've switched back to a simpler implementation because I'm not seeing any gains from using std::accumulate().

ellis: Thanks for the catch! I've added a simple unittest which would have caught this. I guess this…

spupyrevUnsubmitted

Done

same here

spupyrev: same here

llvm/lib/Support/CMakeLists.txt

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMSupport
APInt.cpp		APInt.cpp
APSInt.cpp		APSInt.cpp
ARMBuildAttrs.cpp		ARMBuildAttrs.cpp
ARMAttributeParser.cpp		ARMAttributeParser.cpp
ARMWinEH.cpp		ARMWinEH.cpp
Allocator.cpp		Allocator.cpp
AutoConvert.cpp		AutoConvert.cpp
Base64.cpp		Base64.cpp
		BalancedPartitioning.cpp
BinaryStreamError.cpp		BinaryStreamError.cpp
BinaryStreamReader.cpp		BinaryStreamReader.cpp
BinaryStreamRef.cpp		BinaryStreamRef.cpp
BinaryStreamWriter.cpp		BinaryStreamWriter.cpp
BlockFrequency.cpp		BlockFrequency.cpp
BranchProbability.cpp		BranchProbability.cpp
BuryPointer.cpp		BuryPointer.cpp
CachePruning.cpp		CachePruning.cpp
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profdata/show-order.proftext

This file was added.

				# RUN: llvm-profdata order %s \| FileCheck %s

				# CHECK: a
				# CHECK: b
				# CHECK: c
				# CHECK: x

				# Header
				:ir
				:temporal_prof_traces
				# Num Traces
				3
				# Trace Stream Size:
				3
				# Weight
				1
				a, main.c:b, c
				# Weight
				1
				a, x, main.c:b, c
				# Weight
				1
				a, main.c:b, c

				a
				# Func Hash:
				0x1234
				# Num Counters:
				1
				# Counter Values:
				101

				main.c:b
				0x5678
				1
				202

				c
				0xabcd
				1
				303

				x
				0xefff
				1
				404

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show All 17 Lines
#include "llvm/ProfileData/InstrProfCorrelator.h"		#include "llvm/ProfileData/InstrProfCorrelator.h"
#include "llvm/ProfileData/InstrProfReader.h"		#include "llvm/ProfileData/InstrProfReader.h"
#include "llvm/ProfileData/InstrProfWriter.h"		#include "llvm/ProfileData/InstrProfWriter.h"
#include "llvm/ProfileData/MemProf.h"		#include "llvm/ProfileData/MemProf.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
#include "llvm/ProfileData/RawMemProfReader.h"		#include "llvm/ProfileData/RawMemProfReader.h"
#include "llvm/ProfileData/SampleProfReader.h"		#include "llvm/ProfileData/SampleProfReader.h"
#include "llvm/ProfileData/SampleProfWriter.h"		#include "llvm/ProfileData/SampleProfWriter.h"
		#include "llvm/Support/BalancedPartitioning.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Discriminator.h"		#include "llvm/Support/Discriminator.h"
#include "llvm/Support/Errc.h"		#include "llvm/Support/Errc.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/FormattedStream.h"		#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/InitLLVM.h"		#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/LLVMDriver.h"		#include "llvm/Support/LLVMDriver.h"
▲ Show 20 Lines • Show All 2,999 Lines • ▼ Show 20 Lines	static int show_main(int argc, const char *argv[]) {
if (ProfileKind == sample)		if (ProfileKind == sample)
return showSampleProfile(Filename, ShowCounts, TopNFunctions,		return showSampleProfile(Filename, ShowCounts, TopNFunctions,
ShowAllFunctions, ShowDetailedSummary,		ShowAllFunctions, ShowDetailedSummary,
ShowFunction, ShowProfileSymbolList,		ShowFunction, ShowProfileSymbolList,
ShowSectionInfoOnly, ShowHotFuncList, SFormat, OS);		ShowSectionInfoOnly, ShowHotFuncList, SFormat, OS);
return showMemProfProfile(Filename, ProfiledBinary, SFormat, OS);		return showMemProfProfile(Filename, ProfiledBinary, SFormat, OS);
}		}

		static int order_main(int argc, const char *argv[]) {
		davidxlUnsubmitted Done Reply Inline Actions Need to document it in commandline guide. davidxl: Need to document it in commandline guide.
		cl::opt<std::string> Filename(cl::Positional, cl::desc("<profdata-file>"));
		cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
		cl::init("-"), cl::desc("Output file"));
		cl::alias OutputFilenameA("o", cl::desc("Alias for --output"),
		cl::aliasopt(OutputFilename));
		cl::ParseCommandLineOptions(argc, argv, "LLVM profile data order\n");

		std::error_code EC;
		davidxlUnsubmitted Done Reply Inline Actions How are the default values selected? davidxl: How are the default values selected?
		ellisAuthorUnsubmitted Done Reply Inline Actions These values were also empirically found by optimizing a large binary. I was debating leaving out a default value since these values are pretty specific to the binary we tested. Now that I think about it, we can try to derive a similar list by looking at the total number of functions and assign cutoffs linearly. ellis: These values were also empirically found by optimizing a large binary. I was debating leaving…
		davidxlUnsubmitted Done Reply Inline Actions or add some empirical guidance in the description. davidxl: or add some empirical guidance in the description.
		spupyrevUnsubmitted Done Reply Inline Actions The assumption here is that most of the traces are of size ~32K. Perhaps we could also add a comment and a clarification to re-consider the values, if the traces are of significantly different sizes? spupyrev: The assumption here is that most of the traces are of size ~32K. Perhaps we could also add a…
		raw_fd_ostream OS(OutputFilename.data(), EC, sys::fs::OF_TextWithCRLF);
		if (EC)
		exitWithErrorCode(EC, OutputFilename);
		auto FS = vfs::getRealFileSystem();
		auto ReaderOrErr = InstrProfReader::create(Filename, *FS);
		if (Error E = ReaderOrErr.takeError())
		exitWithError(std::move(E), Filename);

		auto Reader = std::move(ReaderOrErr.get());
		for (auto &I : *Reader) {
		// Read all entries
		(void)I;
		}
		auto &Traces = Reader->getTemporalProfTraces();
		auto Nodes = TemporalProfTraceTy::createBPFunctionNodes(Traces);
		BalancedPartitioningConfig Config;
		BalancedPartitioning BP(Config);
		BP.run(Nodes);

		WithColor::note() << "# Ordered " << Nodes.size() << " functions\n";
		for (auto &N : Nodes) {
		auto FuncName = Reader->getSymtab().getFuncName(N.Id);
		if (FuncName.contains(':')) {
		// GlobalValue::getGlobalIdentifier() prefixes the filename if the symbol
		// is local. This logic will break if there is a colon in the filename,
		// but we cannot use rsplit() because ObjC symbols can have colons.
		auto [Filename, ParsedFuncName] = FuncName.split(':');
		// Emit a comment describing where this symbol came from
		OS << "# " << Filename << "\n";
		FuncName = ParsedFuncName;
		}
		OS << FuncName << "\n";
		}
		return 0;
		}

int llvm_profdata_main(int argc, char **argvNonConst,		int llvm_profdata_main(int argc, char **argvNonConst,
const llvm::ToolContext &) {		const llvm::ToolContext &) {
const char argv = const_cast<const char >(argvNonConst);		const char argv = const_cast<const char >(argvNonConst);
InitLLVM X(argc, argv);		InitLLVM X(argc, argv);

StringRef ProgName(sys::path::filename(argv[0]));		StringRef ProgName(sys::path::filename(argv[0]));
if (argc > 1) {		if (argc > 1) {
int (func)(int, const char []) = nullptr;		int (func)(int, const char []) = nullptr;

if (strcmp(argv[1], "merge") == 0)		if (strcmp(argv[1], "merge") == 0)
func = merge_main;		func = merge_main;
else if (strcmp(argv[1], "show") == 0)		else if (strcmp(argv[1], "show") == 0)
func = show_main;		func = show_main;
else if (strcmp(argv[1], "overlap") == 0)		else if (strcmp(argv[1], "overlap") == 0)
func = overlap_main;		func = overlap_main;
		else if (strcmp(argv[1], "order") == 0)
		func = order_main;

if (func) {		if (func) {
std::string Invocation(ProgName.str() + " " + argv[1]);		std::string Invocation(ProgName.str() + " " + argv[1]);
argv[1] = Invocation.c_str();		argv[1] = Invocation.c_str();
return func(argc - 1, argv + 1);		return func(argc - 1, argv + 1);
}		}

if (strcmp(argv[1], "-h") == 0 \|\| strcmp(argv[1], "-help") == 0 \|\|		if (strcmp(argv[1], "-h") == 0 \|\| strcmp(argv[1], "-help") == 0 \|\|
Show All 14 Lines	if (argc > 1) {
}		}
}		}

if (argc < 2)		if (argc < 2)
errs() << ProgName << ": No command specified!\n";		errs() << ProgName << ": No command specified!\n";
else		else
errs() << ProgName << ": Unknown command!\n";		errs() << ProgName << ": Unknown command!\n";

errs() << "USAGE: " << ProgName << " <merge\|show\|overlap> [args...]\n";		errs() << "USAGE: " << ProgName << " <merge\|show\|overlap\|order> [args...]\n";
return 1;		return 1;
}		}

llvm/unittests/Support/BalancedPartitioningTest.cpp

This file was added.

				//===- BalancedPartitioningTest.cpp - BalancedPartitioning tests ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Support/BalancedPartitioning.h"
				#include "llvm/ProfileData/InstrProf.h"
				#include "llvm/Testing/Support/SupportHelpers.h"
				#include "gmock/gmock.h"
				#include "gtest/gtest.h"

				using namespace llvm;
				using testing::Each;
				using testing::Field;
				using testing::Not;
				using testing::UnorderedElementsAre;
				using testing::UnorderedElementsAreArray;

				namespace llvm {

				void PrintTo(const BPFunctionNode &Node, std::ostream *OS) {
				raw_os_ostream ROS(*OS);
				Node.dump(ROS);
				}

				TEST(BPFunctionNodeTest, Basic) {
				auto Nodes = TemporalProfTraceTy::createBPFunctionNodes({
				TemporalProfTraceTy({0, 1, 2, 3, 4}),
				TemporalProfTraceTy({4, 2}),
				});

				auto NodeIs = [](BPFunctionNode::IDT Id,
				ArrayRef<BPFunctionNode::UtilityNodeT> UNs) {
				return AllOf(Field("Id", &BPFunctionNode::Id, Id),
				Field("UtilityNodes", &BPFunctionNode::UtilityNodes,
				UnorderedElementsAreArray(UNs)));
				};

				EXPECT_THAT(Nodes,
				UnorderedElementsAre(NodeIs(0, {0, 1, 2}), NodeIs(1, {1, 2}),
				NodeIs(2, {1, 2, 4, 5}), NodeIs(3, {2}),
				NodeIs(4, {2, 3, 4, 5})));
				}

				class BalancedPartitioningTest : public ::testing::Test {
				protected:
				BalancedPartitioningConfig Config;
				BalancedPartitioning Bp;
				BalancedPartitioningTest() : Bp(Config) {}

				static std::vector<BPFunctionNode::IDT>
				getIds(std::vector<BPFunctionNode> Nodes) {
				std::vector<BPFunctionNode::IDT> Ids;
				for (auto &N : Nodes)
				Ids.push_back(N.Id);
				return Ids;
				}
				};

				TEST_F(BalancedPartitioningTest, Basic) {
				std::vector<BPFunctionNode> Nodes = {
				BPFunctionNode(0, {1, 2}), BPFunctionNode(2, {3, 4}),
				BPFunctionNode(1, {1, 2}), BPFunctionNode(3, {3, 4}),
				BPFunctionNode(4, {4}),
				};

				Bp.run(Nodes);

				auto NodeIs = [](BPFunctionNode::IDT Id, std::optional<uint32_t> Bucket) {
				return AllOf(Field("Id", &BPFunctionNode::Id, Id),
				Field("Bucket", &BPFunctionNode::Bucket, Bucket));
				};

				EXPECT_THAT(Nodes,
				UnorderedElementsAre(NodeIs(0, 0), NodeIs(1, 1), NodeIs(2, 2),
				NodeIs(3, 3), NodeIs(4, 4)));
				}

				TEST_F(BalancedPartitioningTest, Large) {
				const int ProblemSize = 1000;
				std::vector<BPFunctionNode::UtilityNodeT> AllUNs;
				for (int i = 0; i < ProblemSize; i++)
				AllUNs.emplace_back(i);

				std::mt19937 RNG;
				std::vector<BPFunctionNode> Nodes;
				for (int i = 0; i < ProblemSize; i++) {
				std::vector<BPFunctionNode::UtilityNodeT> UNs;
				int SampleSize =
				std::uniform_int_distribution<int>(0, AllUNs.size() - 1)(RNG);
				std::sample(AllUNs.begin(), AllUNs.end(), std::back_inserter(UNs),
				SampleSize, RNG);
				Nodes.emplace_back(i, UNs);
				}

				auto OrigIds = getIds(Nodes);

				Bp.run(Nodes);

				EXPECT_THAT(
				Nodes, Each(Not(Field("Bucket", &BPFunctionNode::Bucket, std::nullopt))));
				EXPECT_THAT(getIds(Nodes), UnorderedElementsAreArray(OrigIds));
				}

				TEST_F(BalancedPartitioningTest, MoveGain) {
				BalancedPartitioning::SignaturesT Signatures = {
				{10, 10, 10.f, 0.f, true}, // 0
				{10, 10, 0.f, 10.f, true}, // 1
				{10, 10, 0.f, 20.f, true}, // 2
				};
				EXPECT_FLOAT_EQ(Bp.moveGain(BPFunctionNode(0, {}), true, Signatures), 0.f);
				EXPECT_FLOAT_EQ(Bp.moveGain(BPFunctionNode(0, {0, 1}), true, Signatures),
				10.f);
				EXPECT_FLOAT_EQ(Bp.moveGain(BPFunctionNode(0, {1, 2}), false, Signatures),
				30.f);
				}

				} // end namespace llvm

llvm/unittests/Support/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
				ProfileData
				thakisUnsubmitted Not Done Reply Inline Actions This increases the number of files needed to compile SupportTests by 200% (from ~200 to ~600). Is there a way to prevent this dep with some mocking or similar? thakis: This increases the number of files needed to compile SupportTests by 200% (from ~200 to ~600).
				ellisAuthorUnsubmitted Done Reply Inline Actions Thanks for flagging! In https://reviews.llvm.org/D152325 I've moved one of theses tests to `unittests/ProfileData` so we can remove the dependency. ellis: Thanks for flagging! In https://reviews.llvm.org/D152325 I've moved one of theses tests to…
	Support			Support
	TargetParser			TargetParser
	)			)

	add_llvm_unittest(SupportTests			add_llvm_unittest(SupportTests
	AddressRangeTest.cpp			AddressRangeTest.cpp
	AlignmentTest.cpp			AlignmentTest.cpp
	AlignOfTest.cpp			AlignOfTest.cpp
	AllocatorTest.cpp			AllocatorTest.cpp
	ARMAttributeParser.cpp			ARMAttributeParser.cpp
	ArrayRecyclerTest.cpp			ArrayRecyclerTest.cpp
	Base64Test.cpp			Base64Test.cpp
	BinaryStreamTest.cpp			BinaryStreamTest.cpp
	BLAKE3Test.cpp			BLAKE3Test.cpp
	BlockFrequencyTest.cpp			BlockFrequencyTest.cpp
				BalancedPartitioningTest.cpp
	BranchProbabilityTest.cpp			BranchProbabilityTest.cpp
	CachePruningTest.cpp			CachePruningTest.cpp
	CrashRecoveryTest.cpp			CrashRecoveryTest.cpp
	Casting.cpp			Casting.cpp
	CheckedArithmeticTest.cpp			CheckedArithmeticTest.cpp
	Chrono.cpp			Chrono.cpp
	CommandLineTest.cpp			CommandLineTest.cpp
	CompressionTest.cpp			CompressionTest.cpp
	▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstrProf] Use BalancedPartitioning to order temporal profiling trace dataClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528975

llvm/docs/CommandGuide/llvm-profdata.rst

llvm/include/llvm/ProfileData/InstrProf.h

llvm/include/llvm/Support/BalancedPartitioning.h

llvm/lib/ProfileData/InstrProf.cpp

llvm/lib/Support/BalancedPartitioning.cpp

llvm/lib/Support/CMakeLists.txt

llvm/test/tools/llvm-profdata/show-order.proftext

llvm/tools/llvm-profdata/llvm-profdata.cpp

llvm/unittests/Support/BalancedPartitioningTest.cpp

llvm/unittests/Support/CMakeLists.txt

[InstrProf] Use BalancedPartitioning to order temporal profiling trace data
ClosedPublic