This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
pseudo-probe-emit.c
-
llvm/
-
include/llvm/
-
llvm/
-
IR/
-
IntrinsicInst.h
-
Intrinsics.td
6/9
PseudoProbe.h
-
Passes/
1/2
StandardInstrumentations.h
-
Transforms/IPO/
-
IPO/
-
SampleProfileProbe.h
-
lib/
-
IR/
2/2
PseudoProbe.cpp
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
StandardInstrumentations.cpp
-
Transforms/IPO/
-
IPO/
2/3
SampleProfile.cpp
1/2
SampleProfileProbe.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
-
pseudo-probe-update.prof
-
pseudo-probe-emit-inline.ll
-
pseudo-probe-emit.ll
-
pseudo-probe-profile.ll
-
pseudo-probe-update.ll
-
pseudo-probe-verify.ll

Differential D93264

[CSSPGO] Introducing distribution factor for pseudo probe.
ClosedPublic

Authored by hoy on Dec 14 2020, 6:39 PM.

Download Raw Diff

Details

Reviewers

wmi
davidxl
wenlei
wlei

Commits

rG3d89b3cbec23: [CSSPGO] Introducing distribution factor for pseudo probe.

Summary

Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count.

This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes.

A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead.

Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Dec 14 2020, 6:39 PM

Herald added subscribers: dexonsmith, wenlei, hiraditya. · View Herald TranscriptDec 14 2020, 6:39 PM

hoy requested review of this revision.Dec 14 2020, 6:39 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 14 2020, 6:39 PM

Herald added subscribers: llvm-commits, cfe-commits, jdoerfert. · View Herald Transcript

hoy added a parent revision: D92347: [CSSPGO] Consume pseudo-probe-based AutoFDO profile.Dec 14 2020, 6:40 PM

Harbormaster completed remote builds in B82373: Diff 311763.Dec 14 2020, 6:40 PM

hoy retitled this revision from [CSSPGO] Introducing distribution factor for pseudo probe to [CSSPGO] Introducing distribution factor for pseudo probe..Dec 14 2020, 6:46 PM

hoy edited the summary of this revision. (Show Details)

hoy added reviewers: wmi, davidxl, wenlei, wlei.Dec 15 2020, 9:47 AM

Rebasing.

Harbormaster completed remote builds in B82707: Diff 312339.Dec 16 2020, 5:59 PM

wmi added inline comments.Jan 12 2021, 11:07 AM

llvm/include/llvm/IR/PseudoProbe.h
41	The bits in discriminator is a scare resource. Have you considered using less bits to represent probe distribution factor? I guess it is possible that using a little more coarse grain distribution factor won't affect performance.
llvm/include/llvm/Passes/StandardInstrumentations.h
273	Before PseudoProbeUpdate pass, there is no need to verify because PseudoProbeUpdate will make distribution factor consistent. PseudoProbeUpdate run in a late stage in the lto/thinlto prelink pipeline, and no many passes need the verification, so what is the major usage of PseudoProbeVerifier?
llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
133–134	Why not issue warning/error message when verification fails? That will make enabling the verification in release compiler possible.

hoy added inline comments.Jan 12 2021, 12:25 PM

llvm/include/llvm/IR/PseudoProbe.h
41	That's a good point. We are using seven bits to represent [0, 100] so that integral numbers can be distinguished. Yes, we could use fewer bits to represent, say 4 bits to represent only even numbers. We could also not use any bits here but instead use the distribution factor of the outer block probes when the competition of those bits are high. I can do an experiment to see how well that works.
llvm/include/llvm/Passes/StandardInstrumentations.h
273	Yeah, there's no need to verify intermediate passes. The verifier pass is just a handy utility that tracks those passes that do code duplication for debugging. Perhaps I should give it a better name like PseudoCloningTracker?
llvm/lib/Transforms/IPO/SampleProfileProbe.cpp
133–134	The verifier is for debugging only. It doesn't really do any verification. It just helps to track code duplication. Sorry for the naming confusion.

hoy added inline comments.Jan 13 2021, 10:12 PM

llvm/include/llvm/IR/PseudoProbe.h
41	On a second thought, using the distribution factor of block probes for call probe may not work well since a callsite may be surrounded by more than one block probes. We could use also fewer bits like 6 bits to encode even numbers in the range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did a profile quality measurement with the even number encoding. It's OK overall except for two SPEC benchmarks. I guess it's a trade-off we'll have to take when there's a competition on those bits.

Adding support in the priority-based inliner.

Harbormaster completed remote builds in B85120: Diff 316567.Jan 13 2021, 10:14 PM

hoy edited the summary of this revision. (Show Details)Jan 13 2021, 10:16 PM

wmi added inline comments.Jan 13 2021, 10:40 PM

llvm/include/llvm/IR/PseudoProbe.h
41	Could you elaborate a little bit about the case that a callsite is surrounded by more than one block probe? Is it because bb merge like in cfg simplification?

hoy added inline comments.Jan 13 2021, 10:58 PM

llvm/include/llvm/IR/PseudoProbe.h
41	Yes, block merge in cfg simplification is a good example. Inlining can also end up with callee code and caller code in one block. Jump threading or other cfg optimizations that convert a conditional jump into an unconditional jump can result in block merge too. So far our way to track block weight for blocks with multiple probes is to take the maximum count out of those probes. When it comes to tracking callsite count, it is handy and accurate to attach a dedicated distribution factor for each individual call. For example, when a call is inlined, the inlinee's probes will be cloned into the caller, and they will be prorated based on the callsite's dedicated distribution factor.

wmi added inline comments.Jan 14 2021, 11:11 AM

llvm/include/llvm/IR/PseudoProbe.h
41	Actually, I think we may be able to extend Discriminator and PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to follow Dwarf standard about how many bits Discrminator is going to occupy. But inside compiler, Discriminator is represented as MetaData so it hasn't to be 32bits. For example, we can extend Discriminator MetaData to be 64bits or even larger and specify only lower 32bits will be actually emitted into Dwarf section. For intermediate information like distribution factors, we can put it into the higher bits.

hoy added inline comments.Jan 14 2021, 11:49 AM

llvm/include/llvm/IR/PseudoProbe.h
41	That's a good idea, I like that. Actually we thought about that int the past and our concern was about memory cost since the discriminator filed in `DILexicalBlockFile` metadata is not optional. It is probably OK for pseudo probe since discriminators are only used for callsites. It might be a problem with -fdebug-info-for-profiling where discriminators can be used more often. It sounds to me extending the size of discriminator is desirable for pseudo probes and potentially FS-AFDO. It might be worth evaluating the cost at some time. What do you think?

wmi added inline comments.Jan 14 2021, 10:58 PM

llvm/include/llvm/IR/PseudoProbe.h
41	Yes, it is worth evaluating the cost. It is only about intermediate data in compiler and it won't affect the binary and profile output, therefore it won't introduce backward compatibility issue. I think it is up to you to choose whether to evaluate it now or later.
llvm/lib/IR/PseudoProbe.cpp
65	Add assertion message.
llvm/lib/Transforms/IPO/SampleProfile.cpp
363	CallsiteCount will be the count before being prorated or after if CallsiteDistribution is not 1.0?

hoy added inline comments.Jan 14 2021, 11:15 PM

llvm/include/llvm/IR/PseudoProbe.h
41	Sounds good. Will do a measurement for both -fpseudo-probe-for-profiling and -fdebug-info-for-profiling later.
llvm/lib/IR/PseudoProbe.cpp
65	Will do.
llvm/lib/Transforms/IPO/SampleProfile.cpp
363	It is the count after prorated. The prorated count will be used to guide inlining. For example, if a callsite is duplicated in LTO prelink, then in LTO postlink the two copies will get their own distribution factors and their prorated counts are used to decide if they should be inlined independently.

LGTM. Thanks.

llvm/lib/Transforms/IPO/SampleProfile.cpp
363	Ok, better comment it.

This revision is now accepted and ready to land.Jan 15 2021, 9:53 AM

Addressing Wei's feedbacks.

Also prorating indirect callsite target count annotation.

Harbormaster completed remote builds in B85796: Diff 317726.Jan 19 2021, 4:53 PM

hoy added a child revision: D95056: [CSSPGO] LTO option for pseudo probe.Jan 20 2021, 9:21 AM

hoy removed a child revision: D95056: [CSSPGO] LTO option for pseudo probe.Jan 20 2021, 9:24 AM

Rebasing.

Harbormaster completed remote builds in B87545: Diff 320822.Feb 2 2021, 9:33 AM

Rebasing.

Harbormaster completed remote builds in B87553: Diff 320844.Feb 2 2021, 11:03 AM

This revision was landed with ongoing or failed builds.Feb 2 2021, 11:55 AM

Closed by commit rG3d89b3cbec23: [CSSPGO] Introducing distribution factor for pseudo probe. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG3d89b3cbec23: [CSSPGO] Introducing distribution factor for pseudo probe..

probinson mentioned this in D96354: Avoid conflicts between debug-info and pseudo-probe profiling.Feb 9 2021, 9:50 AM

probinson mentioned this in rG5ea2d4fa4811: Avoid conflicts between debug-info and pseudo-probe profiling.Feb 10 2021, 7:35 AM

Revision Contents

Path

Size

clang/

test/

CodeGen/

pseudo-probe-emit.c

8 lines

llvm/

include/

llvm/

IR/

IntrinsicInst.h

8 lines

Intrinsics.td

2 lines

PseudoProbe.h

27 lines

Passes/

StandardInstrumentations.h

2 lines

Transforms/

IPO/

SampleProfileProbe.h

41 lines

lib/

IR/

PseudoProbe.cpp

40 lines

Passes/

PassBuilder.cpp

6 lines

PassRegistry.def

1 line

StandardInstrumentations.cpp

1 line

Transforms/

IPO/

SampleProfile.cpp

62 lines

SampleProfileProbe.cpp

162 lines

test/

Transforms/

SampleProfile/

Inputs/

pseudo-probe-update.prof

8 lines

pseudo-probe-emit-inline.ll

20 lines

pseudo-probe-emit.ll

22 lines

pseudo-probe-profile.ll

42 lines

pseudo-probe-update.ll

45 lines

pseudo-probe-verify.ll

77 lines

Diff 316567

clang/test/CodeGen/pseudo-probe-emit.c

	// RUN: %clang -O2 -fexperimental-new-pass-manager -fpseudo-probe-for-profiling -g -emit-llvm -S -o - %s \| FileCheck %s			// RUN: %clang -O2 -fexperimental-new-pass-manager -fpseudo-probe-for-profiling -g -emit-llvm -S -o - %s \| FileCheck %s

	// Check the generation of pseudoprobe intrinsic call			// Check the generation of pseudoprobe intrinsic call

	void bar();			void bar();
	void go();			void go();

	void foo(int x) {			void foo(int x) {
	// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0)			// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1)
	if (x == 0)			if (x == 0)
	// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 2, i32 0)			// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 2, i32 0, i64 -1)
	bar();			bar();
	else			else
	// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 3, i32 0)			// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 3, i32 0, i64 -1)
	go();			go();
	// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0)			// CHECK: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0, i64 -1)
	}			}

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 975 Lines • ▼ Show 20 Lines	public:
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}

ConstantInt *getFuncGuid() const {		ConstantInt *getFuncGuid() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(0)));		return cast<ConstantInt>(const_cast<Value *>(getArgOperand(0)));
}		}

		ConstantInt *getIndex() const {
		return cast<ConstantInt>(const_cast<Value *>(getArgOperand(1)));
		}

ConstantInt *getAttributes() const {		ConstantInt *getAttributes() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(2)));		return cast<ConstantInt>(const_cast<Value *>(getArgOperand(2)));
}		}

ConstantInt *getIndex() const {		ConstantInt *getFactor() const {
return cast<ConstantInt>(const_cast<Value *>(getArgOperand(1)));		return cast<ConstantInt>(const_cast<Value *>(getArgOperand(3)));
}		}
};		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_INTRINSICINST_H		#endif // LLVM_IR_INTRINSICINST_H

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,282 Lines • ▼ Show 20 Lines
	// that they are not removed even if they turn out to be empty, for languages			// that they are not removed even if they turn out to be empty, for languages
	// which specify that infinite loops must be preserved.			// which specify that infinite loops must be preserved.
	def int_sideeffect : DefaultAttrsIntrinsic<[], [], [IntrInaccessibleMemOnly, IntrWillReturn]>;			def int_sideeffect : DefaultAttrsIntrinsic<[], [], [IntrInaccessibleMemOnly, IntrWillReturn]>;

	// The pseudoprobe intrinsic works as a place holder to the block it probes.			// The pseudoprobe intrinsic works as a place holder to the block it probes.
	// Like the sideeffect intrinsic defined above, this intrinsic is treated by the			// Like the sideeffect intrinsic defined above, this intrinsic is treated by the
	// optimizer as having opaque side effects so that it won't be get rid of or moved			// optimizer as having opaque side effects so that it won't be get rid of or moved
	// out of the block it probes.			// out of the block it probes.
	def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty],			def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],
	[IntrInaccessibleMemOnly, IntrWillReturn]>;			[IntrInaccessibleMemOnly, IntrWillReturn]>;

	// Intrinsics to support half precision floating point format			// Intrinsics to support half precision floating point format
	let IntrProperties = [IntrNoMem, IntrWillReturn] in {			let IntrProperties = [IntrNoMem, IntrWillReturn] in {
	def int_convert_to_fp16 : DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_anyfloat_ty]>;			def int_convert_to_fp16 : DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_anyfloat_ty]>;
	def int_convert_from_fp16 : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_i16_ty]>;			def int_convert_from_fp16 : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_i16_ty]>;
	}			}

	▲ Show 20 Lines • Show All 357 Lines • Show Last 20 Lines

llvm/include/llvm/IR/PseudoProbe.h

	Show All 10 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_IR_PSEUDOPROBE_H			#ifndef LLVM_IR_PSEUDOPROBE_H
	#define LLVM_IR_PSEUDOPROBE_H			#define LLVM_IR_PSEUDOPROBE_H

	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include <cassert>			#include <cassert>
	#include <cstdint>			#include <cstdint>
				#include <limits>

	namespace llvm {			namespace llvm {

	class Instruction;			class Instruction;
				class BasicBlock;

	constexpr const char *PseudoProbeDescMetadataName = "llvm.pseudo_probe_desc";			constexpr const char *PseudoProbeDescMetadataName = "llvm.pseudo_probe_desc";

	enum class PseudoProbeType { Block = 0, IndirectCall, DirectCall };			enum class PseudoProbeType { Block = 0, IndirectCall, DirectCall };

				// The saturated distrution factor representing 100% for block probes.
				constexpr static uint64_t PseudoProbeFullDistributionFactor =
				std::numeric_limits<uint64_t>::max();

	struct PseudoProbeDwarfDiscriminator {			struct PseudoProbeDwarfDiscriminator {
				public:
	// The following APIs encodes/decodes per-probe information to/from a			// The following APIs encodes/decodes per-probe information to/from a
	// 32-bit integer which is organized as:			// 32-bit integer which is organized as:
	// [2:0] - 0x7, this is reserved for regular discriminator,			// [2:0] - 0x7, this is reserved for regular discriminator,
	// see DWARF discriminator encoding rule			// see DWARF discriminator encoding rule
	// [18:3] - probe id			// [18:3] - probe id
	// [25:19] - reserved			// [25:19] - probe distribution factor
				wmiUnsubmitted Not Done Reply Inline Actions The bits in discriminator is a scare resource. Have you considered using less bits to represent probe distribution factor? I guess it is possible that using a little more coarse grain distribution factor won't affect performance. wmi: The bits in discriminator is a scare resource. Have you considered using less bits to represent…
				hoyAuthorUnsubmitted Done Reply Inline Actions That's a good point. We are using seven bits to represent [0, 100] so that integral numbers can be distinguished. Yes, we could use fewer bits to represent, say 4 bits to represent only even numbers. We could also not use any bits here but instead use the distribution factor of the outer block probes when the competition of those bits are high. I can do an experiment to see how well that works. hoy: That's a good point. We are using seven bits to represent [0, 100] so that integral numbers can…
				hoyAuthorUnsubmitted Done Reply Inline Actions On a second thought, using the distribution factor of block probes for call probe may not work well since a callsite may be surrounded by more than one block probes. We could use also fewer bits like 6 bits to encode even numbers in the range [0, 100], or 5 bits to encoding multiples of 3 in [0, 100]. I did a profile quality measurement with the even number encoding. It's OK overall except for two SPEC benchmarks. I guess it's a trade-off we'll have to take when there's a competition on those bits. hoy: On a second thought, using the distribution factor of block probes for call probe may not work…
				wmiUnsubmitted Not Done Reply Inline Actions Could you elaborate a little bit about the case that a callsite is surrounded by more than one block probe? Is it because bb merge like in cfg simplification? wmi: Could you elaborate a little bit about the case that a callsite is surrounded by more than one…
				hoyAuthorUnsubmitted Done Reply Inline Actions Yes, block merge in cfg simplification is a good example. Inlining can also end up with callee code and caller code in one block. Jump threading or other cfg optimizations that convert a conditional jump into an unconditional jump can result in block merge too. So far our way to track block weight for blocks with multiple probes is to take the maximum count out of those probes. When it comes to tracking callsite count, it is handy and accurate to attach a dedicated distribution factor for each individual call. For example, when a call is inlined, the inlinee's probes will be cloned into the caller, and they will be prorated based on the callsite's dedicated distribution factor. hoy: Yes, block merge in cfg simplification is a good example. Inlining can also end up with callee…
				wmiUnsubmitted Not Done Reply Inline Actions Actually, I think we may be able to extend Discriminator and PseudoProbeDwarfDiscriminator. To emit Discriminator into Dwarf, we need to follow Dwarf standard about how many bits Discrminator is going to occupy. But inside compiler, Discriminator is represented as MetaData so it hasn't to be 32bits. For example, we can extend Discriminator MetaData to be 64bits or even larger and specify only lower 32bits will be actually emitted into Dwarf section. For intermediate information like distribution factors, we can put it into the higher bits. wmi: Actually, I think we may be able to extend Discriminator and PseudoProbeDwarfDiscriminator. To…
				hoyAuthorUnsubmitted Done Reply Inline Actions That's a good idea, I like that. Actually we thought about that int the past and our concern was about memory cost since the discriminator filed in `DILexicalBlockFile` metadata is not optional. It is probably OK for pseudo probe since discriminators are only used for callsites. It might be a problem with -fdebug-info-for-profiling where discriminators can be used more often. It sounds to me extending the size of discriminator is desirable for pseudo probes and potentially FS-AFDO. It might be worth evaluating the cost at some time. What do you think? hoy: That's a good idea, I like that. Actually we thought about that int the past and our concern…
				wmiUnsubmitted Done Reply Inline Actions Yes, it is worth evaluating the cost. It is only about intermediate data in compiler and it won't affect the binary and profile output, therefore it won't introduce backward compatibility issue. I think it is up to you to choose whether to evaluate it now or later. wmi: Yes, it is worth evaluating the cost. It is only about intermediate data in compiler and it…
				hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. Will do a measurement for both -fpseudo-probe-for-profiling and -fdebug-info-for-profiling later. hoy: Sounds good. Will do a measurement for both -fpseudo-probe-for-profiling and -fdebug-info-for…
	// [28:26] - probe type, see PseudoProbeType			// [28:26] - probe type, see PseudoProbeType
	// [31:29] - reserved for probe attributes			// [31:29] - reserved for probe attributes
	static uint32_t packProbeData(uint32_t Index, uint32_t Type) {			static uint32_t packProbeData(uint32_t Index, uint32_t Type, uint32_t Flags,
				uint32_t Factor) {
	assert(Index <= 0xFFFF && "Probe index too big to encode, exceeding 2^16");			assert(Index <= 0xFFFF && "Probe index too big to encode, exceeding 2^16");
	assert(Type <= 0x7 && "Probe type too big to encode, exceeding 7");			assert(Type <= 0x7 && "Probe type too big to encode, exceeding 7");
	return (Index << 3) \| (Type << 26) \| 0x7;			assert(Flags <= 0x7);
				assert(Factor <= 100 &&
				"Probe distribution factor too big to encode, exceeding 100");
				return (Index << 3) \| (Factor << 19) \| (Type << 26) \| 0x7;
	}			}

	static uint32_t extractProbeIndex(uint32_t Value) {			static uint32_t extractProbeIndex(uint32_t Value) {
	return (Value >> 3) & 0xFFFF;			return (Value >> 3) & 0xFFFF;
	}			}

	static uint32_t extractProbeType(uint32_t Value) {			static uint32_t extractProbeType(uint32_t Value) {
	return (Value >> 26) & 0x7;			return (Value >> 26) & 0x7;
	}			}

	static uint32_t extractProbeAttributes(uint32_t Value) {			static uint32_t extractProbeAttributes(uint32_t Value) {
	return (Value >> 29) & 0x7;			return (Value >> 29) & 0x7;
	}			}

				static uint32_t extractProbeFactor(uint32_t Value) {
				return (Value >> 19) & 0x7F;
				}

				// The saturated distrution factor representing 100% for callsites.
				constexpr static uint8_t FullDistributionFactor = 100;
	};			};

	struct PseudoProbe {			struct PseudoProbe {
	uint32_t Id;			uint32_t Id;
	uint32_t Type;			uint32_t Type;
	uint32_t Attr;			uint32_t Attr;
				float Factor;
	};			};

	Optional<PseudoProbe> extractProbe(const Instruction &Inst);			Optional<PseudoProbe> extractProbe(const Instruction &Inst);

				void setProbeDistributionFactor(Instruction &Inst, float Factor);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_IR_PSEUDOPROBE_H			#endif // LLVM_IR_PSEUDOPROBE_H

llvm/include/llvm/Passes/StandardInstrumentations.h

	Show All 16 Lines

	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/IR/BasicBlock.h"			#include "llvm/IR/BasicBlock.h"
	#include "llvm/IR/OptBisect.h"			#include "llvm/IR/OptBisect.h"
	#include "llvm/IR/PassTimingInfo.h"			#include "llvm/IR/PassTimingInfo.h"
	#include "llvm/IR/ValueHandle.h"			#include "llvm/IR/ValueHandle.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
				#include "llvm/Transforms/IPO/SampleProfileProbe.h"

	#include <string>			#include <string>
	#include <utility>			#include <utility>

	namespace llvm {			namespace llvm {

	class Module;			class Module;
	class Function;			class Function;
	▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	class StandardInstrumentations {			class StandardInstrumentations {
	PrintIRInstrumentation PrintIR;			PrintIRInstrumentation PrintIR;
	PrintPassInstrumentation PrintPass;			PrintPassInstrumentation PrintPass;
	TimePassesHandler TimePasses;			TimePassesHandler TimePasses;
	OptNoneInstrumentation OptNone;			OptNoneInstrumentation OptNone;
	OptBisectInstrumentation OptBisect;			OptBisectInstrumentation OptBisect;
	PreservedCFGCheckerInstrumentation PreservedCFGChecker;			PreservedCFGCheckerInstrumentation PreservedCFGChecker;
	IRChangedPrinter PrintChangedIR;			IRChangedPrinter PrintChangedIR;
				PseudoProbeVerifier PseudoProbeVerification;
				wmiUnsubmitted Not Done Reply Inline Actions Before PseudoProbeUpdate pass, there is no need to verify because PseudoProbeUpdate will make distribution factor consistent. PseudoProbeUpdate run in a late stage in the lto/thinlto prelink pipeline, and no many passes need the verification, so what is the major usage of PseudoProbeVerifier? wmi: Before PseudoProbeUpdate pass, there is no need to verify because PseudoProbeUpdate will make…
				hoyAuthorUnsubmitted Done Reply Inline Actions Yeah, there's no need to verify intermediate passes. The verifier pass is just a handy utility that tracks those passes that do code duplication for debugging. Perhaps I should give it a better name like PseudoCloningTracker? hoy: Yeah, there's no need to verify intermediate passes. The verifier pass is just a handy utility…
	VerifyInstrumentation Verify;			VerifyInstrumentation Verify;

	bool VerifyEach;			bool VerifyEach;

	public:			public:
	StandardInstrumentations(bool DebugLogging, bool VerifyEach = false)			StandardInstrumentations(bool DebugLogging, bool VerifyEach = false)
	: PrintPass(DebugLogging), OptNone(DebugLogging), Verify(DebugLogging),			: PrintPass(DebugLogging), OptNone(DebugLogging), Verify(DebugLogging),
	VerifyEach(VerifyEach) {}			VerifyEach(VerifyEach) {}
	Show All 12 Lines

llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h

	Show All 10 Lines
	/// AutoFDO.			/// AutoFDO.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILEPROBE_H			#ifndef LLVM_TRANSFORMS_IPO_SAMPLEPROFILEPROBE_H
	#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILEPROBE_H			#define LLVM_TRANSFORMS_IPO_SAMPLEPROFILEPROBE_H

	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
				#include "llvm/Analysis/CallGraphSCCPass.h"
				#include "llvm/Analysis/LazyCallGraph.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/IR/PassInstrumentation.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/IR/PseudoProbe.h"			#include "llvm/IR/PseudoProbe.h"
	#include "llvm/ProfileData/SampleProf.h"			#include "llvm/ProfileData/SampleProf.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
	#include <unordered_map>			#include <unordered_map>

	namespace llvm {			namespace llvm {

	class Module;			class Module;

	using namespace sampleprof;			using namespace sampleprof;
	using BlockIdMap = std::unordered_map<BasicBlock *, uint32_t>;			using BlockIdMap = std::unordered_map<BasicBlock *, uint32_t>;
	using InstructionIdMap = std::unordered_map<Instruction *, uint32_t>;			using InstructionIdMap = std::unordered_map<Instruction *, uint32_t>;
				using ProbeFactorMap = std::unordered_map<uint64_t, float>;
				using FuncProbeFactorMap = StringMap<ProbeFactorMap>;

	enum class PseudoProbeReservedId { Invalid = 0, Last = Invalid };			enum class PseudoProbeReservedId { Invalid = 0, Last = Invalid };

	class PseudoProbeDescriptor {			class PseudoProbeDescriptor {
	uint64_t FunctionGUID;			uint64_t FunctionGUID;
	uint64_t FunctionHash;			uint64_t FunctionHash;

	public:			public:
	PseudoProbeDescriptor(uint64_t GUID, uint64_t Hash)			PseudoProbeDescriptor(uint64_t GUID, uint64_t Hash)
	: FunctionGUID(GUID), FunctionHash(Hash) {}			: FunctionGUID(GUID), FunctionHash(Hash) {}
	uint64_t getFunctionGUID() const { return FunctionGUID; }			uint64_t getFunctionGUID() const { return FunctionGUID; }
	uint64_t getFunctionHash() const { return FunctionHash; }			uint64_t getFunctionHash() const { return FunctionHash; }
	};			};

				// A pseudo probe verifier that can be run after each IR passes to detect the
				// violation of updating probe factors. In principle, the sum of distribution
				// factor for a probe should be identical before and after a pass. For a
				// function pass, the factor sum for a probe would be typically 100%.
				class PseudoProbeVerifier {
				public:
				void registerCallbacks(PassInstrumentationCallbacks &PIC);

				// Implementation of pass instrumentation callbacks for new pass manager.
				void runAfterPass(StringRef PassID, Any IR);

				private:
				// Allow a little bias due the rounding to integral factors.
				constexpr static float DistributionFactorVariance = 0.02;
				// Distribution factors from last pass.
				FuncProbeFactorMap FunctionProbeFactors;

				void collectProbeFactors(const BasicBlock *BB, ProbeFactorMap &ProbeFactors);
				void runAfterPass(const Module *M);
				void runAfterPass(const LazyCallGraph::SCC *C);
				void runAfterPass(const Function *F);
				void runAfterPass(const Loop *L);
				bool shouldVerifyFunction(const Function *F);
				void verifyProbeFactors(const Function *F,
				const ProbeFactorMap &ProbeFactors);
				};

	// This class serves sample counts correlation for SampleProfileLoader by			// This class serves sample counts correlation for SampleProfileLoader by
	// analyzing pseudo probes and their function descriptors injected by			// analyzing pseudo probes and their function descriptors injected by
	// SampleProfileProber.			// SampleProfileProber.
	class PseudoProbeManager {			class PseudoProbeManager {
	DenseMap<uint64_t, PseudoProbeDescriptor> GUIDToProbeDescMap;			DenseMap<uint64_t, PseudoProbeDescriptor> GUIDToProbeDescMap;

	const PseudoProbeDescriptor *getDesc(const Function &F) const;			const PseudoProbeDescriptor *getDesc(const Function &F) const;

	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	class SampleProfileProbePass : public PassInfoMixin<SampleProfileProbePass> {			class SampleProfileProbePass : public PassInfoMixin<SampleProfileProbePass> {
	TargetMachine *TM;			TargetMachine *TM;

	public:			public:
	SampleProfileProbePass(TargetMachine *TM) : TM(TM) {}			SampleProfileProbePass(TargetMachine *TM) : TM(TM) {}
	PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);			PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
	};			};

				class PseudoProbeUpdatePass : public PassInfoMixin<PseudoProbeUpdatePass> {
				void runOnFunction(Function &F, FunctionAnalysisManager &FAM);

				public:
				PseudoProbeUpdatePass() {}
				PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
				};

	} // end namespace llvm			} // end namespace llvm
	#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILEPROBE_H			#endif // LLVM_TRANSFORMS_IPO_SAMPLEPROFILEPROBE_H

llvm/lib/IR/PseudoProbe.cpp

Show All 29 Lines	if (const DebugLoc &DLoc = Inst.getDebugLoc()) {
if (DILocation::isPseudoProbeDiscriminator(Discriminator)) {		if (DILocation::isPseudoProbeDiscriminator(Discriminator)) {
PseudoProbe Probe;		PseudoProbe Probe;
Probe.Id =		Probe.Id =
PseudoProbeDwarfDiscriminator::extractProbeIndex(Discriminator);		PseudoProbeDwarfDiscriminator::extractProbeIndex(Discriminator);
Probe.Type =		Probe.Type =
PseudoProbeDwarfDiscriminator::extractProbeType(Discriminator);		PseudoProbeDwarfDiscriminator::extractProbeType(Discriminator);
Probe.Attr =		Probe.Attr =
PseudoProbeDwarfDiscriminator::extractProbeAttributes(Discriminator);		PseudoProbeDwarfDiscriminator::extractProbeAttributes(Discriminator);
		Probe.Factor =
		PseudoProbeDwarfDiscriminator::extractProbeFactor(Discriminator) /
		(float)PseudoProbeDwarfDiscriminator::FullDistributionFactor;
return Probe;		return Probe;
}		}
}		}
return None;		return None;
}		}

Optional<PseudoProbe> extractProbe(const Instruction &Inst) {		Optional<PseudoProbe> extractProbe(const Instruction &Inst) {
if (const auto *II = dyn_cast<PseudoProbeInst>(&Inst)) {		if (const auto *II = dyn_cast<PseudoProbeInst>(&Inst)) {
PseudoProbe Probe;		PseudoProbe Probe;
Probe.Id = II->getIndex()->getZExtValue();		Probe.Id = II->getIndex()->getZExtValue();
Probe.Type = (uint32_t)PseudoProbeType::Block;		Probe.Type = (uint32_t)PseudoProbeType::Block;
Probe.Attr = II->getAttributes()->getZExtValue();		Probe.Attr = II->getAttributes()->getZExtValue();
		Probe.Factor = II->getFactor()->getZExtValue() /
		(float)PseudoProbeFullDistributionFactor;
return Probe;		return Probe;
}		}

if (isa<CallBase>(&Inst) && !isa<IntrinsicInst>(&Inst))		if (isa<CallBase>(&Inst) && !isa<IntrinsicInst>(&Inst))
return extractProbeFromDiscriminator(Inst);		return extractProbeFromDiscriminator(Inst);

return None;		return None;
}		}

		void setProbeDistributionFactor(Instruction &Inst, float Factor) {
		assert(Factor <= 1);
		wmiUnsubmitted Done Reply Inline Actions Add assertion message. wmi: Add assertion message.
		hoyAuthorUnsubmitted Done Reply Inline Actions Will do. hoy: Will do.
		if (auto *II = dyn_cast<PseudoProbeInst>(&Inst)) {
		IRBuilder<> Builder(&Inst);
		uint64_t IntFactor = PseudoProbeFullDistributionFactor;
		if (Factor < 1)
		IntFactor *= Factor;
		auto OrigFactor = II->getFactor()->getZExtValue();
		if (IntFactor != OrigFactor)
		II->replaceUsesOfWith(II->getFactor(), Builder.getInt64(IntFactor));
		} else if (isa<CallBase>(&Inst) && !isa<IntrinsicInst>(&Inst)) {
		if (const DebugLoc &DLoc = Inst.getDebugLoc()) {
		const DILocation *DIL = DLoc;
		auto Discriminator = DIL->getDiscriminator();
		if (DILocation::isPseudoProbeDiscriminator(Discriminator)) {
		auto Index =
		PseudoProbeDwarfDiscriminator::extractProbeIndex(Discriminator);
		auto Type =
		PseudoProbeDwarfDiscriminator::extractProbeType(Discriminator);
		auto Attr = PseudoProbeDwarfDiscriminator::extractProbeAttributes(
		Discriminator);
		// Round small factors to 0 to avoid over-counting.
		uint32_t IntFactor =
		PseudoProbeDwarfDiscriminator::FullDistributionFactor;
		if (Factor < 1)
		IntFactor *= Factor;
		uint32_t V = PseudoProbeDwarfDiscriminator::packProbeData(
		Index, Type, Attr, IntFactor);
		DIL = DIL->cloneWithDiscriminator(V);
		Inst.setDebugLoc(DIL);
		}
		}
		}
		}
} // namespace llvm		} // namespace llvm

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 1,393 Lines • ▼ Show 20 Lines	if (PGOOpt && PGOOpt->DebugInfoForProfiling)
MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));		MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));

// Add the core simplification pipeline.		// Add the core simplification pipeline.
MPM.addPass(buildModuleSimplificationPipeline(Level, ThinLTOPhase::None));		MPM.addPass(buildModuleSimplificationPipeline(Level, ThinLTOPhase::None));

// Now add the optimization pipeline.		// Now add the optimization pipeline.
MPM.addPass(buildModuleOptimizationPipeline(Level, LTOPreLink));		MPM.addPass(buildModuleOptimizationPipeline(Level, LTOPreLink));

		if (PGOOpt && PGOOpt->PseudoProbeForProfiling)
		MPM.addPass(PseudoProbeUpdatePass());

// Emit annotation remarks.		// Emit annotation remarks.
addAnnotationRemarksPass(MPM);		addAnnotationRemarksPass(MPM);

if (LTOPreLink)		if (LTOPreLink)
addRequiredLTOPreLinkPasses(MPM);		addRequiredLTOPreLinkPasses(MPM);

return MPM;		return MPM;
}		}
Show All 37 Lines	PassBuilder::buildThinLTOPreLinkDefaultPipeline(OptimizationLevel Level) {
MPM.addPass(GlobalOptPass());		MPM.addPass(GlobalOptPass());

// Module simplification splits coroutines, but does not fully clean up		// Module simplification splits coroutines, but does not fully clean up
// coroutine intrinsics. To ensure ThinLTO optimization passes don't trip up		// coroutine intrinsics. To ensure ThinLTO optimization passes don't trip up
// on these, we schedule the cleanup here.		// on these, we schedule the cleanup here.
if (PTO.Coroutines)		if (PTO.Coroutines)
MPM.addPass(createModuleToFunctionPassAdaptor(CoroCleanupPass()));		MPM.addPass(createModuleToFunctionPassAdaptor(CoroCleanupPass()));

		if (PGOOpt && PGOOpt->PseudoProbeForProfiling)
		MPM.addPass(PseudoProbeUpdatePass());

// Emit annotation remarks.		// Emit annotation remarks.
addAnnotationRemarksPass(MPM);		addAnnotationRemarksPass(MPM);

addRequiredLTOPreLinkPasses(MPM);		addRequiredLTOPreLinkPasses(MPM);

return MPM;		return MPM;
}		}

▲ Show 20 Lines • Show All 1,566 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	MODULE_PASS("dfsan", DataFlowSanitizerPass())			MODULE_PASS("dfsan", DataFlowSanitizerPass())
	MODULE_PASS("asan-module", ModuleAddressSanitizerPass(/CompileKernel=/false, false, true, false))			MODULE_PASS("asan-module", ModuleAddressSanitizerPass(/CompileKernel=/false, false, true, false))
	MODULE_PASS("msan-module", MemorySanitizerPass({}))			MODULE_PASS("msan-module", MemorySanitizerPass({}))
	MODULE_PASS("tsan-module", ThreadSanitizerPass())			MODULE_PASS("tsan-module", ThreadSanitizerPass())
	MODULE_PASS("kasan-module", ModuleAddressSanitizerPass(/CompileKernel=/true, false, true, false))			MODULE_PASS("kasan-module", ModuleAddressSanitizerPass(/CompileKernel=/true, false, true, false))
	MODULE_PASS("sancov-module", ModuleSanitizerCoveragePass())			MODULE_PASS("sancov-module", ModuleSanitizerCoveragePass())
	MODULE_PASS("memprof-module", ModuleMemProfilerPass())			MODULE_PASS("memprof-module", ModuleMemProfilerPass())
	MODULE_PASS("poison-checking", PoisonCheckingPass())			MODULE_PASS("poison-checking", PoisonCheckingPass())
				MODULE_PASS("pseudo-probe-update", PseudoProbeUpdatePass())
	#undef MODULE_PASS			#undef MODULE_PASS

	#ifndef CGSCC_ANALYSIS			#ifndef CGSCC_ANALYSIS
	#define CGSCC_ANALYSIS(NAME, CREATE_PASS)			#define CGSCC_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())			CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())
	CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())			CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())
	CGSCC_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))			CGSCC_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
	▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

llvm/lib/Passes/StandardInstrumentations.cpp

Show First 20 Lines • Show All 859 Lines • ▼ Show 20 Lines	void StandardInstrumentations::registerCallbacks(
PassInstrumentationCallbacks &PIC) {		PassInstrumentationCallbacks &PIC) {
PrintIR.registerCallbacks(PIC);		PrintIR.registerCallbacks(PIC);
PrintPass.registerCallbacks(PIC);		PrintPass.registerCallbacks(PIC);
TimePasses.registerCallbacks(PIC);		TimePasses.registerCallbacks(PIC);
OptNone.registerCallbacks(PIC);		OptNone.registerCallbacks(PIC);
OptBisect.registerCallbacks(PIC);		OptBisect.registerCallbacks(PIC);
PreservedCFGChecker.registerCallbacks(PIC);		PreservedCFGChecker.registerCallbacks(PIC);
PrintChangedIR.registerCallbacks(PIC);		PrintChangedIR.registerCallbacks(PIC);
		PseudoProbeVerification.registerCallbacks(PIC);
if (VerifyEach)		if (VerifyEach)
Verify.registerCallbacks(PIC);		Verify.registerCallbacks(PIC);
}		}

namespace llvm {		namespace llvm {

template class ChangeReporter<std::string>;		template class ChangeReporter<std::string>;
template class TextChangeReporter<std::string>;		template class TextChangeReporter<std::string>;

} // namespace llvm		} // namespace llvm

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

STATISTIC(NumCSInlined,		STATISTIC(NumCSInlined,
"Number of functions inlined with context sensitive profile");		"Number of functions inlined with context sensitive profile");
STATISTIC(NumCSNotInlined,		STATISTIC(NumCSNotInlined,
"Number of functions not inlined with context sensitive profile");		"Number of functions not inlined with context sensitive profile");
STATISTIC(NumMismatchedProfile,		STATISTIC(NumMismatchedProfile,
"Number of functions with CFG mismatched profile");		"Number of functions with CFG mismatched profile");
STATISTIC(NumMatchedProfile, "Number of functions with CFG matched profile");		STATISTIC(NumMatchedProfile, "Number of functions with CFG matched profile");
		STATISTIC(NumDuplicatedInlinesite,
		"Number of inlined callsites with a partial distribution factor");

STATISTIC(NumCSInlinedHitMinLimit,		STATISTIC(NumCSInlinedHitMinLimit,
"Number of functions with FDO inline stopped due to min size limit");		"Number of functions with FDO inline stopped due to min size limit");
STATISTIC(NumCSInlinedHitMaxLimit,		STATISTIC(NumCSInlinedHitMaxLimit,
"Number of functions with FDO inline stopped due to max size limit");		"Number of functions with FDO inline stopped due to max size limit");
STATISTIC(		STATISTIC(
NumCSInlinedHitGrowthLimit,		NumCSInlinedHitGrowthLimit,
"Number of functions with FDO inline stopped due to growth size limit");		"Number of functions with FDO inline stopped due to growth size limit");
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	private:
Module &CurrentModule;		Module &CurrentModule;
DenseMap<uint64_t, StringRef> &CurrentGUIDToFuncNameMap;		DenseMap<uint64_t, StringRef> &CurrentGUIDToFuncNameMap;
};		};

// Inline candidate used by iterative callsite prioritized inliner		// Inline candidate used by iterative callsite prioritized inliner
struct InlineCandidate {		struct InlineCandidate {
CallBase *CallInstr;		CallBase *CallInstr;
const FunctionSamples *CalleeSamples;		const FunctionSamples *CalleeSamples;
uint64_t CallsiteCount;		uint64_t CallsiteCount;
		wmiUnsubmitted Not Done Reply Inline Actions CallsiteCount will be the count before being prorated or after if CallsiteDistribution is not 1.0? wmi: CallsiteCount will be the count before being prorated or after if CallsiteDistribution is not 1.
		hoyAuthorUnsubmitted Done Reply Inline Actions It is the count after prorated. The prorated count will be used to guide inlining. For example, if a callsite is duplicated in LTO prelink, then in LTO postlink the two copies will get their own distribution factors and their prorated counts are used to decide if they should be inlined independently. hoy: It is the count after prorated. The prorated count will be used to guide inlining. For example…
		wmiUnsubmitted Done Reply Inline Actions Ok, better comment it. wmi: Ok, better comment it.
		// Call site distribution factor to prorate the profile samples for a
		// duplicated callsite. Default value is 1.0.
		float CallsiteDistribution;
Function *ICPCallee;		Function *ICPCallee;
};		};

// Inline candidate comparer using call site weight		// Inline candidate comparer using call site weight
struct CandidateComparer {		struct CandidateComparer {
bool operator()(const InlineCandidate &LHS, const InlineCandidate &RHS) {		bool operator()(const InlineCandidate &LHS, const InlineCandidate &RHS) {
return LHS.CallsiteCount < RHS.CallsiteCount;		return LHS.CallsiteCount < RHS.CallsiteCount;
}		}
▲ Show 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	ErrorOr<uint64_t> SampleProfileLoader::getProbeWeight(const Instruction &Inst) {
// it means that the inlined callsite has no sample, thus the call		// it means that the inlined callsite has no sample, thus the call
// instruction should have 0 count.		// instruction should have 0 count.
if (const auto *CB = dyn_cast<CallBase>(&Inst))		if (const auto *CB = dyn_cast<CallBase>(&Inst))
if (!CB->isIndirectCall() && findCalleeFunctionSamples(*CB))		if (!CB->isIndirectCall() && findCalleeFunctionSamples(*CB))
return 0;		return 0;

const ErrorOr<uint64_t> &R = FS->findSamplesAt(Probe->Id, 0);		const ErrorOr<uint64_t> &R = FS->findSamplesAt(Probe->Id, 0);
if (R) {		if (R) {
uint64_t Samples = R.get();		uint64_t Samples = R.get() * Probe->Factor;
bool FirstMark = CoverageTracker.markSamplesUsed(FS, Probe->Id, 0, Samples);		bool FirstMark = CoverageTracker.markSamplesUsed(FS, Probe->Id, 0, Samples);
if (FirstMark) {		if (FirstMark) {
ORE->emit([&]() {		ORE->emit([&]() {
OptimizationRemarkAnalysis Remark(DEBUG_TYPE, "AppliedSamples", &Inst);		OptimizationRemarkAnalysis Remark(DEBUG_TYPE, "AppliedSamples", &Inst);
Remark << "Applied " << ore::NV("NumSamples", Samples);		Remark << "Applied " << ore::NV("NumSamples", Samples);
Remark << " samples from profile (ProbeId=";		Remark << " samples from profile (ProbeId=";
Remark << ore::NV("ProbeId", Probe->Id);		Remark << ore::NV("ProbeId", Probe->Id);
		Remark << ", Factor=";
		Remark << ore::NV("Factor", Probe->Factor);
		Remark << ", OriginalSamples=";
		Remark << ore::NV("OriginalSamples", R.get());
Remark << ")";		Remark << ")";
return Remark;		return Remark;
});		});
}		}

LLVM_DEBUG(dbgs() << " " << Probe->Id << ":" << Inst		LLVM_DEBUG(dbgs() << " " << Probe->Id << ":" << Inst
<< " - weight: " << R.get() << ")\n");		<< " - weight: " << R.get() << " - factor: "
		<< format("%0.2f", Probe->Factor) << ")\n");
return Samples;		return Samples;
}		}
return R;		return R;
}		}

/// Compute the weight of a basic block.		/// Compute the weight of a basic block.
///		///
/// The weight of basic block \p BB is the maximum weight of all the		/// The weight of basic block \p BB is the maximum weight of all the
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	if (InlineFunction(CB, IFI).isSuccess()) {
// Now populate the list of newly exposed call sites.		// Now populate the list of newly exposed call sites.
InlinedCallSites.clear();		InlinedCallSites.clear();
for (auto &I : IFI.InlinedCallSites)		for (auto &I : IFI.InlinedCallSites)
InlinedCallSites.push_back(I);		InlinedCallSites.push_back(I);

if (ProfileIsCS)		if (ProfileIsCS)
ContextTracker->markContextSamplesInlined(Candidate.CalleeSamples);		ContextTracker->markContextSamplesInlined(Candidate.CalleeSamples);
++NumCSInlined;		++NumCSInlined;

		// Prorate inlined probes for a duplicated inlining callsite which probably
		// has a distribution less than 100%. Samples for an inlinee should be
		// distributed among the copies of the original callsite based on each
		// callsite's distribution factor for counts accuracy. Note that an inlined
		// probe may come with its own distribution factor if it has been duplicated
		// in the inlinee body. The two factor are multiplied to reflect the
		// aggregation of duplication.
		if (Candidate.CallsiteDistribution < 1) {
		for (auto &I : InlinedCallSites) {
		if (Optional<PseudoProbe> Probe = extractProbe(*I))
		setProbeDistributionFactor(I, Probe->Factor
		Candidate.CallsiteDistribution);
		}
		NumDuplicatedInlinesite++;
		}

return true;		return true;
}		}
return false;		return false;
}		}

bool SampleProfileLoader::getInlineCandidate(InlineCandidate *NewCandidate,		bool SampleProfileLoader::getInlineCandidate(InlineCandidate *NewCandidate,
CallBase *CB) {		CallBase *CB) {
assert(CB);		assert(CB);

if (isa<IntrinsicInst>(CB))		if (isa<IntrinsicInst>(CB))
return false;		return false;

// Find the callee's profile. For indirect call, find hottest target profile.		// Find the callee's profile. For indirect call, find hottest target profile.
const FunctionSamples CalleeSamples = findCalleeFunctionSamples(CB);		const FunctionSamples CalleeSamples = findCalleeFunctionSamples(CB);
if (!CalleeSamples)		if (!CalleeSamples)
return false;		return false;

		float Factor = 1.0;
		if (Optional<PseudoProbe> Probe = extractProbe(*CB))
		Factor = Probe->Factor;

uint64_t CallsiteCount = 0;		uint64_t CallsiteCount = 0;
ErrorOr<uint64_t> Weight = getBlockWeight(CB->getParent());		ErrorOr<uint64_t> Weight = getBlockWeight(CB->getParent());
if (Weight)		if (Weight)
CallsiteCount = Weight.get();		CallsiteCount = Weight.get();
else if (CalleeSamples)		else if (CalleeSamples)
CallsiteCount = std::max(CallsiteCount, CalleeSamples->getEntrySamples());		CallsiteCount = std::max(
		CallsiteCount, uint64_t(CalleeSamples->getEntrySamples() * Factor));

*NewCandidate = {CB, CalleeSamples, CallsiteCount, nullptr};		*NewCandidate = {CB, CalleeSamples, CallsiteCount, Factor, nullptr};
return true;		return true;
}		}

InlineCost		InlineCost
SampleProfileLoader::shouldInlineCandidate(InlineCandidate &Candidate) {		SampleProfileLoader::shouldInlineCandidate(InlineCandidate &Candidate) {
assert(ProfileIsCS);		assert(ProfileIsCS);

std::unique_ptr<InlineAdvice> Advice = nullptr;		std::unique_ptr<InlineAdvice> Advice = nullptr;
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	while (!CQueue.empty() && F.getInstructionCount() < SizeLimit) {
if (CalledFunction == &F)		if (CalledFunction == &F)
continue;		continue;
if (I->isIndirectCall()) {		if (I->isIndirectCall()) {
if (PromotedInsns.count(I))		if (PromotedInsns.count(I))
continue;		continue;
uint64_t Sum;		uint64_t Sum;
auto CalleeSamples = findIndirectCallFunctionSamples(*I, Sum);		auto CalleeSamples = findIndirectCallFunctionSamples(*I, Sum);
uint64_t SumOrigin = Sum;		uint64_t SumOrigin = Sum;
		Sum *= Candidate.CallsiteDistribution;
for (const auto *FS : CalleeSamples) {		for (const auto *FS : CalleeSamples) {
// TODO: Consider disable pre-lTO ICP for MonoLTO as well		// TODO: Consider disable pre-lTO ICP for MonoLTO as well
if (IsThinLTOPreLink) {		if (IsThinLTOPreLink) {
FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),		FS->findInlinedFunctions(InlinedGUIDs, F.getParent(),
PSI->getOrCompHotCountThreshold());		PSI->getOrCompHotCountThreshold());
continue;		continue;
}		}
uint64_t EntryCountDistributed = FS->getEntrySamples();		uint64_t EntryCountDistributed =
		FS->getEntrySamples() * Candidate.CallsiteDistribution;
// In addition to regular inline cost check, we also need to make sure		// In addition to regular inline cost check, we also need to make sure
// ICP isn't introducing excessive speculative checks even if individual		// ICP isn't introducing excessive speculative checks even if individual
// target looks beneficial to promote and inline. That means we should		// target looks beneficial to promote and inline. That means we should
// only do ICP where's a small number dominant targets.		// only do ICP where's a small number dominant targets.
if (EntryCountDistributed < SumOrigin / ProfileICPThreshold)		if (EntryCountDistributed < SumOrigin / ProfileICPThreshold)
break;		break;
// For indirect call, we don't run CallAnalyzer through InlineCost		// For indirect call, we don't run CallAnalyzer through InlineCost
// before actual inlining to work around PR18962. However, that means we		// before actual inlining to work around PR18962. However, that means we
// may do ICP first and later decided not to inline, which is mostly ok		// may do ICP first and later decided not to inline, which is mostly ok
// for perf.		// for perf.
if (!PSI->isHotCount(EntryCountDistributed))		if (!PSI->isHotCount(EntryCountDistributed))
break;		break;
const char *Reason = nullptr;		const char *Reason = nullptr;
auto CalleeFunctionName = FS->getFuncName();		auto CalleeFunctionName = FS->getFuncName();
if (CallBase *DI = tryPromoteIndirectCall(		if (CallBase *DI = tryPromoteIndirectCall(
F, CalleeFunctionName, Sum, EntryCountDistributed, I, Reason)) {		F, CalleeFunctionName, Sum, EntryCountDistributed, I, Reason)) {
// Attach function profile for selected indirect callee, and update		// Attach function profile for selected indirect callee, and update
// call site count for the selected target too. Speculatively check		// call site count for the selected target too. Speculatively check
// if it's beneficial to inline the callee to decide whether to ICP.		// if it's beneficial to inline the callee to decide whether to ICP.
Candidate = {DI, FS, EntryCountDistributed, DI->getCalledFunction()};		Candidate = {DI, FS, EntryCountDistributed,
		Candidate.CallsiteDistribution, DI->getCalledFunction()};
		// Prorate the indirect callsite distribution.
		// Do not update the promoted direct callsite distribution at this
		// point since the original distribution combined with the callee
		// profile will be used to prorate callsites from the callee if
		// inlined. Once not inlined, the direct callsite distribution should
		// be prorated so that the it will reflect the real callsite counts.
		setProbeDistributionFactor(I, Candidate.CallsiteDistribution Sum /
		SumOrigin);
PromotedInsns.insert(I);		PromotedInsns.insert(I);
SmallVector<CallBase *, 8> InlinedCallSites;		SmallVector<CallBase *, 8> InlinedCallSites;
// If profile mismatches, we should not attempt to inline DI.		// If profile mismatches, we should not attempt to inline DI.
if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&		if ((isa<CallInst>(DI) \|\| isa<InvokeInst>(DI)) &&
tryInlineCandidate(Candidate, InlinedCallSites)) {		tryInlineCandidate(Candidate, InlinedCallSites)) {
for (auto *CB : InlinedCallSites) {		for (auto *CB : InlinedCallSites) {
if (getInlineCandidate(&NewCandidate, CB))		if (getInlineCandidate(&NewCandidate, CB))
CQueue.emplace(NewCandidate);		CQueue.emplace(NewCandidate);
}		}
Changed = true;		Changed = true;
		} else {
		// Prorate the direct callsite distribution so that it reflects real
		// callsite counts.
		setProbeDistributionFactor(DI, Candidate.CallsiteDistribution
		EntryCountDistributed /
		SumOrigin);
}		}
} else {		} else {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "\nFailed to promote indirect call to "		<< "\nFailed to promote indirect call to "
<< CalleeFunctionName << " because " << Reason << "\n");		<< CalleeFunctionName << " because " << Reason << "\n");
}		}
}		}
} else if (CalledFunction && CalledFunction->getSubprogram() &&		} else if (CalledFunction && CalledFunction->getSubprogram() &&
▲ Show 20 Lines • Show All 965 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/SampleProfileProbe.cpp

//===- SampleProfileProbe.cpp - Pseudo probe Instrumentation -------------===//		//===- SampleProfileProbe.cpp - Pseudo probe Instrumentation -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the SampleProfileProber transformation.		// This file implements the SampleProfileProber transformation.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/SampleProfileProbe.h"		#include "llvm/Transforms/IPO/SampleProfileProbe.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/ProfileData/SampleProf.h"		#include "llvm/ProfileData/SampleProf.h"
#include "llvm/Support/CRC.h"		#include "llvm/Support/CRC.h"
		#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"		#include "llvm/Transforms/Utils/ModuleUtils.h"
		#include <unordered_set>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
#define DEBUG_TYPE "sample-profile-probe"		#define DEBUG_TYPE "sample-profile-probe"

STATISTIC(ArtificialDbgLine,		STATISTIC(ArtificialDbgLine,
"Number of probes that have an artificial debug line");		"Number of probes that have an artificial debug line");

		static cl::opt<bool>
		VerifyPseudoProbe("verify-pseudo-probe", cl::init(false), cl::Hidden,
		cl::desc("Do pseudo probe verification"));

		static cl::list<std::string> VerifyPseudoProbeFuncList(
		"verify-pseudo-probe-funcs", cl::Hidden,
		cl::desc("The option to specify the name of the functions to verify."));

		static cl::opt<bool>
		UpdatePseudoProbe("update-pseudo-probe", cl::init(true), cl::Hidden,
		cl::desc("Update pseudo probe distribution factor"));

		bool PseudoProbeVerifier::shouldVerifyFunction(const Function *F) {
		// Skip function declaration.
		if (F->isDeclaration())
		return false;
		// Skip function that will not be emitted into object file. The prevailing
		// defintion will be verified instead.
		if (F->hasAvailableExternallyLinkage())
		return false;
		// Do a name matching.
		static std::unordered_set<std::string> VerifyFuncNames(
		VerifyPseudoProbeFuncList.begin(), VerifyPseudoProbeFuncList.end());
		return VerifyFuncNames.empty() \|\| VerifyFuncNames.count(F->getName().str());
		}

		void PseudoProbeVerifier::registerCallbacks(PassInstrumentationCallbacks &PIC) {
		if (VerifyPseudoProbe) {
		PIC.registerAfterPassCallback(
		[this](StringRef P, Any IR, const PreservedAnalyses &) {
		this->runAfterPass(P, IR);
		});
		}
		}

		// Callback to run after each transformation for the new pass manager.
		void PseudoProbeVerifier::runAfterPass(StringRef PassID, Any IR) {
		std::string Banner =
		"\n* Pseudo Probe Verification After " + PassID.str() + " *\n";
		dbgs() << Banner;
		if (any_isa<const Module *>(IR))
		runAfterPass(any_cast<const Module *>(IR));
		else if (any_isa<const Function *>(IR))
		runAfterPass(any_cast<const Function *>(IR));
		else if (any_isa<const LazyCallGraph::SCC *>(IR))
		runAfterPass(any_cast<const LazyCallGraph::SCC *>(IR));
		else if (any_isa<const Loop *>(IR))
		runAfterPass(any_cast<const Loop *>(IR));
		else
		llvm_unreachable("Unknown IR unit");
		}

		void PseudoProbeVerifier::runAfterPass(const Module *M) {
		for (const Function &F : *M)
		runAfterPass(&F);
		}

		void PseudoProbeVerifier::runAfterPass(const LazyCallGraph::SCC *C) {
		for (const LazyCallGraph::Node &N : *C)
		runAfterPass(&N.getFunction());
		}

		void PseudoProbeVerifier::runAfterPass(const Function *F) {
		if (!shouldVerifyFunction(F))
		return;
		ProbeFactorMap ProbeFactors;
		for (const auto &BB : *F)
		collectProbeFactors(&BB, ProbeFactors);
		verifyProbeFactors(F, ProbeFactors);
		}

		void PseudoProbeVerifier::runAfterPass(const Loop *L) {
		const Function *F = L->getHeader()->getParent();
		runAfterPass(F);
		}

		void PseudoProbeVerifier::collectProbeFactors(const BasicBlock *Block,
		ProbeFactorMap &ProbeFactors) {
		for (const auto &I : *Block) {
		if (Optional<PseudoProbe> Probe = extractProbe(I))
		ProbeFactors[Probe->Id] += Probe->Factor;
		}
		}

		void PseudoProbeVerifier::verifyProbeFactors(
		const Function *F, const ProbeFactorMap &ProbeFactors) {
		bool BannerPrinted = false;
		auto &PrevProbeFactors = FunctionProbeFactors[F->getName()];
		for (const auto &I : ProbeFactors) {
		float CurProbeFactor = I.second;
		if (PrevProbeFactors.count(I.first)) {
		float PrevProbeFactor = PrevProbeFactors[I.first];
		if (std::abs(CurProbeFactor - PrevProbeFactor) >
		DistributionFactorVariance) {
		wmiUnsubmitted Not Done Reply Inline Actions Why not issue warning/error message when verification fails? That will make enabling the verification in release compiler possible. wmi: Why not issue warning/error message when verification fails? That will make enabling the…
		hoyAuthorUnsubmitted Done Reply Inline Actions The verifier is for debugging only. It doesn't really do any verification. It just helps to track code duplication. Sorry for the naming confusion. hoy: The verifier is for debugging only. It doesn't really do any verification. It just helps to…
		if (!BannerPrinted) {
		dbgs() << "Function " << F->getName() << ":\n";
		BannerPrinted = true;
		}
		dbgs() << "Probe " << I.first << "\tprevious factor "
		<< format("%0.2f", PrevProbeFactor) << "\tcurrent factor "
		<< format("%0.2f", CurProbeFactor) << "\n";
		}
		}

		// Update
		PrevProbeFactors[I.first] = I.second;
		}
		}

PseudoProbeManager::PseudoProbeManager(const Module &M) {		PseudoProbeManager::PseudoProbeManager(const Module &M) {
if (NamedMDNode *FuncInfo = M.getNamedMetadata(PseudoProbeDescMetadataName)) {		if (NamedMDNode *FuncInfo = M.getNamedMetadata(PseudoProbeDescMetadataName)) {
for (const auto *Operand : FuncInfo->operands()) {		for (const auto *Operand : FuncInfo->operands()) {
const auto *MD = cast<MDNode>(Operand);		const auto *MD = cast<MDNode>(Operand);
auto GUID =		auto GUID =
mdconst::dyn_extract<ConstantInt>(MD->getOperand(0))->getZExtValue();		mdconst::dyn_extract<ConstantInt>(MD->getOperand(0))->getZExtValue();
auto Hash =		auto Hash =
mdconst::dyn_extract<ConstantInt>(MD->getOperand(1))->getZExtValue();		mdconst::dyn_extract<ConstantInt>(MD->getOperand(1))->getZExtValue();
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	for (auto &I : BlockProbeIds) {
}		}

IRBuilder<> Builder(J);		IRBuilder<> Builder(J);
assert(Builder.GetInsertPoint() != BB->end() &&		assert(Builder.GetInsertPoint() != BB->end() &&
"Cannot get the probing point");		"Cannot get the probing point");
Function *ProbeFn =		Function *ProbeFn =
llvm::Intrinsic::getDeclaration(M, Intrinsic::pseudoprobe);		llvm::Intrinsic::getDeclaration(M, Intrinsic::pseudoprobe);
Value *Args[] = {Builder.getInt64(Guid), Builder.getInt64(Index),		Value *Args[] = {Builder.getInt64(Guid), Builder.getInt64(Index),
Builder.getInt32(0)};		Builder.getInt32(0),
		Builder.getInt64(PseudoProbeFullDistributionFactor)};
auto *Probe = Builder.CreateCall(ProbeFn, Args);		auto *Probe = Builder.CreateCall(ProbeFn, Args);
AssignDebugLoc(Probe);		AssignDebugLoc(Probe);
}		}

// Probe both direct calls and indirect calls. Direct calls are probed so that		// Probe both direct calls and indirect calls. Direct calls are probed so that
// their probe ID can be used as an call site identifier to represent a		// their probe ID can be used as an call site identifier to represent a
// calling context.		// calling context.
for (auto &I : CallProbeIds) {		for (auto &I : CallProbeIds) {
auto *Call = I.first;		auto *Call = I.first;
uint32_t Index = I.second;		uint32_t Index = I.second;
uint32_t Type = cast<CallBase>(Call)->getCalledFunction()		uint32_t Type = cast<CallBase>(Call)->getCalledFunction()
? (uint32_t)PseudoProbeType::DirectCall		? (uint32_t)PseudoProbeType::DirectCall
: (uint32_t)PseudoProbeType::IndirectCall;		: (uint32_t)PseudoProbeType::IndirectCall;
AssignDebugLoc(Call);		AssignDebugLoc(Call);
// Levarge the 32-bit discriminator field of debug data to store the ID and		// Levarge the 32-bit discriminator field of debug data to store the ID and
// type of a callsite probe. This gets rid of the dependency on plumbing a		// type of a callsite probe. This gets rid of the dependency on plumbing a
// customized metadata through the codegen pipeline.		// customized metadata through the codegen pipeline.
uint32_t V = PseudoProbeDwarfDiscriminator::packProbeData(Index, Type);		uint32_t V = PseudoProbeDwarfDiscriminator::packProbeData(
		Index, Type, 0, PseudoProbeDwarfDiscriminator::FullDistributionFactor);
if (auto DIL = Call->getDebugLoc()) {		if (auto DIL = Call->getDebugLoc()) {
DIL = DIL->cloneWithDiscriminator(V);		DIL = DIL->cloneWithDiscriminator(V);
Call->setDebugLoc(DIL);		Call->setDebugLoc(DIL);
}		}
}		}

// Create module-level metadata that contains function info necessary to		// Create module-level metadata that contains function info necessary to
// synthesize probe-based sample counts, which are		// synthesize probe-based sample counts, which are
Show All 38 Lines	for (auto &F : M) {
if (F.isDeclaration())		if (F.isDeclaration())
continue;		continue;
SampleProfileProber ProbeManager(F, ModuleId);		SampleProfileProber ProbeManager(F, ModuleId);
ProbeManager.instrumentOneFunc(F, TM);		ProbeManager.instrumentOneFunc(F, TM);
}		}

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

		void PseudoProbeUpdatePass::runOnFunction(Function &F,
		FunctionAnalysisManager &FAM) {
		BlockFrequencyInfo &BFI = FAM.getResult<BlockFrequencyAnalysis>(F);
		auto BBProfileCount = [&BFI](BasicBlock *BB) {
		return BFI.getBlockProfileCount(BB)
		? BFI.getBlockProfileCount(BB).getValue()
		: 0;
		};

		// Collect the sum of execution weight for each probe.
		ProbeFactorMap ProbeFactors;
		for (auto &Block : F) {
		for (auto &I : Block) {
		if (Optional<PseudoProbe> Probe = extractProbe(I))
		ProbeFactors[Probe->Id] += BBProfileCount(&Block);
		}
		}

		// Fix up over-counted probes.
		for (auto &Block : F) {
		for (auto &I : Block) {
		if (Optional<PseudoProbe> Probe = extractProbe(I)) {
		float Sum = ProbeFactors[Probe->Id];
		if (Sum != 0)
		setProbeDistributionFactor(I, BBProfileCount(&Block) / Sum);
		}
		}
		}
		}

		PreservedAnalyses PseudoProbeUpdatePass::run(Module &M,
		ModuleAnalysisManager &AM) {
		if (UpdatePseudoProbe) {
		for (auto &F : M) {
		if (F.isDeclaration())
		continue;
		FunctionAnalysisManager &FAM =
		AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
		runOnFunction(F, FAM);
		}
		}
		return PreservedAnalyses::none();
		}

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof

This file was added.

				foo:3200:13
				1: 13
				2: 7
				3: 6
				4: 13
				5: 7
				6: 6
				!CFGChecksum: 844530426352218

llvm/test/Transforms/SampleProfile/pseudo-probe-emit-inline.ll

	; REQUIRES: x86_64-linux			; REQUIRES: x86_64-linux
	; RUN: opt < %s -passes='pseudo-probe,cgscc(inline)' -function-sections -mtriple=x86_64-unknown-linux-gnu -S -o %t			; RUN: opt < %s -passes='pseudo-probe,cgscc(inline)' -function-sections -mtriple=x86_64-unknown-linux-gnu -S -o %t
	; RUN: FileCheck %s < %t --check-prefix=CHECK-IL			; RUN: FileCheck %s < %t --check-prefix=CHECK-IL
	; RUN: llc -pseudo-probe-for-profiling -function-sections <%t -filetype=asm -o %t1			; RUN: llc -pseudo-probe-for-profiling -function-sections <%t -filetype=asm -o %t1
	; RUN: FileCheck %s < %t1 --check-prefix=CHECK-ASM			; RUN: FileCheck %s < %t1 --check-prefix=CHECK-ASM
	; RUN: llc -pseudo-probe-for-profiling -function-sections <%t -filetype=obj -o %t2			; RUN: llc -pseudo-probe-for-profiling -function-sections <%t -filetype=obj -o %t2
	; RUN: llvm-objdump --section-headers %t2 \| FileCheck %s --check-prefix=CHECK-OBJ			; RUN: llvm-objdump --section-headers %t2 \| FileCheck %s --check-prefix=CHECK-OBJ
	; RUN: llvm-mc -filetype=asm <%t1 -o %t3			; RUN: llvm-mc -filetype=asm <%t1 -o %t3
	; RUN: FileCheck %s < %t3 --check-prefix=CHECK-ASM			; RUN: FileCheck %s < %t3 --check-prefix=CHECK-ASM
	; RUN: llvm-mc -filetype=obj <%t1 -o %t4			; RUN: llvm-mc -filetype=obj <%t1 -o %t4
	; RUN: llvm-objdump --section-headers %t4 \| FileCheck %s --check-prefix=CHECK-OBJ			; RUN: llvm-objdump --section-headers %t4 \| FileCheck %s --check-prefix=CHECK-OBJ

	define dso_local void @foo2() !dbg !7 {			define dso_local void @foo2() !dbg !7 {
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID1:]], i64 1, i32 0), !dbg ![[#]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID1:]], i64 1, i32 0, i64 -1), !dbg ![[#]]
	; CHECK-ASM: .pseudoprobe [[#GUID1:]] 1 0 0			; CHECK-ASM: .pseudoprobe [[#GUID1:]] 1 0 0
	ret void, !dbg !10			ret void, !dbg !10
	}			}

	define dso_local void @foo() #0 !dbg !11 {			define dso_local void @foo() #0 !dbg !11 {
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID2:]], i64 1, i32 0), !dbg ![[#]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID2:]], i64 1, i32 0, i64 -1), !dbg ![[#]]
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID1]], i64 1, i32 0), !dbg ![[#DL1:]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID1]], i64 1, i32 0, i64 -1), !dbg ![[#DL1:]]
	; CHECK-ASM: .pseudoprobe [[#GUID2:]] 1 0 0			; CHECK-ASM: .pseudoprobe [[#GUID2:]] 1 0 0
	; CHECK-ASM: .pseudoprobe [[#GUID1]] 1 0 0 @ [[#GUID2]]:2			; CHECK-ASM: .pseudoprobe [[#GUID1]] 1 0 0 @ [[#GUID2]]:2
	call void @foo2(), !dbg !12			call void @foo2(), !dbg !12
	ret void, !dbg !13			ret void, !dbg !13
	}			}

	define dso_local i32 @entry() !dbg !14 {			define dso_local i32 @entry() !dbg !14 {
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID3:]], i64 1, i32 0), !dbg ![[#]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID3:]], i64 1, i32 0, i64 -1), !dbg ![[#]]
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID2]], i64 1, i32 0), !dbg ![[#DL2:]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID2]], i64 1, i32 0, i64 -1), !dbg ![[#DL2:]]
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID1]], i64 1, i32 0), !dbg ![[#DL3:]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID1]], i64 1, i32 0, i64 -1), !dbg ![[#DL3:]]
	; CHECK-ASM: .pseudoprobe [[#GUID3:]] 1 0 0			; CHECK-ASM: .pseudoprobe [[#GUID3:]] 1 0 0
	; CHECK-ASM: .pseudoprobe [[#GUID2]] 1 0 0 @ [[#GUID3]]:2			; CHECK-ASM: .pseudoprobe [[#GUID2]] 1 0 0 @ [[#GUID3]]:2
	; CHECK-ASM: .pseudoprobe [[#GUID1]] 1 0 0 @ [[#GUID3]]:2 @ [[#GUID2]]:2			; CHECK-ASM: .pseudoprobe [[#GUID1]] 1 0 0 @ [[#GUID3]]:2 @ [[#GUID2]]:2
	call void @foo(), !dbg !18			call void @foo(), !dbg !18
	ret i32 0, !dbg !19			ret i32 0, !dbg !19
	}			}


	; CHECK-IL: ![[#SCOPE1:]] = distinct !DISubprogram(name: "foo2"			; CHECK-IL: ![[#SCOPE1:]] = distinct !DISubprogram(name: "foo2"
	; CHECK-IL: ![[#SCOPE2:]] = distinct !DISubprogram(name: "foo"			; CHECK-IL: ![[#SCOPE2:]] = distinct !DISubprogram(name: "foo"
	; CHECK-IL: ![[#DL1]] = !DILocation(line: 3, column: 1, scope: ![[#SCOPE1]], inlinedAt: ![[#INL1:]])			; CHECK-IL: ![[#DL1]] = !DILocation(line: 3, column: 1, scope: ![[#SCOPE1]], inlinedAt: ![[#INL1:]])
	; CHECK-IL: ![[#INL1]] = distinct !DILocation(line: 7, column: 3, scope: ![[#BL1:]])			; CHECK-IL: ![[#INL1]] = distinct !DILocation(line: 7, column: 3, scope: ![[#BL1:]])
	;; A discriminator of 134217751 which is 0x8000017 in hexdecimal, stands for a direct call probe			;; A discriminator of 186646551 which is 0xb200017 in hexdecimal, stands for a direct call probe
	;; with an index of 2.			;; with an index of 2 and a scale of 100%.
	; CHECK-IL: ![[#BL1]] = !DILexicalBlockFile(scope: ![[#SCOPE2]], file: !1, discriminator: 134217751)			; CHECK-IL: ![[#BL1]] = !DILexicalBlockFile(scope: ![[#SCOPE2]], file: !1, discriminator: 186646551)
	; CHECK-IL: ![[#SCOPE3:]] = distinct !DISubprogram(name: "entry"			; CHECK-IL: ![[#SCOPE3:]] = distinct !DISubprogram(name: "entry"
	; CHECK-IL: ![[#DL2]] = !DILocation(line: 7, column: 3, scope: ![[#SCOPE2]], inlinedAt: ![[#INL2:]])			; CHECK-IL: ![[#DL2]] = !DILocation(line: 7, column: 3, scope: ![[#SCOPE2]], inlinedAt: ![[#INL2:]])
	; CHECK-IL: ![[#INL2]] = distinct !DILocation(line: 11, column: 3, scope: ![[#BL2:]])			; CHECK-IL: ![[#INL2]] = distinct !DILocation(line: 11, column: 3, scope: ![[#BL2:]])
	; CHECK-IL: ![[#BL2]] = !DILexicalBlockFile(scope: ![[#SCOPE3]], file: !1, discriminator: 134217751)			; CHECK-IL: ![[#BL2]] = !DILexicalBlockFile(scope: ![[#SCOPE3]], file: !1, discriminator: 186646551)
	; CHECK-IL: ![[#DL3]] = !DILocation(line: 3, column: 1, scope: ![[#SCOPE1]], inlinedAt: ![[#INL3:]])			; CHECK-IL: ![[#DL3]] = !DILocation(line: 3, column: 1, scope: ![[#SCOPE1]], inlinedAt: ![[#INL3:]])
	; CHECK-IL: ![[#INL3]] = distinct !DILocation(line: 7, column: 3, scope: ![[#BL1]], inlinedAt: ![[#INL2]])			; CHECK-IL: ![[#INL3]] = distinct !DILocation(line: 7, column: 3, scope: ![[#BL1]], inlinedAt: ![[#INL2]])


	; Check the generation of .pseudo_probe_desc section			; Check the generation of .pseudo_probe_desc section
	; CHECK-ASM: .section .pseudo_probe_desc,"G",@progbits,.pseudo_probe_desc_foo2,comdat			; CHECK-ASM: .section .pseudo_probe_desc,"G",@progbits,.pseudo_probe_desc_foo2,comdat
	; CHECK-ASM-NEXT: .quad [[#GUID1]]			; CHECK-ASM-NEXT: .quad [[#GUID1]]
	; CHECK-ASM-NEXT: .quad [[#HASH1:]]			; CHECK-ASM-NEXT: .quad [[#HASH1:]]
	Show All 37 Lines

llvm/test/Transforms/SampleProfile/pseudo-probe-emit.ll

	; REQUIRES: x86_64-linux			; REQUIRES: x86_64-linux
	; RUN: opt < %s -passes=pseudo-probe -function-sections -S -o %t			; RUN: opt < %s -passes=pseudo-probe -function-sections -S -o %t
	; RUN: FileCheck %s < %t --check-prefix=CHECK-IL			; RUN: FileCheck %s < %t --check-prefix=CHECK-IL
	; RUN: llc %t -pseudo-probe-for-profiling -stop-after=pseudo-probe-inserter -o - \| FileCheck %s --check-prefix=CHECK-MIR			; RUN: llc %t -pseudo-probe-for-profiling -stop-after=pseudo-probe-inserter -o - \| FileCheck %s --check-prefix=CHECK-MIR
	; RUN: llc %t -pseudo-probe-for-profiling -function-sections -filetype=asm -o %t1			; RUN: llc %t -pseudo-probe-for-profiling -function-sections -filetype=asm -o %t1
	; RUN: FileCheck %s < %t1 --check-prefix=CHECK-ASM			; RUN: FileCheck %s < %t1 --check-prefix=CHECK-ASM
	; RUN: llc %t -pseudo-probe-for-profiling -function-sections -filetype=obj -o %t2			; RUN: llc %t -pseudo-probe-for-profiling -function-sections -filetype=obj -o %t2
	; RUN: llvm-objdump --section-headers %t2 \| FileCheck %s --check-prefix=CHECK-OBJ			; RUN: llvm-objdump --section-headers %t2 \| FileCheck %s --check-prefix=CHECK-OBJ
	; RUN: llvm-mc %t1 -filetype=obj -o %t3			; RUN: llvm-mc %t1 -filetype=obj -o %t3
	; RUN: llvm-objdump --section-headers %t3 \| FileCheck %s --check-prefix=CHECK-OBJ			; RUN: llvm-objdump --section-headers %t3 \| FileCheck %s --check-prefix=CHECK-OBJ

	;; Check the generation of pseudoprobe intrinsic call.			;; Check the generation of pseudoprobe intrinsic call.

				@a = dso_local global i32 0, align 4

	define void @foo(i32 %x) !dbg !3 {			define void @foo(i32 %x) !dbg !3 {
	bb0:			bb0:
	%cmp = icmp eq i32 %x, 0			%cmp = icmp eq i32 %x, 0
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0), !dbg ![[#FAKELINE:]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1), !dbg ![[#FAKELINE:]]
	; CHECK-MIR: PSEUDO_PROBE [[#GUID:]], 1, 0, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID:]], 1, 0, 0
	; CHECK-ASM: .pseudoprobe [[#GUID:]] 1 0 0			; CHECK-ASM: .pseudoprobe [[#GUID:]] 1 0 0
	br i1 %cmp, label %bb1, label %bb2			br i1 %cmp, label %bb1, label %bb2

	bb1:			bb1:
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0), !dbg ![[#FAKELINE]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1), !dbg ![[#FAKELINE]]
	; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 3, 0, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 3, 0, 0
	; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 4, 0, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 4, 0, 0
	; CHECK-ASM: .pseudoprobe [[#GUID]] 3 0 0			; CHECK-ASM: .pseudoprobe [[#GUID]] 3 0 0
	; CHECK-ASM: .pseudoprobe [[#GUID]] 4 0 0			; CHECK-ASM: .pseudoprobe [[#GUID]] 4 0 0
				store i32 6, i32* @a, align 4
	br label %bb3			br label %bb3

	bb2:			bb2:
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0), !dbg ![[#FAKELINE]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1), !dbg ![[#FAKELINE]]
	; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 2, 0, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 2, 0, 0
	; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 4, 0, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID]], 4, 0, 0
	; CHECK-ASM: .pseudoprobe [[#GUID]] 2 0 0			; CHECK-ASM: .pseudoprobe [[#GUID]] 2 0 0
	; CHECK-ASM: .pseudoprobe [[#GUID]] 4 0 0			; CHECK-ASM: .pseudoprobe [[#GUID]] 4 0 0
				store i32 8, i32* @a, align 4
	br label %bb3			br label %bb3

	bb3:			bb3:
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0), !dbg ![[#REALLINE:]]			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID]], i64 4, i32 0, i64 -1), !dbg ![[#REALLINE:]]
	ret void, !dbg !12			ret void, !dbg !12
	}			}

	declare void @bar(i32 %x)			declare void @bar(i32 %x)

	define internal void @foo2(void (i32)* %f) !dbg !4 {			define internal void @foo2(void (i32)* %f) !dbg !4 {
	entry:			entry:
	; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID2:]], i64 1, i32 0)			; CHECK-IL: call void @llvm.pseudoprobe(i64 [[#GUID2:]], i64 1, i32 0, i64 -1)
	; CHECK-MIR: PSEUDO_PROBE [[#GUID2:]], 1, 0, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID2:]], 1, 0, 0
	; CHECK-ASM: .pseudoprobe [[#GUID2:]] 1 0 0			; CHECK-ASM: .pseudoprobe [[#GUID2:]] 1 0 0
	; Check pseudo_probe metadata attached to the indirect call instruction.			; Check pseudo_probe metadata attached to the indirect call instruction.
	; CHECK-IL: call void %f(i32 1), !dbg ![[#PROBE0:]]			; CHECK-IL: call void %f(i32 1), !dbg ![[#PROBE0:]]
	; CHECK-MIR: PSEUDO_PROBE [[#GUID2]], 2, 1, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID2]], 2, 1, 0
	; CHECK-ASM: .pseudoprobe [[#GUID2]] 2 1 0			; CHECK-ASM: .pseudoprobe [[#GUID2]] 2 1 0
	call void %f(i32 1), !dbg !13			call void %f(i32 1), !dbg !13
	; Check pseudo_probe metadata attached to the direct call instruction.			; Check pseudo_probe metadata attached to the direct call instruction.
	; CHECK-IL: call void @bar(i32 1), !dbg ![[#PROBE1:]]			; CHECK-IL: call void @bar(i32 1), !dbg ![[#PROBE1:]]
	; CHECK-MIR: PSEUDO_PROBE [[#GUID2]], 3, 2, 0			; CHECK-MIR: PSEUDO_PROBE [[#GUID2]], 3, 2, 0
	; CHECK-ASM: .pseudoprobe [[#GUID2]] 3 2 0			; CHECK-ASM: .pseudoprobe [[#GUID2]] 3 2 0
	call void @bar(i32 1)			call void @bar(i32 1)
	ret void			ret void
	}			}

	; CHECK-IL: ![[#FOO:]] = distinct !DISubprogram(name: "foo"			; CHECK-IL: ![[#FOO:]] = distinct !DISubprogram(name: "foo"
	; CHECK-IL: ![[#FAKELINE]] = !DILocation(line: 0, scope: ![[#FOO]])			; CHECK-IL: ![[#FAKELINE]] = !DILocation(line: 0, scope: ![[#FOO]])
	; CHECK-IL: ![[#REALLINE]] = !DILocation(line: 2, scope: ![[#FOO]])			; CHECK-IL: ![[#REALLINE]] = !DILocation(line: 2, scope: ![[#FOO]])
	; CHECK-IL: ![[#PROBE0]] = !DILocation(line: 2, column: 20, scope: ![[#SCOPE0:]])			; CHECK-IL: ![[#PROBE0]] = !DILocation(line: 2, column: 20, scope: ![[#SCOPE0:]])
	;; A discriminator of 67108887 which is 0x4000017 in hexdecimal, stands for a direct call probe			;; A discriminator of 67108887 which is 0x7200017 in hexdecimal, stands for a direct call probe
	;; with an index of 2.			;; with an index of 2.
	; CHECK-IL: ![[#SCOPE0]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 67108887)			; CHECK-IL: ![[#SCOPE0]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537687)
	; CHECK-IL: ![[#PROBE1]] = !DILocation(line: 0, scope: ![[#SCOPE1:]])			; CHECK-IL: ![[#PROBE1]] = !DILocation(line: 0, scope: ![[#SCOPE1:]])
	;; A discriminator of 134217759 which is 0x800001f in hexdecimal, stands for a direct call probe			;; A discriminator of 186646559 which is 0xb20001f in hexdecimal, stands for a direct call probe
	;; with an index of 3.			;; with an index of 3.
	; CHECK-IL: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 134217759)			; CHECK-IL: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 186646559)

	; Check the generation of .pseudo_probe_desc section			; Check the generation of .pseudo_probe_desc section
	; CHECK-ASM: .section .pseudo_probe_desc,"G",@progbits,.pseudo_probe_desc_foo,comdat			; CHECK-ASM: .section .pseudo_probe_desc,"G",@progbits,.pseudo_probe_desc_foo,comdat
	; CHECK-ASM-NEXT: .quad [[#GUID]]			; CHECK-ASM-NEXT: .quad [[#GUID]]
	; CHECK-ASM-NEXT: .quad [[#HASH:]]			; CHECK-ASM-NEXT: .quad [[#HASH:]]
	; CHECK-ASM-NEXT: .byte 3			; CHECK-ASM-NEXT: .byte 3
	; CHECK-ASM-NEXT: .ascii "foo"			; CHECK-ASM-NEXT: .ascii "foo"
	; CHECK-ASM-NEXT: .section .pseudo_probe_desc,"G",@progbits,.pseudo_probe_desc_foo2,comdat			; CHECK-ASM-NEXT: .section .pseudo_probe_desc,"G",@progbits,.pseudo_probe_desc_foo2,comdat
	Show All 24 Lines

llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll

	; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-profile.prof -pass-remarks=sample-profile -pass-remarks-output=%t.opt.yaml -S \| FileCheck %s			; RUN: opt < %s -passes=pseudo-probe,sample-profile -sample-profile-file=%S/Inputs/pseudo-probe-profile.prof -pass-remarks=sample-profile -pass-remarks-output=%t.opt.yaml -S \| FileCheck %s
	; RUN: FileCheck %s -check-prefix=YAML < %t.opt.yaml			; RUN: FileCheck %s -check-prefix=YAML < %t.opt.yaml

	define dso_local i32 @foo(i32 %x, void (i32)* %f) #0 !dbg !4 {			define dso_local i32 @foo(i32 %x, void (i32)* %f) #0 !dbg !4 {
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%x.addr = alloca i32, align 4			%x.addr = alloca i32, align 4
	store i32 %x, i32* %x.addr, align 4			store i32 %x, i32* %x.addr, align 4
	%0 = load i32, i32* %x.addr, align 4			%0 = load i32, i32* %x.addr, align 4
	%cmp = icmp eq i32 %0, 0			%cmp = icmp eq i32 %0, 0
	; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0)			; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1)
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else
	; CHECK: br i1 %cmp, label %if.then, label %if.else, !prof ![[PD1:[0-9]+]]			; CHECK: br i1 %cmp, label %if.then, label %if.else, !prof ![[PD1:[0-9]+]]

	if.then:			if.then:
	; CHECK: call {{.*}}, !dbg ![[#PROBE1:]], !prof ![[PROF1:[0-9]+]]			; CHECK: call {{.*}}, !dbg ![[#PROBE1:]], !prof ![[PROF1:[0-9]+]]
	call void %f(i32 1)			call void %f(i32 1)
	; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0)			; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
	store i32 1, i32* %retval, align 4			store i32 1, i32* %retval, align 4
	br label %return			br label %return

	if.else:			if.else:
	; CHECK: call {{.*}}, !dbg ![[#PROBE2:]], !prof ![[PROF2:[0-9]+]]			; CHECK: call {{.*}}, !dbg ![[#PROBE2:]], !prof ![[PROF2:[0-9]+]]
	call void %f(i32 2)			call void %f(i32 2)
	; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0)			; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
	store i32 2, i32* %retval, align 4			store i32 2, i32* %retval, align 4
	br label %return			br label %return

	return:			return:
	; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0)			; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
	%1 = load i32, i32* %retval, align 4			%1 = load i32, i32* %retval, align 4
	ret i32 %1			ret i32 %1
	}			}

	attributes #0 = {"use-sample-profile"}			attributes #0 = {"use-sample-profile"}

	; CHECK: ![[PD1]] = !{!"branch_weights", i32 8, i32 7}			; CHECK: ![[PD1]] = !{!"branch_weights", i32 8, i32 7}
	; CHECK: ![[#PROBE1]] = !DILocation(line: 0, scope: ![[#SCOPE1:]])			; CHECK: ![[#PROBE1]] = !DILocation(line: 0, scope: ![[#SCOPE1:]])
	;; A discriminator of 119537711 which is 0x400002f in hexdecimal, stands for an indirect call probe			;; A discriminator of 119537711 which is 0x720002f in hexdecimal, stands for an indirect call probe
	;; with an index of 5.			;; with an index of 5.
	; CHECK: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 67108911)			; CHECK: ![[#SCOPE1]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537711)
	; CHECK: ![[PROF1]] = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}			; CHECK: ![[PROF1]] = !{!"VP", i32 0, i64 7, i64 9191153033785521275, i64 5, i64 -1069303473483922844, i64 2}
	; CHECK: ![[#PROBE2]] = !DILocation(line: 0, scope: ![[#SCOPE2:]])			;; A discriminator of 119537719 which is 0x7200037 in hexdecimal, stands for an indirect call probe
	;; A discriminator of 119537719 which is 0x4000037 in hexdecimal, stands for an indirect call probe
	;; with an index of 6.			;; with an index of 6.
	; CHECK: ![[#SCOPE2]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 67108919)			; CHECK: ![[#PROBE2]] = !DILocation(line: 0, scope: ![[#SCOPE2:]])
				; CHECK: ![[#SCOPE2]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 119537719)
	; CHECK: ![[PROF2]] = !{!"VP", i32 0, i64 6, i64 -1069303473483922844, i64 4, i64 9191153033785521275, i64 2}			; CHECK: ![[PROF2]] = !{!"VP", i32 0, i64 6, i64 -1069303473483922844, i64 4, i64 9191153033785521275, i64 2}

	!llvm.module.flags = !{!9, !10}			!llvm.module.flags = !{!9, !10}

	!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1)			!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1)
	!1 = !DIFile(filename: "test.c", directory: "")			!1 = !DIFile(filename: "test.c", directory: "")
	!2 = !{}			!2 = !{}
	!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, unit: !0, retainedNodes: !2)			!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, unit: !0, retainedNodes: !2)
	Show All 9 Lines
	;YAML-NEXT: Name: AppliedSamples			;YAML-NEXT: Name: AppliedSamples
	;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }			;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }
	;YAML-NEXT: Function: foo			;YAML-NEXT: Function: foo
	;YAML-NEXT: Args:			;YAML-NEXT: Args:
	;YAML-NEXT: - String: 'Applied '			;YAML-NEXT: - String: 'Applied '
	;YAML-NEXT: - NumSamples: '13'			;YAML-NEXT: - NumSamples: '13'
	;YAML-NEXT: - String: ' samples from profile (ProbeId='			;YAML-NEXT: - String: ' samples from profile (ProbeId='
	;YAML-NEXT: - ProbeId: '1'			;YAML-NEXT: - ProbeId: '1'
				;YAML-NEXT: - String: ', Factor='
				;YAML-NEXT: - Factor: '1.000000e+00'
				;YAML-NEXT: - String: ', OriginalSamples='
				;YAML-NEXT: - OriginalSamples: '13'
	;YAML-NEXT: - String: ')'			;YAML-NEXT: - String: ')'
	;YAML: --- !Analysis			;YAML: --- !Analysis
	;YAML-NEXT: Pass: sample-profile			;YAML-NEXT: Pass: sample-profile
	;YAML-NEXT: Name: AppliedSamples			;YAML-NEXT: Name: AppliedSamples
	;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }			;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }
	;YAML-NEXT: Function: foo			;YAML-NEXT: Function: foo
	;YAML-NEXT: Args:			;YAML-NEXT: Args:
	;YAML-NEXT: - String: 'Applied '			;YAML-NEXT: - String: 'Applied '
	;YAML-NEXT: - NumSamples: '7'			;YAML-NEXT: - NumSamples: '7'
	;YAML-NEXT: - String: ' samples from profile (ProbeId='			;YAML-NEXT: - String: ' samples from profile (ProbeId='
	;YAML-NEXT: - ProbeId: '5'			;YAML-NEXT: - ProbeId: '5'
				;YAML-NEXT: - String: ', Factor='
				;YAML-NEXT: - Factor: '1.000000e+00'
				;YAML-NEXT: - String: ', OriginalSamples='
				;YAML-NEXT: - OriginalSamples: '7'
	;YAML-NEXT: - String: ')'			;YAML-NEXT: - String: ')'
	;YAML: --- !Analysis			;YAML: --- !Analysis
	;YAML-NEXT: Pass: sample-profile			;YAML-NEXT: Pass: sample-profile
	;YAML-NEXT: Name: AppliedSamples			;YAML-NEXT: Name: AppliedSamples
	;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }			;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }
	;YAML-NEXT: Function: foo			;YAML-NEXT: Function: foo
	;YAML-NEXT: Args:			;YAML-NEXT: Args:
	;YAML-NEXT: - String: 'Applied '			;YAML-NEXT: - String: 'Applied '
	;YAML-NEXT: - NumSamples: '7'			;YAML-NEXT: - NumSamples: '7'
	;YAML-NEXT: - String: ' samples from profile (ProbeId='			;YAML-NEXT: - String: ' samples from profile (ProbeId='
	;YAML-NEXT: - ProbeId: '2'			;YAML-NEXT: - ProbeId: '2'
				;YAML-NEXT: - String: ', Factor='
				;YAML-NEXT: - Factor: '1.000000e+00'
				;YAML-NEXT: - String: ', OriginalSamples='
				;YAML-NEXT: - OriginalSamples: '7'
	;YAML-NEXT: - String: ')'			;YAML-NEXT: - String: ')'
	;YAML: --- !Analysis			;YAML: --- !Analysis
	;YAML-NEXT: Pass: sample-profile			;YAML-NEXT: Pass: sample-profile
	;YAML-NEXT: Name: AppliedSamples			;YAML-NEXT: Name: AppliedSamples
	;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }			;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }
	;YAML-NEXT: Function: foo			;YAML-NEXT: Function: foo
	;YAML-NEXT: Args:			;YAML-NEXT: Args:
	;YAML-NEXT: - String: 'Applied '			;YAML-NEXT: - String: 'Applied '
	;YAML-NEXT: - NumSamples: '6'			;YAML-NEXT: - NumSamples: '6'
	;YAML-NEXT: - String: ' samples from profile (ProbeId='			;YAML-NEXT: - String: ' samples from profile (ProbeId='
	;YAML-NEXT: - ProbeId: '6'			;YAML-NEXT: - ProbeId: '6'
				;YAML-NEXT: - String: ', Factor='
				;YAML-NEXT: - Factor: '1.000000e+00'
				;YAML-NEXT: - String: ', OriginalSamples='
				;YAML-NEXT: - OriginalSamples: '6'
	;YAML-NEXT: - String: ')'			;YAML-NEXT: - String: ')'
	;YAML: --- !Analysis			;YAML: --- !Analysis
	;YAML-NEXT: Pass: sample-profile			;YAML-NEXT: Pass: sample-profile
	;YAML-NEXT: Name: AppliedSamples			;YAML-NEXT: Name: AppliedSamples
	;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }			;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }
	;YAML-NEXT: Function: foo			;YAML-NEXT: Function: foo
	;YAML-NEXT: Args:			;YAML-NEXT: Args:
	;YAML-NEXT: - String: 'Applied '			;YAML-NEXT: - String: 'Applied '
	;YAML-NEXT: - NumSamples: '6'			;YAML-NEXT: - NumSamples: '6'
	;YAML-NEXT: - String: ' samples from profile (ProbeId='			;YAML-NEXT: - String: ' samples from profile (ProbeId='
	;YAML-NEXT: - ProbeId: '3'			;YAML-NEXT: - ProbeId: '3'
				;YAML-NEXT: - String: ', Factor='
				;YAML-NEXT: - Factor: '1.000000e+00'
				;YAML-NEXT: - String: ', OriginalSamples='
				;YAML-NEXT: - OriginalSamples: '6'
	;YAML-NEXT: - String: ')'			;YAML-NEXT: - String: ')'
	;YAML: --- !Analysis			;YAML: --- !Analysis
	;YAML-NEXT: Pass: sample-profile			;YAML-NEXT: Pass: sample-profile
	;YAML-NEXT: Name: AppliedSamples			;YAML-NEXT: Name: AppliedSamples
	;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }			;YAML-NEXT: DebugLoc: { File: test.c, Line: 0, Column: 0 }
	;YAML-NEXT: Function: foo			;YAML-NEXT: Function: foo
	;YAML-NEXT: Args:			;YAML-NEXT: Args:
	;YAML-NEXT: - String: 'Applied '			;YAML-NEXT: - String: 'Applied '
	;YAML-NEXT: - NumSamples: '13'			;YAML-NEXT: - NumSamples: '13'
	;YAML-NEXT: - String: ' samples from profile (ProbeId='			;YAML-NEXT: - String: ' samples from profile (ProbeId='
	;YAML-NEXT: - ProbeId: '4'			;YAML-NEXT: - ProbeId: '4'
				;YAML-NEXT: - String: ', Factor='
				;YAML-NEXT: - Factor: '1.000000e+00'
				;YAML-NEXT: - String: ', OriginalSamples='
				;YAML-NEXT: - OriginalSamples: '13'
	;YAML-NEXT: - String: ')'			;YAML-NEXT: - String: ')'

llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll

This file was added.

				; RUN: opt < %s -passes='pseudo-probe,sample-profile,jump-threading,pseudo-probe-update' -sample-profile-file=%S/Inputs/pseudo-probe-update.prof -S \| FileCheck %s

				declare i32 @f1()
				declare i32 @f2()
				declare void @f3()


				;; This tests that the branch in 'merge' can be cloned up into T1.
				define i32 @foo(i1 %cond, i1 %cond2) #0 {
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1)
				br i1 %cond, label %T1, label %F1
				T1:
				; CHECK: %v1 = call i32 @f1(), !prof ![[#PROF1:]]
				%v1 = call i32 @f1()
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
				;; The distribution factor -8513881372706734080 stands for 53.85%, whic is from 7/6+7.
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -8513881372706734080)
				%cond3 = icmp eq i32 %v1, 412
				br label %Merge
				F1:
				; CHECK: %v2 = call i32 @f2(), !prof ![[#PROF2:]]
				%v2 = call i32 @f2()
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
				;; The distribution factor 8513881922462547968 stands for 46.25%, which is from 6/6+7.
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 8513881922462547968)
				br label %Merge
				Merge:

				%A = phi i1 [%cond3, %T1], [%cond2, %F1]
				%B = phi i32 [%v1, %T1], [%v2, %F1]
				br i1 %A, label %T2, label %F2
				T2:
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
				call void @f3()
				ret i32 %B
				F2:
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 6, i32 0, i64 -1)
				ret i32 %B
				}

				; CHECK: ![[#PROF1]] = !{!"branch_weights", i32 7}
				; CHECK: ![[#PROF2]] = !{!"branch_weights", i32 6}

				attributes #0 = {"use-sample-profile"}

llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll

This file was added.

				; REQUIRES: x86_64-linux
				; RUN: opt < %s -passes='pseudo-probe,loop-unroll-full' -verify-pseudo-probe -S -o %t 2>&1 \| FileCheck %s --check-prefix=VERIFY
				; RUN: FileCheck %s < %t

				; VERIFY: * Pseudo Probe Verification After LoopFullUnrollPass *
				; VERIFY: Function foo:
				; VERIFY: Probe 6 previous factor 1.00 current factor 5.00
				; VERIFY: Probe 4 previous factor 1.00 current factor 5.00

				declare void @foo2() nounwind

				define void @foo(i32 %x) {
				bb:
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 1, i32 0, i64 -1)
				%tmp = alloca [5 x i32*], align 16
				br label %bb7.preheader

				bb3.loopexit:
				%spec.select.lcssa = phi i32 [ %spec.select, %bb10 ]
				%tmp5.not = icmp eq i32 %spec.select.lcssa, 0
				br i1 %tmp5.not, label %bb24, label %bb7.preheader

				bb7.preheader:
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 3, i32 0, i64 -1)
				%tmp1.06 = phi i32 [ 5, %bb ], [ %spec.select.lcssa, %bb3.loopexit ]
				br label %bb10

				bb10:
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
				; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
				; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
				; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
				; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 4, i32 0, i64 -1)
				; CHECK: call void @foo2(), !dbg ![[#PROBE6:]]
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 2, i32 0, i64 -1)
				%indvars.iv = phi i64 [ 0, %bb7.preheader ], [ %indvars.iv.next, %bb10 ]
				%tmp1.14 = phi i32 [ %tmp1.06, %bb7.preheader ], [ %spec.select, %bb10 ]
				%tmp13 = getelementptr inbounds [5 x i32], [5 x i32]* %tmp, i64 0, i64 %indvars.iv
				%tmp14 = load i32, i32* %tmp13, align 8
				%tmp15.not = icmp ne i32* %tmp14, null
				%tmp18 = sext i1 %tmp15.not to i32
				%spec.select = add nsw i32 %tmp1.14, %tmp18
				call void @foo2(), !dbg !12
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, 5
				br i1 %exitcond.not, label %bb3.loopexit, label %bb10, !llvm.loop !13

				bb24:
				; CHECK: call void @llvm.pseudoprobe(i64 [[#GUID:]], i64 5, i32 0, i64 -1)
				ret void
				}

				;; A discriminator of 186646583 which is 0xb200037 in hexdecimal, stands for a direct call probe
				;; with an index of 6 and a scale of -1%.
				; CHECK: ![[#PROBE6]] = !DILocation(line: 2, column: 20, scope: ![[#SCOPE:]])
				; CHECK: ![[#SCOPE]] = !DILexicalBlockFile(scope: ![[#]], file: ![[#]], discriminator: 186646583)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!9, !10}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.9.0", isOptimized: false, runtimeVersion: 0, emissionKind: 1, enums: !2)
				!1 = !DIFile(filename: "test.c", directory: "")
				!2 = !{}
				!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 2, type: !5, isLocal: false, isDefinition: true, scopeLine: 2, isOptimized: false, unit: !0, retainedNodes: !2)
				!5 = !DISubroutineType(types: !6)
				!6 = !{!7}
				!7 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				!9 = !{i32 2, !"Dwarf Version", i32 4}
				!10 = !{i32 2, !"Debug Info Version", i32 3}
				!11 = !{!"clang version 3.9.0"}
				!12 = !DILocation(line: 2, column: 20, scope: !4)
				!13 = distinct !{!13, !14}
				!14 = !{!"llvm.loop.unroll.full"}

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Introducing distribution factor for pseudo probe.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316567

clang/test/CodeGen/pseudo-probe-emit.c

llvm/include/llvm/IR/IntrinsicInst.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/PseudoProbe.h

llvm/include/llvm/Passes/StandardInstrumentations.h

llvm/include/llvm/Transforms/IPO/SampleProfileProbe.h

llvm/lib/IR/PseudoProbe.cpp

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Passes/StandardInstrumentations.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/lib/Transforms/IPO/SampleProfileProbe.cpp

llvm/test/Transforms/SampleProfile/Inputs/pseudo-probe-update.prof

llvm/test/Transforms/SampleProfile/pseudo-probe-emit-inline.ll

llvm/test/Transforms/SampleProfile/pseudo-probe-emit.ll

llvm/test/Transforms/SampleProfile/pseudo-probe-profile.ll

llvm/test/Transforms/SampleProfile/pseudo-probe-update.ll

llvm/test/Transforms/SampleProfile/pseudo-probe-verify.ll

[CSSPGO] Introducing distribution factor for pseudo probe.
ClosedPublic