This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
1/2
DebugInfoMetadata.h
-
lib/
-
CodeGen/
-
AsmPrinter/
1
PseudoProbePrinter.cpp
10/26
MIRFSDiscriminator.cpp
-
MIRSampleProfile.cpp
-
PseudoProbeInserter.cpp
-
Transforms/IPO/
-
IPO/
-
SampleProfileProbe.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
Inputs/
-
fsloader_v1.afdo
1/2
fsafdo_test1.ll
-
fsafdo_test2.ll
-
fsafdo_test3.ll
-
fsafdo_test4.ll

Differential D145171

[FSAFDO] Improve FS discriminator encoding
ClosedPublic

Authored by xur on Mar 2 2023, 10:11 AM.

Download Raw Diff

Details

Reviewers

hoy
wenlei
shenhan

Commits

rGebe09e2a9556: [FSAFDO] Improve FS discriminator encoding

Summary

This change improves FS discriminators in the following ways:
(1) use call-stack debug information in the discriminator hash:
the same (src/line) DILs can now have different hash if they
come from different call-stacks. This reduces the hash conflicts.
(2) don't generate the FS discriminator for meta instructions
(i.e. instructions not emitted). This reduces the number
discriminators conflicts (for the case we run out of discriminator
bits for that pass).
(3) use less expensive hashing of xxHash64.

These improvements should bring better performance for FSAFDO
and they should be used by default. But this change creates
incompatible FS discriminators. For the iterative profile users,
they might see an performance drop in the first release with
this change (due to the fact that the profiles have the old
discriminators and the compiler uses the new discriminator).
We have measured that this is not more than 1.5% on several
benchmarks. Note the degradation should be gone in the second
release and one should expect a performance gain over the binary
without this change.

One possible solution to the iterative profile issue would be
separating discriminators for profile-use and the ones emitted to
the binary. This would require a mechanism to allow two sets of
discriminators to be maintained and then phasing out the first
approach. This is too much churn in the compiler and the
performance implications do not seem to be worth the effort.

Instead, we put the changes under an option so iterative profile
users can do a gradual rollout of this change. I will make the
option default value to true in a later patch and eventually
purge this option from the code base.

Diff Detail

Unit TestsFailed

	Time	Test
	60,080 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

xur created this revision.Mar 2 2023, 10:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 10:11 AM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

xur requested review of this revision.Mar 2 2023, 10:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 10:11 AM

Harbormaster completed remote builds in B217003: Diff 501911.Mar 2 2023, 11:07 AM

Thanks for the change! I'm going to start the integration with pseudo probe from here.

llvm/lib/CodeGen/MIRFSDiscriminator.cpp
64	BB could have different names between the release and the debug compiler, or using or not using `-fdiscard-value-names`. This may lead to non-determinism in computing FS discriminators. Maybe consider using an integer id for BB? It doesn't exist today though.
150	Should zero line number also get a discriminator? This may result in invalid line offsets with which all such instructions will end up sharing the same sample.
156	Wondering if `DIL->getFilename()` is still needed since `CallStackHashVal` includes the caller linkage names which should help avoid hash conflicts.
167	Wondering if `BBSizeHash` is stable enough run-to-run. If consecutive builds have slight change in block size, their discriminators may not match? Including callsite hash `LocationDiscriminator` sounds a great improvement. I'm wondering how helpful it is to include `BBSizeHash` in the discriminator encoding. Have you evaluated the two changes separately?

xur added inline comments.Mar 2 2023, 4:12 PM

llvm/lib/CodeGen/MIRFSDiscriminator.cpp
64	You are exactly right! Note this is old version of Hash that will be deprecated. The new version will not use the name anymore.
150	Zero line numbers do not get a discriminator. The number of zero line number instructions is actually pretty big. They can easily overflow the discriminators. We have another changes to deal with zero line number and instructions with empty DIL. They are still ongoing.
156	CallStackHashValue is for DIL with getInlinedAt(). If this is not inlined instrution. CallStackHashValue returns 0.
167	We have tried a few other ways for the Hash. It seems that it's better to be more strict on the match -- if it's mistach, we will not use the pass specific counters, but we still use the average count (branch portability) from previous rounds. This seems to be better than applying wrong BB weights. I don't have the data for with and without BBSizeHash. I can a quick run to see if it has performance impact.

hoy added inline comments.Mar 2 2023, 4:46 PM

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
310 ↗	(On Diff #501911)	How about making this function virtual and override it in MIR sample loader? Something like ErrorOr<uint64_t> getInstWeightImpl(const MachineInstr &Inst) { if (ImprovedFSDiscriminator && Inst.isMetaInstruction()) return std::error_code(); return SampleProfileLoaderBaseImpl<MachineBasicBlock>::getInstWeightImpl( Inst); }
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
64	I see. It's only used in the old version.
82	`getLinkageName` may return an empty string C functions. We often use a trick to work it around: // Use linkage name for C++ if possible. auto Name = SP->getLinkageName(); if (Name.empty()) Name = SP->getName(); Actually there is a similar function `getCallStackHash` in `SampleProfileProbe.cpp`. Do you think it's possible to unify them and place it in a command file like `DebubInfo.h`?
113	Is this fixing a bug of the old encoding? BTW, should we assert the current pass is always not the base pass or LowBit is never zero?
150	Good to know you are working on a fix.
156	Thanks for pointing it out.

hoy added inline comments.Mar 2 2023, 5:14 PM

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
310 ↗	(On Diff #501911)	Or maybe just override the existing virtual function `getInstWeight`, just like `SampleProfileLoader::getInstWeight`

Thanks Hongtao for the reviews and comments. I'll update the patch shortly.

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
310 ↗	(On Diff #501911)	I can do that. This way we don't need shouldIgnoreInst() function. I have this shouldIgnoreInst() because there are types of instructions in SamplePorfiles try to ignore too. I want to have a unify interface for that. But this seems to be an overkill.
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
82	Good to know this. Have see seen cases that empty string leads to hash conflicts? Will getName() also return empty string? We can add another interface in DISubprogram? in DebuginfoMetadata.h?
113	This is a bug fix. I notice that issue for a while but I did fix it because it changes of some discriminators. I will add an assert here.

hoy added inline comments.Mar 6 2023, 11:28 AM

llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
310 ↗	(On Diff #501911)	Thanks.
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
82	I haven't taken a deep look into whether empty strings can lead to hash conflicts for this case actually. `getName()` will always return the original demangled function name, and for C, it is just the linkage name. A new interface in DebuginfoMetadata.h sounds good. Perhaps a new member function for `DILocation`?
167	This seems to be better than applying wrong BB weights. This makes sense. I don't have the data for with and without BBSizeHash. I can a quick run to see if it has performance impact. Yeah, curious to see how impactful it is. Thanks.
llvm/test/CodeGen/X86/fsafdo_test1.ll
5	I have a question about keeping the original discriminator, i.e, 2 here. IIUC, the MIR sample loader will skip loading samples for the instruction. Do you think it should get a new discriminator so that it can use pass specific counters too? Let me know if I miss anything.

xur added inline comments.Mar 6 2023, 1:55 PM

llvm/test/CodeGen/X86/fsafdo_test1.ll
5	No. It will be loaded in MIR samples profile. 2 will be bit masked and the counter will be contributed to version 0 (i.e. discriminator value of 0).

Integrated Hongtao's review suggestion.

Herald added a subscriber: ormris. · View Herald TranscriptMar 6 2023, 5:14 PM

Harbormaster completed remote builds in B217758: Diff 502867.Mar 6 2023, 9:47 PM

This reduces the hash conflicts.

Curious how did you check/detect conflicts/collisions?

one should expect a performance gain over the binary without this change.

How big is the gain you saw?

llvm/lib/CodeGen/MIRFSDiscriminator.cpp
67	same here, converting to string first seem unnecessary.
77	MD5 as cryptographic hash is expensive, `xxHash64` should be good enough in terms of distribution and collision avoidance. Now that we're changing discriminator algorithm, wondering if we should take the opportunity to move to xxhash for the new version. (For compilation with compact-binary profile where MD5 is used a lot, MD5 shows up quite hot in perf profile)
81	If collision is a concern and we're working to reduce collision, `^=` is a weak blend/combine function (symmetric among other problems.) Something like this below is a safer combine function. We have a number of similar `hashCombine` in LLVM code base. inline int64_t hashCombine(const int64_t Seed, const int64_t Val) { std::hash<int64_t> Hasher; return Seed ^ (Hasher(Val) + 0x9e3779b9 + (Seed << 6) + (Seed >> 2)); }
139	converting int to string just to get an hash seems like an overkill. `xxHash64(ArrayRef<uint8_t> Data)` should be fast and good enough.

hoy added inline comments.Mar 7 2023, 9:10 AM

llvm/include/llvm/IR/DebugInfoMetadata.h
1924	nit: name it `getSubprogramLinkageName`?
llvm/lib/CodeGen/AsmPrinter/PseudoProbePrinter.cpp
35	Thanks for fixing this!

In D145171#4174356, @wenlei wrote:

This reduces the hash conflicts.

Curious how did you check/detect conflicts/collisions?

Most from eyeballing the change -- I manually look at some functions.
The other indicator was the number of discriminators created in some lines in templates header. For some line, we overflow the bits.

one should expect a performance gain over the binary without this change.

How big is the gain you saw?

We measured this for a variety of programs -- we are seeing improvement range from 0.7% to 2.0% on top of current FSAFDO.
The gain usually is higher for iterative profiles. (This is with the BBSize in the hash).

llvm/include/llvm/IR/DebugInfoMetadata.h
1924	This is a better name. I will change to this.
llvm/lib/CodeGen/MIRFSDiscriminator.cpp
67	Will switch to xxHash64.
77	This is a good idea. I think this is a good opportunity to switch to less expensive hash algorithm.
139	Got it. I will probably remove BBSize from the hash.

The performance for removing BBSize from discriminator hash is out: it shows a slightly gain on performance, like ~0.2% vs with BBsize in the hash. I will remove BBSize from the hash.

Integrated review comments/suggestions from Wenlei and Hongtao:
(1) remove BBSize from hash
(2) use less expensive hash function
(3) better names for getSubprogramLinkageName()

In D145171#4179269, @xur wrote:

The performance for removing BBSize from discriminator hash is out: it shows a slightly gain on performance, like ~0.2% vs with BBsize in the hash. I will remove BBSize from the hash.

Good to know removing it helps perf. Can you please update the summary as well? LGTM otherwise.

llvm/lib/CodeGen/MIRFSDiscriminator.cpp
167	The callsite hash is being removed from the discriminator for V1. I guess the discriminator conflicts will be way less than V0?

This revision is now accepted and ready to land.Mar 8 2023, 3:58 PM

Harbormaster completed remote builds in B218220: Diff 503527.Mar 8 2023, 4:35 PM

In D145171#4179258, @xur wrote:

In D145171#4174356, @wenlei wrote:

This reduces the hash conflicts.

Curious how did you check/detect conflicts/collisions?

Most from eyeballing the change -- I manually look at some functions.
The other indicator was the number of discriminators created in some lines in templates header. For some line, we overflow the bits.

I'm wondering if we can have a way to systemically detect and report such cases -- that should make us aware when this is happening which can throttle perf.

one should expect a performance gain over the binary without this change.

How big is the gain you saw?

We measured this for a variety of programs -- we are seeing improvement range from 0.7% to 2.0% on top of current FSAFDO.
The gain usually is higher for iterative profiles. (This is with the BBSize in the hash).

That's quite promising. Just to double check, 0.7%-2.0% was the additional improvement from this patch alone, right? What's the total perf improvement from FSAFDO after this change that you saw?
We will measure this internally too.

The change looks good, thanks!

In D145171#4179636, @hoy wrote:

In D145171#4179269, @xur wrote:

The performance for removing BBSize from discriminator hash is out: it shows a slightly gain on performance, like ~0.2% vs with BBsize in the hash. I will remove BBSize from the hash.

Good to know removing it helps perf. Can you please update the summary as well? LGTM otherwise.

Definitely. I will update the summary and the check-in message.

In D145171#4179924, @wenlei wrote:

In D145171#4179258, @xur wrote:

In D145171#4174356, @wenlei wrote:

This reduces the hash conflicts.

Curious how did you check/detect conflicts/collisions?

Most from eyeballing the change -- I manually look at some functions.
The other indicator was the number of discriminators created in some lines in templates header. For some line, we overflow the bits.

I'm wondering if we can have a way to systemically detect and report such cases -- that should make us aware when this is happening which can throttle perf.

I have another follow-up patch that will track and report the match. It does not track the conflict. But let me think about it to add this support.

one should expect a performance gain over the binary without this change.

How big is the gain you saw?

We measured this for a variety of programs -- we are seeing improvement range from 0.7% to 2.0% on top of current FSAFDO.
The gain usually is higher for iterative profiles. (This is with the BBSize in the hash).

That's quite promising. Just to double check, 0.7%-2.0% was the additional improvement from this patch alone, right? What's the total perf improvement from FSAFDO after this change that you saw?
We will measure this internally too.

The total improvement really depends on programs. On average, I would say 1.5% to 2.0%. For some programs, like clang itself, FSAFDO improve ~3% over AFDO.
There is also a small change on the create_llvm_prof tool side. We will update the tool soon.

I'm also looking forward to hearing the performance number on your tests.

The change looks good, thanks!

llvm/lib/CodeGen/MIRFSDiscriminator.cpp
167	Yes, the callsite hash is not in the discriminator hash for V1 -- the callsite hash is now the part of the key for the map. We now can have discriminators with the same value but they will not conflict with each other as they belong to different maps. This indirectly increases the range of the discriminators and thus results in less conflicts.

xur edited the summary of this revision. (Show Details)Mar 9 2023, 10:08 AM

In D145171#4182014, @xur wrote:

The total improvement really depends on programs. On average, I would say 1.5% to 2.0%. For some programs, like clang itself, FSAFDO improve ~3% over AFDO.
There is also a small change on the create_llvm_prof tool side. We will update the tool soon.

The numbers are exciting! I'll let you know our numbers once ready.

BTW, I'll started the integration with CSSPGO base off here. Pseudo probes have a potential to allow more bits for the FS discriminator and additional FS passes. Implementation-wise, the potential comes from placing the FS discriminator field as a separate int64 operand of the pseudo probe intrinsic, instead of using the existing dbg metadata. I would like your thoughts about whether it's worth the diversion. Thanks.

Closed by commit rGebe09e2a9556: [FSAFDO] Improve FS discriminator encoding (authored by xur). · Explain WhyMar 9 2023, 11:19 PM

This revision was automatically updated to reflect the committed changes.

xur added a commit: rGebe09e2a9556: [FSAFDO] Improve FS discriminator encoding.

There is also a small change on the create_llvm_prof tool side. We will update the tool soon.

What kind of change is that about? Appreciate if you could drop us a note when the change is up since we don't closely follow create_llvm_prof updates.

BTW, I'll started the integration with CSSPGO base off here. Pseudo probes have a potential to allow more bits for the FS discriminator and additional FS passes. Implementation-wise, the potential comes from placing the FS discriminator field as a separate int64 operand of the pseudo probe intrinsic, instead of using the existing dbg metadata. I would like your thoughts about whether it's worth the diversion. Thanks.

All thing being equal, I'd favor consistency. But there's potential benefit by allowing more FS profile loading, though whether that is material depends on whether we will actually leverage the ability to load more than 4 profiles. My intuition is that loading base profile + pre-RA profile + pre-layout profile as the way it is today is probably good enough.

shenhan mentioned this in D152399: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO. .Jun 7 2023, 2:44 PM

shenhan mentioned this in rG8df75969ae70: [CodeGen] Fine tune MachineFunctionSplitPass (MFS) for FSAFDO..Jul 10 2023, 4:02 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

DebugInfoMetadata.h

508 lines

lib/

CodeGen/

AsmPrinter/

PseudoProbePrinter.cpp

6 lines

MIRFSDiscriminator.cpp

71 lines

MIRSampleProfile.cpp

6 lines

PseudoProbeInserter.cpp

5 lines

Transforms/

IPO/

SampleProfileProbe.cpp

6 lines

test/

CodeGen/

X86/

Inputs/

35 lines

9 lines

45 lines

14 lines

7 lines

Diff 502867

llvm/include/llvm/IR/DebugInfoMetadata.h

Show First 20 Lines • Show All 1,593 Lines • ▼ Show 20 Lines	public:

static bool classof(const Metadata *MD) {		static bool classof(const Metadata *MD) {
return MD->getMetadataID() == DISubprogramKind \|\|		return MD->getMetadataID() == DISubprogramKind \|\|
MD->getMetadataID() == DILexicalBlockKind \|\|		MD->getMetadataID() == DILexicalBlockKind \|\|
MD->getMetadataID() == DILexicalBlockFileKind;		MD->getMetadataID() == DILexicalBlockFileKind;
}		}
};		};

/// Debug location.
///
/// A debug location in source code, used for debug info and otherwise.
class DILocation : public MDNode {
friend class LLVMContextImpl;
friend class MDNode;

DILocation(LLVMContext &C, StorageType Storage, unsigned Line,
unsigned Column, ArrayRef<Metadata *> MDs, bool ImplicitCode);
~DILocation() { dropAllReferences(); }

static DILocation *getImpl(LLVMContext &Context, unsigned Line,
unsigned Column, Metadata *Scope,
Metadata *InlinedAt, bool ImplicitCode,
StorageType Storage, bool ShouldCreate = true);
static DILocation *getImpl(LLVMContext &Context, unsigned Line,
unsigned Column, DILocalScope *Scope,
DILocation *InlinedAt, bool ImplicitCode,
StorageType Storage, bool ShouldCreate = true) {
return getImpl(Context, Line, Column, static_cast<Metadata *>(Scope),
static_cast<Metadata *>(InlinedAt), ImplicitCode, Storage,
ShouldCreate);
}

TempDILocation cloneImpl() const {
// Get the raw scope/inlinedAt since it is possible to invoke this on
// a DILocation containing temporary metadata.
return getTemporary(getContext(), getLine(), getColumn(), getRawScope(),
getRawInlinedAt(), isImplicitCode());
}

public:
// Disallow replacing operands.
void replaceOperandWith(unsigned I, Metadata *New) = delete;

DEFINE_MDNODE_GET(DILocation,
(unsigned Line, unsigned Column, Metadata *Scope,
Metadata *InlinedAt = nullptr, bool ImplicitCode = false),
(Line, Column, Scope, InlinedAt, ImplicitCode))
DEFINE_MDNODE_GET(DILocation,
(unsigned Line, unsigned Column, DILocalScope *Scope,
DILocation *InlinedAt = nullptr,
bool ImplicitCode = false),
(Line, Column, Scope, InlinedAt, ImplicitCode))

/// Return a (temporary) clone of this.
TempDILocation clone() const { return cloneImpl(); }

unsigned getLine() const { return SubclassData32; }
unsigned getColumn() const { return SubclassData16; }
DILocalScope *getScope() const { return cast<DILocalScope>(getRawScope()); }

DILocation *getInlinedAt() const {
return cast_or_null<DILocation>(getRawInlinedAt());
}

/// Check if the location corresponds to an implicit code.
/// When the ImplicitCode flag is true, it means that the Instruction
/// with this DILocation has been added by the front-end but it hasn't been
/// written explicitly by the user (e.g. cleanup stuff in C++ put on a closing
/// bracket). It's useful for code coverage to not show a counter on "empty"
/// lines.
bool isImplicitCode() const { return SubclassData1; }
void setImplicitCode(bool ImplicitCode) { SubclassData1 = ImplicitCode; }

DIFile *getFile() const { return getScope()->getFile(); }
StringRef getFilename() const { return getScope()->getFilename(); }
StringRef getDirectory() const { return getScope()->getDirectory(); }
std::optional<StringRef> getSource() const { return getScope()->getSource(); }

/// Get the scope where this is inlined.
///
/// Walk through \a getInlinedAt() and return \a getScope() from the deepest
/// location.
DILocalScope *getInlinedAtScope() const {
if (auto *IA = getInlinedAt())
return IA->getInlinedAtScope();
return getScope();
}

/// Get the DWARF discriminator.
///
/// DWARF discriminators distinguish identical file locations between
/// instructions that are on different basic blocks.
///
/// There are 3 components stored in discriminator, from lower bits:
///
/// Base discriminator: assigned by AddDiscriminators pass to identify IRs
/// that are defined by the same source line, but
/// different basic blocks.
/// Duplication factor: assigned by optimizations that will scale down
/// the execution frequency of the original IR.
/// Copy Identifier: assigned by optimizations that clones the IR.
/// Each copy of the IR will be assigned an identifier.
///
/// Encoding:
///
/// The above 3 components are encoded into a 32bit unsigned integer in
/// order. If the lowest bit is 1, the current component is empty, and the
/// next component will start in the next bit. Otherwise, the current
/// component is non-empty, and its content starts in the next bit. The
/// value of each components is either 5 bit or 12 bit: if the 7th bit
/// is 0, the bit 2~6 (5 bits) are used to represent the component; if the
/// 7th bit is 1, the bit 2~6 (5 bits) and 8~14 (7 bits) are combined to
/// represent the component. Thus, the number of bits used for a component
/// is either 0 (if it and all the next components are empty); 1 - if it is
/// empty; 7 - if its value is up to and including 0x1f (lsb and msb are both
/// 0); or 14, if its value is up to and including 0x1ff. Note that the last
/// component is also capped at 0x1ff, even in the case when both first
/// components are 0, and we'd technically have 29 bits available.
///
/// For precise control over the data being encoded in the discriminator,
/// use encodeDiscriminator/decodeDiscriminator.

inline unsigned getDiscriminator() const;

// For the regular discriminator, it stands for all empty components if all
// the lowest 3 bits are non-zero and all higher 29 bits are unused(zero by
// default). Here we fully leverage the higher 29 bits for pseudo probe use.
// This is the format:
// [2:0] - 0x7
// [31:3] - pseudo probe fields guaranteed to be non-zero as a whole
// So if the lower 3 bits is non-zero and the others has at least one
// non-zero bit, it guarantees to be a pseudo probe discriminator
inline static bool isPseudoProbeDiscriminator(unsigned Discriminator) {
return ((Discriminator & 0x7) == 0x7) && (Discriminator & 0xFFFFFFF8);
}

/// Returns a new DILocation with updated \p Discriminator.
inline const DILocation *cloneWithDiscriminator(unsigned Discriminator) const;

/// Returns a new DILocation with updated base discriminator \p BD. Only the
/// base discriminator is set in the new DILocation, the other encoded values
/// are elided.
/// If the discriminator cannot be encoded, the function returns std::nullopt.
inline std::optional<const DILocation *>
cloneWithBaseDiscriminator(unsigned BD) const;

/// Returns the duplication factor stored in the discriminator, or 1 if no
/// duplication factor (or 0) is encoded.
inline unsigned getDuplicationFactor() const;

/// Returns the copy identifier stored in the discriminator.
inline unsigned getCopyIdentifier() const;

/// Returns the base discriminator stored in the discriminator.
inline unsigned getBaseDiscriminator() const;

/// Returns a new DILocation with duplication factor \p DF * current
/// duplication factor encoded in the discriminator. The current duplication
/// factor is as defined by getDuplicationFactor().
/// Returns std::nullopt if encoding failed.
inline std::optional<const DILocation *>
cloneByMultiplyingDuplicationFactor(unsigned DF) const;

/// When two instructions are combined into a single instruction we also
/// need to combine the original locations into a single location.
/// When the locations are the same we can use either location.
/// When they differ, we need a third location which is distinct from either.
/// If they share a common scope, use this scope and compare the line/column
/// pair of the locations with the common scope:
/// * if both match, keep the line and column;
/// * if only the line number matches, keep the line and set the column as 0;
/// * otherwise set line and column as 0.
/// If they do not share a common scope the location is ambiguous and can't be
/// represented in a line entry. In this case, set line and column as 0 and
/// use the scope of any location.
///
/// \p LocA \p LocB: The locations to be merged.
static const DILocation getMergedLocation(const DILocation LocA,
const DILocation *LocB);

/// Try to combine the vector of locations passed as input in a single one.
/// This function applies getMergedLocation() repeatedly left-to-right.
///
/// \p Locs: The locations to be merged.
static const DILocation *
getMergedLocations(ArrayRef<const DILocation *> Locs);

/// Return the masked discriminator value for an input discrimnator value D
/// (i.e. zero out the (B+1)-th and above bits for D (B is 0-base).
// Example: an input of (0x1FF, 7) returns 0xFF.
static unsigned getMaskedDiscriminator(unsigned D, unsigned B) {
return (D & getN1Bits(B));
}

/// Return the bits used for base discriminators.
static unsigned getBaseDiscriminatorBits() { return getBaseFSBitEnd(); }

/// Returns the base discriminator for a given encoded discriminator \p D.
static unsigned
getBaseDiscriminatorFromDiscriminator(unsigned D,
bool IsFSDiscriminator = false) {
if (IsFSDiscriminator)
return getMaskedDiscriminator(D, getBaseDiscriminatorBits());
return getUnsignedFromPrefixEncoding(D);
}

/// Raw encoding of the discriminator. APIs such as cloneWithDuplicationFactor
/// have certain special case behavior (e.g. treating empty duplication factor
/// as the value '1').
/// This API, in conjunction with cloneWithDiscriminator, may be used to
/// encode the raw values provided.
///
/// \p BD: base discriminator
/// \p DF: duplication factor
/// \p CI: copy index
///
/// The return is std::nullopt if the values cannot be encoded in 32 bits -
/// for example, values for BD or DF larger than 12 bits. Otherwise, the
/// return is the encoded value.
static std::optional<unsigned> encodeDiscriminator(unsigned BD, unsigned DF,
unsigned CI);

/// Raw decoder for values in an encoded discriminator D.
static void decodeDiscriminator(unsigned D, unsigned &BD, unsigned &DF,
unsigned &CI);

/// Returns the duplication factor for a given encoded discriminator \p D, or
/// 1 if no value or 0 is encoded.
static unsigned getDuplicationFactorFromDiscriminator(unsigned D) {
if (EnableFSDiscriminator)
return 1;
D = getNextComponentInDiscriminator(D);
unsigned Ret = getUnsignedFromPrefixEncoding(D);
if (Ret == 0)
return 1;
return Ret;
}

/// Returns the copy identifier for a given encoded discriminator \p D.
static unsigned getCopyIdentifierFromDiscriminator(unsigned D) {
return getUnsignedFromPrefixEncoding(
getNextComponentInDiscriminator(getNextComponentInDiscriminator(D)));
}

Metadata *getRawScope() const { return getOperand(0); }
Metadata *getRawInlinedAt() const {
if (getNumOperands() == 2)
return getOperand(1);
return nullptr;
}

static bool classof(const Metadata *MD) {
return MD->getMetadataID() == DILocationKind;
}
};

/// Subprogram description.		/// Subprogram description.
class DISubprogram : public DILocalScope {		class DISubprogram : public DILocalScope {
friend class LLVMContextImpl;		friend class LLVMContextImpl;
friend class MDNode;		friend class MDNode;

unsigned Line;		unsigned Line;
unsigned ScopeLine;		unsigned ScopeLine;
unsigned VirtualIndex;		unsigned VirtualIndex;
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	public:
/// FIXME: Should this be looking through bitcasts?		/// FIXME: Should this be looking through bitcasts?
bool describes(const Function *F) const;		bool describes(const Function *F) const;

static bool classof(const Metadata *MD) {		static bool classof(const Metadata *MD) {
return MD->getMetadataID() == DISubprogramKind;		return MD->getMetadataID() == DISubprogramKind;
}		}
};		};

		/// Debug location.
		///
		/// A debug location in source code, used for debug info and otherwise.
		class DILocation : public MDNode {
		friend class LLVMContextImpl;
		friend class MDNode;

		DILocation(LLVMContext &C, StorageType Storage, unsigned Line,
		unsigned Column, ArrayRef<Metadata *> MDs, bool ImplicitCode);
		~DILocation() { dropAllReferences(); }

		static DILocation *getImpl(LLVMContext &Context, unsigned Line,
		unsigned Column, Metadata *Scope,
		Metadata *InlinedAt, bool ImplicitCode,
		StorageType Storage, bool ShouldCreate = true);
		static DILocation *getImpl(LLVMContext &Context, unsigned Line,
		unsigned Column, DILocalScope *Scope,
		DILocation *InlinedAt, bool ImplicitCode,
		StorageType Storage, bool ShouldCreate = true) {
		return getImpl(Context, Line, Column, static_cast<Metadata *>(Scope),
		static_cast<Metadata *>(InlinedAt), ImplicitCode, Storage,
		ShouldCreate);
		}

		TempDILocation cloneImpl() const {
		// Get the raw scope/inlinedAt since it is possible to invoke this on
		// a DILocation containing temporary metadata.
		return getTemporary(getContext(), getLine(), getColumn(), getRawScope(),
		getRawInlinedAt(), isImplicitCode());
		}

		public:
		// Disallow replacing operands.
		void replaceOperandWith(unsigned I, Metadata *New) = delete;

		DEFINE_MDNODE_GET(DILocation,
		(unsigned Line, unsigned Column, Metadata *Scope,
		Metadata *InlinedAt = nullptr, bool ImplicitCode = false),
		(Line, Column, Scope, InlinedAt, ImplicitCode))
		DEFINE_MDNODE_GET(DILocation,
		(unsigned Line, unsigned Column, DILocalScope *Scope,
		DILocation *InlinedAt = nullptr,
		bool ImplicitCode = false),
		(Line, Column, Scope, InlinedAt, ImplicitCode))

		/// Return a (temporary) clone of this.
		TempDILocation clone() const { return cloneImpl(); }

		unsigned getLine() const { return SubclassData32; }
		unsigned getColumn() const { return SubclassData16; }
		DILocalScope *getScope() const { return cast<DILocalScope>(getRawScope()); }

		/// Return the linkage name of Subprogram. If the linkage name is empty,
		/// return scope name (the demangled name).
		const StringRef getName() const {
		hoyUnsubmitted Not Done Reply Inline Actions nit: name it `getSubprogramLinkageName`? hoy: nit: name it `getSubprogramLinkageName`?
		xurAuthorUnsubmitted Done Reply Inline Actions This is a better name. I will change to this. xur: This is a better name. I will change to this.
		DISubprogram *SP = getScope()->getSubprogram();
		if (!SP)
		return "";
		auto Name = SP->getLinkageName();
		if (!Name.empty())
		return Name;
		return SP->getName();
		}

		DILocation *getInlinedAt() const {
		return cast_or_null<DILocation>(getRawInlinedAt());
		}

		/// Check if the location corresponds to an implicit code.
		/// When the ImplicitCode flag is true, it means that the Instruction
		/// with this DILocation has been added by the front-end but it hasn't been
		/// written explicitly by the user (e.g. cleanup stuff in C++ put on a closing
		/// bracket). It's useful for code coverage to not show a counter on "empty"
		/// lines.
		bool isImplicitCode() const { return SubclassData1; }
		void setImplicitCode(bool ImplicitCode) { SubclassData1 = ImplicitCode; }

		DIFile *getFile() const { return getScope()->getFile(); }
		StringRef getFilename() const { return getScope()->getFilename(); }
		StringRef getDirectory() const { return getScope()->getDirectory(); }
		std::optional<StringRef> getSource() const { return getScope()->getSource(); }

		/// Get the scope where this is inlined.
		///
		/// Walk through \a getInlinedAt() and return \a getScope() from the deepest
		/// location.
		DILocalScope *getInlinedAtScope() const {
		if (auto *IA = getInlinedAt())
		return IA->getInlinedAtScope();
		return getScope();
		}

		/// Get the DWARF discriminator.
		///
		/// DWARF discriminators distinguish identical file locations between
		/// instructions that are on different basic blocks.
		///
		/// There are 3 components stored in discriminator, from lower bits:
		///
		/// Base discriminator: assigned by AddDiscriminators pass to identify IRs
		/// that are defined by the same source line, but
		/// different basic blocks.
		/// Duplication factor: assigned by optimizations that will scale down
		/// the execution frequency of the original IR.
		/// Copy Identifier: assigned by optimizations that clones the IR.
		/// Each copy of the IR will be assigned an identifier.
		///
		/// Encoding:
		///
		/// The above 3 components are encoded into a 32bit unsigned integer in
		/// order. If the lowest bit is 1, the current component is empty, and the
		/// next component will start in the next bit. Otherwise, the current
		/// component is non-empty, and its content starts in the next bit. The
		/// value of each components is either 5 bit or 12 bit: if the 7th bit
		/// is 0, the bit 2~6 (5 bits) are used to represent the component; if the
		/// 7th bit is 1, the bit 2~6 (5 bits) and 8~14 (7 bits) are combined to
		/// represent the component. Thus, the number of bits used for a component
		/// is either 0 (if it and all the next components are empty); 1 - if it is
		/// empty; 7 - if its value is up to and including 0x1f (lsb and msb are both
		/// 0); or 14, if its value is up to and including 0x1ff. Note that the last
		/// component is also capped at 0x1ff, even in the case when both first
		/// components are 0, and we'd technically have 29 bits available.
		///
		/// For precise control over the data being encoded in the discriminator,
		/// use encodeDiscriminator/decodeDiscriminator.

		inline unsigned getDiscriminator() const;

		// For the regular discriminator, it stands for all empty components if all
		// the lowest 3 bits are non-zero and all higher 29 bits are unused(zero by
		// default). Here we fully leverage the higher 29 bits for pseudo probe use.
		// This is the format:
		// [2:0] - 0x7
		// [31:3] - pseudo probe fields guaranteed to be non-zero as a whole
		// So if the lower 3 bits is non-zero and the others has at least one
		// non-zero bit, it guarantees to be a pseudo probe discriminator
		inline static bool isPseudoProbeDiscriminator(unsigned Discriminator) {
		return ((Discriminator & 0x7) == 0x7) && (Discriminator & 0xFFFFFFF8);
		}

		/// Returns a new DILocation with updated \p Discriminator.
		inline const DILocation *cloneWithDiscriminator(unsigned Discriminator) const;

		/// Returns a new DILocation with updated base discriminator \p BD. Only the
		/// base discriminator is set in the new DILocation, the other encoded values
		/// are elided.
		/// If the discriminator cannot be encoded, the function returns std::nullopt.
		inline std::optional<const DILocation *>
		cloneWithBaseDiscriminator(unsigned BD) const;

		/// Returns the duplication factor stored in the discriminator, or 1 if no
		/// duplication factor (or 0) is encoded.
		inline unsigned getDuplicationFactor() const;

		/// Returns the copy identifier stored in the discriminator.
		inline unsigned getCopyIdentifier() const;

		/// Returns the base discriminator stored in the discriminator.
		inline unsigned getBaseDiscriminator() const;

		/// Returns a new DILocation with duplication factor \p DF * current
		/// duplication factor encoded in the discriminator. The current duplication
		/// factor is as defined by getDuplicationFactor().
		/// Returns std::nullopt if encoding failed.
		inline std::optional<const DILocation *>
		cloneByMultiplyingDuplicationFactor(unsigned DF) const;

		/// When two instructions are combined into a single instruction we also
		/// need to combine the original locations into a single location.
		/// When the locations are the same we can use either location.
		/// When they differ, we need a third location which is distinct from either.
		/// If they share a common scope, use this scope and compare the line/column
		/// pair of the locations with the common scope:
		/// * if both match, keep the line and column;
		/// * if only the line number matches, keep the line and set the column as 0;
		/// * otherwise set line and column as 0.
		/// If they do not share a common scope the location is ambiguous and can't be
		/// represented in a line entry. In this case, set line and column as 0 and
		/// use the scope of any location.
		///
		/// \p LocA \p LocB: The locations to be merged.
		static const DILocation getMergedLocation(const DILocation LocA,
		const DILocation *LocB);

		/// Try to combine the vector of locations passed as input in a single one.
		/// This function applies getMergedLocation() repeatedly left-to-right.
		///
		/// \p Locs: The locations to be merged.
		static const DILocation *
		getMergedLocations(ArrayRef<const DILocation *> Locs);

		/// Return the masked discriminator value for an input discrimnator value D
		/// (i.e. zero out the (B+1)-th and above bits for D (B is 0-base).
		// Example: an input of (0x1FF, 7) returns 0xFF.
		static unsigned getMaskedDiscriminator(unsigned D, unsigned B) {
		return (D & getN1Bits(B));
		}

		/// Return the bits used for base discriminators.
		static unsigned getBaseDiscriminatorBits() { return getBaseFSBitEnd(); }

		/// Returns the base discriminator for a given encoded discriminator \p D.
		static unsigned
		getBaseDiscriminatorFromDiscriminator(unsigned D,
		bool IsFSDiscriminator = false) {
		if (IsFSDiscriminator)
		return getMaskedDiscriminator(D, getBaseDiscriminatorBits());
		return getUnsignedFromPrefixEncoding(D);
		}

		/// Raw encoding of the discriminator. APIs such as cloneWithDuplicationFactor
		/// have certain special case behavior (e.g. treating empty duplication factor
		/// as the value '1').
		/// This API, in conjunction with cloneWithDiscriminator, may be used to
		/// encode the raw values provided.
		///
		/// \p BD: base discriminator
		/// \p DF: duplication factor
		/// \p CI: copy index
		///
		/// The return is std::nullopt if the values cannot be encoded in 32 bits -
		/// for example, values for BD or DF larger than 12 bits. Otherwise, the
		/// return is the encoded value.
		static std::optional<unsigned> encodeDiscriminator(unsigned BD, unsigned DF,
		unsigned CI);

		/// Raw decoder for values in an encoded discriminator D.
		static void decodeDiscriminator(unsigned D, unsigned &BD, unsigned &DF,
		unsigned &CI);

		/// Returns the duplication factor for a given encoded discriminator \p D, or
		/// 1 if no value or 0 is encoded.
		static unsigned getDuplicationFactorFromDiscriminator(unsigned D) {
		if (EnableFSDiscriminator)
		return 1;
		D = getNextComponentInDiscriminator(D);
		unsigned Ret = getUnsignedFromPrefixEncoding(D);
		if (Ret == 0)
		return 1;
		return Ret;
		}

		/// Returns the copy identifier for a given encoded discriminator \p D.
		static unsigned getCopyIdentifierFromDiscriminator(unsigned D) {
		return getUnsignedFromPrefixEncoding(
		getNextComponentInDiscriminator(getNextComponentInDiscriminator(D)));
		}

		Metadata *getRawScope() const { return getOperand(0); }
		Metadata *getRawInlinedAt() const {
		if (getNumOperands() == 2)
		return getOperand(1);
		return nullptr;
		}

		static bool classof(const Metadata *MD) {
		return MD->getMetadataID() == DILocationKind;
		}
		};

class DILexicalBlockBase : public DILocalScope {		class DILexicalBlockBase : public DILocalScope {
protected:		protected:
DILexicalBlockBase(LLVMContext &C, unsigned ID, StorageType Storage,		DILexicalBlockBase(LLVMContext &C, unsigned ID, StorageType Storage,
ArrayRef<Metadata *> Ops);		ArrayRef<Metadata *> Ops);
~DILexicalBlockBase() = default;		~DILexicalBlockBase() = default;

public:		public:
DILocalScope *getScope() const { return cast<DILocalScope>(getRawScope()); }		DILocalScope *getScope() const { return cast<DILocalScope>(getRawScope()); }
▲ Show 20 Lines • Show All 1,689 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/PseudoProbePrinter.cpp

Show All 26 Lines	void PseudoProbeHandler::emitPseudoProbe(uint64_t Guid, uint64_t Index,
const DILocation *DebugLoc) {		const DILocation *DebugLoc) {
// Gather all the inlined-at nodes.		// Gather all the inlined-at nodes.
// When it's done ReversedInlineStack looks like ([66, B], [88, A])		// When it's done ReversedInlineStack looks like ([66, B], [88, A])
// which means, Function A inlines function B at calliste with a probe id 88,		// which means, Function A inlines function B at calliste with a probe id 88,
// and B inlines C at probe 66 where C is represented by Guid.		// and B inlines C at probe 66 where C is represented by Guid.
SmallVector<InlineSite, 8> ReversedInlineStack;		SmallVector<InlineSite, 8> ReversedInlineStack;
auto *InlinedAt = DebugLoc ? DebugLoc->getInlinedAt() : nullptr;		auto *InlinedAt = DebugLoc ? DebugLoc->getInlinedAt() : nullptr;
while (InlinedAt) {		while (InlinedAt) {
const DISubprogram *SP = InlinedAt->getScope()->getSubprogram();		auto Name = InlinedAt->getName();
		hoyUnsubmitted Not Done Reply Inline Actions Thanks for fixing this! hoy: Thanks for fixing this!
// Use linkage name for C++ if possible.
auto Name = SP->getLinkageName();
if (Name.empty())
Name = SP->getName();
// Use caching to avoid redundant md5 computation for build speed.		// Use caching to avoid redundant md5 computation for build speed.
uint64_t &CallerGuid = NameGuidMap[Name];		uint64_t &CallerGuid = NameGuidMap[Name];
if (!CallerGuid)		if (!CallerGuid)
CallerGuid = Function::getGUID(Name);		CallerGuid = Function::getGUID(Name);
uint64_t CallerProbeId = PseudoProbeDwarfDiscriminator::extractProbeIndex(		uint64_t CallerProbeId = PseudoProbeDwarfDiscriminator::extractProbeIndex(
InlinedAt->getDiscriminator());		InlinedAt->getDiscriminator());
ReversedInlineStack.emplace_back(CallerGuid, CallerProbeId);		ReversedInlineStack.emplace_back(CallerGuid, CallerProbeId);
InlinedAt = InlinedAt->getInlinedAt();		InlinedAt = InlinedAt->getInlinedAt();
}		}

SmallVector<InlineSite, 8> InlineStack(llvm::reverse(ReversedInlineStack));		SmallVector<InlineSite, 8> InlineStack(llvm::reverse(ReversedInlineStack));
Asm->OutStreamer->emitPseudoProbe(Guid, Index, Type, Attr, InlineStack,		Asm->OutStreamer->emitPseudoProbe(Guid, Index, Type, Attr, InlineStack,
Asm->CurrentFnSym);		Asm->CurrentFnSym);
}		}

llvm/lib/CodeGen/MIRFSDiscriminator.cpp

	Show All 24 Lines
	#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"			#include "llvm/Transforms/Utils/SampleProfileLoaderBaseUtil.h"

	using namespace llvm;			using namespace llvm;
	using namespace sampleprof;			using namespace sampleprof;
	using namespace sampleprofutil;			using namespace sampleprofutil;

	#define DEBUG_TYPE "mirfs-discriminators"			#define DEBUG_TYPE "mirfs-discriminators"

				// TODO(xur): Remove this option and related code once we make true as the
				// default.
				cl::opt<bool> ImprovedFSDiscriminator(
				"improved-fs-discriminator", cl::Hidden, cl::init(false),
				cl::desc("New FS discriminators encoding (incompatible with the original "
				"encoding)"));

	char MIRAddFSDiscriminators::ID = 0;			char MIRAddFSDiscriminators::ID = 0;

	INITIALIZE_PASS(MIRAddFSDiscriminators, DEBUG_TYPE,			INITIALIZE_PASS(MIRAddFSDiscriminators, DEBUG_TYPE,
	"Add MIR Flow Sensitive Discriminators",			"Add MIR Flow Sensitive Discriminators",
	/* cfg = / false, / is_analysis = */ false)			/* cfg = / false, / is_analysis = */ false)

	char &llvm::MIRAddFSDiscriminatorsID = MIRAddFSDiscriminators::ID;			char &llvm::MIRAddFSDiscriminatorsID = MIRAddFSDiscriminators::ID;

	FunctionPass *llvm::createMIRAddFSDiscriminatorsPass(FSDiscriminatorPass P) {			FunctionPass *llvm::createMIRAddFSDiscriminatorsPass(FSDiscriminatorPass P) {
	return new MIRAddFSDiscriminators(P);			return new MIRAddFSDiscriminators(P);
	}			}

				// TODO(xur): Remove this once we switch to ImprovedFSDiscriminator.
	// Compute a hash value using debug line number, and the line numbers from the			// Compute a hash value using debug line number, and the line numbers from the
	// inline stack.			// inline stack.
	static uint64_t getCallStackHash(const MachineBasicBlock &BB,			static uint64_t getCallStackHashV0(const MachineBasicBlock &BB,
	const MachineInstr &MI,			const MachineInstr &MI,
	const DILocation *DIL) {			const DILocation *DIL) {
	auto updateHash = [](const StringRef &Str) -> uint64_t {			auto updateHash = [](const StringRef &Str) -> uint64_t {
	if (Str.empty())			if (Str.empty())
	return 0;			return 0;
	return MD5Hash(Str);			return MD5Hash(Str);
	};			};
	uint64_t Ret = updateHash(std::to_string(DIL->getLine()));			uint64_t Ret = updateHash(std::to_string(DIL->getLine()));
	Ret ^= updateHash(BB.getName());			Ret ^= updateHash(BB.getName());
				hoyUnsubmitted Not Done Reply Inline Actions BB could have different names between the release and the debug compiler, or using or not using `-fdiscard-value-names`. This may lead to non-determinism in computing FS discriminators. Maybe consider using an integer id for BB? It doesn't exist today though. hoy: BB could have different names between the release and the debug compiler, or using or not using…
				xurAuthorUnsubmitted Done Reply Inline Actions You are exactly right! Note this is old version of Hash that will be deprecated. The new version will not use the name anymore. xur: You are exactly right! Note this is old version of Hash that will be deprecated. The new…
				hoyUnsubmitted Not Done Reply Inline Actions I see. It's only used in the old version. hoy: I see. It's only used in the old version.
	Ret ^= updateHash(DIL->getScope()->getSubprogram()->getLinkageName());			Ret ^= updateHash(DIL->getScope()->getSubprogram()->getLinkageName());
	for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {			for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
	Ret ^= updateHash(std::to_string(DIL->getLine()));			Ret ^= updateHash(std::to_string(DIL->getLine()));
				wenleiUnsubmitted Not Done Reply Inline Actions same here, converting to string first seem unnecessary. wenlei: same here, converting to string first seem unnecessary.
				xurAuthorUnsubmitted Done Reply Inline Actions Will switch to xxHash64. xur: Will switch to xxHash64.
	Ret ^= updateHash(DIL->getScope()->getSubprogram()->getLinkageName());			Ret ^= updateHash(DIL->getScope()->getSubprogram()->getLinkageName());
	}			}
	return Ret;			return Ret;
	}			}

				static uint64_t getCallStackHash(const DILocation *DIL) {
				auto updateHash = [](const StringRef &Str) -> uint64_t {
				if (Str.empty())
				return 0;
				return MD5Hash(Str);
				wenleiUnsubmitted Not Done Reply Inline Actions MD5 as cryptographic hash is expensive, `xxHash64` should be good enough in terms of distribution and collision avoidance. Now that we're changing discriminator algorithm, wondering if we should take the opportunity to move to xxhash for the new version. (For compilation with compact-binary profile where MD5 is used a lot, MD5 shows up quite hot in perf profile) wenlei: MD5 as cryptographic hash is expensive, `xxHash64` should be good enough in terms of…
				xurAuthorUnsubmitted Done Reply Inline Actions This is a good idea. I think this is a good opportunity to switch to less expensive hash algorithm. xur: This is a good idea. I think this is a good opportunity to switch to less expensive hash…
				};
				uint64_t Ret = 0;
				for (DIL = DIL->getInlinedAt(); DIL; DIL = DIL->getInlinedAt()) {
				Ret ^= updateHash(std::to_string(DIL->getLine()));
				wenleiUnsubmitted Not Done Reply Inline Actions If collision is a concern and we're working to reduce collision, `^=` is a weak blend/combine function (symmetric among other problems.) Something like this below is a safer combine function. We have a number of similar `hashCombine` in LLVM code base. inline int64_t hashCombine(const int64_t Seed, const int64_t Val) { std::hash<int64_t> Hasher; return Seed ^ (Hasher(Val) + 0x9e3779b9 + (Seed << 6) + (Seed >> 2)); } wenlei: If collision is a concern and we're working to reduce collision, `^=` is a weak blend/combine…
				Ret ^= updateHash(DIL->getName());
				hoyUnsubmitted Not Done Reply Inline Actions `getLinkageName` may return an empty string C functions. We often use a trick to work it around: // Use linkage name for C++ if possible. auto Name = SP->getLinkageName(); if (Name.empty()) Name = SP->getName(); Actually there is a similar function `getCallStackHash` in `SampleProfileProbe.cpp`. Do you think it's possible to unify them and place it in a command file like `DebubInfo.h`? hoy: `getLinkageName` may return an empty string C functions. We often use a trick to work it around…
				xurAuthorUnsubmitted Done Reply Inline Actions Good to know this. Have see seen cases that empty string leads to hash conflicts? Will getName() also return empty string? We can add another interface in DISubprogram? in DebuginfoMetadata.h? xur: Good to know this. Have see seen cases that empty string leads to hash conflicts? Will getName…
				hoyUnsubmitted Not Done Reply Inline Actions I haven't taken a deep look into whether empty strings can lead to hash conflicts for this case actually. `getName()` will always return the original demangled function name, and for C, it is just the linkage name. A new interface in DebuginfoMetadata.h sounds good. Perhaps a new member function for `DILocation`? hoy: I haven't taken a deep look into whether empty strings can lead to hash conflicts for this case…
				}
				return Ret;
				}

	// Traverse the CFG and assign FD discriminators. If two instructions			// Traverse the CFG and assign FD discriminators. If two instructions
	// have the same lineno and discriminator, but residing in different BBs,			// have the same lineno and discriminator, but residing in different BBs,
	// the latter instruction will get a new discriminator value. The new			// the latter instruction will get a new discriminator value. The new
	// discriminator keeps the existing discriminator value but sets new bits			// discriminator keeps the existing discriminator value but sets new bits
	// b/w LowBit and HighBit.			// b/w LowBit and HighBit.
	bool MIRAddFSDiscriminators::runOnMachineFunction(MachineFunction &MF) {			bool MIRAddFSDiscriminators::runOnMachineFunction(MachineFunction &MF) {
	if (!EnableFSDiscriminator)			if (!EnableFSDiscriminator)
	return false;			return false;
	if (!MF.getFunction().shouldEmitDebugInfoForProfiling())			if (!MF.getFunction().shouldEmitDebugInfoForProfiling())
	return false;			return false;

	bool Changed = false;			bool Changed = false;
	using LocationDiscriminator = std::tuple<StringRef, unsigned, unsigned>;			using LocationDiscriminator =
				std::tuple<StringRef, unsigned, unsigned, uint64_t>;
	using BBSet = DenseSet<const MachineBasicBlock *>;			using BBSet = DenseSet<const MachineBasicBlock *>;
	using LocationDiscriminatorBBMap = DenseMap<LocationDiscriminator, BBSet>;			using LocationDiscriminatorBBMap = DenseMap<LocationDiscriminator, BBSet>;
	using LocationDiscriminatorCurrPassMap =			using LocationDiscriminatorCurrPassMap =
	DenseMap<LocationDiscriminator, unsigned>;			DenseMap<LocationDiscriminator, unsigned>;

	LocationDiscriminatorBBMap LDBM;			LocationDiscriminatorBBMap LDBM;
	LocationDiscriminatorCurrPassMap LDCM;			LocationDiscriminatorCurrPassMap LDCM;

	// Mask of discriminators before this pass.			// Mask of discriminators before this pass.
	unsigned BitMaskBefore = getN1Bits(LowBit);			// TODO(xur): simplify this once we switch to ImprovedFSDiscriminator.
				unsigned LowBitTemp = LowBit;
				assert(LowBit > 0 && "LowBit in FSDiscriminator cannot be 0");
				if (ImprovedFSDiscriminator)
				hoyUnsubmitted Not Done Reply Inline Actions Is this fixing a bug of the old encoding? BTW, should we assert the current pass is always not the base pass or LowBit is never zero? hoy: Is this fixing a bug of the old encoding? BTW, should we assert the current pass is always not…
				xurAuthorUnsubmitted Done Reply Inline Actions This is a bug fix. I notice that issue for a while but I did fix it because it changes of some discriminators. I will add an assert here. xur: This is a bug fix. I notice that issue for a while but I did fix it because it changes of some…
				LowBitTemp -= 1;
				unsigned BitMaskBefore = getN1Bits(LowBitTemp);
	// Mask of discriminators including this pass.			// Mask of discriminators including this pass.
	unsigned BitMaskNow = getN1Bits(HighBit);			unsigned BitMaskNow = getN1Bits(HighBit);
	// Mask of discriminators for bits specific to this pass.			// Mask of discriminators for bits specific to this pass.
	unsigned BitMaskThisPass = BitMaskNow ^ BitMaskBefore;			unsigned BitMaskThisPass = BitMaskNow ^ BitMaskBefore;
	unsigned NumNewD = 0;			unsigned NumNewD = 0;

	LLVM_DEBUG(dbgs() << "MIRAddFSDiscriminators working on Func: "			LLVM_DEBUG(dbgs() << "MIRAddFSDiscriminators working on Func: "
	<< MF.getFunction().getName() << "\n");			<< MF.getFunction().getName() << " Highbit=" << HighBit
				<< "\n");

				auto BBSize = [](const MachineBasicBlock &BB) {
				int Size = 0;
				for (const MachineInstr &I : BB) {
				if (ImprovedFSDiscriminator && I.isMetaInstruction())
				continue;
				Size++;
				}
				return Size;
				};

	for (MachineBasicBlock &BB : MF) {			for (MachineBasicBlock &BB : MF) {
				uint64_t BBSizeHash = 0;
				if (ImprovedFSDiscriminator)
				BBSizeHash = MD5Hash(std::to_string(BBSize(BB)));
				wenleiUnsubmitted Not Done Reply Inline Actions converting int to string just to get an hash seems like an overkill. `xxHash64(ArrayRef<uint8_t> Data)` should be fast and good enough. wenlei: converting int to string just to get an hash seems like an overkill. `xxHash64…
				xurAuthorUnsubmitted Done Reply Inline Actions Got it. I will probably remove BBSize from the hash. xur: Got it. I will probably remove BBSize from the hash.

	for (MachineInstr &I : BB) {			for (MachineInstr &I : BB) {
				if (ImprovedFSDiscriminator && I.isMetaInstruction()) {
				continue;
				}
	const DILocation *DIL = I.getDebugLoc().get();			const DILocation *DIL = I.getDebugLoc().get();
	if (!DIL)			if (!DIL)
	continue;			continue;
	unsigned LineNo = DIL->getLine();			unsigned LineNo = DIL->getLine();
	if (LineNo == 0)			if (LineNo == 0)
	continue;			continue;
				hoyUnsubmitted Not Done Reply Inline Actions Should zero line number also get a discriminator? This may result in invalid line offsets with which all such instructions will end up sharing the same sample. hoy: Should zero line number also get a discriminator? This may result in invalid line offsets with…
				xurAuthorUnsubmitted Done Reply Inline Actions Zero line numbers do not get a discriminator. The number of zero line number instructions is actually pretty big. They can easily overflow the discriminators. We have another changes to deal with zero line number and instructions with empty DIL. They are still ongoing. xur: Zero line numbers do not get a discriminator. The number of zero line number instructions is…
				hoyUnsubmitted Not Done Reply Inline Actions Good to know you are working on a fix. hoy: Good to know you are working on a fix.
	unsigned Discriminator = DIL->getDiscriminator();			unsigned Discriminator = DIL->getDiscriminator();
	LocationDiscriminator LD{DIL->getFilename(), LineNo, Discriminator};			uint64_t CallStackHashVal = 0;
				if (ImprovedFSDiscriminator)
				CallStackHashVal = getCallStackHash(DIL);

				LocationDiscriminator LD{DIL->getFilename(), LineNo, Discriminator,
				hoyUnsubmitted Not Done Reply Inline Actions Wondering if `DIL->getFilename()` is still needed since `CallStackHashVal` includes the caller linkage names which should help avoid hash conflicts. hoy: Wondering if `DIL->getFilename()` is still needed since `CallStackHashVal` includes the caller…
				xurAuthorUnsubmitted Done Reply Inline Actions CallStackHashValue is for DIL with getInlinedAt(). If this is not inlined instrution. CallStackHashValue returns 0. xur: CallStackHashValue is for DIL with getInlinedAt(). If this is not inlined instrution.
				hoyUnsubmitted Not Done Reply Inline Actions Thanks for pointing it out. hoy: Thanks for pointing it out.
				CallStackHashVal};
	auto &BBMap = LDBM[LD];			auto &BBMap = LDBM[LD];
	auto R = BBMap.insert(&BB);			auto R = BBMap.insert(&BB);
	if (BBMap.size() == 1)			if (BBMap.size() == 1)
	continue;			continue;

	unsigned DiscriminatorCurrPass;			unsigned DiscriminatorCurrPass;
	DiscriminatorCurrPass = R.second ? ++LDCM[LD] : LDCM[LD];			DiscriminatorCurrPass = R.second ? ++LDCM[LD] : LDCM[LD];
	DiscriminatorCurrPass = DiscriminatorCurrPass << LowBit;			DiscriminatorCurrPass = DiscriminatorCurrPass << LowBit;
	DiscriminatorCurrPass += getCallStackHash(BB, I, DIL);			if (ImprovedFSDiscriminator)
				DiscriminatorCurrPass += BBSizeHash;
				hoyUnsubmitted Not Done Reply Inline Actions Wondering if `BBSizeHash` is stable enough run-to-run. If consecutive builds have slight change in block size, their discriminators may not match? Including callsite hash `LocationDiscriminator` sounds a great improvement. I'm wondering how helpful it is to include `BBSizeHash` in the discriminator encoding. Have you evaluated the two changes separately? hoy: Wondering if `BBSizeHash` is stable enough run-to-run. If consecutive builds have slight change…
				xurAuthorUnsubmitted Done Reply Inline Actions We have tried a few other ways for the Hash. It seems that it's better to be more strict on the match -- if it's mistach, we will not use the pass specific counters, but we still use the average count (branch portability) from previous rounds. This seems to be better than applying wrong BB weights. I don't have the data for with and without BBSizeHash. I can a quick run to see if it has performance impact. xur: We have tried a few other ways for the Hash. It seems that it's better to be more strict on the…
				hoyUnsubmitted Not Done Reply Inline Actions This seems to be better than applying wrong BB weights. This makes sense. I don't have the data for with and without BBSizeHash. I can a quick run to see if it has performance impact. Yeah, curious to see how impactful it is. Thanks. hoy: > This seems to be better than applying wrong BB weights. This makes sense. > I don't have…
				hoyUnsubmitted Not Done Reply Inline Actions The callsite hash is being removed from the discriminator for V1. I guess the discriminator conflicts will be way less than V0? hoy: The callsite hash is being removed from the discriminator for V1. I guess the discriminator…
				xurAuthorUnsubmitted Done Reply Inline Actions Yes, the callsite hash is not in the discriminator hash for V1 -- the callsite hash is now the part of the key for the map. We now can have discriminators with the same value but they will not conflict with each other as they belong to different maps. This indirectly increases the range of the discriminators and thus results in less conflicts. xur: Yes, the callsite hash is not in the discriminator hash for V1 -- the callsite hash is now the…
				else
				DiscriminatorCurrPass += getCallStackHashV0(BB, I, DIL);
	DiscriminatorCurrPass &= BitMaskThisPass;			DiscriminatorCurrPass &= BitMaskThisPass;
	unsigned NewD = Discriminator \| DiscriminatorCurrPass;			unsigned NewD = Discriminator \| DiscriminatorCurrPass;
	const auto *const NewDIL = DIL->cloneWithDiscriminator(NewD);			const auto *const NewDIL = DIL->cloneWithDiscriminator(NewD);
	if (!NewDIL) {			if (!NewDIL) {
	LLVM_DEBUG(dbgs() << "Could not encode discriminator: "			LLVM_DEBUG(dbgs() << "Could not encode discriminator: "
	<< DIL->getFilename() << ":" << DIL->getLine() << ":"			<< DIL->getFilename() << ":" << DIL->getLine() << ":"
	<< DIL->getColumn() << ":" << Discriminator << " "			<< DIL->getColumn() << ":" << Discriminator << " "
	<< I << "\n");			<< I << "\n");
	Show All 20 Lines

llvm/lib/CodeGen/MIRSampleProfile.cpp

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines

static cl::opt<bool> ViewBFIBefore("fs-viewbfi-before", cl::Hidden,		static cl::opt<bool> ViewBFIBefore("fs-viewbfi-before", cl::Hidden,
cl::init(false),		cl::init(false),
cl::desc("View BFI before MIR loader"));		cl::desc("View BFI before MIR loader"));
static cl::opt<bool> ViewBFIAfter("fs-viewbfi-after", cl::Hidden,		static cl::opt<bool> ViewBFIAfter("fs-viewbfi-after", cl::Hidden,
cl::init(false),		cl::init(false),
cl::desc("View BFI after MIR loader"));		cl::desc("View BFI after MIR loader"));

		extern cl::opt<bool> ImprovedFSDiscriminator;
char MIRProfileLoaderPass::ID = 0;		char MIRProfileLoaderPass::ID = 0;

INITIALIZE_PASS_BEGIN(MIRProfileLoaderPass, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(MIRProfileLoaderPass, DEBUG_TYPE,
"Load MIR Sample Profile",		"Load MIR Sample Profile",
/* cfg = / false, / is_analysis = */ false)		/* cfg = / false, / is_analysis = */ false)
INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)		INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree)
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	protected:
// LowBit in the FS discriminator used by this instance. Note the number is		// LowBit in the FS discriminator used by this instance. Note the number is
// 0-based. Base discrimnator use bit 0 to bit 11.		// 0-based. Base discrimnator use bit 0 to bit 11.
unsigned LowBit;		unsigned LowBit;
// HighwBit in the FS discriminator used by this instance. Note the number		// HighwBit in the FS discriminator used by this instance. Note the number
// is 0-based.		// is 0-based.
unsigned HighBit;		unsigned HighBit;

bool ProfileIsValid = true;		bool ProfileIsValid = true;
		ErrorOr<uint64_t> getInstWeight(const MachineInstr &MI) override {
		if (ImprovedFSDiscriminator && MI.isMetaInstruction())
		return std::error_code();
		return getInstWeightImpl(MI);
		}
};		};

template <>		template <>
void SampleProfileLoaderBaseImpl<		void SampleProfileLoaderBaseImpl<
MachineBasicBlock>::computeDominanceAndLoopInfo(MachineFunction &F) {}		MachineBasicBlock>::computeDominanceAndLoopInfo(MachineFunction &F) {}

void MIRProfileLoader::setBranchProbs(MachineFunction &F) {		void MIRProfileLoader::setBranchProbs(MachineFunction &F) {
LLVM_DEBUG(dbgs() << "\nPropagation complete. Setting branch probs\n");		LLVM_DEBUG(dbgs() << "\nPropagation complete. Setting branch probs\n");
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/lib/CodeGen/PseudoProbeInserter.cpp

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	for (MachineBasicBlock &MBB : MF) {
}		}
}		}

return Changed;		return Changed;
}		}

private:		private:
uint64_t getFuncGUID(Module M, DILocation DL) {		uint64_t getFuncGUID(Module M, DILocation DL) {
auto *SP = DL->getScope()->getSubprogram();		auto Name = DL->getName();
auto Name = SP->getLinkageName();
if (Name.empty())
Name = SP->getName();
return Function::getGUID(Name);		return Function::getGUID(Name);
}		}

bool ShouldRun = false;		bool ShouldRun = false;
};		};
} // namespace		} // namespace

char PseudoProbeInserter::ID = 0;		char PseudoProbeInserter::ID = 0;
Show All 11 Lines

llvm/lib/Transforms/IPO/SampleProfileProbe.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	UpdatePseudoProbe("update-pseudo-probe", cl::init(true), cl::Hidden,
cl::desc("Update pseudo probe distribution factor"));		cl::desc("Update pseudo probe distribution factor"));

static uint64_t getCallStackHash(const DILocation *DIL) {		static uint64_t getCallStackHash(const DILocation *DIL) {
uint64_t Hash = 0;		uint64_t Hash = 0;
const DILocation *InlinedAt = DIL ? DIL->getInlinedAt() : nullptr;		const DILocation *InlinedAt = DIL ? DIL->getInlinedAt() : nullptr;
while (InlinedAt) {		while (InlinedAt) {
Hash ^= MD5Hash(std::to_string(InlinedAt->getLine()));		Hash ^= MD5Hash(std::to_string(InlinedAt->getLine()));
Hash ^= MD5Hash(std::to_string(InlinedAt->getColumn()));		Hash ^= MD5Hash(std::to_string(InlinedAt->getColumn()));
const DISubprogram *SP = InlinedAt->getScope()->getSubprogram();		auto Name = InlinedAt->getName();
// Use linkage name for C++ if possible.
auto Name = SP->getLinkageName();
if (Name.empty())
Name = SP->getName();
Hash ^= MD5Hash(Name);		Hash ^= MD5Hash(Name);
InlinedAt = InlinedAt->getInlinedAt();		InlinedAt = InlinedAt->getInlinedAt();
}		}
return Hash;		return Hash;
}		}

static uint64_t computeCallStackHash(const Instruction &Inst) {		static uint64_t computeCallStackHash(const Instruction &Inst) {
return getCallStackHash(Inst.getDebugLoc());		return getCallStackHash(Inst.getDebugLoc());
▲ Show 20 Lines • Show All 392 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/Inputs/fsloader_v1.afdo

This file was added.

				work:42380966:1346190
				1: 1246499
				5: 1246499
				foo:28798256:4267
				0: 4267
				2.1: 255999
				4: 264627 bar:250018
				4.1792: 269485 bar:278102
				4.6656: 280297 bar:280933
				4.6912: 278916 bar:267752
				5: 264627
				5.1792: 269485
				5.6656: 260670
				5.6912: 278916
				6: 11541
				6.6912: 278916 work:284547
				6.7168: 260670 work:249428
				6.7424: 11541
				7: 272442
				7.6912: 283590
				7.7168: 234082
				7.7424: 279149
				8: 11541
				8.14848: 283590 work:305061
				8.15104: 279149 work:281368
				8.15360: 234082 work:225786
				10: 4050
				bar:9504180:1076805
				2: 1056020
				3: 1056020
				main:20360:0
				0: 0
				2.1: 4045
				3: 4156 foo:4267
				5: 0

llvm/test/CodeGen/X86/fsafdo_test1.ll

	; RUN: llc -enable-fs-discriminator < %s \| FileCheck %s			; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false < %s \| FileCheck %s --check-prefix=V0
				; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true < %s \| FileCheck %s --check-prefix=V1
	;			;
	; Check that fs-afdo discriminators are generated.			; Check that fs-afdo discriminators are generated.
	; CHECK: .loc 1 7 3 is_stmt 0 discriminator 2 # foo.c:7:3			; CHECK: .loc 1 7 3 is_stmt 0 discriminator 2 # foo.c:7:3
				hoyUnsubmitted Not Done Reply Inline Actions I have a question about keeping the original discriminator, i.e, 2 here. IIUC, the MIR sample loader will skip loading samples for the instruction. Do you think it should get a new discriminator so that it can use pass specific counters too? Let me know if I miss anything. hoy: I have a question about keeping the original discriminator, i.e, 2 here. IIUC, the MIR sample…
				xurAuthorUnsubmitted Done Reply Inline Actions No. It will be loaded in MIR samples profile. 2 will be bit masked and the counter will be contributed to version 0 (i.e. discriminator value of 0). xur: No. It will be loaded in MIR samples profile. 2 will be bit masked and the counter will be…
	; ChECK: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5			; ChECK: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5
	; CHECK: .loc 1 9 5 is_stmt 0 discriminator 11266 # foo.c:9:5			; V0: .loc 1 9 5 is_stmt 0 discriminator 11266 # foo.c:9:5
	; CHECK: .loc 1 7 3 is_stmt 1 discriminator 11266 # foo.c:7:3			; V0: .loc 1 7 3 is_stmt 1 discriminator 11266 # foo.c:7:3
				; V1: .loc 1 9 5 is_stmt 0 discriminator 2818 # foo.c:9:5
				; V1: .loc 1 7 3 is_stmt 1 discriminator 2818 # foo.c:7:3
	; Check that variable __llvm_fs_discriminator__ is generated.			; Check that variable __llvm_fs_discriminator__ is generated.
	; CHECK: .type __llvm_fs_discriminator__,@object # @__llvm_fs_discriminator__			; CHECK: .type __llvm_fs_discriminator__,@object # @__llvm_fs_discriminator__
	; CHECK: .section .rodata,"a",@progbits			; CHECK: .section .rodata,"a",@progbits
	; CHECK: .weak __llvm_fs_discriminator__			; CHECK: .weak __llvm_fs_discriminator__
	; CHECK: __llvm_fs_discriminator__:			; CHECK: __llvm_fs_discriminator__:
	; CHECK: .byte 1			; CHECK: .byte 1
	; CHECK: .size __llvm_fs_discriminator__, 1			; CHECK: .size __llvm_fs_discriminator__, 1

	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fsafdo_test2.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: llc -enable-fs-discriminator < %s \| FileCheck %s			; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false < %s \| FileCheck %s --check-prefixes=V0,V01
	; RUN: llvm-profdata merge --sample -profile-isfs -o %t.afdo %S/Inputs/fsloader.afdo			; RUN: llvm-profdata merge --sample -profile-isfs -o %t0.afdo %S/Inputs/fsloader.afdo
	; RUN: llc -enable-fs-discriminator -fs-profile-file=%t.afdo -show-fs-branchprob -disable-ra-fsprofile-loader=false -disable-layout-fsprofile-loader=false < %s 2>&1 \| FileCheck %s --check-prefix=LOADER			; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false -fs-profile-file=%t0.afdo -show-fs-branchprob -disable-ra-fsprofile-loader=false -disable-layout-fsprofile-loader=false < %s 2>&1 \| FileCheck %s --check-prefixes=LOADERV0,LOADER
				; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true < %s \| FileCheck %s --check-prefixes=V1,V01
				; RUN: llvm-profdata merge --sample -profile-isfs -o %t1.afdo %S/Inputs/fsloader_v1.afdo
				; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true -fs-profile-file=%t1.afdo -show-fs-branchprob -disable-ra-fsprofile-loader=false -disable-layout-fsprofile-loader=false < %s 2>&1 \| FileCheck %s --check-prefixes=LOADERV1,LOADER
	;			;
	;;			;;
	;; C source code for the test (compiler at -O3):			;; C source code for the test (compiler at -O3):
	;; // A test case for loop unroll.			;; // A test case for loop unroll.
	;;			;;
	;; __attribute__((noinline)) int bar(int i){			;; __attribute__((noinline)) int bar(int i){
	;; volatile int j;			;; volatile int j;
	;; j = i;			;; j = i;
	Show All 23 Lines
	;; int main() {			;; int main() {
	;; int i;			;; int i;
	;; for (i = 0; i < 10000000; i++) {			;; for (i = 0; i < 10000000; i++) {
	;; foo();			;; foo();
	;; }			;; }
	;; }			;; }
	;;			;;
	;; Check that fs-afdo discriminators are generated.			;; Check that fs-afdo discriminators are generated.
	; CHECK: .loc 1 23 9 is_stmt 0 discriminator 1 # unroll.c:23:9			; V01: .loc 1 23 9 is_stmt 0 discriminator 1 # unroll.c:23:9
	; CHECK: .loc 1 23 9 is_stmt 0 discriminator 3585 # unroll.c:23:9			; V0: .loc 1 23 9 is_stmt 0 discriminator 3585 # unroll.c:23:9
	; CHECK: .loc 1 23 9 is_stmt 0 discriminator 8705 # unroll.c:23:9			; V0: .loc 1 23 9 is_stmt 0 discriminator 8705 # unroll.c:23:9
	; CHECK: .loc 1 23 9 is_stmt 0 discriminator 4097 # unroll.c:23:9			; V0: .loc 1 23 9 is_stmt 0 discriminator 4097 # unroll.c:23:9
				; V1: .loc 1 23 9 is_stmt 0 discriminator 6913 # unroll.c:23:9
				; V1: .loc 1 23 9 is_stmt 0 discriminator 7169 # unroll.c:23:9
				; V1: .loc 1 23 9 is_stmt 0 discriminator 7425 # unroll.c:23:9
	;;			;;
	;; Check that variable __llvm_fs_discriminator__ is generated.			;; Check that variable __llvm_fs_discriminator__ is generated.
	; CHECK: .type __llvm_fs_discriminator__,@object # @__llvm_fs_discriminator__			; V01: .type __llvm_fs_discriminator__,@object # @__llvm_fs_discriminator__
	; CHECK: .section .rodata,"a",@progbits			; V01: .section .rodata,"a",@progbits
	; CHECK: .weak __llvm_fs_discriminator__			; V01: .weak __llvm_fs_discriminator__
	; CHECK: __llvm_fs_discriminator__:			; V01: __llvm_fs_discriminator__:
	; CHECK: .byte 1			; V01: .byte 1
	; CHECK: .size __llvm_fs_discriminator__, 1			; V01: .size __llvm_fs_discriminator__, 1

	;; Check that new branch probs are generated.			;; Check that new branch probs are generated.
	; LOADER: Set branch fs prob: MBB (1 -> 3): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7aca7894 / 0x80000000 = 95.93%			; LOADER: Set branch fs prob: MBB (1 -> 3): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7aca7894 / 0x80000000 = 95.93%
	; LOADER: Set branch fs prob: MBB (1 -> 2): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0535876c / 0x80000000 = 4.07%			; LOADER: Set branch fs prob: MBB (1 -> 2): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0535876c / 0x80000000 = 4.07%
	; LOADER: Set branch fs prob: MBB (3 -> 5): unroll.c:24:11-->unroll.c:22:11 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x7aca7894 / 0x80000000 = 95.93%			; LOADER: Set branch fs prob: MBB (3 -> 5): unroll.c:24:11-->unroll.c:22:11 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x7aca7894 / 0x80000000 = 95.93%
	; LOADER: Set branch fs prob: MBB (3 -> 4): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x0535876c / 0x80000000 = 4.07%			; LOADER: Set branch fs prob: MBB (3 -> 4): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x0535876c / 0x80000000 = 4.07%
	; LOADER: Set branch fs prob: MBB (5 -> 8): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x021c112e / 0x80000000 = 1.65%			; LOADER: Set branch fs prob: MBB (5 -> 8): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x021c112e / 0x80000000 = 1.65%
	; LOADER: Set branch fs prob: MBB (5 -> 7): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7de3eed2 / 0x80000000 = 98.35%			; LOADER: Set branch fs prob: MBB (5 -> 7): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7de3eed2 / 0x80000000 = 98.35%
	; LOADER: Set branch fs prob: MBB (8 -> 10): unroll.c:24:11-->unroll.c:22:11 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x00000000 / 0x80000000 = 0.00%			; LOADER: Set branch fs prob: MBB (8 -> 10): unroll.c:24:11-->unroll.c:22:11 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x00000000 / 0x80000000 = 0.00%
	; LOADER: Set branch fs prob: MBB (8 -> 9): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x80000000 / 0x80000000 = 100.00%			; LOADER: Set branch fs prob: MBB (8 -> 9): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x80000000 / 0x80000000 = 100.00%
	; LOADER: Set branch fs prob: MBB (10 -> 12): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7aca7894 / 0x80000000 = 95.93%			; LOADERV0: Set branch fs prob: MBB (10 -> 12): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7aca7894 / 0x80000000 = 95.93%
	; LOADER: Set branch fs prob: MBB (10 -> 11): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0535876c / 0x80000000 = 4.07%			; LOADERV1: Set branch fs prob: MBB (10 -> 12): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0a5856e1 / 0x80000000 = 8.08%
				; LOADERV0: Set branch fs prob: MBB (10 -> 11): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0535876c / 0x80000000 = 4.07%
				; LOADERV1: Set branch fs prob: MBB (10 -> 11): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x75a7a91f / 0x80000000 = 91.92%
	; LOADER: Set branch fs prob: MBB (12 -> 14): unroll.c:24:11-->unroll.c:22:11 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x02012507 / 0x80000000 = 1.57%			; LOADER: Set branch fs prob: MBB (12 -> 14): unroll.c:24:11-->unroll.c:22:11 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x02012507 / 0x80000000 = 1.57%
	; LOADER: Set branch fs prob: MBB (12 -> 13): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x7dfedaf9 / 0x80000000 = 98.43%			; LOADER: Set branch fs prob: MBB (12 -> 13): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x7dfedaf9 / 0x80000000 = 98.43%
	; LOADER: Set branch fs prob: MBB (14 -> 16): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0a5856e1 / 0x80000000 = 8.08%			; LOADERV0: Set branch fs prob: MBB (14 -> 16): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0a5856e1 / 0x80000000 = 8.08%
	; LOADER: Set branch fs prob: MBB (14 -> 15): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x75a7a91f / 0x80000000 = 91.92%			; LOADERV1: Set branch fs prob: MBB (14 -> 16): unroll.c:22:11-->unroll.c:24:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x7aca7894 / 0x80000000 = 95.93%
				; LOADERV0: Set branch fs prob: MBB (14 -> 15): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x75a7a91f / 0x80000000 = 91.92%
				; LOADERV1: Set branch fs prob: MBB (14 -> 15): unroll.c:22:11 W=283590 0x40000000 / 0x80000000 = 50.00% --> 0x0535876c / 0x80000000 = 4.07%
	; LOADER: Set branch fs prob: MBB (16 -> 18): unroll.c:24:11-->unroll.c:19:3 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x16588166 / 0x80000000 = 17.46%			; LOADER: Set branch fs prob: MBB (16 -> 18): unroll.c:24:11-->unroll.c:19:3 W=283590 0x30000000 / 0x80000000 = 37.50% --> 0x16588166 / 0x80000000 = 17.46%
	; LOADER: Set branch fs prob: MBB (16 -> 17): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x69a77e9a / 0x80000000 = 82.54%			; LOADER: Set branch fs prob: MBB (16 -> 17): unroll.c:24:11 W=283590 0x50000000 / 0x80000000 = 62.50% --> 0x69a77e9a / 0x80000000 = 82.54%


	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@sum = dso_local local_unnamed_addr global i32 0, align 4			@sum = dso_local local_unnamed_addr global i32 0, align 4

	declare i32 @bar(i32 %i) #0			declare i32 @bar(i32 %i) #0
	declare void @work(i32 %i) #2			declare void @work(i32 %i) #2

	define dso_local void @foo() #0 !dbg !29 {			define dso_local void @foo() #0 !dbg !29 {
	▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fsafdo_test3.ll

	; RUN: llvm-profdata merge --sample -profile-isfs -o %t.afdo %S/Inputs/fsloader.afdo			; RUN: llvm-profdata merge --sample -profile-isfs -o %t0.afdo %S/Inputs/fsloader.afdo
	; RUN: llc -enable-fs-discriminator -fs-profile-file=%t.afdo -disable-ra-fsprofile-loader=false -disable-layout-fsprofile-loader=false -print-machine-bfi -print-bfi-func-name=foo -print-before=fs-profile-loader -stop-after=fs-profile-loader < %s 2>&1 \| FileCheck %s --check-prefix=BFI			; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false -fs-profile-file=%t0.afdo -disable-ra-fsprofile-loader=false -disable-layout-fsprofile-loader=false -print-machine-bfi -print-bfi-func-name=foo -print-before=fs-profile-loader -stop-after=fs-profile-loader < %s 2>&1 \| FileCheck %s --check-prefixes=BFI,BFIV0
				; RUN: llvm-profdata merge --sample -profile-isfs -o %t1.afdo %S/Inputs/fsloader_v1.afdo
				; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true -fs-profile-file=%t1.afdo -disable-ra-fsprofile-loader=false -disable-layout-fsprofile-loader=false -print-machine-bfi -print-bfi-func-name=foo -print-before=fs-profile-loader -stop-after=fs-profile-loader < %s 2>&1 \| FileCheck %s --check-prefixes=BFI,BFIV1
	;			;
	;;			;;
	;; C source code for the test (compiler at -O3):			;; C source code for the test (compiler at -O3):
	;; // A test case for loop unroll.			;; // A test case for loop unroll.
	;;			;;
	;; __attribute__((noinline)) int bar(int i){			;; __attribute__((noinline)) int bar(int i){
	;; volatile int j;			;; volatile int j;
	;; j = i;			;; j = i;
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; BFI: - BB14[if.then.3]: float = 2.5405, int = 20, count = 10670			; BFI: - BB14[if.then.3]: float = 2.5405, int = 20, count = 10670
	; BFI: - BB15[if.end.3]: float = 59.967, int = 479, count = 255547			; BFI: - BB15[if.end.3]: float = 59.967, int = 479, count = 255547
	; BFI: - BB16[if.then7.3]: float = 2.5405, int = 20, count = 10670			; BFI: - BB16[if.then7.3]: float = 2.5405, int = 20, count = 10670
	; BFI: - BB17[if.end9.3]: float = 59.967, int = 479, count = 255547			; BFI: - BB17[if.end9.3]: float = 59.967, int = 479, count = 255547
	; BFI: - BB18[for.end12]: float = 1.0, int = 8, count = 4268			; BFI: - BB18[for.end12]: float = 1.0, int = 8, count = 4268
	;			;
	; BFI: # * IR Dump Before SampleFDO loader in MIR (fs-profile-loader) *:			; BFI: # * IR Dump Before SampleFDO loader in MIR (fs-profile-loader) *:
	; BFI: # End machine code for function foo.			; BFI: # End machine code for function foo.
	;			; BFI-EMPTY:
	; BFI: block-frequency-info: foo			; BFI: block-frequency-info: foo
	; BFI: - BB0[entry]: float = 1.0, int = 8, count = 4268			; BFI: - BB0[entry]: float = 1.0, int = 8, count = 4268
	; BFI: - BB1[for.cond1.preheader]: float = 66.446, int = 531, count = 283289			; BFI: - BB1[for.cond1.preheader]: float = 66.446, int = 531, count = 283289
	; BFI: - BB2[if.then]: float = 2.7041, int = 21, count = 11204			; BFI: - BB2[if.then]: float = 2.7041, int = 21, count = 11204
	; BFI: - BB3[if.end]: float = 66.446, int = 531, count = 283289			; BFI: - BB3[if.end]: float = 66.446, int = 531, count = 283289
	; BFI: - BB4[if.then7]: float = 2.7041, int = 21, count = 11204			; BFI: - BB4[if.then7]: float = 2.7041, int = 21, count = 11204
	; BFI: - BB5[if.end9]: float = 66.446, int = 531, count = 283289			; BFI: - BB5[if.end9]: float = 66.446, int = 531, count = 283289
	; BFI: - BB6[if.then.1]: float = 65.351, int = 522, count = 278487			; BFI: - BB6[if.then.1]: float = 65.351, int = 522, count = 278487
	; BFI: - BB7[if.end.1]: float = 66.446, int = 531, count = 283289			; BFI: - BB7[if.end.1]: float = 66.446, int = 531, count = 283289
	; BFI: - BB8[if.then7.1]: float = 66.446, int = 531, count = 283289			; BFI: - BB8[if.then7.1]: float = 66.446, int = 531, count = 283289
	; BFI: - BB9[if.end9.1]: float = 66.446, int = 531, count = 283289			; BFI: - BB9[if.end9.1]: float = 66.446, int = 531, count = 283289
	; BFI: - BB10[if.then.2]: float = 2.7041, int = 21, count = 11204			; BFIV0: - BB10[if.then.2]: float = 2.7041, int = 21, count = 11204
				; BFIV1: - BB10[if.then.2]: float = 61.075, int = 488, count = 260348
	; BFI: - BB11[if.end.2]: float = 66.446, int = 531, count = 283289			; BFI: - BB11[if.end.2]: float = 66.446, int = 531, count = 283289
	; BFI: - BB12[if.then7.2]: float = 65.405, int = 523, count = 279021			; BFI: - BB12[if.then7.2]: float = 65.405, int = 523, count = 279021
	; BFI: - BB13[if.end9.2]: float = 66.446, int = 531, count = 283289			; BFI: - BB13[if.end9.2]: float = 66.446, int = 531, count = 283289
	; BFI: - BB14[if.then.3]: float = 61.075, int = 488, count = 260348			; BFIV0: - BB14[if.then.3]: float = 61.075, int = 488, count = 260348
				; BFIV1: - BB14[if.then.3]: float = 2.7041, int = 21, count = 11204
	; BFI: - BB15[if.end.3]: float = 66.446, int = 531, count = 283289			; BFI: - BB15[if.end.3]: float = 66.446, int = 531, count = 283289
	; BFI: - BB16[if.then7.3]: float = 54.846, int = 438, count = 233673			; BFI: - BB16[if.then7.3]: float = 54.846, int = 438, count = 233673
	; BFI: - BB17[if.end9.3]: float = 66.446, int = 531, count = 283289			; BFI: - BB17[if.end9.3]: float = 66.446, int = 531, count = 283289
	; BFI: - BB18[for.end12]: float = 1.0, int = 8, count = 4268			; BFI: - BB18[for.end12]: float = 1.0, int = 8, count = 4268

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@sum = dso_local local_unnamed_addr global i32 0, align 4			@sum = dso_local local_unnamed_addr global i32 0, align 4
	▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fsafdo_test4.ll

	; RUN: llc -enable-fs-discriminator < %s \| FileCheck %s			; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=false < %s \| FileCheck %s
				; RUN: llc -enable-fs-discriminator -improved-fs-discriminator=true < %s \| FileCheck %s
	;			;
	; Check that fs-afdo discriminators are NOT generated, as debugInfoForProfiling is false (not set).			; Check that fs-afdo discriminators are NOT generated, as debugInfoForProfiling is false (not set).
	; CHECK: .loc 1 7 3 is_stmt 0 discriminator 2 # foo.c:7:3			; CHECK: .loc 1 7 3 is_stmt 0 discriminator 2 # foo.c:7:3
	; CHECK: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5			; CHECK: .loc 1 9 5 is_stmt 1 discriminator 2 # foo.c:9:5
	; CHECK-NOT: .loc 1 9 5 is_stmt 0 discriminator 11266 # foo.c:9:5			; CHECK-NOT: .loc 1 9 5 is_stmt 0 discriminator
	; CHECK-NOT: .loc 1 7 3 is_stmt 1 discriminator 11266 # foo.c:7:3			; CHECK-NOT: .loc 1 7 3 is_stmt 1 discriminator
	; Check that variable __llvm_fs_discriminator__ is NOT generated.			; Check that variable __llvm_fs_discriminator__ is NOT generated.
	; CHECK-NOT: __llvm_fs_discriminator__:			; CHECK-NOT: __llvm_fs_discriminator__:

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.Node = type { ptr }			%struct.Node = type { ptr }

	define i32 @foo(ptr readonly %node, ptr readnone %root) !dbg !6 {			define i32 @foo(ptr readonly %node, ptr readnone %root) !dbg !6 {
	Show All 40 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[FSAFDO] Improve FS discriminator encodingClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 502867

llvm/include/llvm/IR/DebugInfoMetadata.h

llvm/lib/CodeGen/AsmPrinter/PseudoProbePrinter.cpp

llvm/lib/CodeGen/MIRFSDiscriminator.cpp

llvm/lib/CodeGen/MIRSampleProfile.cpp

llvm/lib/CodeGen/PseudoProbeInserter.cpp

llvm/lib/Transforms/IPO/SampleProfileProbe.cpp

llvm/test/CodeGen/X86/Inputs/fsloader_v1.afdo

llvm/test/CodeGen/X86/fsafdo_test1.ll

llvm/test/CodeGen/X86/fsafdo_test2.ll

llvm/test/CodeGen/X86/fsafdo_test3.ll

llvm/test/CodeGen/X86/fsafdo_test4.ll

[FSAFDO] Improve FS discriminator encoding
ClosedPublic