This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
1/1
AsmPrinter.h
-
MC/
-
MCObjectFileInfo.h
-
lib/
-
CodeGen/AsmPrinter/
-
AsmPrinter/
2/6
AsmPrinter.cpp
-
MC/
1/2
MCObjectFileInfo.cpp
-
Target/X86/
-
X86/
-
X86InstrInfo.h
-
X86InstrInfo.cpp
-
test/Transforms/PGOProfile/
-
Transforms/
-
PGOProfile/
3/3
asm_emit_branch_prob.ll

Differential D158889

[AsmPrinter][PGO] Adds optional dumping of branch probabilities for PGO metrics.
Needs ReviewPublic

Authored by red1bluelost on Aug 25 2023, 2:10 PM.

Download Raw Diff

Details

Reviewers

wenlei
davidxl
mtrofin
MaskRay

Summary

[1/2] This is the first of two patches for branch accuracy metrics. Second at D158890.

This patch adds optional dumping of branch probabilities that can be used for collecting branch
accuracy metrics. Our metrics compare branch probabilities against execution traces. This patch is
for dumping those branch probabilities.

Testing:
New llvm test to verify branch probabilities are dumped as expected on x86_64.
Ran check-llvm with x86 enabled.
Probability dumping also testing using the scripts in the second patch of the series.

Diff Detail

Event Timeline

red1bluelost created this revision.Aug 25 2023, 2:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 2:10 PM

Herald added subscribers: wlei, pengfei, hiraditya. · View Herald Transcript

red1bluelost requested review of this revision.Aug 25 2023, 2:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 2:10 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

red1bluelost added a child revision: D158890: [PGO] Adds branch accuracy metric script and x86 branch tracing tool..Aug 25 2023, 2:10 PM

Harbormaster completed remote builds in B254993: Diff 553604.Aug 25 2023, 2:11 PM

red1bluelost edited the summary of this revision. (Show Details)Aug 25 2023, 2:11 PM

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Rebased patch to fix buildkite.

davidxl added a subscriber: mtrofin.Aug 25 2023, 2:43 PM

This is great! Can you share what tracing methodology you use?

LGTM with some nits.

llvm/include/llvm/CodeGen/AsmPrinter.h
390	nit: init at declaration (both fields)
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1657	why not just use `EnableBranchProbabilityDumping`?
llvm/test/Transforms/PGOProfile/asm_emit_branch_prob.ll
65	nit: line up the CHECK

This revision is now accepted and ready to land.Aug 25 2023, 3:14 PM

mtrofin added inline comments.Aug 25 2023, 3:15 PM

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1760	what about virtual calls?

BTW, have you considered using -mbb-profile-dump (defined in AsmPrinter.cpp) - same intention, just MBB frequencies, and dumping to a csv; also assumes compilation happened with -basic-block-sections=labels

Harbormaster completed remote builds in B255002: Diff 553631.Aug 25 2023, 7:04 PM

In D158889#4618718, @mtrofin wrote:

This is great! Can you share what tracing methodology you use?

We used Pin Tool to collect branch traces and a basic python script to process the data and generate CSVs
and stats. Those are in the second patch D158890. We've been able to use these metrics on x86 and
internally with a slightly modified script but it should be able to support other targets that can collect
branch traces.

In D158889#4619073, @mtrofin wrote:

BTW, have you considered using -mbb-profile-dump (defined in AsmPrinter.cpp) - same intention, just MBB frequencies, and dumping to a csv; also assumes compilation happened with -basic-block-sections=labels

We had not considered that. We mainly focused in on branch weights since it is a primary result of
SamplePGO. Definitely will look into if similar metrics can be done :)

In D158889#4618519, @wenlei wrote:

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Yeah, we internally serialize machine block execution counts to the binary. Would it be helpful to write those counts into some bb section similar to the llvm_bb_addr_map section?

Addresses nits.

Harbormaster completed remote builds in B255222: Diff 553931.Aug 28 2023, 7:49 AM

red1bluelost marked 2 inline comments as done.Aug 28 2023, 7:51 AM

red1bluelost added inline comments.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1657	Since we are reading from a global variable and the following loop it probably hot code, I thought it would be better to cache it outside the loop rather than keep referencing the global.
1760	I don't think MachineInstr has a notion of virtual calls. Unless you mean something else or I'm missing something.

In D158889#4619737, @hoy wrote:

In D158889#4618519, @wenlei wrote:

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Yeah, we internally serialize machine block execution counts to the binary. Would it be helpful to write those counts into some bb section similar to the llvm_bb_addr_map section?

This is interesting and similar but our metrics are only focusing on branch weights at the moment. We've focused on branch weights so far since they are a direct result of Profile Loading.

At least for the current state of the metrics here and consumed in D158890, there is not a use for block counts but it could be a future extension of the metrics.

mtrofin added inline comments.Aug 28 2023, 8:58 AM

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1760	Correct, but (I mean this as brainstorming, not blocking this patch - which still lgtm) you could for instance record "there's an indirect call here, with <profile>" and correlate with your trace. I.e. "are we missing out on indirect call promotion opportunities".

In D158889#4621393, @red1bluelost wrote:

In D158889#4619737, @hoy wrote:

In D158889#4618519, @wenlei wrote:

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Yeah, we internally serialize machine block execution counts to the binary. Would it be helpful to write those counts into some bb section similar to the llvm_bb_addr_map section?

This is interesting and similar but our metrics are only focusing on branch weights at the moment. We've focused on branch weights so far since they are a direct result of Profile Loading.

At least for the current state of the metrics here and consumed in D158890, there is not a use for block counts but it could be a future extension of the metrics.

Thanks for the information. Maybe we can look into encoding block counts metrics somewhere.

A bit more context about our use case: we use block counts to check if the compiler does a good job maintaining the input profile quality throughout its pipeline. Using branch weights would require an offline BFI computation to recover the block counts based on the branch weights, which may not give as accurate information as the profiler (such as the LBR profiler) gives.

red1bluelost added inline comments.Aug 29 2023, 1:46 PM

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1760	That definitely sounds like a good idea. Our encoded reserved space just for this type of extra info. Agreed it'd fit in a later patch :)

@mtrofin Would you be able to commit on my behalf? (Name: Micah Weston, Email: micahsweston@gmail.com)

MaskRay added a subscriber: MaskRay.Aug 30 2023, 2:34 PM

MaskRay added inline comments.

llvm/lib/MC/MCObjectFileInfo.cpp
1225	This logic hasn't been thoroughly tested. We need comdat and `!associated`. Please see the log of the code you copied from.

MaskRay requested changes to this revision.Aug 30 2023, 2:39 PM

MaskRay added inline comments.

llvm/test/Transforms/PGOProfile/asm_emit_branch_prob.ll
2	We likely need an explicit `-mtriple=x86_64` for llc commands and use `REQUIRES: x86-registered-target`.
69	Please don't test the string before `warning:`. It's not reliable in some environments when the tool is a symlink to a real executable.

This revision now requires changes to proceed.Aug 30 2023, 2:39 PM

Addresses two of the suggestions.

red1bluelost marked 2 inline comments as done.Aug 30 2023, 3:39 PM

red1bluelost added inline comments.

llvm/lib/MC/MCObjectFileInfo.cpp
1225	Could you elaborate on this? I'm very confused as to what you are referring to or how to go about fixing it.

Harbormaster completed remote builds in B255887: Diff 554846.Aug 30 2023, 4:22 PM

Fixes formatting issue with git clang-format.

Harbormaster completed remote builds in B255909: Diff 554874.Aug 30 2023, 6:37 PM

In D158889#4621393, @red1bluelost wrote:

In D158889#4619737, @hoy wrote:

In D158889#4618519, @wenlei wrote:

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Yeah, we internally serialize machine block execution counts to the binary. Would it be helpful to write those counts into some bb section similar to the llvm_bb_addr_map section?

This is interesting and similar but our metrics are only focusing on branch weights at the moment. We've focused on branch weights so far since they are a direct result of Profile Loading.

At least for the current state of the metrics here and consumed in D158890, there is not a use for block counts but it could be a future extension of the metrics.

We're probably among the few people that will actually benefit from something like this, but honestly I'm still a bit unsure whether the use case is common enough to justify built-in support like this.

However, if we make something like this part of llvm, I suggest we at least make it as general purpose as possible. A few comments related to that:

Try to incorporate block counts/frequencies as well. Most of the researches on profile quality use a block overlap metric which relies on block counts rather than branch probabilities. Our internal version also uses block counts, as branch weights cannot represent the profile for branchless code.

Instead of coupling this with a specific consumer, Pin tool in your case, and in the next patch, I suggest we build general support to decode such metadata section, so tools like llvm-objdump can be used to inspect its payload.

In D158889#4633333, @wenlei wrote:

In D158889#4621393, @red1bluelost wrote:

In D158889#4619737, @hoy wrote:

In D158889#4618519, @wenlei wrote:

We have something similar internally, but didn't upstream because we are not sure if the use case is too narrow to justify burdening the code base with all the related complexity. cc @hoy

Yeah, we internally serialize machine block execution counts to the binary. Would it be helpful to write those counts into some bb section similar to the llvm_bb_addr_map section?

This is interesting and similar but our metrics are only focusing on branch weights at the moment. We've focused on branch weights so far since they are a direct result of Profile Loading.

At least for the current state of the metrics here and consumed in D158890, there is not a use for block counts but it could be a future extension of the metrics.

We're probably among the few people that will actually benefit from something like this, but honestly I'm still a bit unsure whether the use case is common enough to justify built-in support like this.

Even if relatively few groups would build up on such information, arguably PGO (as a technique) is probably the most impactful tool we have for improving performance. Like I think we're all observing here, profiles aren't necessarily well maintained throughout passes - it's actually easy to write a pass that accidentally and silently drops it, or that corrupts it somehow. I think having primitives in llvm that can help anyone interested build validation tooling and detect and fix such bugs would end up helping the community way more than their cost.

Maybe we can even, eventually, have a layer of defense doing this on a build bot with e.g. llvm-test-suite benchmarks. (I have a rfc for doing some even simpler validation transparently as part of opt/llc, would send it after the long weekend)

However, if we make something like this part of llvm, I suggest we at least make it as general purpose as possible. A few comments related to that:

Try to incorporate block counts/frequencies as well. Most of the researches on profile quality use a block overlap metric which relies on block counts rather than branch probabilities. Our internal version also uses block counts, as branch weights cannot represent the profile for branchless code.

Instead of coupling this with a specific consumer, Pin tool in your case, and in the next patch, I suggest we build general support to decode such metadata section, so tools like llvm-objdump can be used to inspect its payload.

+1!

Actually - @red1bluelost - sorry if this would creep up the scope of your work, but since we're talking design choices, would summarizing in a RFC be maybe a good step? Then we can discuss the motivation and design choices in the community, and since there is similar work (@wenlei's earlier remark), their experience will surely help!

In D158889#4633333, @wenlei wrote:

We're probably among the few people that will actually benefit from something like this, but honestly I'm still a bit unsure whether the use case is common enough to justify built-in support like this.

Micrea makes a good point about it being easy for a pass to corrupt PGO information the same way it can corrupt debug information. Having a tool to track or debug PGO info like branch_weights, even if something different than this patch, seems beneficial for upstream.

In D158889#4633333, @wenlei wrote:

Try to incorporate block counts/frequencies as well. Most of the researches on profile quality use a block overlap metric which relies on block counts rather than branch probabilities. Our internal version also uses block counts, as branch weights cannot represent the profile for branchless code.

I'll try to look into this.

In D158889#4633333, @wenlei wrote:

Instead of coupling this with a specific consumer, Pin tool in your case, and in the next patch, I suggest we build general support to decode such metadata section, so tools like llvm-objdump can be used to inspect its payload.

For the metrics, we need an execution branch trace for comparison, Pin tool we've found works for x86 and we have an tool for one of our internal targets. We hope to keep the tracing part minimally coupled to the compiler metadata.

Makes sense to add general support for extracting the section info. I can try to look into it.

In D158889#4633353, @mtrofin wrote:

Even if relatively few groups would build up on such information, arguably PGO (as a technique) is probably the most impactful tool we have for improving performance. Like I think we're all observing here, profiles aren't necessarily well maintained throughout passes - it's actually easy to write a pass that accidentally and silently drops it, or that corrupts it somehow. I think having primitives in llvm that can help anyone interested build validation tooling and detect and fix such bugs would end up helping the community way more than their cost.

Agreed that it is easy to corrupt. The team I'm with at MediaTek has found cases in LLVM passes and our backend where branch weights are mishandled or not updated which was a driving factor to developing metrics for it.

In D158889#4633353, @mtrofin wrote:

Maybe we can even, eventually, have a layer of defense doing this on a build bot with e.g. llvm-test-suite benchmarks. (I have a rfc for doing some even simpler validation transparently as part of opt/llc, would send it after the long weekend)

Internally we are planning to track some of our benchmarks using these metrics since we've found its helpful for this source of change/regression tracking for our PGO data.

In D158889#4633353, @mtrofin wrote:

Actually - @red1bluelost - sorry if this would creep up the scope of your work, but since we're talking design choices, would summarizing in a RFC be maybe a good step? Then we can discuss the motivation and design choices in the community, and since there is similar work (@wenlei's earlier remark), their experience will surely help!

That makes sense. I can see about getting stuff written up. It will take some time. I'll also be giving a talk at the developer meeting in October on how we've been using the metrics and where we think it can help others.

[...]

In D158889#4633353, @mtrofin wrote:

Actually - @red1bluelost - sorry if this would creep up the scope of your work, but since we're talking design choices, would summarizing in a RFC be maybe a good step? Then we can discuss the motivation and design choices in the community, and since there is similar work (@wenlei's earlier remark), their experience will surely help!

That makes sense. I can see about getting stuff written up. It will take some time. I'll also be giving a talk at the developer meeting in October on how we've been using the metrics and where we think it can help others.

BTW, the "Practical Compiler Optimizations for Warehouse-Scale Applications" workshop at the conference would also be good to attend, the topic is very much in scope!

In D158889#4635143, @red1bluelost wrote:

In D158889#4633333, @wenlei wrote:

We're probably among the few people that will actually benefit from something like this, but honestly I'm still a bit unsure whether the use case is common enough to justify built-in support like this.

Micrea makes a good point about it being easy for a pass to corrupt PGO information the same way it can corrupt debug information. Having a tool to track or debug PGO info like branch_weights, even if something different than this patch, seems beneficial for upstream.

In D158889#4633333, @wenlei wrote:

Try to incorporate block counts/frequencies as well. Most of the researches on profile quality use a block overlap metric which relies on block counts rather than branch probabilities. Our internal version also uses block counts, as branch weights cannot represent the profile for branchless code.

I'll try to look into this.

In D158889#4633333, @wenlei wrote:

Instead of coupling this with a specific consumer, Pin tool in your case, and in the next patch, I suggest we build general support to decode such metadata section, so tools like llvm-objdump can be used to inspect its payload.

For the metrics, we need an execution branch trace for comparison, Pin tool we've found works for x86 and we have an tool for one of our internal targets. We hope to keep the tracing part minimally coupled to the compiler metadata.

Makes sense to add general support for extracting the section info. I can try to look into it.

In D158889#4633353, @mtrofin wrote:

Even if relatively few groups would build up on such information, arguably PGO (as a technique) is probably the most impactful tool we have for improving performance. Like I think we're all observing here, profiles aren't necessarily well maintained throughout passes - it's actually easy to write a pass that accidentally and silently drops it, or that corrupts it somehow. I think having primitives in llvm that can help anyone interested build validation tooling and detect and fix such bugs would end up helping the community way more than their cost.

Agreed that it is easy to corrupt. The team I'm with at MediaTek has found cases in LLVM passes and our backend where branch weights are mishandled or not updated which was a driving factor to developing metrics for it.

In D158889#4633353, @mtrofin wrote:

Maybe we can even, eventually, have a layer of defense doing this on a build bot with e.g. llvm-test-suite benchmarks. (I have a rfc for doing some even simpler validation transparently as part of opt/llc, would send it after the long weekend)

Internally we are planning to track some of our benchmarks using these metrics since we've found its helpful for this source of change/regression tracking for our PGO data.

In D158889#4633353, @mtrofin wrote:

Actually - @red1bluelost - sorry if this would creep up the scope of your work, but since we're talking design choices, would summarizing in a RFC be maybe a good step? Then we can discuss the motivation and design choices in the community, and since there is similar work (@wenlei's earlier remark), their experience will surely help!

That makes sense. I can see about getting stuff written up. It will take some time. I'll also be giving a talk at the developer meeting in October on how we've been using the metrics and where we think it can help others.

Thanks. If we make this general enough, I agree that it'd be useful. Looking forward to RFC and related discussions.

LLVM generally doesn't do a good in profile maintenance (updating branch_weights throughout optimization pipeline), and a good step towards improving that would be having a way to quantify profile quality, which is something like this would help.

Matt added a subscriber: Matt.Sep 5 2023, 3:00 PM

MatzeB added a subscriber: MatzeB.Sep 11 2023, 9:58 AM

Closing this review and the second (D158890) as I work on the RFC. By the time a second review iteration is up, it will be via GitHub PR. Hoping to get the RFC out in the next 2-3 weeks.

red1bluelost mentioned this in D158890: [PGO] Adds branch accuracy metric script and x86 branch tracing tool..Sep 15 2023, 10:29 AM

Re-opening for RFC: https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

AsmPrinter.h

16 lines

MC/

MCObjectFileInfo.h

5 lines

lib/

CodeGen/

AsmPrinter/

AsmPrinter.cpp

76 lines

MC/

MCObjectFileInfo.cpp

22 lines

Target/

X86/

X86InstrInfo.h

2 lines

X86InstrInfo.cpp

8 lines

test/

Transforms/

PGOProfile/

asm_emit_branch_prob.ll

69 lines

Diff 554874

llvm/include/llvm/CodeGen/AsmPrinter.h

Show First 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	void recordSled(MCSymbol *Sled, const MachineInstr &MI, SledKind Kind,
uint8_t Version = 0);		uint8_t Version = 0);

/// Emit a table with all XRay instrumentation points.		/// Emit a table with all XRay instrumentation points.
void emitXRayTable();		void emitXRayTable();

void emitPatchableFunctionEntries();		void emitPatchableFunctionEntries();

//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
		// Branch Probability Dumping Implementation
		//===------------------------------------------------------------------===//

		struct BranchProbEntry {
		const MCSymbol *Sym = nullptr;
		double Probability = 0.0;
		};
		mtrofinUnsubmitted Done Reply Inline Actions nit: init at declaration (both fields) mtrofin: nit: init at declaration (both fields)

		// All the branch probabilities to be emitted.
		std::vector<BranchProbEntry> BranchProbs;

		void emitLabelAndRecordBranchProb(double Probability);

		void emitBranchProbabilitySection();

		//===------------------------------------------------------------------===//
// MachineFunctionPass Implementation.		// MachineFunctionPass Implementation.
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//

/// Record analysis usage.		/// Record analysis usage.
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

/// Set up the AsmPrinter when we are working on a new module. If your pass		/// Set up the AsmPrinter when we are working on a new module. If your pass
/// overrides this, it must make sure to explicitly call this implementation.		/// overrides this, it must make sure to explicitly call this implementation.
▲ Show 20 Lines • Show All 495 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCObjectFileInfo.h

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	protected:

/// Section containing metadata on function stack sizes.		/// Section containing metadata on function stack sizes.
MCSection *StackSizesSection = nullptr;		MCSection *StackSizesSection = nullptr;

/// Section for pseudo probe information used by AutoFDO		/// Section for pseudo probe information used by AutoFDO
MCSection *PseudoProbeSection = nullptr;		MCSection *PseudoProbeSection = nullptr;
MCSection *PseudoProbeDescSection = nullptr;		MCSection *PseudoProbeDescSection = nullptr;

		/// Section containing dumped branch probabilities
		MCSection *BranchProbabilitySection = nullptr;

// Section for metadata of llvm statistics.		// Section for metadata of llvm statistics.
MCSection *LLVMStatsSection = nullptr;		MCSection *LLVMStatsSection = nullptr;

// ELF specific sections.		// ELF specific sections.
MCSection *DataRelROSection = nullptr;		MCSection *DataRelROSection = nullptr;
MCSection *MergeableConst4Section = nullptr;		MCSection *MergeableConst4Section = nullptr;
MCSection *MergeableConst8Section = nullptr;		MCSection *MergeableConst8Section = nullptr;
MCSection *MergeableConst16Section = nullptr;		MCSection *MergeableConst16Section = nullptr;
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	public:
MCSection *getBBAddrMapSection(const MCSection &TextSec) const;		MCSection *getBBAddrMapSection(const MCSection &TextSec) const;

MCSection *getKCFITrapSection(const MCSection &TextSec) const;		MCSection *getKCFITrapSection(const MCSection &TextSec) const;

MCSection *getPseudoProbeSection(const MCSection &TextSec) const;		MCSection *getPseudoProbeSection(const MCSection &TextSec) const;

MCSection *getPseudoProbeDescSection(StringRef FuncName) const;		MCSection *getPseudoProbeDescSection(StringRef FuncName) const;

		MCSection *getBranchProbabilitySection(const MCSection &TextSec) const;

MCSection *getLLVMStatsSection() const;		MCSection *getLLVMStatsSection() const;

MCSection getPCSection(StringRef Name, const MCSection TextSec) const;		MCSection getPCSection(StringRef Name, const MCSection TextSec) const;

// ELF specific sections.		// ELF specific sections.
MCSection *getDataRelROSection() const { return DataRelROSection; }		MCSection *getDataRelROSection() const { return DataRelROSection; }
const MCSection *getMergeableConst4Section() const {		const MCSection *getMergeableConst4Section() const {
return MergeableConst4Section;		return MergeableConst4Section;
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp

Show All 16 Lines
#include "PseudoProbePrinter.h"		#include "PseudoProbePrinter.h"
#include "WasmException.h"		#include "WasmException.h"
#include "WinCFGuard.h"		#include "WinCFGuard.h"
#include "WinException.h"		#include "WinException.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/TinyPtrVector.h"		#include "llvm/ADT/TinyPtrVector.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/BinaryFormat/COFF.h"		#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/GCMetadata.h"		#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/CodeGen/GCMetadataPrinter.h"		#include "llvm/CodeGen/GCMetadataPrinter.h"
#include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineConstantPool.h"		#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBundle.h"		#include "llvm/CodeGen/MachineInstrBundle.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"		#include "llvm/CodeGen/MachineJumpTableInfo.h"
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetLoweringObjectFile.h"		#include "llvm/Target/TargetLoweringObjectFile.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include "llvm/TargetParser/Triple.h"		#include "llvm/TargetParser/Triple.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cinttypes>		#include <cinttypes>
		#include <cmath>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <memory>		#include <memory>
#include <optional>		#include <optional>
#include <string>		#include <string>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "asm-printer"		#define DEBUG_TYPE "asm-printer"

static cl::opt<std::string> BasicBlockProfileDump(		static cl::opt<std::string> BasicBlockProfileDump(
"mbb-profile-dump", cl::Hidden,		"mbb-profile-dump", cl::Hidden,
cl::desc("Basic block profile dump for external cost modelling. If "		cl::desc("Basic block profile dump for external cost modelling. If "
"matching up BBs with afterwards, the compilation must be "		"matching up BBs with afterwards, the compilation must be "
"performed with -basic-block-sections=labels. Enabling this "		"performed with -basic-block-sections=labels. Enabling this "
"flag during in-process ThinLTO is not supported."));		"flag during in-process ThinLTO is not supported."));

		static cl::opt<bool> EnableBranchProbabilityDumping(
		"emit-asm-branch-probabilities", cl::init(false), cl::Hidden,
		cl::desc("Dump branch probabilities during asm printing to debug section"));

const char DWARFGroupName[] = "dwarf";		const char DWARFGroupName[] = "dwarf";
const char DWARFGroupDescription[] = "DWARF Emission";		const char DWARFGroupDescription[] = "DWARF Emission";
const char DbgTimerName[] = "emit";		const char DbgTimerName[] = "emit";
const char DbgTimerDescription[] = "Debug Info Emission";		const char DbgTimerDescription[] = "Debug Info Emission";
const char EHTimerName[] = "write_exception";		const char EHTimerName[] = "write_exception";
const char EHTimerDescription[] = "DWARF Exception Writer";		const char EHTimerDescription[] = "DWARF Exception Writer";
const char CFGuardName[] = "Control Flow Guard";		const char CFGuardName[] = "Control Flow Guard";
const char CFGuardDescription[] = "Control Flow Guard";		const char CFGuardDescription[] = "Control Flow Guard";
▲ Show 20 Lines • Show All 1,443 Lines • ▼ Show 20 Lines	static bool needFuncLabels(const MachineFunction &MF) {
// We might emit an EH table that uses function begin and end labels even if		// We might emit an EH table that uses function begin and end labels even if
// we don't have any landingpads.		// we don't have any landingpads.
if (!MF.getFunction().hasPersonalityFn())		if (!MF.getFunction().hasPersonalityFn())
return false;		return false;
return !isNoOpWithoutInvoke(		return !isNoOpWithoutInvoke(
classifyEHPersonality(MF.getFunction().getPersonalityFn()));		classifyEHPersonality(MF.getFunction().getPersonalityFn()));
}		}

		/// Calculates probability of taking the branch target if MBPI is available.
		static std::optional<double>
		getBranchProbForCondBr(const MachineInstr &MI,
		MachineBranchProbabilityInfo *MBPI) {
		assert(MI.isConditionalBranch() && !MI.isCall());

		const MachineBasicBlock *Parent = MI.getParent();
		const TargetInstrInfo *TII =
		Parent->getParent()->getSubtarget().getInstrInfo();
		assert(TII);

		const MachineBasicBlock *Target = TII->getBranchDestBlock(MI);
		const auto TargetProb = MBPI->getEdgeProbability(Parent, Target);
		const double Probability =
		double(TargetProb.getNumerator()) / double(TargetProb.getDenominator());
		return Probability;
		}

/// EmitFunctionBody - This method emits the body and trailer for a		/// EmitFunctionBody - This method emits the body and trailer for a
/// function.		/// function.
void AsmPrinter::emitFunctionBody() {		void AsmPrinter::emitFunctionBody() {
emitFunctionHeader();		emitFunctionHeader();

// Emit target-specific gunk before the function body.		// Emit target-specific gunk before the function body.
emitFunctionBodyStart();		emitFunctionBodyStart();

Show All 16 Lines	void AsmPrinter::emitFunctionBody() {
}		}

// Print out code for the function.		// Print out code for the function.
bool HasAnyRealCode = false;		bool HasAnyRealCode = false;
int NumInstsInFunction = 0;		int NumInstsInFunction = 0;
bool IsEHa = MMI->getModule()->getModuleFlag("eh-asynch");		bool IsEHa = MMI->getModule()->getModuleFlag("eh-asynch");

bool CanDoExtraAnalysis = ORE->allowExtraAnalysis(DEBUG_TYPE);		bool CanDoExtraAnalysis = ORE->allowExtraAnalysis(DEBUG_TYPE);
		const bool CanDumpBranchProbs = EnableBranchProbabilityDumping;
		mtrofinUnsubmitted Not Done Reply Inline Actions why not just use `EnableBranchProbabilityDumping`? mtrofin: why not just use `EnableBranchProbabilityDumping`?
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions Since we are reading from a global variable and the following loop it probably hot code, I thought it would be better to cache it outside the loop rather than keep referencing the global. red1bluelost: Since we are reading from a global variable and the following loop it probably hot code, I…
for (auto &MBB : *MF) {		for (auto &MBB : *MF) {
// Print a label for the basic block.		// Print a label for the basic block.
emitBasicBlockStart(MBB);		emitBasicBlockStart(MBB);
DenseMap<StringRef, unsigned> MnemonicCounts;		DenseMap<StringRef, unsigned> MnemonicCounts;
for (auto &MI : MBB) {		for (auto &MI : MBB) {
// Print the assembly for the instruction.		// Print the assembly for the instruction.
if (!MI.isPosition() && !MI.isImplicitDef() && !MI.isKill() &&		if (!MI.isPosition() && !MI.isImplicitDef() && !MI.isKill() &&
!MI.isDebugInstr()) {		!MI.isDebugInstr()) {
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	for (auto &MI : MBB) {
case TargetOpcode::ARITH_FENCE:		case TargetOpcode::ARITH_FENCE:
if (isVerbose())		if (isVerbose())
OutStreamer->emitRawComment("ARITH_FENCE");		OutStreamer->emitRawComment("ARITH_FENCE");
break;		break;
case TargetOpcode::MEMBARRIER:		case TargetOpcode::MEMBARRIER:
OutStreamer->emitRawComment("MEMBARRIER");		OutStreamer->emitRawComment("MEMBARRIER");
break;		break;
default:		default:
		if (CanDumpBranchProbs && MI.isConditionalBranch() && !MI.isCall()) {
		if (auto *MBPI =
		getAnalysisIfAvailable<MachineBranchProbabilityInfo>()) {
		if (auto Prob = getBranchProbForCondBr(MI, MBPI))
		emitLabelAndRecordBranchProb(*Prob);
		mtrofinUnsubmitted Not Done Reply Inline Actions what about virtual calls? mtrofin: what about virtual calls?
		red1bluelostAuthorUnsubmitted Not Done Reply Inline Actions I don't think MachineInstr has a notion of virtual calls. Unless you mean something else or I'm missing something. red1bluelost: I don't think MachineInstr has a notion of virtual calls. Unless you mean something else or I'm…
		mtrofinUnsubmitted Not Done Reply Inline Actions Correct, but (I mean this as brainstorming, not blocking this patch - which still lgtm) you could for instance record "there's an indirect call here, with <profile>" and correlate with your trace. I.e. "are we missing out on indirect call promotion opportunities". mtrofin: Correct, but (I mean this as brainstorming, not blocking this patch - which still lgtm) you…
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions That definitely sounds like a good idea. Our encoded reserved space just for this type of extra info. Agreed it'd fit in a later patch :) red1bluelost: That definitely sounds like a good idea. Our encoded reserved space just for this type of extra…
		}
		}
emitInstruction(&MI);		emitInstruction(&MI);
if (CanDoExtraAnalysis) {		if (CanDoExtraAnalysis) {
MCInst MCI;		MCInst MCI;
MCI.setOpcode(MI.getOpcode());		MCI.setOpcode(MI.getOpcode());
auto Name = OutStreamer->getMnemonic(MCI);		auto Name = OutStreamer->getMnemonic(MCI);
auto I = MnemonicCounts.insert({Name, 0u});		auto I = MnemonicCounts.insert({Name, 0u});
I.first->second++;		I.first->second++;
}		}
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	void AsmPrinter::emitFunctionBody() {
// Emit section containing stack size metadata.		// Emit section containing stack size metadata.
emitStackSizeSection(*MF);		emitStackSizeSection(*MF);

// Emit .su file containing function stack size information.		// Emit .su file containing function stack size information.
emitStackUsage(*MF);		emitStackUsage(*MF);

emitPatchableFunctionEntries();		emitPatchableFunctionEntries();

		emitBranchProbabilitySection();

if (isVerbose())		if (isVerbose())
OutStreamer->getCommentOS() << "-- End function\n";		OutStreamer->getCommentOS() << "-- End function\n";

OutStreamer->addBlankLine();		OutStreamer->addBlankLine();

// Output MBB ids, function names, and frequencies if the flag to dump		// Output MBB ids, function names, and frequencies if the flag to dump
// MBB profile information has been set		// MBB profile information has been set
if (MBBProfileDumpFileOutput) {		if (MBBProfileDumpFileOutput) {
▲ Show 20 Lines • Show All 2,205 Lines • ▼ Show 20 Lines	if (TM.getTargetTriple().isOSBinFormatELF()) {
OutStreamer->switchSection(OutContext.getELFSection(		OutStreamer->switchSection(OutContext.getELFSection(
"__patchable_function_entries", ELF::SHT_PROGBITS, Flags, 0, GroupName,		"__patchable_function_entries", ELF::SHT_PROGBITS, Flags, 0, GroupName,
F.hasComdat(), MCSection::NonUniqueID, LinkedToSym));		F.hasComdat(), MCSection::NonUniqueID, LinkedToSym));
emitAlignment(Align(PointerSize));		emitAlignment(Align(PointerSize));
OutStreamer->emitSymbolValue(CurrentPatchableFunctionEntrySym, PointerSize);		OutStreamer->emitSymbolValue(CurrentPatchableFunctionEntrySym, PointerSize);
}		}
}		}

		void AsmPrinter::emitLabelAndRecordBranchProb(const double Probability) {
		assert(Probability >= 0.0 && Probability <= 1.0 &&
		"branch probability should be in range [0.0,1.0]");

		MCSymbol *Label = OutStreamer->getContext().createTempSymbol("branch_prob");
		Label->setUsedInReloc();
		OutStreamer->emitLabel(Label);

		BranchProbs.push_back({Label, Probability});
		}

		void AsmPrinter::emitBranchProbabilitySection() {
		if (BranchProbs.empty())
		return;

		MCSection *BranchProbSec =
		getObjFileLowering().getBranchProbabilitySection(*getCurrentSection());
		if (!BranchProbSec)
		return;

		unsigned WordSizeBytes = MAI->getCodePointerSize();

		OutStreamer->pushSection();
		auto OnExit = make_scope_exit([&] { OutStreamer->popSection(); });
		OutStreamer->switchSection(BranchProbSec);

		for (auto [Label, Prob] : BranchProbs) {
		OutStreamer->emitSymbolValue(Label, WordSizeBytes);

		// Encode all data within 4 bytes to support 32 and 64 bit targets.
		// Convert branch probability to value 0 - 10000 using 14 bits.
		const long Val = lround(Prob * 10000.0);
		assert(Val >= 0 && Val <= 10000);
		const uint32_t Encoded = static_cast<uint32_t>(Val & 0x3fff);
		OutStreamer->emitInt32(Encoded);

		// Pad for alignment and reserved for future use
		OutStreamer->emitZeros(WordSizeBytes - sizeof(Encoded));
		}
		}

uint16_t AsmPrinter::getDwarfVersion() const {		uint16_t AsmPrinter::getDwarfVersion() const {
return OutStreamer->getContext().getDwarfVersion();		return OutStreamer->getContext().getDwarfVersion();
}		}

void AsmPrinter::setDwarfVersion(uint16_t Version) {		void AsmPrinter::setDwarfVersion(uint16_t Version) {
OutStreamer->getContext().setDwarfVersion(Version);		OutStreamer->getContext().setDwarfVersion(Version);
}		}

Show All 19 Lines

llvm/lib/MC/MCObjectFileInfo.cpp

Show First 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	EHFrameSection =
Ctx->getELFSection(".eh_frame", EHSectionType, EHSectionFlags);		Ctx->getELFSection(".eh_frame", EHSectionType, EHSectionFlags);

StackSizesSection = Ctx->getELFSection(".stack_sizes", ELF::SHT_PROGBITS, 0);		StackSizesSection = Ctx->getELFSection(".stack_sizes", ELF::SHT_PROGBITS, 0);

PseudoProbeSection = Ctx->getELFSection(".pseudo_probe", DebugSecType, 0);		PseudoProbeSection = Ctx->getELFSection(".pseudo_probe", DebugSecType, 0);
PseudoProbeDescSection =		PseudoProbeDescSection =
Ctx->getELFSection(".pseudo_probe_desc", DebugSecType, 0);		Ctx->getELFSection(".pseudo_probe_desc", DebugSecType, 0);

		BranchProbabilitySection =
		Ctx->getELFSection(".branch_probabilities", DebugSecType, 0);

LLVMStatsSection = Ctx->getELFSection(".llvm_stats", ELF::SHT_PROGBITS, 0);		LLVMStatsSection = Ctx->getELFSection(".llvm_stats", ELF::SHT_PROGBITS, 0);
}		}

void MCObjectFileInfo::initGOFFMCObjectFileInfo(const Triple &T) {		void MCObjectFileInfo::initGOFFMCObjectFileInfo(const Triple &T) {
TextSection =		TextSection =
Ctx->getGOFFSection(".text", SectionKind::getText(), nullptr, nullptr);		Ctx->getGOFFSection(".text", SectionKind::getText(), nullptr, nullptr);
BSSSection =		BSSSection =
Ctx->getGOFFSection(".bss", SectionKind::getBSS(), nullptr, nullptr);		Ctx->getGOFFSection(".bss", SectionKind::getBSS(), nullptr, nullptr);
▲ Show 20 Lines • Show All 660 Lines • ▼ Show 20 Lines	if (Ctx->getTargetTriple().supportsCOMDAT() && !FuncName.empty()) {
S->getEntrySize(),		S->getEntrySize(),
S->getName() + "_" + FuncName,		S->getName() + "_" + FuncName,
/IsComdat=/true);		/IsComdat=/true);
}		}
}		}
return PseudoProbeDescSection;		return PseudoProbeDescSection;
}		}

		MCSection *
		MCObjectFileInfo::getBranchProbabilitySection(const MCSection &TextSec) const {
		if (Ctx->getObjectFileType() != MCContext::IsELF)
		return BranchProbabilitySection;

		const auto &ElfSec = static_cast<const MCSectionELF &>(TextSec);
		unsigned Flags = ELF::SHF_LINK_ORDER;
		StringRef GroupName;
		MaskRayUnsubmitted Not Done Reply Inline Actions This logic hasn't been thoroughly tested. We need comdat and `!associated`. Please see the log of the code you copied from. MaskRay: This logic hasn't been thoroughly tested. We need comdat and `!associated`. Please see the log…
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions Could you elaborate on this? I'm very confused as to what you are referring to or how to go about fixing it. red1bluelost: Could you elaborate on this? I'm very confused as to what you are referring to or how to go…
		if (const MCSymbol *Group = ElfSec.getGroup()) {
		GroupName = Group->getName();
		Flags \|= ELF::SHF_GROUP;
		}

		return Ctx->getELFSection(BranchProbabilitySection->getName(),
		ELF::SHT_PROGBITS, Flags, 0, GroupName, true,
		ElfSec.getUniqueID(),
		cast<MCSymbolELF>(TextSec.getBeginSymbol()));
		}

MCSection *MCObjectFileInfo::getLLVMStatsSection() const {		MCSection *MCObjectFileInfo::getLLVMStatsSection() const {
return LLVMStatsSection;		return LLVMStatsSection;
}		}

MCSection *MCObjectFileInfo::getPCSection(StringRef Name,		MCSection *MCObjectFileInfo::getPCSection(StringRef Name,
const MCSection *TextSec) const {		const MCSection *TextSec) const {
if (Ctx->getObjectFileType() != MCContext::IsELF)		if (Ctx->getObjectFileType() != MCContext::IsELF)
return nullptr;		return nullptr;
Show All 17 Lines

llvm/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	void replaceBranchWithTailCall(MachineBasicBlock &MBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
const MachineInstr &TailCall) const override;		const MachineInstr &TailCall) const override;

bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,		bool analyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const override;		bool AllowModify) const override;

		MachineBasicBlock *getBranchDestBlock(const MachineInstr &MI) const override;

int getJumpTableIndex(const MachineInstr &MI) const override;		int getJumpTableIndex(const MachineInstr &MI) const override;

std::optional<ExtAddrMode>		std::optional<ExtAddrMode>
getAddrModeFromMemoryOp(const MachineInstr &MemI,		getAddrModeFromMemoryOp(const MachineInstr &MemI,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;

bool getConstValDefinedInReg(const MachineInstr &MI, const Register Reg,		bool getConstValDefinedInReg(const MachineInstr &MI, const Register Reg,
int64_t &ImmVal) const override;		int64_t &ImmVal) const override;
▲ Show 20 Lines • Show All 364 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,208 Lines • ▼ Show 20 Lines	bool X86InstrInfo::analyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,		MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const {		bool AllowModify) const {
SmallVector<MachineInstr *, 4> CondBranches;		SmallVector<MachineInstr *, 4> CondBranches;
return AnalyzeBranchImpl(MBB, TBB, FBB, Cond, CondBranches, AllowModify);		return AnalyzeBranchImpl(MBB, TBB, FBB, Cond, CondBranches, AllowModify);
}		}

		MachineBasicBlock *
		X86InstrInfo::getBranchDestBlock(const MachineInstr &MI) const {
		assert(MI.getDesc().isBranch() && !MI.getDesc().isIndirectBranch() &&
		!MI.getDesc().isCall() && "Unexpected opcode!");
		// direct jumps and direct conditional jumps have target in first operand
		return MI.getOperand(0).getMBB();
		}

static int getJumpTableIndexFromAddr(const MachineInstr &MI) {		static int getJumpTableIndexFromAddr(const MachineInstr &MI) {
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
int MemRefBegin = X86II::getMemoryOperandNo(Desc.TSFlags);		int MemRefBegin = X86II::getMemoryOperandNo(Desc.TSFlags);
assert(MemRefBegin >= 0 && "instr should have memory operand");		assert(MemRefBegin >= 0 && "instr should have memory operand");
MemRefBegin += X86II::getOperandBias(Desc);		MemRefBegin += X86II::getOperandBias(Desc);

const MachineOperand &MO = MI.getOperand(MemRefBegin + X86::AddrDisp);		const MachineOperand &MO = MI.getOperand(MemRefBegin + X86::AddrDisp);
if (!MO.isJTI())		if (!MO.isJTI())
▲ Show 20 Lines • Show All 6,711 Lines • Show Last 20 Lines

llvm/test/Transforms/PGOProfile/asm_emit_branch_prob.ll

This file was added.

				; REQUIRES: x86_64-linux
				; REQUIRES: x86-registered-target
				MaskRayUnsubmitted Done Reply Inline Actions We likely need an explicit `-mtriple=x86_64` for llc commands and use `REQUIRES: x86-registered-target`. MaskRay: We likely need an explicit `-mtriple=x86_64` for llc commands and use `REQUIRES: x86-registered…

				; RUN: llc < %s -O1 -mtriple=x86_64 -filetype=obj -o %t -emit-asm-branch-probabilities
				; RUN: llvm-readelf --hex-dump=.branch_probabilities %t \
				; RUN: \| FileCheck %s --check-prefix=CHECK-BP-SEC

				; RUN: llc < %s -O1 -mtriple=x86_64 -filetype=obj -o %t
				; RUN: llvm-readelf --hex-dump=.branch_probabilities %t 2>&1 \
				; RUN: \| FileCheck %s --check-prefix=CHECK-NO-BP-SEC

				; When enabled, the branch probabilities are dumped with the jump address and
				; the percent encoded as a char from 0 to 10000.

				; // Test original source code
				; void sink(int &);
				; int test(int a, int b, int c) {
				; if (a > 100)
				; sink(a);
				; sink(c);
				; if (b > 100)
				; sink(b);
				; return a + b + c;
				; }

				define dso_local noundef i32 @_Z4testiii(i32 noundef %0, i32 noundef %1, i32 noundef %2) local_unnamed_addr {
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4
				%6 = alloca i32, align 4
				store i32 %0, ptr %4, align 4
				store i32 %1, ptr %5, align 4
				store i32 %2, ptr %6, align 4
				%7 = icmp sgt i32 %0, 100
				br i1 %7, label %8, label %9, !prof !0

				8: ; preds = %3
				call void @_Z4sinkRi(ptr noundef nonnull align 4 dereferenceable(4) %4)
				br label %9

				9: ; preds = %8, %3
				call void @_Z4sinkRi(ptr noundef nonnull align 4 dereferenceable(4) %6)
				%10 = icmp sgt i32 %1, 100
				br i1 %10, label %11, label %13, !prof !1

				11: ; preds = %9
				call void @_Z4sinkRi(ptr noundef nonnull align 4 dereferenceable(4) %5)
				%12 = load i32, ptr %5, align 4
				br label %13

				13: ; preds = %11, %9
				%14 = phi i32 [ %12, %11 ], [ %1, %9 ]
				%15 = load i32, ptr %4, align 4
				%16 = add nsw i32 %14, %15
				%17 = load i32, ptr %6, align 4
				%18 = add nsw i32 %16, %17
				ret i32 %18
				}

				declare void @_Z4sinkRi(ptr noundef nonnull align 4 dereferenceable(4)) local_unnamed_addr #1


				!0 = !{!"branch_weights", i32 1000, i32 3000}
				!1 = !{!"branch_weights", i32 3600, i32 400}

				; CHECK-BP-SEC: Hex dump of section '.branch_probabilities':
				mtrofinUnsubmitted Done Reply Inline Actions nit: line up the CHECK mtrofin: nit: line up the CHECK
				; CHECK-BP-SEC-NEXT: 0x00000000 {{([[:xdigit:]]{8}) ([[:xdigit:]]{8})}} {{4c1d\|c409}}0000 00000000 {{(.{16})}}
				; CHECK-BP-SEC-NEXT: 0x00000010 {{([[:xdigit:]]{8}) ([[:xdigit:]]{8})}} {{2823\|e803}}0000 00000000 {{(.{16})}}

				; CHECK-NO-BP-SEC: warning: '{{.*}}': could not find section '.branch_probabilities'
				MaskRayUnsubmitted Done Reply Inline Actions Please don't test the string before `warning:`. It's not reliable in some environments when the tool is a symlink to a real executable. MaskRay: Please don't test the string before `warning:`. It's not reliable in some environments when the…