This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/Frontend/
-
test/
-
Frontend/
-
amdgcn-machine-analysis-remarks.cl
-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUAsmPrinter.h
7/17
AMDGPUAsmPrinter.cpp
1/1
SIProgramInfo.h
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
resource-optimization-remarks.ll

Differential D123878

[AMDGPU] Add remarks to output some resource usage
ClosedPublic

Authored by vangthao on Apr 15 2022, 3:19 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
yaxunl
t-tye
scott.linder
b-sumner

Commits

rG67357739c6d3: [AMDGPU] Add remarks to output some resource usage

Summary

Add analyis remarks to output kernel name, register usage, occupancy,
scratch usage, spills, and LDS information.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,070 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

vangthao created this revision.Apr 15 2022, 3:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 3:19 PM

Herald added subscribers: hsmhsm, foad, kerbowa and 9 others. · View Herald Transcript

vangthao requested review of this revision.Apr 15 2022, 3:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 3:19 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

vangthao added reviewers: arsenm, rampitec, yaxunl, t-tye, scott.linder, b-sumner.Apr 15 2022, 3:21 PM

Herald added a subscriber: ormris. · View Herald TranscriptApr 15 2022, 3:21 PM

This feature is based on https://reviews.llvm.org/D95063

arsenm added inline comments.Apr 15 2022, 4:01 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
511	Needs to skip this whole block if !ORE? Also should move all of this to a separate function
513	Define the string name somewhere to avoid repeating it everywhere
llvm/lib/Target/AMDGPU/SIProgramInfo.h
52	This isn't a spill size

Harbormaster completed remote builds in B159886: Diff 423175.Apr 15 2022, 4:04 PM

jtramm added a subscriber: jtramm.Apr 18 2022, 7:52 AM

scott.linder added inline comments.Apr 18 2022, 3:48 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
596	This seems like the wrong place to segment the output; could this be made a part of the emitter itself? Or maybe better yet, could all of these values be collected into a single remark? That seems to be how e.g. `llvm/lib/CodeGen/RegAllocGreedy.cpp` does it: Pass: regalloc Name: SpillReloadCopies Function: f Args: - NumSpills: '1' - String: ' spills ' - TotalSpillsCost: '1.000000e+00' - String: ' total spills cost ' - NumReloads: '1' - String: ' reloads ' - TotalReloadsCost: '1.000000e+00' - String: ' total reloads cost ' - String: generated in function

Move remarks into its own function. Skip if !ORE. Add clang frontend test. Remove LDSSpillSize.

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:33 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

vangthao added inline comments.Apr 21 2022, 5:40 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
596	We also want to output this in a readable format for the frontend. Collecting it all into a single remark seems to break the output format since clang seems to ignore all newlines from a diagnostic remark.

Harbormaster completed remote builds in B160759: Diff 424344.Apr 21 2022, 8:20 PM

scott.linder added inline comments.Apr 26 2022, 10:45 AM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

596

Rather than use newlines, RegAllocGreedy uses spaces; we can debate aesthetics, but I feel like we should have a compelling reason before we choose our own format.

For example, with RegAllocGreedy you can get output along the lines of:

foo/bar/baz.cpp:42:1: remark: 10 spills 100 total spills cost 7 folded spills 70 total folded spills cost 22 reloads 44 total reloads cost 77 folded reloads 120 total folded reloads cost 1 zero cost folded reloads 78 virtual registers copies 111 total copies cost
void foo()
^

RegAllocFast appears to be the original use-case that caused the machine remarks to be invented, and it has been updated recently (https://reviews.llvm.org/D100020) so I don't suspect this is just some legacy cruft.

If we follow the same approach we get something like:

void AMDGPUAsmPrinter::emitResourceUsageRemarks(
    const MachineFunction &MF, const SIProgramInfo &CurrentProgramInfo) {
  if (!ORE)
    return;

  ORE->emit([&]() {
    return MachineOptimizationRemarkAnalysis(
               "kernel-resource-usage", "ResourceUsage",
               MF.getFunction().getSubprogram(), &MF.front())
           << ore::NV("NumSGPR", CurrentProgramInfo.NumSGPR) << " SGPRs "
           << ore::NV("NumVGPR", CurrentProgramInfo.NumArchVGPR) << " VGPRs "
           << ore::NV("NumAGPR", CurrentProgramInfo.NumAccVGPR) << " AGPRs "
           << ore::NV("ScratchSize", CurrentProgramInfo.ScratchSize)
           << " scratch bytes/thread "
           << ore::NV("Occupancy", CurrentProgramInfo.Occupancy)
           << " occupancy waves/SIMD "
           << ore::NV("SGPRSpill", CurrentProgramInfo.SGPRSpill)
           << " SGPR spills "
           << ore::NV("VGPRSpill", CurrentProgramInfo.VGPRSpill)
           << " VGPR spills " << ore::NV("BytesLDS", CurrentProgramInfo.LDSSize)
           << " LDS size bytes/block ";
  });                                                                                                                                                                                                                                                                   }

which produces:

clang/test/Frontend/amdgcn-machine-analysis-remarks.cl:14:1: remark: 9 SGPRs 10 VGPRs 12 AGPRs 0 scratch bytes/thread 10 occupancy waves/SIMD 0 SGPR spills 0 VGPR spills 0 LDS size bytes/block  [-Rpass-analysis=kernel-resource-usage]
__kernel void foo() {
^

That seems reasonable to me, and avoids bloating the other formats like YAML with extra remarks, including some that have no actual content (i.e. the "KernelName" and "KernelEnd" remarks are meaningless).

Of course I'm just one opinion, and if others prefer the several-remark approach I'm fine with it.

1178–1266

arsenm added inline comments.Apr 26 2022, 2:36 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1178–1266	I'm sort of surprised the all in one form doesn't come out in something more parsable? It might be worth looking at IR level remarks, since there's probably more usage of them. Are there any existing uses in clang that do something meaningful by parsing out different parts?

arsenm mentioned this in D95063: AMDGPU: Use optimization remarks for register usage.Apr 27 2022, 1:30 PM

vangthao added inline comments.Apr 29 2022, 2:00 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1178–1266	We could check if the specific remark is enabled to avoid cluttering YAML output: const char *Name = "kernel-resource-usage"; LLVMContext &Ctx = MF.getFunction().getContext(); if (!Ctx.getDiagHandlerPtr()->isAnalysisRemarkEnabled(Name)) return; I do not think using spaces to format the output will work for us. Most of the IR level remarks seems to be related specifically to their pass and/or optimization thus are quite short so readability is not as impacted. We are outputting a decent chunk of information and need some readability for the user. Parsing out different parts like this is mostly a workaround for clang ignoring newlines. I am not aware of any other uses doing it like this since many of them are short and uses spaces for formatting.

arsenm added inline comments.Apr 29 2022, 2:45 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1222	Probably should not use the word thread. lane?
1243	Unrelated but we should reimplement VGPRSpill. It's now reporting number of lowered spill instructions rather than number of spilled values

Even with newlines forced via extra remarks, I'm not a big fan of the "-----------------------" remark; it doesn't interact well with other random remarks in the output, for example when I enable all remarks using the pattern '.*' I see:

remark: foo.cl:27:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: test_kernel: MI Instruction count changed from 0 to 6; Delta: 6
remark: foo.cl:27:0: 0 stack bytes in function
remark: foo.cl:42:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: test_func: MI Instruction count changed from 0 to 4; Delta: 4
remark: foo.cl:42:0: 0 stack bytes in function
remark: foo.cl:42:0: SI insert wait instructions: Function: test_func: MI Instruction count changed from 4 to 5; Delta: 1
remark: foo.cl:8:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: empty_kernel: MI Instruction count changed from 0 to 1; Delta: 1
remark: foo.cl:8:0: 0 stack bytes in function
remark: foo.cl:52:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: empty_func: MI Instruction count changed from 0 to 1; Delta: 1
remark: foo.cl:52:0: 0 stack bytes in function
remark: foo.cl:52:0: SI insert wait instructions: Function: empty_func: MI Instruction count changed from 1 to 2; Delta: 1
remark: <unknown>:0:0: BasicBlock:
: 2

remark: foo.cl:27:0: 6 instructions in function
remark: foo.cl:27:0: Kernel Name: test_kernel
remark: foo.cl:27:0: SGPRs: 24
remark: foo.cl:27:0: VGPRs: 9
remark: foo.cl:27:0: AGPRs: 43
remark: foo.cl:27:0: ScratchSize [bytes/thread]: 0
remark: foo.cl:27:0: Occupancy [waves/SIMD]: 5
remark: foo.cl:27:0: SGPRs Spill: 0
remark: foo.cl:27:0: VGPRs Spill: 0
remark: foo.cl:27:0: LDS Size [bytes/block]: 512
remark: foo.cl:27:0: ------------------------------
remark: <unknown>:0:0: BasicBlock:
: 2

remark: foo.cl:42:0: 5 instructions in function
remark: foo.cl:42:0: Kernel Name: test_func
remark: foo.cl:42:0: SGPRs: 0
remark: foo.cl:42:0: VGPRs: 0
remark: foo.cl:42:0: AGPRs: 0
remark: foo.cl:42:0: ScratchSize [bytes/thread]: 0
remark: foo.cl:42:0: Occupancy [waves/SIMD]: 0
remark: foo.cl:42:0: SGPRs Spill: 0
remark: foo.cl:42:0: VGPRs Spill: 0
remark: foo.cl:42:0: ------------------------------
remark: <unknown>:0:0: BasicBlock:
: 1

If we do keep the delimiter remarks, can we have one at the beginning as well? At least then other remarks don't appear to "bleed" into the new block of remarks.

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1178–1266	Are there any existing uses in clang that do something meaningful by parsing out different parts? I would expect this sort of thing to happen before it gets serialized to an unstructured string, or for it to happen on a structured output like YAML. Parsing out different parts like this is mostly a workaround for clang ignoring newlines. Can we fix clang to respect newlines?

arsenm added inline comments.May 2 2022, 10:56 AM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1264	Get rid of the ——. It’s not a remark and only kind of makes sense if you are printing these with others

If possible, I would like to keep some kind of delimiter. I like the idea of having it at the beginning and at the end of the section. The best option would be to convince clang to print new lines.

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptMay 11 2022, 8:58 AM

In D123878#3506500, @afanfa wrote:

If possible, I would like to keep some kind of delimiter. I like the idea of having it at the beginning and at the end of the section. The best option would be to convince clang to print new lines.

It seems like the stripping of non-printable characters is intentional, but only for diagnostics that don't use the TableGen based diagnostic formatting scheme in clang. I'm not sure exactly why, but regardless there is precedent for newlines in other messages.

To get the newlines to hit the terminal for our case the following patch is enough:

--- a/clang/lib/Basic/Diagnostic.cpp
+++ b/clang/lib/Basic/Diagnostic.cpp
@@ -812,7 +812,7 @@ FormatDiagnostic(const char *DiagStr, const char *DiagEnd,
       getArgKind(0) == DiagnosticsEngine::ak_std_string) {                                                                                                                                                                                                                  const std::string &S = getArgStdStr(0);                                                                                                                                                                                                                                 for (char c : S) {
-      if (llvm::sys::locale::isPrint(c) || c == '\t') {
+      if (llvm::sys::locale::isPrint(c) || c == '\t' || c == '\n') {
         OutStr.push_back(c);
       }
     }

To get the right indentation inserted for the extra lines the TextDiagnostic consumer needs to also be updated, but that should be a small change. Only one test breaks, and it seems useful for it to pick up the new behavior anyway.

The only other potential conflict is that in TextDiagnostic specifically there is also the Clang option -fmessage-length=N which will forcibly wrap diagnostic messages on word-boundaries (although never breaking up a word across lines). It seems to only apply to text preceding the first newline in the message, so it is likely just a non-issue?

I am not sure if allowing clang to accept newlines is a good idea. It seems like clang wants to know what type of message is being outputted. For example whether this is a remark, warning, etc. but allowing for a diagnostic to output their own newline makes it ambiguous where exactly that output is coming from.

In D123878#3507378, @vangthao wrote:

I am not sure if allowing clang to accept newlines is a good idea. It seems like clang wants to know what type of message is being outputted. For example whether this is a remark, warning, etc. but allowing for a diagnostic to output their own newline makes it ambiguous where exactly that output is coming from.

It already supports newlines in any diagnostic which doesn't use the trivial format string "%0", and at least clang/test/Misc/diag-line-wrapping.cpp explicitly tests this behavior.

It seems reasonable that clang could add a prefix or indentation scheme while emitting multi-line diagnostics to the terminal, to help with the ambiguity issue. For example instead of the current output:

clang/test/Misc/diag-line-wrapping.cpp:8:14: error: non-static member 'f' found in multiple base-class subobjects of type 'B':
    struct DD -> struct D1 -> struct B
    struct DD -> struct D2 -> struct B

Maybe we could have something like:

clang/test/Misc/diag-line-wrapping.cpp:8:14: error: ...
...: non-static member 'f' found in multiple base-class subobjects of type 'B':
...:     struct DD -> struct D1 -> struct B
...:     struct DD -> struct D2 -> struct B

I don't know if changing this kind of output is a breaking change, though? I do know some tooling parses this output.

In D123878#3506500, @afanfa wrote:

If possible, I would like to keep some kind of delimiter. I like the idea of having it at the beginning and at the end of the section. The best option would be to convince clang to print new lines.

But it’s not a section and no actual grouping concept here. You just happen to see this printed in order. Any delimiter should be introduced as a display function, not emitted as part of the remarks themselves

ormris removed a subscriber: ormris.May 16 2022, 10:55 AM

vangthao mentioned this in D127923: [Diagnostics] Accept newline and format diag opts on first line.Jun 16 2022, 10:20 AM

Remove "--------" delimiter. Change ScratchSize [bytes/thread] to ScratchSize [bytes/lane]. Use lambda expression to emit remarks. Do not output yaml if specific remark is not enabled. Add indentation to make it easier to tell which resource usage remark belong to which kernel.

Harbormaster completed remote builds in B172366: Diff 440447.Jun 27 2022, 7:07 PM

I will let others comment, but I think this is a perfectly reasonable alternative, and I much prefer using indentation over the delimiter-remark approach. LGTM, assuming nobody else objects

ping

arsenm added inline comments.Jul 14 2022, 10:01 AM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1215	Why is this kernel name? Do we not emit these for other functions?

vangthao added inline comments.Jul 14 2022, 10:15 AM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1215	We do emit these for other functions. Would this be better off as "Function Name" instead of kernel?

arsenm added inline comments.Jul 14 2022, 10:22 AM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1215	Yes

Change "Kernel Name" to "Function Name" and rebased patch.

arsenm added inline comments.Jul 14 2022, 12:38 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1194	Why &&?

Harbormaster completed remote builds in B175476: Diff 444759.Jul 14 2022, 3:16 PM

scott.linder added inline comments.Jul 14 2022, 3:48 PM

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1194	Looking at the `ore::NV` constructors I would vote that this just be by value, i.e. `auto Argument`. Everything that can be an `Argument` is a small "prefer passing by value" type.

Change auto &&Argument to auto Argument.

arsenm accepted this revision.Jul 14 2022, 5:03 PM

This revision is now accepted and ready to land.Jul 14 2022, 5:03 PM

Harbormaster completed remote builds in B175540: Diff 444844.Jul 14 2022, 6:54 PM

scott.linder accepted this revision.Jul 15 2022, 10:57 AM

This revision was landed with ongoing or failed builds.Jul 15 2022, 11:02 AM

Closed by commit rG67357739c6d3: [AMDGPU] Add remarks to output some resource usage (authored by vangthao). · Explain Why

This revision was automatically updated to reflect the committed changes.

vangthao added a commit: rG67357739c6d3: [AMDGPU] Add remarks to output some resource usage.

Revision Contents

Path

Size

clang/

test/

Frontend/

amdgcn-machine-analysis-remarks.cl

18 lines

llvm/

lib/

Target/

AMDGPU/

AMDGPUAsmPrinter.h

3 lines

AMDGPUAsmPrinter.cpp

99 lines

SIProgramInfo.h

2 lines

test/

CodeGen/

AMDGPU/

resource-optimization-remarks.ll

169 lines

Diff 424344

clang/test/Frontend/amdgcn-machine-analysis-remarks.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx908 -Rpass-analysis=kernel-resource-usage -S -O0 -verify %s -o /dev/null

				// expected-remark@+10 {{Kernel Name: foo}}
				// expected-remark@+9 {{SGPRs: 9}}
				// expected-remark@+8 {{VGPRs: 10}}
				// expected-remark@+7 {{AGPRs: 12}}
				// expected-remark@+6 {{ScratchSize [bytes/thread]: 0}}
				// expected-remark@+5 {{Occupancy [waves/SIMD]: 10}}
				// expected-remark@+4 {{SGPRs Spill: 0}}
				// expected-remark@+3 {{VGPRs Spill: 0}}
				// expected-remark@+2 {{LDS Size [bytes/block]: 0}}
				// expected-remark@+1 {{------------------------------}}
				__kernel void foo() {
				__asm volatile ("; clobber s8" :::"s8");
				__asm volatile ("; clobber v9" :::"v9");
				__asm volatile ("; clobber a11" :::"a11");
				}

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	private:
void emitPALFunctionMetadata(const MachineFunction &MF);		void emitPALFunctionMetadata(const MachineFunction &MF);
void emitCommonFunctionComments(uint32_t NumVGPR,		void emitCommonFunctionComments(uint32_t NumVGPR,
Optional<uint32_t> NumAGPR,		Optional<uint32_t> NumAGPR,
uint32_t TotalNumVGPR,		uint32_t TotalNumVGPR,
uint32_t NumSGPR,		uint32_t NumSGPR,
uint64_t ScratchSize,		uint64_t ScratchSize,
uint64_t CodeSize,		uint64_t CodeSize,
const AMDGPUMachineFunction* MFI);		const AMDGPUMachineFunction* MFI);
		void emitResourceUsageRemarks(const MachineFunction &MF,
		const SIProgramInfo &CurrentProgramInfo,
		bool isModuleEntryFunction, bool hasMAIInsts);

uint16_t getAmdhsaKernelCodeProperties(		uint16_t getAmdhsaKernelCodeProperties(
const MachineFunction &MF) const;		const MachineFunction &MF) const;

amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor(		amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor(
const MachineFunction &MF,		const MachineFunction &MF,
const SIProgramInfo &PI) const;		const SIProgramInfo &PI) const;

▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show All 21 Lines

#include "AMDKernelCodeT.h" #include "AMDKernelCodeT.h"

#include "GCNSubtarget.h" #include "GCNSubtarget.h"

#include "MCTargetDesc/AMDGPUInstPrinter.h" #include "MCTargetDesc/AMDGPUInstPrinter.h"

#include "MCTargetDesc/AMDGPUTargetStreamer.h" #include "MCTargetDesc/AMDGPUTargetStreamer.h"

#include "R600AsmPrinter.h" #include "R600AsmPrinter.h"

#include "SIMachineFunctionInfo.h" #include "SIMachineFunctionInfo.h"

#include "TargetInfo/AMDGPUTargetInfo.h" #include "TargetInfo/AMDGPUTargetInfo.h"

#include "Utils/AMDGPUBaseInfo.h" #include "Utils/AMDGPUBaseInfo.h"

#include "llvm/Analysis/OptimizationRemarkEmitter.h"

#include "llvm/BinaryFormat/ELF.h" #include "llvm/BinaryFormat/ELF.h"

#include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFrameInfo.h"

#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"

#include "llvm/IR/DiagnosticInfo.h" #include "llvm/IR/DiagnosticInfo.h"

#include "llvm/MC/MCAssembler.h" #include "llvm/MC/MCAssembler.h"

#include "llvm/MC/MCContext.h" #include "llvm/MC/MCContext.h"

#include "llvm/MC/MCSectionELF.h" #include "llvm/MC/MCSectionELF.h"

#include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCStreamer.h"

#include "llvm/MC/TargetRegistry.h" #include "llvm/MC/TargetRegistry.h"

#include "llvm/Support/AMDHSAKernelDescriptor.h" #include "llvm/Support/AMDHSAKernelDescriptor.h"

#include "llvm/Support/TargetParser.h" #include "llvm/Support/TargetParser.h"

▲ Show 20 Lines • Show All 461 Lines • ▼ Show 20 Lines bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {

} }

DisasmLines.clear(); DisasmLines.clear();

HexLines.clear(); HexLines.clear();

DisasmLineMaxLen = 0; DisasmLineMaxLen = 0;

emitFunctionBody(); emitFunctionBody();

emitResourceUsageRemarks(MF, CurrentProgramInfo, MFI->isModuleEntryFunction(),

arsenmUnsubmitted

Done

Needs to skip this whole block if !ORE? Also should move all of this to a separate function

arsenm: Needs to skip this whole block if !ORE? Also should move all of this to a separate function

STM.hasMAIInsts());

arsenmUnsubmitted

Done

Define the string name somewhere to avoid repeating it everywhere

arsenm: Define the string name somewhere to avoid repeating it everywhere

if (isVerbose()) { if (isVerbose()) {

MCSectionELF *CommentSection = MCSectionELF *CommentSection =

Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0); Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);

OutStreamer->SwitchSection(CommentSection); OutStreamer->SwitchSection(CommentSection);

if (!MFI->isEntryFunction()) { if (!MFI->isEntryFunction()) {

OutStreamer->emitRawComment(" Function info:", false); OutStreamer->emitRawComment(" Function info:", false);

const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info = const AMDGPUResourceUsageAnalysis::SIFunctionResourceInfo &Info =

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines OutStreamer->emitRawComment(

" COMPUTE_PGM_RSRC2:TGID_Y_EN: " + " COMPUTE_PGM_RSRC2:TGID_Y_EN: " +

Twine(G_00B84C_TGID_Y_EN(CurrentProgramInfo.ComputePGMRSrc2)), false); Twine(G_00B84C_TGID_Y_EN(CurrentProgramInfo.ComputePGMRSrc2)), false);

OutStreamer->emitRawComment( OutStreamer->emitRawComment(

" COMPUTE_PGM_RSRC2:TGID_Z_EN: " + " COMPUTE_PGM_RSRC2:TGID_Z_EN: " +

Twine(G_00B84C_TGID_Z_EN(CurrentProgramInfo.ComputePGMRSrc2)), false); Twine(G_00B84C_TGID_Z_EN(CurrentProgramInfo.ComputePGMRSrc2)), false);

OutStreamer->emitRawComment( OutStreamer->emitRawComment(

" COMPUTE_PGM_RSRC2:TIDIG_COMP_CNT: " + " COMPUTE_PGM_RSRC2:TIDIG_COMP_CNT: " +

Twine(G_00B84C_TIDIG_COMP_CNT(CurrentProgramInfo.ComputePGMRSrc2)), Twine(G_00B84C_TIDIG_COMP_CNT(CurrentProgramInfo.ComputePGMRSrc2)),

false); false);

scott.linderUnsubmitted

Not Done

This seems like the wrong place to segment the output; could this be made a part of the emitter itself?

Or maybe better yet, could all of these values be collected into a single remark? That seems to be how e.g. llvm/lib/CodeGen/RegAllocGreedy.cpp does it:

Pass:            regalloc                                                                                Name:            SpillReloadCopies                                                                       Function:        f                                                                                       Args:                                                                                                      - NumSpills:       '1'                                                                                   - String:          ' spills '                                                                            - TotalSpillsCost: '1.000000e+00'                                                                        - String:          ' total spills cost '                                                                 - NumReloads:      '1'                                                                                   - String:          ' reloads '                                                                           - TotalReloadsCost: '1.000000e+00'                                                                       - String:          ' total reloads cost '                                                                - String:          generated in function

scott.linder: This seems like the wrong place to segment the output; could this be made a part of the emitter…

vangthaoAuthorUnsubmitted

Done

We also want to output this in a readable format for the frontend. Collecting it all into a single remark seems to break the output format since clang seems to ignore all newlines from a diagnostic remark.

vangthao: We also want to output this in a readable format for the frontend. Collecting it all into a…

scott.linderUnsubmitted

Not Done

Rather than use newlines, RegAllocGreedy uses spaces; we can debate aesthetics, but I feel like we should have a compelling reason before we choose our own format.

For example, with RegAllocGreedy you can get output along the lines of:

foo/bar/baz.cpp:42:1: remark: 10 spills 100 total spills cost 7 folded spills 70 total folded spills cost 22 reloads 44 total reloads cost 77 folded reloads 120 total folded reloads cost 1 zero cost folded reloads 78 virtual registers copies 111 total copies cost
void foo()
^

If we follow the same approach we get something like:

void AMDGPUAsmPrinter::emitResourceUsageRemarks(
    const MachineFunction &MF, const SIProgramInfo &CurrentProgramInfo) {
  if (!ORE)
    return;

  ORE->emit([&]() {
    return MachineOptimizationRemarkAnalysis(
               "kernel-resource-usage", "ResourceUsage",
               MF.getFunction().getSubprogram(), &MF.front())
           << ore::NV("NumSGPR", CurrentProgramInfo.NumSGPR) << " SGPRs "
           << ore::NV("NumVGPR", CurrentProgramInfo.NumArchVGPR) << " VGPRs "
           << ore::NV("NumAGPR", CurrentProgramInfo.NumAccVGPR) << " AGPRs "
           << ore::NV("ScratchSize", CurrentProgramInfo.ScratchSize)
           << " scratch bytes/thread "
           << ore::NV("Occupancy", CurrentProgramInfo.Occupancy)
           << " occupancy waves/SIMD "
           << ore::NV("SGPRSpill", CurrentProgramInfo.SGPRSpill)
           << " SGPR spills "
           << ore::NV("VGPRSpill", CurrentProgramInfo.VGPRSpill)
           << " VGPR spills " << ore::NV("BytesLDS", CurrentProgramInfo.LDSSize)
           << " LDS size bytes/block ";
  });                                                                                                                                                                                                                                                                   }

which produces:

clang/test/Frontend/amdgcn-machine-analysis-remarks.cl:14:1: remark: 9 SGPRs 10 VGPRs 12 AGPRs 0 scratch bytes/thread 10 occupancy waves/SIMD 0 SGPR spills 0 VGPR spills 0 LDS size bytes/block  [-Rpass-analysis=kernel-resource-usage]
__kernel void foo() {
^

Of course I'm just one opinion, and if others prefer the several-remark approach I'm fine with it.

scott.linder: Rather than use newlines, RegAllocGreedy uses spaces; we can debate aesthetics, but I feel like…

assert(STM.hasGFX90AInsts() || assert(STM.hasGFX90AInsts() ||

CurrentProgramInfo.ComputePGMRSrc3GFX90A == 0); CurrentProgramInfo.ComputePGMRSrc3GFX90A == 0);

if (STM.hasGFX90AInsts()) { if (STM.hasGFX90AInsts()) {

OutStreamer->emitRawComment( OutStreamer->emitRawComment(

" COMPUTE_PGM_RSRC3_GFX90A:ACCUM_OFFSET: " + " COMPUTE_PGM_RSRC3_GFX90A:ACCUM_OFFSET: " +

Twine((AMDHSA_BITS_GET(CurrentProgramInfo.ComputePGMRSrc3GFX90A, Twine((AMDHSA_BITS_GET(CurrentProgramInfo.ComputePGMRSrc3GFX90A,

amdhsa::COMPUTE_PGM_RSRC3_GFX90A_ACCUM_OFFSET))), amdhsa::COMPUTE_PGM_RSRC3_GFX90A_ACCUM_OFFSET))),

▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines if (STM.getGeneration() < AMDGPUSubtarget::SEA_ISLANDS) {

// LDS is allocated in 64 dword blocks. // LDS is allocated in 64 dword blocks.

LDSAlignShift = 8; LDSAlignShift = 8;

} else { } else {

// LDS is allocated in 128 dword blocks. // LDS is allocated in 128 dword blocks.

LDSAlignShift = 9; LDSAlignShift = 9;

} }

unsigned LDSSpillSize = unsigned LDSSpillSize =

MFI->getLDSWaveSpillSize() * MFI->getMaxFlatWorkGroupSize(); MFI->getLDSWaveSpillSize() * MFI->getMaxFlatWorkGroupSize();

ProgInfo.SGPRSpill = MFI->getNumSpilledSGPRs();

ProgInfo.VGPRSpill = MFI->getNumSpilledVGPRs();

ProgInfo.LDSSize = MFI->getLDSSize() + LDSSpillSize; ProgInfo.LDSSize = MFI->getLDSSize() + LDSSpillSize;

ProgInfo.LDSBlocks = ProgInfo.LDSBlocks =

alignTo(ProgInfo.LDSSize, 1ULL << LDSAlignShift) >> LDSAlignShift; alignTo(ProgInfo.LDSSize, 1ULL << LDSAlignShift) >> LDSAlignShift;

// Scratch is allocated in 256 dword blocks. // Scratch is allocated in 256 dword blocks.

unsigned ScratchAlignShift = 10; unsigned ScratchAlignShift = 10;

// We need to program the hardware with the amount of scratch memory that // We need to program the hardware with the amount of scratch memory that

// is used by the entire wave. ProgInfo.ScratchSize is the amount of // is used by the entire wave. ProgInfo.ScratchSize is the amount of

▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines bool AMDGPUAsmPrinter::PrintAsmOperand(const MachineInstr *MI, unsigned OpNo,

return true; return true;

} }

void AMDGPUAsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const { void AMDGPUAsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const {

AU.addRequired<AMDGPUResourceUsageAnalysis>(); AU.addRequired<AMDGPUResourceUsageAnalysis>();

AU.addPreserved<AMDGPUResourceUsageAnalysis>(); AU.addPreserved<AMDGPUResourceUsageAnalysis>();

AsmPrinter::getAnalysisUsage(AU); AsmPrinter::getAnalysisUsage(AU);

} }

void AMDGPUAsmPrinter::emitResourceUsageRemarks(

const MachineFunction &MF, const SIProgramInfo &CurrentProgramInfo,

bool isModuleEntryFunction, bool hasMAIInsts) {

if (!ORE)

return;

const char *Name = "kernel-resource-usage";

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "KernelName",

MF.getFunction().getSubprogram(),

&MF.front())

<< "Kernel Name: "

<< ore::NV("KernelName", MF.getFunction().getName());

});

arsenmUnsubmitted

Not Done

Why &&?

arsenm: Why &&?

scott.linderUnsubmitted

Not Done

Looking at the ore::NV constructors I would vote that this just be by value, i.e. auto Argument. Everything that can be an Argument is a small "prefer passing by value" type.

scott.linder: Looking at the `ore::NV` constructors I would vote that this just be by value, i.e. `auto…

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "NumSGPR",

MF.getFunction().getSubprogram(),

&MF.front())

<< "SGPRs: " << ore::NV("NumSGPR", CurrentProgramInfo.NumSGPR);

});

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "NumVGPR",

MF.getFunction().getSubprogram(),

&MF.front())

<< "VGPRs: " << ore::NV("NumVGPR", CurrentProgramInfo.NumArchVGPR);

});

if (hasMAIInsts) {

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "NumAGPR",

MF.getFunction().getSubprogram(),

&MF.front())

<< "AGPRs: " << ore::NV("NumAGPR", CurrentProgramInfo.NumAccVGPR);

});

arsenmUnsubmitted

Not Done

Why is this kernel name? Do we not emit these for other functions?

arsenm: Why is this kernel name? Do we not emit these for other functions?

vangthaoAuthorUnsubmitted

Done

We do emit these for other functions. Would this be better off as "Function Name" instead of kernel?

vangthao: We do emit these for other functions. Would this be better off as "Function Name" instead of…

arsenmUnsubmitted

Not Done

Yes

arsenm: Yes

}

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "ScratchSize",

MF.getFunction().getSubprogram(),

&MF.front())

<< "ScratchSize [bytes/thread]: "

arsenmUnsubmitted

Done

Probably should not use the word thread. lane?

arsenm: Probably should not use the word thread. lane?

<< ore::NV("ScratchSize", CurrentProgramInfo.ScratchSize);

});

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "Occupancy",

MF.getFunction().getSubprogram(),

&MF.front())

<< "Occupancy [waves/SIMD]: "

<< ore::NV("Occupancy", CurrentProgramInfo.Occupancy);

});

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "SGPRSpill",

MF.getFunction().getSubprogram(),

&MF.front())

<< "SGPRs Spill: "

<< ore::NV("SGPRSpill", CurrentProgramInfo.SGPRSpill);

});

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "VGPRSpill",

arsenmUnsubmitted

Not Done

Unrelated but we should reimplement VGPRSpill. It's now reporting number of lowered spill instructions rather than number of spilled values

arsenm: Unrelated but we should reimplement VGPRSpill. It's now reporting number of lowered spill…

MF.getFunction().getSubprogram(),

&MF.front())

<< "VGPRs Spill: "

<< ore::NV("VGPRSpill", CurrentProgramInfo.VGPRSpill);

});

if (isModuleEntryFunction) {

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "BytesLDS",

MF.getFunction().getSubprogram(),

&MF.front())

<< "LDS Size [bytes/block]: "

<< ore::NV("BytesLDS", CurrentProgramInfo.LDSSize);

});

}

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "KernelEnd",

MF.getFunction().getSubprogram(),

&MF.front())

<< "------------------------------";

arsenmUnsubmitted

Done

Get rid of the ——. It’s not a remark and only kind of makes sense if you are printing these with others

arsenm: Get rid of the ——. It’s not a remark and only kind of makes sense if you are printing these…

});

}

scott.linderUnsubmitted

Not Done

AsmPrinter::getAnalysisUsage(AU);

}

void AMDGPUAsmPrinter::emitResourceUsageRemarks(

const MachineFunction &MF, const SIProgramInfo &CurrentProgramInfo,

bool isModuleEntryFunction, bool hasMAIInsts) {

if (!ORE)

return;

- const char *Name = "kernel-resource-usage";

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "KernelName",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "Kernel Name: "

- << ore::NV("KernelName", MF.getFunction().getName());

- });

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "NumSGPR",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "SGPRs: " << ore::NV("NumSGPR", CurrentProgramInfo.NumSGPR);

- });

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "NumVGPR",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "VGPRs: " << ore::NV("NumVGPR", CurrentProgramInfo.NumArchVGPR);

- });

- if (hasMAIInsts) {

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "NumAGPR",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "AGPRs: " << ore::NV("NumAGPR", CurrentProgramInfo.NumAccVGPR);

- });

- }

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "ScratchSize",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "ScratchSize [bytes/thread]: "

- << ore::NV("ScratchSize", CurrentProgramInfo.ScratchSize);

- });

+ auto EmitResourceUsageRemark = [&](StringRef RemarkName, StringRef RemarkLabel, auto &&Argument) {

+ ORE->emit([&]() {

+ return MachineOptimizationRemarkAnalysis("kernel-resource-usage", RemarkName,

+ MF.getFunction().getSubprogram(),

+ &MF.front())

+ << RemarkLabel

+ << ore::NV(RemarkName, Argument);

+ });

+ };

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "Occupancy",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "Occupancy [waves/SIMD]: "

- << ore::NV("Occupancy", CurrentProgramInfo.Occupancy);

- });

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "SGPRSpill",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "SGPRs Spill: "

- << ore::NV("SGPRSpill", CurrentProgramInfo.SGPRSpill);

- });

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "VGPRSpill",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "VGPRs Spill: "

- << ore::NV("VGPRSpill", CurrentProgramInfo.VGPRSpill);

- });

- if (isModuleEntryFunction) {

- ORE->emit([&]() {

- return MachineOptimizationRemarkAnalysis(Name, "BytesLDS",

- MF.getFunction().getSubprogram(),

- &MF.front())

- << "LDS Size [bytes/block]: "

- << ore::NV("BytesLDS", CurrentProgramInfo.LDSSize);

- });

- }

+ EmitResourceUsageRemark("KernelName", "Kernel Name:", MF.getFunction().getName());

+ EmitResourceUsageRemark("NumSGPR", "SGPRs:", CurrentProgramInfo.NumSGPR);

+ ...

+ EmitResourceUsageRemark("KernelEnd", "

ORE->emit([&]() {

return MachineOptimizationRemarkAnalysis(Name, "KernelEnd",

MF.getFunction().getSubprogram(),

&MF.front())

<< "------------------------------";

});

}

scott.linder:

arsenmUnsubmitted

Not Done

I'm sort of surprised the all in one form doesn't come out in something more parsable? It might be worth looking at IR level remarks, since there's probably more usage of them. Are there any existing uses in clang that do something meaningful by parsing out different parts?

arsenm: I'm sort of surprised the all in one form doesn't come out in something more parsable? It might…

vangthaoAuthorUnsubmitted

Done

We could check if the specific remark is enabled to avoid cluttering YAML output:

const char *Name = "kernel-resource-usage";
LLVMContext &Ctx = MF.getFunction().getContext();
if (!Ctx.getDiagHandlerPtr()->isAnalysisRemarkEnabled(Name))
  return;

I do not think using spaces to format the output will work for us. Most of the IR level remarks seems to be related specifically to their pass and/or optimization thus are quite short so readability is not as impacted. We are outputting a decent chunk of information and need some readability for the user.

Parsing out different parts like this is mostly a workaround for clang ignoring newlines. I am not aware of any other uses doing it like this since many of them are short and uses spaces for formatting.

vangthao: We could check if the specific remark is enabled to avoid cluttering YAML output: ``` const…

scott.linderUnsubmitted

Not Done

Are there any existing uses in clang that do something meaningful by parsing out different parts?

I would expect this sort of thing to happen before it gets serialized to an unstructured string, or for it to happen on a structured output like YAML.

Parsing out different parts like this is mostly a workaround for clang ignoring newlines.

Can we fix clang to respect newlines?

scott.linder: > Are there any existing uses in clang that do something meaningful by parsing out different…

llvm/lib/Target/AMDGPU/SIProgramInfo.h

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	struct SIProgramInfo {
uint64_t ComputePGMRSrc3GFX90A = 0;		uint64_t ComputePGMRSrc3GFX90A = 0;

uint32_t NumVGPR = 0;		uint32_t NumVGPR = 0;
uint32_t NumArchVGPR = 0;		uint32_t NumArchVGPR = 0;
uint32_t NumAccVGPR = 0;		uint32_t NumAccVGPR = 0;
uint32_t AccumOffset = 0;		uint32_t AccumOffset = 0;
uint32_t TgSplit = 0;		uint32_t TgSplit = 0;
uint32_t NumSGPR = 0;		uint32_t NumSGPR = 0;
		unsigned SGPRSpill = 0;
		arsenmUnsubmitted Done Reply Inline Actions This isn't a spill size arsenm: This isn't a spill size
		unsigned VGPRSpill = 0;
uint32_t LDSSize = 0;		uint32_t LDSSize = 0;
bool FlatUsed = false;		bool FlatUsed = false;

// Number of SGPRs that meets number of waves per execution unit request.		// Number of SGPRs that meets number of waves per execution unit request.
uint32_t NumSGPRsForWavesPerEU = 0;		uint32_t NumSGPRsForWavesPerEU = 0;

// Number of VGPRs that meets number of waves per execution unit request.		// Number of VGPRs that meets number of waves per execution unit request.
uint32_t NumVGPRsForWavesPerEU = 0;		uint32_t NumVGPRsForWavesPerEU = 0;
Show All 21 Lines

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -pass-remarks-output=%t -pass-remarks-analysis=kernel-resource-usage -filetype=obj -o /dev/null %s 2>&1 \| FileCheck -check-prefix=STDERR %s
				; RUN: FileCheck -check-prefix=REMARK %s < %t

				; STDERR: remark: foo.cl:27:0: Kernel Name: test_kernel
				; STDERR-NEXT: remark: foo.cl:27:0: SGPRs: 24
				; STDERR-NEXT: remark: foo.cl:27:0: VGPRs: 9
				; STDERR-NEXT: remark: foo.cl:27:0: AGPRs: 43
				; STDERR-NEXT: remark: foo.cl:27:0: ScratchSize [bytes/thread]: 0
				; STDERR-NEXT: remark: foo.cl:27:0: Occupancy [waves/SIMD]: 5
				; STDERR-NEXT: remark: foo.cl:27:0: SGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:27:0: VGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:27:0: LDS Size [bytes/block]: 512
				; STDERR-NEXT: remark: foo.cl:27:0: ------------------------------

				; REMARK-LABEL: --- !Analysis
				; REMARK: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: KernelName
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'Kernel Name: '
				; REMARK-NEXT: - KernelName: test_kernel
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: NumSGPR
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'SGPRs: '
				; REMARK-NEXT: - NumSGPR: '24'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: NumVGPR
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'VGPRs: '
				; REMARK-NEXT: - NumVGPR: '9'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: NumAGPR
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'AGPRs: '
				; REMARK-NEXT: - NumAGPR: '43'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: ScratchSize
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'ScratchSize [bytes/thread]: '
				; REMARK-NEXT: - ScratchSize: '0'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: Occupancy
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'Occupancy [waves/SIMD]: '
				; REMARK-NEXT: - Occupancy: '5'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: SGPRSpill
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'SGPRs Spill: '
				; REMARK-NEXT: - SGPRSpill: '0'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: VGPRSpill
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'VGPRs Spill: '
				; REMARK-NEXT: - VGPRSpill: '0'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: BytesLDS
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: 'LDS Size [bytes/block]: '
				; REMARK-NEXT: - BytesLDS: '512'
				; REMARK-NEXT: ...
				; REMARK-NEXT: --- !Analysis
				; REMARK-NEXT: Pass: kernel-resource-usage
				; REMARK-NEXT: Name: KernelEnd
				; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
				; REMARK-NEXT: Function: test_kernel
				; REMARK-NEXT: Args:
				; REMARK-NEXT: - String: '------------------------------'
				; REMARK-NEXT: ...
				@lds = internal unnamed_addr addrspace(3) global [128 x i32] undef, align 4

				define amdgpu_kernel void @test_kernel() !dbg !3 {
				call void asm sideeffect "; clobber v8", "~{v8}"()
				call void asm sideeffect "; clobber s23", "~{s23}"()
				call void asm sideeffect "; clobber a42", "~{a42}"()
				call void asm sideeffect "; use $0", "v"([128 x i32] addrspace(3)* @lds)
				ret void
				}

				; STDERR: remark: foo.cl:42:0: Kernel Name: test_func
				; STDERR-NEXT: remark: foo.cl:42:0: SGPRs: 0
				; STDERR-NEXT: remark: foo.cl:42:0: VGPRs: 0
				; STDERR-NEXT: remark: foo.cl:42:0: AGPRs: 0
				; STDERR-NEXT: remark: foo.cl:42:0: ScratchSize [bytes/thread]: 0
				; STDERR-NEXT: remark: foo.cl:42:0: Occupancy [waves/SIMD]: 0
				; STDERR-NEXT: remark: foo.cl:42:0: SGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:42:0: VGPRs Spill: 0
				; STDERR-NOT: LDS Size
				; STDERR-NEXT: remark: foo.cl:42:0: ------------------------------
				define void @test_func() !dbg !6 {
				call void asm sideeffect "; clobber v17", "~{v17}"()
				call void asm sideeffect "; clobber s11", "~{s11}"()
				call void asm sideeffect "; clobber a9", "~{a9}"()
				ret void
				}

				; STDERR: remark: foo.cl:8:0: Kernel Name: empty_kernel
				; STDERR-NEXT: remark: foo.cl:8:0: SGPRs: 0
				; STDERR-NEXT: remark: foo.cl:8:0: VGPRs: 0
				; STDERR-NEXT: remark: foo.cl:8:0: AGPRs: 0
				; STDERR-NEXT: remark: foo.cl:8:0: ScratchSize [bytes/thread]: 0
				; STDERR-NEXT: remark: foo.cl:8:0: Occupancy [waves/SIMD]: 10
				; STDERR-NEXT: remark: foo.cl:8:0: SGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:8:0: VGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:8:0: LDS Size [bytes/block]: 0
				; STDERR-NEXT: remark: foo.cl:8:0: ------------------------------
				define amdgpu_kernel void @empty_kernel() !dbg !7 {
				ret void
				}

				; STDERR: remark: foo.cl:52:0: Kernel Name: empty_func
				; STDERR-NEXT: remark: foo.cl:52:0: SGPRs: 0
				; STDERR-NEXT: remark: foo.cl:52:0: VGPRs: 0
				; STDERR-NEXT: remark: foo.cl:52:0: AGPRs: 0
				; STDERR-NEXT: remark: foo.cl:52:0: ScratchSize [bytes/thread]: 0
				; STDERR-NEXT: remark: foo.cl:52:0: Occupancy [waves/SIMD]: 0
				; STDERR-NEXT: remark: foo.cl:52:0: SGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:52:0: VGPRs Spill: 0
				; STDERR-NEXT: remark: foo.cl:52:0: ------------------------------
				define void @empty_func() !dbg !8 {
				ret void
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!2}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
				!1 = !DIFile(filename: "foo.cl", directory: "/tmp")
				!2 = !{i32 2, !"Debug Info Version", i32 3}
				!3 = distinct !DISubprogram(name: "test_kernel", scope: !1, file: !1, type: !4, scopeLine: 27, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
				!4 = !DISubroutineType(types: !5)
				!5 = !{null}
				!6 = distinct !DISubprogram(name: "test_func", scope: !1, file: !1, type: !4, scopeLine: 42, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
				!7 = distinct !DISubprogram(name: "empty_kernel", scope: !1, file: !1, type: !4, scopeLine: 8, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)
				!8 = distinct !DISubprogram(name: "empty_func", scope: !1, file: !1, type: !4, scopeLine: 52, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0)

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add remarks to output some resource usageClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 424344

clang/test/Frontend/amdgcn-machine-analysis-remarks.cl

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

llvm/lib/Target/AMDGPU/SIProgramInfo.h

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll

[AMDGPU] Add remarks to output some resource usage
ClosedPublic