Add analyis remarks to output kernel name, register usage, occupancy,
scratch usage, spills, and LDS information.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
596 | This seems like the wrong place to segment the output; could this be made a part of the emitter itself? Or maybe better yet, could all of these values be collected into a single remark? That seems to be how e.g. llvm/lib/CodeGen/RegAllocGreedy.cpp does it: Pass: regalloc Name: SpillReloadCopies Function: f Args: - NumSpills: '1' - String: ' spills ' - TotalSpillsCost: '1.000000e+00' - String: ' total spills cost ' - NumReloads: '1' - String: ' reloads ' - TotalReloadsCost: '1.000000e+00' - String: ' total reloads cost ' - String: generated in function |
Move remarks into its own function. Skip if !ORE. Add clang frontend test. Remove LDSSpillSize.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
596 | We also want to output this in a readable format for the frontend. Collecting it all into a single remark seems to break the output format since clang seems to ignore all newlines from a diagnostic remark. |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
596 | Rather than use newlines, RegAllocGreedy uses spaces; we can debate aesthetics, but I feel like we should have a compelling reason before we choose our own format. For example, with RegAllocGreedy you can get output along the lines of: foo/bar/baz.cpp:42:1: remark: 10 spills 100 total spills cost 7 folded spills 70 total folded spills cost 22 reloads 44 total reloads cost 77 folded reloads 120 total folded reloads cost 1 zero cost folded reloads 78 virtual registers copies 111 total copies cost void foo() ^ RegAllocFast appears to be the original use-case that caused the machine remarks to be invented, and it has been updated recently (https://reviews.llvm.org/D100020) so I don't suspect this is just some legacy cruft. If we follow the same approach we get something like: void AMDGPUAsmPrinter::emitResourceUsageRemarks( const MachineFunction &MF, const SIProgramInfo &CurrentProgramInfo) { if (!ORE) return; ORE->emit([&]() { return MachineOptimizationRemarkAnalysis( "kernel-resource-usage", "ResourceUsage", MF.getFunction().getSubprogram(), &MF.front()) << ore::NV("NumSGPR", CurrentProgramInfo.NumSGPR) << " SGPRs " << ore::NV("NumVGPR", CurrentProgramInfo.NumArchVGPR) << " VGPRs " << ore::NV("NumAGPR", CurrentProgramInfo.NumAccVGPR) << " AGPRs " << ore::NV("ScratchSize", CurrentProgramInfo.ScratchSize) << " scratch bytes/thread " << ore::NV("Occupancy", CurrentProgramInfo.Occupancy) << " occupancy waves/SIMD " << ore::NV("SGPRSpill", CurrentProgramInfo.SGPRSpill) << " SGPR spills " << ore::NV("VGPRSpill", CurrentProgramInfo.VGPRSpill) << " VGPR spills " << ore::NV("BytesLDS", CurrentProgramInfo.LDSSize) << " LDS size bytes/block "; }); } which produces: clang/test/Frontend/amdgcn-machine-analysis-remarks.cl:14:1: remark: 9 SGPRs 10 VGPRs 12 AGPRs 0 scratch bytes/thread 10 occupancy waves/SIMD 0 SGPR spills 0 VGPR spills 0 LDS size bytes/block [-Rpass-analysis=kernel-resource-usage] __kernel void foo() { ^ That seems reasonable to me, and avoids bloating the other formats like YAML with extra remarks, including some that have no actual content (i.e. the "KernelName" and "KernelEnd" remarks are meaningless). Of course I'm just one opinion, and if others prefer the several-remark approach I'm fine with it. | |
1191–1279 |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1191–1279 | I'm sort of surprised the all in one form doesn't come out in something more parsable? It might be worth looking at IR level remarks, since there's probably more usage of them. Are there any existing uses in clang that do something meaningful by parsing out different parts? |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1191–1279 | We could check if the specific remark is enabled to avoid cluttering YAML output: const char *Name = "kernel-resource-usage"; LLVMContext &Ctx = MF.getFunction().getContext(); if (!Ctx.getDiagHandlerPtr()->isAnalysisRemarkEnabled(Name)) return; I do not think using spaces to format the output will work for us. Most of the IR level remarks seems to be related specifically to their pass and/or optimization thus are quite short so readability is not as impacted. We are outputting a decent chunk of information and need some readability for the user. Parsing out different parts like this is mostly a workaround for clang ignoring newlines. I am not aware of any other uses doing it like this since many of them are short and uses spaces for formatting. |
Even with newlines forced via extra remarks, I'm not a big fan of the "-----------------------" remark; it doesn't interact well with other random remarks in the output, for example when I enable all remarks using the pattern '.*' I see:
remark: foo.cl:27:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: test_kernel: MI Instruction count changed from 0 to 6; Delta: 6 remark: foo.cl:27:0: 0 stack bytes in function remark: foo.cl:42:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: test_func: MI Instruction count changed from 0 to 4; Delta: 4 remark: foo.cl:42:0: 0 stack bytes in function remark: foo.cl:42:0: SI insert wait instructions: Function: test_func: MI Instruction count changed from 4 to 5; Delta: 1 remark: foo.cl:8:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: empty_kernel: MI Instruction count changed from 0 to 1; Delta: 1 remark: foo.cl:8:0: 0 stack bytes in function remark: foo.cl:52:0: AMDGPU DAG->DAG Pattern Instruction Selection: Function: empty_func: MI Instruction count changed from 0 to 1; Delta: 1 remark: foo.cl:52:0: 0 stack bytes in function remark: foo.cl:52:0: SI insert wait instructions: Function: empty_func: MI Instruction count changed from 1 to 2; Delta: 1 remark: <unknown>:0:0: BasicBlock: : 2 remark: foo.cl:27:0: 6 instructions in function remark: foo.cl:27:0: Kernel Name: test_kernel remark: foo.cl:27:0: SGPRs: 24 remark: foo.cl:27:0: VGPRs: 9 remark: foo.cl:27:0: AGPRs: 43 remark: foo.cl:27:0: ScratchSize [bytes/thread]: 0 remark: foo.cl:27:0: Occupancy [waves/SIMD]: 5 remark: foo.cl:27:0: SGPRs Spill: 0 remark: foo.cl:27:0: VGPRs Spill: 0 remark: foo.cl:27:0: LDS Size [bytes/block]: 512 remark: foo.cl:27:0: ------------------------------ remark: <unknown>:0:0: BasicBlock: : 2 remark: foo.cl:42:0: 5 instructions in function remark: foo.cl:42:0: Kernel Name: test_func remark: foo.cl:42:0: SGPRs: 0 remark: foo.cl:42:0: VGPRs: 0 remark: foo.cl:42:0: AGPRs: 0 remark: foo.cl:42:0: ScratchSize [bytes/thread]: 0 remark: foo.cl:42:0: Occupancy [waves/SIMD]: 0 remark: foo.cl:42:0: SGPRs Spill: 0 remark: foo.cl:42:0: VGPRs Spill: 0 remark: foo.cl:42:0: ------------------------------ remark: <unknown>:0:0: BasicBlock: : 1
If we do keep the delimiter remarks, can we have one at the beginning as well? At least then other remarks don't appear to "bleed" into the new block of remarks.
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1191–1279 |
I would expect this sort of thing to happen before it gets serialized to an unstructured string, or for it to happen on a structured output like YAML.
Can we fix clang to respect newlines? |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1277 | Get rid of the ——. It’s not a remark and only kind of makes sense if you are printing these with others |
If possible, I would like to keep some kind of delimiter. I like the idea of having it at the beginning and at the end of the section. The best option would be to convince clang to print new lines.
It seems like the stripping of non-printable characters is intentional, but only for diagnostics that don't use the TableGen based diagnostic formatting scheme in clang. I'm not sure exactly why, but regardless there is precedent for newlines in other messages.
To get the newlines to hit the terminal for our case the following patch is enough:
--- a/clang/lib/Basic/Diagnostic.cpp +++ b/clang/lib/Basic/Diagnostic.cpp @@ -812,7 +812,7 @@ FormatDiagnostic(const char *DiagStr, const char *DiagEnd, getArgKind(0) == DiagnosticsEngine::ak_std_string) { const std::string &S = getArgStdStr(0); for (char c : S) { - if (llvm::sys::locale::isPrint(c) || c == '\t') { + if (llvm::sys::locale::isPrint(c) || c == '\t' || c == '\n') { OutStr.push_back(c); } }
To get the right indentation inserted for the extra lines the TextDiagnostic consumer needs to also be updated, but that should be a small change. Only one test breaks, and it seems useful for it to pick up the new behavior anyway.
The only other potential conflict is that in TextDiagnostic specifically there is also the Clang option -fmessage-length=N which will forcibly wrap diagnostic messages on word-boundaries (although never breaking up a word across lines). It seems to only apply to text preceding the first newline in the message, so it is likely just a non-issue?
I am not sure if allowing clang to accept newlines is a good idea. It seems like clang wants to know what type of message is being outputted. For example whether this is a remark, warning, etc. but allowing for a diagnostic to output their own newline makes it ambiguous where exactly that output is coming from.
It already supports newlines in any diagnostic which doesn't use the trivial format string "%0", and at least clang/test/Misc/diag-line-wrapping.cpp explicitly tests this behavior.
It seems reasonable that clang could add a prefix or indentation scheme while emitting multi-line diagnostics to the terminal, to help with the ambiguity issue. For example instead of the current output:
clang/test/Misc/diag-line-wrapping.cpp:8:14: error: non-static member 'f' found in multiple base-class subobjects of type 'B': struct DD -> struct D1 -> struct B struct DD -> struct D2 -> struct B
Maybe we could have something like:
clang/test/Misc/diag-line-wrapping.cpp:8:14: error: ... ...: non-static member 'f' found in multiple base-class subobjects of type 'B': ...: struct DD -> struct D1 -> struct B ...: struct DD -> struct D2 -> struct B
I don't know if changing this kind of output is a breaking change, though? I do know some tooling parses this output.
But it’s not a section and no actual grouping concept here. You just happen to see this printed in order. Any delimiter should be introduced as a display function, not emitted as part of the remarks themselves
Remove "--------" delimiter. Change ScratchSize [bytes/thread] to ScratchSize [bytes/lane]. Use lambda expression to emit remarks. Do not output yaml if specific remark is not enabled. Add indentation to make it easier to tell which resource usage remark belong to which kernel.
I will let others comment, but I think this is a perfectly reasonable alternative, and I much prefer using indentation over the delimiter-remark approach. LGTM, assuming nobody else objects
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1228 | Why is this kernel name? Do we not emit these for other functions? |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1228 | We do emit these for other functions. Would this be better off as "Function Name" instead of kernel? |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1228 | Yes |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1207 | Why &&? |
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
1207 | Looking at the ore::NV constructors I would vote that this just be by value, i.e. auto Argument. Everything that can be an Argument is a small "prefer passing by value" type. |
Needs to skip this whole block if !ORE? Also should move all of this to a separate function