This is an archive of the discontinued LLVM Phabricator instance.

[codegen] Add StackFrameLayoutAnalysisPass
ClosedPublic

Authored by paulkirth on Oct 7 2022, 2:26 PM.

Details

Summary

Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding -Rpass-analysis=stack-frame-layout
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
myhsu added inline comments.Oct 26 2022, 9:34 PM
llvm/lib/CodeGen/StackFramePrinterPass.cpp
79 ↗(On Diff #470989)

const SlotData &? If this is true please also update the call site (e.g. line 143)

138 ↗(On Diff #470989)

nit: use llvm::sort so that we can simply write llvm::sort(SlotInfo, <comparer>)

llvm/test/CodeGen/ARM/stack-frame-printer.ll
221 ↗(On Diff #470989)

nit: is it possible to clean up some of the irrelevant strings like the producer and directory fields?

Thanks for the feedback. Those are good catches. I'll send out a patch cleaning those up.

Address comments

  • fix brancing style
  • update parameters to use const
  • use llvm::sort in place of std::sort
  • remove path strings from debug info in test files
paulkirth marked 9 inline comments as done.Oct 31 2022, 6:50 PM

Can we add a Note diagnostic when we emit a -Wframe-larger-than= that alludes to re-running with -mllvm -print-stack-frame to get additional info?

We should update the release notes of clang to mention this feature, too.

arsenm added a subscriber: arsenm.Nov 15 2022, 12:51 PM

Can we add a Note diagnostic when we emit a -Wframe-larger-than= that alludes to re-running with -mllvm -print-stack-frame to get additional info?

We should update the release notes of clang to mention this feature, too.

I don't think we should be pointing users to -mllvm flags. Plus, I don't really think random dbgs() printing is going to interact correctly with other diagnostics

llvm/lib/CodeGen/StackFramePrinterPass.cpp
120–122 ↗(On Diff #471235)

Check MFI.isDead instead?

Can we add a Note diagnostic when we emit a -Wframe-larger-than= that alludes to re-running with -mllvm -print-stack-frame to get additional info?

We should update the release notes of clang to mention this feature, too.

I don't think we should be pointing users to -mllvm flags. Plus, I don't really think random dbgs() printing is going to interact correctly with other diagnostics

Then how will users know about super cool things? We could create a new front end flag for this -mllvm flag, a la https://reviews.llvm.org/D131986.

Can we add a Note diagnostic when we emit a -Wframe-larger-than= that alludes to re-running with -mllvm -print-stack-frame to get additional info?

We should update the release notes of clang to mention this feature, too.

I don't think we should be pointing users to -mllvm flags. Plus, I don't really think random dbgs() printing is going to interact correctly with other diagnostics

Then how will users know about super cool things? We could create a new front end flag for this -mllvm flag, a la https://reviews.llvm.org/D131986.

Sorry, bad example; that example is adding IR Function attributes, not setting -mllvm flags (which tend to get dropped during LTO unless re-passed to the linker), but there is a -mllvm flag for that. https://reviews.llvm.org/D127988

Regardless, we _could_ wire up a frontend flag for more info, I think. Not sure we have precedence for that, but it would be really really helpful to folks presented with -Wframe-larger-than= warnings.

Can we add a Note diagnostic when we emit a -Wframe-larger-than= that alludes to re-running with -mllvm -print-stack-frame to get additional info?

We should update the release notes of clang to mention this feature, too.

I don't think we should be pointing users to -mllvm flags. Plus, I don't really think random dbgs() printing is going to interact correctly with other diagnostics

Then how will users know about super cool things? We could create a new front end flag for this -mllvm flag, a la https://reviews.llvm.org/D131986.

I'd be a bit more comfortable routing this through the backend remarks infrastructure, although it's a lot bigger than everything else currently reported there

Thanks for the feedback! I have a few questions I'm hoping you can answer.

I don't think we should be pointing users to -mllvm flags. Plus, I don't really think random dbgs() printing is going to interact correctly with other diagnostics

I modeled this pass off of the MachineFunctionPrinterPass, does that mean it it should also use a different stream? This seems to be a fairly common pattern, so should we be filing bugs and tracking work in this area?

I'd be a bit more comfortable routing this through the backend remarks infrastructure, although it's a lot bigger than everything else currently reported there

I'm not sure that this diagnostic belongs in optimization remarks, though. It isn't describing any of the decision making that went into the stack layout, which is what I think most remarks typically describe. I'm basing that on https://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports. Is my interpretation of that too narrow?

Also, is there something special about the remarks output that makes it better? Is the setup/initialization more careful than for the other streams? I'd like to understand the trade-off a bit better. Our documentation makes it seem as though its geared towards compiler engineers, where I view this as a more general diagnostic output, like the other printing passes.

llvm/lib/CodeGen/StackFramePrinterPass.cpp
120–122 ↗(On Diff #471235)

Oh, good suggestion. That check bothered me, but I missed that API. I'll update this patch to reflect your suggestion.

Any chance we could squirrel the info away (I assume there's a reason we can't compute the info where the warn-stack-size LLVM feature is implemented in PrologEpilogInserter.cpp) somewhere, and emit it as part of the frame-larger-than/warn-stack-size diagnostic?

(also, we do already have an opt remark for stack frame size in general (in PrologEpilogInserter, very close to where warn-stack-size is implemented), so it seems OK to use the remark infrastructure for a more detailed stack report - but ideally if the point is to make frame-larger-than better, it'd be good to include the info in that diagnostic)

as a more general diagnostic output, like the other printing passes.

As an aside: I don't think any "printing pass" is designed to be used beyond LLVM compiler engineers - they're implementation details of the compiler, even/much moreso than the optimization remarks infrastructure, which is user-surfaced/clang-flag-supported/passed through suitable APIs (rather than emitted raw to streams from the middle/backend). Optimization remarks are plumbed through the diagnostic infrastructure, can be suppressed/enabled, have file/line info at least some of the time, get all the clang diagnostic formatting infrastructure (eg: current work to have a SARIF output mode would be done up in clang, etc - and raw/direct output from LLVM wouldn't be captured/handled by that work, for instance), colouring, etc.

Any chance we could squirrel the info away (I assume there's a reason we can't compute the info where the warn-stack-size LLVM feature is implemented in PrologEpilogInserter.cpp) somewhere, and emit it as part of the frame-larger-than/warn-stack-size diagnostic?

(also, we do already have an opt remark for stack frame size in general (in PrologEpilogInserter, very close to where warn-stack-size is implemented), so it seems OK to use the remark infrastructure for a more detailed stack report - but ideally if the point is to make frame-larger-than better, it'd be good to include the info in that diagnostic)

Originally, I had prototyped this to run when emitting -Wframe-larger-than diagnostics, however being able to dump the stack layout easily seems valuable on its own. The biggest advantage to delaying the pass is that we can print better diagnostics after the LiveDebugValues pass has a chance to run. The layout isn't affected, but we can print out more variable mappings by delaying the printing pass.

as a more general diagnostic output, like the other printing passes.

As an aside: I don't think any "printing pass" is designed to be used beyond LLVM compiler engineers - they're implementation details of the compiler, even/much moreso than the optimization remarks infrastructure, which is user-surfaced/clang-flag-supported/passed through suitable APIs (rather than emitted raw to streams from the middle/backend). Optimization remarks are plumbed through the diagnostic infrastructure, can be suppressed/enabled, have file/line info at least some of the time, get all the clang diagnostic formatting infrastructure (eg: current work to have a SARIF output mode would be done up in clang, etc - and raw/direct output from LLVM wouldn't be captured/handled by that work, for instance), colouring, etc.

Thanks for the clarification. Those are good points, so thank you for the detailed answer.

Any chance we could squirrel the info away (I assume there's a reason we can't compute the info where the warn-stack-size LLVM feature is implemented in PrologEpilogInserter.cpp) somewhere, and emit it as part of the frame-larger-than/warn-stack-size diagnostic?

(also, we do already have an opt remark for stack frame size in general (in PrologEpilogInserter, very close to where warn-stack-size is implemented), so it seems OK to use the remark infrastructure for a more detailed stack report - but ideally if the point is to make frame-larger-than better, it'd be good to include the info in that diagnostic)

Originally, I had prototyped this to run when emitting -Wframe-larger-than diagnostics, however being able to dump the stack layout easily seems valuable on its own. The biggest advantage to delaying the pass is that we can print better diagnostics after the LiveDebugValues pass has a chance to run. The layout isn't affected, but we can print out more variable mappings by delaying the printing pass.

Fair enough - could the warn-stack-size warning be moved to there, then, and then the information included in the warning? It could have both a warning and remark form, so folks could use the remark form when they just want all the reports or don't want the reports phrased as a problem, but as an informational message? (though this may or may not be worth it - I guess people can turn on the warning, lower the threshold, and specifically make this warning a non-error, which amounts to roughly the same thing as a remark)

paulkirth updated this revision to Diff 475602.Nov 15 2022, 3:18 PM

Rebase and address comments

  • Replace magic comparison with MFI.isDeadIbjectIndex()
  • Small code improvement by using a constructor w/ emplace_back

Fair enough - could the warn-stack-size warning be moved to there, then, and then the information included in the warning? It could have both a warning and remark form, so folks could use the remark form when they just want all the reports or don't want the reports phrased as a problem, but as an informational message? (though this may or may not be worth it - I guess people can turn on the warning, lower the threshold, and specifically make this warning a non-error, which amounts to roughly the same thing as a remark)

I guess it could be moved, but I'm not sure it makes the most sense. PrologEpilogueInserter is also already emitting the optimization remarks for stack size, so IMO it makes sense to keep them together, since that's the place where that information is determined. But there is no technical reason why we couldn't move it later, since all the information to do the check is readily available.

Another point to consider is that we already have several ways to expose stack sizes to users. -Wframe-larger-than gives warnings when a threshold is exceeded, but we also provide -fstack-size-section, and -fstack-usage which output information about every function in the module. And, as mentioned earlier, there is an optimization remark for stack sizes too.

Regardless, I'll take a look and see how easy it will be to expose this through the remarks infrastructure, since that seems to be a generally good idea here.

Any chance we could ... emit it as part of the frame-larger-than/warn-stack-size diagnostic?

This pass prints a TON of (helpful) information...we have a lot of -Wframe-larger-than= instances triggered in our codebase...I think having this on by default would blow our logs significantly. That's why it might be nice to have a Note suggest a flag (default off) for more info on a case by case basis.

As a quick test, I hacked the printer pass to generate an output string, and passed that into the remarks emitter. From opt or llc things look as expected. There's some additional output, but its limited.

I see a more serious issue when using it from Clang, as the output is truncated , as in it only printed up to the first stack slot in my test. Its also all bold, which isn't great. I have a feeling that my shortcut is the root cause of the truncation, but I haven't tracked down the issue exactly.

Do any other remarks output complex data like this? From what I can see they tend to be fairly short…

I also thought about printing each line as a remark, but that seems to get noisy pretty fast, since each line would have the remark <file location> tag plus an [-Rpass-analysis=stackframe-printer] at the end.

Example truncated output (each function should have several lines w/ offset from SP, alignment, and size):

$ clang -O1 -Rpass-analysis=stackframe-printer llvm/test/CodeGen/X86/stack-frame-printer.ll -c -o /dev/null -mllvm -print-stack-frame                        

remark: <unknown>:0:0: 
# Stack Layout: stackSizeWarning
 [-Rpass-analysis=stackframe-printer]
remark: <unknown>:0:0: 
# Stack Layout: cleanup_array
Offset            Align     Size      
[SP-8]      Spill 16        8         

 [-Rpass-analysis=stackframe-printer]
remark: <unknown>:0:0: 
# Stack Layout: cleanup_result
Offset            Align     Size      
[SP-8]      Spill 16        8         

 [-Rpass-analysis=stackframe-printer]
remark: <unknown>:0:0: 
# Stack Layout: do_work
Offset            Align     Size      
[SP-8]      Spill 16        8         

 [-Rpass-analysis=stackframe-printer]
remark: <unknown>:0:0: 
# Stack Layout: gen_array
Offset            Align     Size      
[SP-8]      Spill 16        8         

 [-Rpass-analysis=stackframe-printer]
remark: <unknown>:0:0: 
# Stack Layout: caller
Offset            Align     Size      
[SP-8]      Spill 16        8         

 [-Rpass-analysis=stackframe-printer]

Output from llc (which looks more or less as expected):

$ llc -mcpu=corei7 -O1 -print-stack-frame -pass-remarks-analysis=stackframe-printer < llvm/test/CodeGen/X86/stack-frame-printer.ll 2>&1 >/dev/null

remark: <unknown>:0:0: 
# Stack Layout: stackSizeWarning
Offset            Align     Size      
[SP-88]           16        80        
    buffer @ frame-diags.c:30
[SP-168]          16        80        
    buffer2 @ frame-diags.c:33


remark: <unknown>:0:0: 
# Stack Layout: cleanup_array
Offset            Align     Size      
[SP-8]      Spill 16        8         
[SP-16]           8         8         
    a @ dot.c:13


remark: <unknown>:0:0: 
# Stack Layout: cleanup_result
Offset            Align     Size      
[SP-8]      Spill 16        8         
[SP-16]           8         8         
    res @ dot.c:21


remark: <unknown>:0:0: 
# Stack Layout: do_work
Offset            Align     Size      
[SP-8]      Spill 16        8         
[SP-12]           4         4         
    i @ dot.c:55
[SP-24]           8         8         
    AB @ dot.c:38
[SP-28]           4         4         
    len @ dot.c:37
[SP-32]           4         4         
[SP-40]           8         8         
    out @ dot.c:32
[SP-48]           8         8         
    B @ dot.c:32
[SP-56]           8         8         
    A @ dot.c:32
[SP-60]           4         4         
    sum @ dot.c:54


remark: <unknown>:0:0: 
# Stack Layout: gen_array
Offset            Align     Size      
[SP-8]      Spill 16        8         
[SP-12]           4         4         
    i @ dot.c:69
[SP-16]           4         4         
    size @ dot.c:62
[SP-24]           8         8         
    res @ dot.c:65
[SP-32]           8         8         


remark: <unknown>:0:0: 
# Stack Layout: caller
Offset            Align     Size      
[SP-8]      Spill 16        8         
[SP-12]           4         4         
    ret @ dot.c:81
[SP-16]           4         4         
[SP-24]           8         8         
    res @ dot.c:80
[SP-32]           8         8         
    B @ dot.c:79
[SP-40]           8         8         
    A @ dot.c:78
[SP-44]           4         4         
    err @ dot.c:83
[SP-48]           4         4         
    size @ dot.c:77

Are there any thoughts about how to make this work more nicely w/ optimization remarks from Clang?

The kernel resource remarks added in 67357739c6d36a61972c1fc0e829e35cb5375279 are probably the current heaviest remarks. I believe there were some changes to newline handling for it

This pass prints a TON of (helpful) information...we have a lot of -Wframe-larger-than= instances triggered in our codebase...I think having this on by default would blow our logs significantly. That's why it might be nice to have a Note suggest a flag (default off) for more info on a case by case basis.

I'm not sure we'd need/want to optimize a diagnostic experience for a codebase that's got a lot of latent warnings. If people have lots of existing warnings they should probably turn the warning off (in those instances, at least) & I think saying "this function has a frame that's too big" without any info is pretty hard to act on, so it doesn't seem totally outside the realm of good diagnostics for this particular diagnostic to print a lot of info to help a user act on it/fix the issue when they get it. If most of the time you see this warning and want to fix it, you had to run your build again with an extra flag - I'd say that was a bad diagnostic experience, we should give them enough info the first time around.

If most of the time you don't need this info - yeah, that's something we should figure out, how to provide the right amount of info to be actionable, but in this case I suspect more-often-than-not you want some kind of report/breakdown. If it's a case of not having a way to make it more targeted/actionable and we just have two options ("too terse" and "too verbose") fairly evenly split (it's not clear that most of the time one or the other is the right answer) - I guess two different warning flags or some kind of modifier flag could be suitable. I guess we have that for template recursion things, maybe? Where you can ask if you want the full expansion, but by default we give you a summarized one, skipping expansions we don't think are relevant (I might be misremembering).

The kernel resource remarks added in 67357739c6d36a61972c1fc0e829e35cb5375279 are probably the current heaviest remarks. I believe there were some changes to newline handling for it

D127923 is the patch for the line handling

Sorry, it took me a while to circle back to this.

I'm not sure we'd need/want to optimize a diagnostic experience for a codebase that's got a lot of latent warnings. If people have lots of existing warnings they should probably turn the warning off (in those instances, at least) & I think saying "this function has a frame that's too big" without any info is pretty hard to act on, so it doesn't seem totally outside the realm of good diagnostics for this particular diagnostic to print a lot of info to help a user act on it/fix the issue when they get it. If most of the time you see this warning and want to fix it, you had to run your build again with an extra flag - I'd say that was a bad diagnostic experience, we should give them enough info the first time around.

On the topic of how frequent warnings should be, it's important to keep in mind that a function with a large stack frame will generate a diagnostic in every caller who has inlined that function. So even if you were warning free in the last build a single function using too much stack could generate a lot of diagnostics.

If most of the time you don't need this info - yeah, that's something we should figure out, how to provide the right amount of info to be actionable, but in this case I suspect more-often-than-not you want some kind of report/breakdown. If it's a case of not having a way to make it more targeted/actionable and we just have two options ("too terse" and "too verbose") fairly evenly split (it's not clear that most of the time one or the other is the right answer) - I guess two different warning flags or some kind of modifier flag could be suitable. I guess we have that for template recursion things, maybe? Where you can ask if you want the full expansion, but by default we give you a summarized one, skipping expansions we don't think are relevant (I might be misremembering).

While I agree that running the build again isn't ideal, I don't' see a good way to balance the need for clang to report concise errors with the need for more information in this case. Reading D127923, I think it's clear that many of clang's maintainers would like to keep diagnostics concise if possible. I should also note that we already give some minimal context to -Wframe-larger-than diagnostics by printing the breakdown of the stack usage between spills, program variables, and the UnSafeStack.

I think the right approach in this case is to allow developers to opt into the behavior when needed, so your suggestion of gating it behind a flag seems like a good way to express that. I also think the diagnostic is useful enough on its own to warrant use outside of -Wframe-larger-than, despite being very useful when triaging those type of warnings. I guess that means I should take a harder look at plumbing this through the remarks infrastructure, even if I don't love the output.

paulkirth updated this revision to Diff 484694.Dec 21 2022, 3:10 PM
paulkirth retitled this revision from [codegen] Display stack layouts in console to [codegen] Add a remarks based Stack Layout Analysis pass.
paulkirth edited the summary of this revision. (Show Details)

Refactor implementation to use remarks infrastructure

  • Simplify interfaces
  • Add tests in Clang
  • Update pipeline tests
  • Rename pass and test files
paulkirth updated this revision to Diff 486030.Jan 3 2023, 10:54 AM

Add missing pipline test updates for PowerPC and AMDGPU

This is great! Any chance we can use MachineFrameInfo::StackProtectorIdx to annotate the slot that is reserved for the stack protector?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
85

I don't think this should ever be null.

paulkirth updated this revision to Diff 486050.Jan 3 2023, 12:29 PM

Avoid problems with path separators on windows, and ignore path prefix in diagnostic

@thegameg Maybe? It seems straightforward, but I'll need take a look. If it's easy(which I think it willbe), I'll try to update this patch, if it's more complex, I'll probably do a separate patch to add that feature.

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
85

Ah, good point. I always default to nullptr checks, but in this case that should be impossible. thanks for pointing that out.

paulkirth updated this revision to Diff 486094.Jan 3 2023, 2:37 PM

Remove unnecesary null pointer check.

paulkirth updated this revision to Diff 486112.Jan 3 2023, 4:02 PM

Identify the stack protector in output

thegameg added inline comments.Jan 3 2023, 4:53 PM
llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
93

Why are we emitting the function name? In the serialized remarks (-fsave-optimization-record) it comes in the Function field, and in the diagnostics (-Rpass*) it uses debug info to show the source around it.

if it's for testing only, you can test using the serialized remarks with YAML.

125

We usually use identifiers for remark names, so here StackLayout instead of Stack Layout.

179

From what I can see, you've focused on the -Rpass output using diagnostics and tried to emit a pretty-printed version for that on the command line.

We use remarks through their serialized version as well, through -fsave-optimization-record which will emit a YAML file that can be used in scripts and other post-processing tools.

I think this should be something in between where it looks user-friendly on the command-line but also easy to post-process.

One way would be to do something similar to the memory op remarks, which are used here: llvm/test/Transforms/Util/trivial-auto-var-init-call.ll.

I could see something where you emit a remark for each slot (+ location), with ore::NV used for each piece of information that is useful, something like:

ORE << MachineOptimizationRemarkAnalysis(...) << "Stack slot: offset: " << ore::NV("Offset", D.offset)
                                                                                    << "type: " << ore::NV("Type", type)
[...]

and could generate something like:

--- !Analysis
Pass:            stack-frame-layout
Name:            StackSlot
Function:        stackSizeWarning
Args:
  - String:          'Stack slot: offset: '
  - Offset:         '[SP-8]'
  - String:          ', type: '
  - Type:            'spill'
  - String:          ', align: '
  - Align:           '16'
  - String:          ', size: '
  - Align:           '8'
...

which would look like this on the command line:

remark: Stack slot: offset: [SP-8], type: spill, align: 16, size 8

@arsenm @thegameg @nickdesaulniers @dblaikie @phosek Can we reach a consensus here on how to output this kind of information? I feel like I've been told to move towards remarks as the output method, but that the current diagnostic that tries to lay out the stack visually isn't a good fit since remarks are also serialized ... I'm not all that convinced that providing output other than a visual layout for this information is all that useful in this particular case, but I don't have an issue with supporting it either. I think this is especially true, since memory layouts are tricky to reason about.

For that reason, I'm pretty sure we want to actually show the user the layout directly in the diagnostic. My concern is that if we change the output to better fit within the remarks infrastructure, we lose an effective way to show users what's happening. If we take away the visual representation, then we'll end up needing to run a separate tool and post-process the serialized output to have a user make any real sense of how things are layed out. That seems like a pretty bad user experience, so I'd much rather find a way to have the compiler emit this information directly.

Does anyone have thoughts here on how to move forward?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
93

I followed the example from since it was brought up earlier. https://github.com/llvm/llvm-project/blob/f40d25dd8d3ad7bcfa8f5e8f74f245ab1a7675df/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp#L1223

Also, there is not guarantee you have debug info when you run this, right? Also you won't get a function name in the console if you run this over IR, even when debug information is included w/in the IR.

I see remark: <unknown>:0:0: ... when running any of the IR tests.

125

hmm, I was following the example in https://github.com/llvm/llvm-project/blob/f40d25dd8d3ad7bcfa8f5e8f74f245ab1a7675df/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp#L1223, but it looks like I may have swapped them. I'll take a closer look at the output and fix accordingly (here and elsewhere).

179

Thanks for the suggestion. While I understand the desire to make the output more machine readable, I don't think this is a good place to do so. Layouts are hard to reason about and there's actually a fairly decent way we can display this to users and convey exactly where things are. The entire point of this patch was to give a somewhat visual representation to how the stack is layed out, and help debug stack layout issues. It's one of the reasons I didn't originally do this with remarks, but there's been a fair amount of discussion to this point already w/in this patch.

If this isn't a good fit for remarks with the current format, then I'm kind of stuck on how to satisfy the various requirements on how to output and display this kind of information...

@arsenm @thegameg @nickdesaulniers @dblaikie @phosek Can we reach a consensus here on how to output this kind of information? I feel like I've been told to move towards remarks as the output method, but that the current diagnostic that tries to lay out the stack visually isn't a good fit since remarks are also serialized ... I'm not all that convinced that providing output other than a visual layout for this information is all that useful in this particular case, but I don't have an issue with supporting it either. I think this is especially true, since memory layouts are tricky to reason about.

For that reason, I'm pretty sure we want to actually show the user the layout directly in the diagnostic. My concern is that if we change the output to better fit within the remarks infrastructure, we lose an effective way to show users what's happening. If we take away the visual representation, then we'll end up needing to run a separate tool and post-process the serialized output to have a user make any real sense of how things are layed out. That seems like a pretty bad user experience, so I'd much rather find a way to have the compiler emit this information directly.

Does anyone have thoughts here on how to move forward?

I think remarks are the right way to go with this. They provide a pretty flexible way to emit both strings (for formatting and visual representations) and machine-readable data through ore::NV entries. We just need to find a consensus on how it looks like in both the command-line and the serialized mode.

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
179

I don't see a huge difference between:

remark: Offset            Align     Size
remark: [SP-8]      Spill 16        8
remark: [SP-16]     Spill 8         8
remark: [SP-24]     Spill 16        8

and

remark: Stack slot: offset: [SP-8], type: spill, align: 16, size 8
remark: Stack slot: offset: [SP-16], type: spill, align: 8, size 8
remark: Stack slot: offset: [SP-24], type: spill, align: 16, size 8

If you think this is what makes it really useful, we also support multi-line remarks (see LowerMatrixIntrinsics.cpp, and you can still provide precise ore::NV-like entries. In that case you should probably emit one big MachineOptimizationRemarkAnalysis with [SP-8], spill, 16, and 8 as ore::NV entries, and the rest as a strings.

I think ideally I'd like to surface something like this:

A:

or

B:

The first image uses stdout, and the second uses remarks, but prints everything from the function as a single string. This provides some output that is pretty easy for a human to consume, and wouldn't be too hard to parse for an external tool. It isn't as nice from a machine readable perspective as something like JSON or YAML.

The big issue that I'm seeing is that printing multiple remarks ends up printing the path to the source file given as provided on the commandline. If that's from a build system, it will likely be a full path and a user is likely going to have a similar issue. The following examples are, IMO, much harder to make sense of. I see similar issues, even if I make my terminal full screen width.

C:

or (if I format things as suggested)

D:

I'm a bit stumped on how to make this work nicely w/ remarks and provide good support for both the CLI and YAML...

I don't think I understand why we can't achieve B with remarks? In C and D you generate one remark for each line, can't we generate a single multi-line remark instead?

paulkirth updated this revision to Diff 487999.Jan 10 2023, 2:46 PM

Rebase.

  • Switch to multi-line remarks
  • Update tests
paulkirth updated this revision to Diff 488023.Jan 10 2023, 4:10 PM

Add test for YAML output

This looks great, thanks for updating this! A few more comments inline.

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
56
66
111

Is this still worth being a separate function?

129

Unused?

180

Can you add a comment on what ValOffset is?

@thegameg I think I finally understood what you meant re: multi-line remarks. Sorry for the back/forth on that, it just didn't click for me until you commented on the screenshot.

BTW, is there a way to nest some of the items? Ideally we'd be able to have a Slot in the YAML that contains all the various data, similar to how DebugLoc is a more complex object with fields for File, Line, and Column. That way we could group all the data for each slot including variable locations.

I know how that would look in YAML, but I'm unaware of how we'd do that with the existing remarks interfaces... or if doing so would massively change the CLI output. any pointers here?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
56

Good catch! TY

111

yeah, probably not

129

Yeah, looks like I forgot to remove this. Thanks

BTW, is there a way to nest some of the items? Ideally we'd be able to have a Slot in the YAML that contains all the various data, similar to how DebugLoc is a more complex object with fields for File, Line, and Column. That way we could group all the data for each slot including variable locations.

I know how that would look in YAML, but I'm unaware of how we'd do that with the existing remarks interfaces... or if doing so would massively change the CLI output. any pointers here?

Unfortunately no, there is no easy way to do that right now, but I agree it would be nice. The Args seem easy enough to process that I wouldn't bother trying to group them. optrecord.py (and libRemarks) will handle it in the right order so the users can easily work with it.

Address comments.

  • document what ValOffset is used for
  • remove dead code
  • fix typo
thegameg accepted this revision.Jan 11 2023, 12:05 PM

Looks great with the leftover minor changes, feel free to land this, thanks! I'll give this a try internally and provide feedback if any.

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
56

I also suggested Invalid instead of Error but it's up to you.

111

Looks like this stayed around unused

This revision is now accepted and ready to land.Jan 11 2023, 12:05 PM
  • Actually remove dead code
  • Update pass description to be more accurate
  • fix typo
  • update enum member name
paulkirth updated this revision to Diff 488427.Jan 11 2023, 4:56 PM

Update clang test for windows file separators.

Add target triple to all RUN lines.

Seems like the layout is the same on 64-bit windows, but for some reason
clang.exe chooses i386 unless the triple is set. So just set the triple
uniformly, and avoid any potential problems.

Also add a REQUIRES line to the test, since these need an x86_64 target

Make YAML tests less brittle.

It would be really nice if we could limit this to a specific function somehow.

Quick feedback from giving Diff 488727 a spin on the Linux kernel:

via ARCH=arm make LLVM=1 -j128 -s allyesconfig all I found:

  CC      drivers/net/ethernet/mellanox/mlx5/core/en_main.o
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than]
static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
           ^

When I rebuild with:

ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout

I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about mlx5e_setup_tc. Is there a way to limit that?

Filtering through the logs though, I do see:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336

which is good! If I flip on debug info and rebuild, this looks like:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
    old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934
    old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
    new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424

Which is pretty awesome!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of struct mlx5e_params rather than stack allocating them!

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for struct mlx5e_channels. Cool stuff! Now we can finally debug -Wframe-larger-than=!!!

clang/test/Frontend/stack-layout-remark.c
9

Please update:

  1. the patch description/commit message
  2. clang/docs/ReleaseNotes.rst

    to mention this new flag. I kind of wish that -Wstack-frame-larger-than= alluded to this somehow.
llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
66

@paulkirth make sure to mark these code review comments as done when implemented, please!

192

do we need EndIdx, or can we simply use Idx != ObjEnd as the loop terminating condition?

Doesn't look like we use ObjBeg afterwards either; is that necessary? Seems like the call to MFI.getObjectIndexBegin(); could happen in the initialization list of this for loop?

201–203

Consider implementing operator< on SlotData; then I think you can simply call std::sort on SlotInfo and drop this lamda.

207–209
217

If you sink this def into the for loop, then you don't need to clear it.

221–223
228–229

I guess this can be removed since the for loop below wont do anything if there's 0 memoperands?

llvm/test/CodeGen/AArch64/O0-pipeline.ll
76–77

Dang, this adds a bunch of passes to O0 pipelines...any creative ideas on how to not do that?

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

what's going on in this test? Looks like the pass is being run twice or something?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
45

Consider replacing uses of PassName with DEBUG_TYPE since they have the same value.

99–100

consider sinking this closer to use. If you only call one method on it, and it could fit in one line, consider not even creating a variable. i.e.
getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE().emit(ReM)

107

can you sink the call to genSlotDbgMapping() into this arg list? SlotMap seems unreferenced otherwise.

clang/test/Frontend/stack-layout-remark.c
9

Perhaps in the documentation for -Wframe-larger-than=? i.e. adding a code Documentation = [{}] block to BackendFrameLargerThan record in clang/include/clang/Basic/DiagnosticGroups.td or something.

paulkirth updated this revision to Diff 488784.Jan 12 2023, 3:00 PM
paulkirth marked 21 inline comments as done.

Address comments

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
45

ugh, I forgot to update that after adding DEBUG_TYPE. Thanks for pointing that out.

107

surprisingly this resulted in a compiler error:

StackFrameLayoutAnalysisPass.cpp:104:37: error: non-const lvalue reference to type 'SmallDenseMap<...>' cannot bind to a temporary of type 'SmallDenseMap<...>'
    emitStackFrameLayoutRemarks(MF, genSlotDbgMapping(MF), Rem);

So I've just moved it into emitStackFrameLayoutRemarks() and dropped the parameter, which avoids the issue entirely. Since SlotMap is only used there now, it makes more sense to structure the code like this anyway.

217

for some reason I was under the impression that our style guidelines prefered using clear in situations like this, but I can't find it so I think my brain tricked me. Thanks for pointing that out.

It would be really nice if we could limit this to a specific function somehow.

I think you can do that, right ?
see:
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

Quick feedback from giving Diff 488727 a spin on the Linux kernel:

via ARCH=arm make LLVM=1 -j128 -s allyesconfig all I found:

  CC      drivers/net/ethernet/mellanox/mlx5/core/en_main.o
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than]
static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
           ^

When I rebuild with:

ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout

I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about mlx5e_setup_tc. Is there a way to limit that?

see: https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

If that doesn't sort you out, we can probably do something about this. In the worst case we'd need to make this a bigger pass, and also emit diagnostics for stack size, etc instead of in prologue/epilogue inserter. That was mentioned earlier, but I think that should be a separate change.

Filtering through the logs though, I do see:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336

which is good! If I flip on debug info and rebuild, this looks like:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
    old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934
    old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
    new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424

Which is pretty awesome!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of struct mlx5e_params rather than stack allocating them!

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for struct mlx5e_channels. Cool stuff! Now we can finally debug -Wframe-larger-than=!!!

Glad this helps! If you think of more issues, let me know.

paulkirth added inline comments.Jan 12 2023, 3:14 PM
clang/test/Frontend/stack-layout-remark.c
9

Will do on the ReleaseNotes, but I'm a bit unsure what you want in the summary. It mentions the motivation and describe what this pass is for/does using remarks. Is there something else it should say?

9

Perhaps in the documentation for -Wframe-larger-than=? i.e. adding a code Documentation = [{}] block to BackendFrameLargerThan record in clang/include/clang/Basic/DiagnosticGroups.td or something.

Hmm, I'll take a look there, but I'm not 100% sure I follow what you mean.

are you after somethng like this when frame-larger-than diagnostics happen:

you can debug this by adding -Rpass-analysis=stack-frame-layout -mllvm -pass-remarks-filter=<functionname>

or something like that?

It would be really nice if we could limit this to a specific function somehow.

I think you can do that, right ?
see:
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

This filters on the Name of the remark, so here it would be stack-frame-layout.

I think a -Rpass-func-filter=<regex> would be a great addition.

paulkirth added inline comments.Jan 12 2023, 3:37 PM
llvm/test/CodeGen/AArch64/O0-pipeline.ll
76–77

This seems to be the norm w/ other Analysis passes, or anything that uses remarks, really.

I'm not really sure what we can do about that other than to move this into an existing pass that already uses remarks. I didn't see any good candidates at the point in the pipeline that we'd like to run this though.

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

not sure I follow. The block here is executing the pass then freeing the pass. I tried to follow the pattern used around this, but we could change it to

HOTNESS:      Executing Pass 'Stack Frame Layout Analysis'
HOTNESS:      Freeing Pass 'Stack Frame Layout Analysis'

and skip the rest

It would be really nice if we could limit this to a specific function somehow.

I think you can do that, right ?
see:
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

This filters on the Name of the remark, so here it would be stack-frame-layout.

I think a -Rpass-func-filter=<regex> would be a great addition.

That's right. I misremembered. I think when this was a normal printing pass you could filter w/ -filter-print-funcs=foo but I don't think that works now

Actually if we add

if (!isFunctionInPrintList(MF.getName()))
     return false;

we can filter by name

nickdesaulniers accepted this revision.Jan 12 2023, 4:21 PM

Actually if we add

if (!isFunctionInPrintList(MF.getName()))
     return false;

we can filter by name

Does name mangling complicate that? Perhaps a C++ user would give an unmangled name, but MF would be looking at mangled names?

Anyways, it's not a pressing issue. I won't block this patch on that. I just redirect all the output to a file then scan that.

Same thing about adding passes to -O0. Someone might care about that, but I don't.

Nice work @paulkirth . I'm excited to use this to help us better understand and reduce our stack usage in the Linux kernel!

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

Oops, I missed the first instance is Executing then the second is Freeing. NVM!

Sorry, mind adding the documentation, too?

clang/test/Frontend/stack-layout-remark.c
9

Hmm, I'll take a look there, but I'm not 100% sure I follow what you mean.

are you after somethng like this when frame-larger-than diagnostics happen:

you can debug this by adding -Rpass-analysis=stack-frame-layout -mllvm -pass-remarks-filter=<functionname>

or something like that?

Basically, my concern is "how will other developers not cc'ed on this phab review ever find this nifty new flag?" If -Wframe-larger-than= doesn't print info about it, then we should at least have it in our docs.

The last sentence you suggested is exactly what I had in mind.

Actually if we add

if (!isFunctionInPrintList(MF.getName()))
     return false;

we can filter by name

I would rather have a more generic mechanism for remarks or diagnostics in general. Even if it uses isFunctionInPrintList, I'd rather have a real flag that doesn't require -mllvm.

Does name mangling complicate that? Perhaps a C++ user would give an unmangled name, but MF would be looking at mangled names?

Agreed. A regex would help a little there.

Anyways, it's not a pressing issue. I won't block this patch on that. I just redirect all the output to a file then scan that.

You can easily process the yaml output from -fsave-optimization-record with the optrecord module:

import optrecord
import re

all_remarks, file_remarks, _ = optrecord.gather_results(optrecord.find_opt_files(<INPUT_FILE>), 1, False)
for r in optrecord.itervalues(all_remarks):
  if re.match(<REGEX>, r.Function):
    print(r)

You can easily...

I'll just note that v1 of this patch IIRC was a note on the single individual instance of the -Wframe-larger-than= diagnostic. No additional flags for optimization remarks, no (large) dump of stack frame info for non-interesting functions, no python module writing necessary.

I recognize that this approach is a compromise, and it's not worth going back and forth.

But the point of this is to simplify the developer experience when -Wframe-larger-than= is encountered. Let's not lose sight of that.

@paulkirth please don't forget to update the commit description with that flag so that I don't have to read the contents of the commit to find this flag again. If you're using arc diff to upload the patch, the --verbatim flag will update the phab description, IIRC.

paulkirth updated this revision to Diff 488815.Jan 12 2023, 5:12 PM

Address comments

  • Take a stab a ReleaseNotes
  • Update -Wframe-larger-than documentation to reference this pass
  • Enable filtering output from this pass by function name
paulkirth updated this revision to Diff 488834.Jan 12 2023, 6:02 PM

Fix documentation string, since it was invalid formatting for the rst file.

Also since code blocks don't render correctly in the html, write the documentation so that its usable without the code examples

paulkirth updated this revision to Diff 488842.Jan 12 2023, 6:28 PM
paulkirth marked 5 inline comments as done.
paulkirth retitled this revision from [codegen] Add a remarks based Stack Layout Analysis pass to [codegen] Add StackFrameLayoutAnalysisPass.
paulkirth edited the summary of this revision. (Show Details)

Update summary

I would rather have a more generic mechanism for remarks or diagnostics in general. Even if it uses isFunctionInPrintList, I'd rather have a real flag that doesn't require -mllvm.

Agreed. I'm going to opt into the isFunctionInPrintList for now. If you feel strongly, I can remove it, but it seems like a useful compromise.

Does name mangling complicate that? Perhaps a C++ user would give an unmangled name, but MF would be looking at mangled names?

It will absolutely expect the mangled name. We may want to look at the other filtering facilities we have available too. Like we have those sanitizer files, and some other matching that can use regex. XRay had some logic for that stuff too, in addition to the things already in sanitizer common, IIRC.

clang/test/Frontend/stack-layout-remark.c
9

The last sentence you suggested is exactly what I had in mind.

Maybe we should follow this patch up w/ a change to the frame larger than diagnostic?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
221–223

BTW, didn't clang format used to fix these? I definitely remember it doing that. I don't have any special config I use, so it's just picking up project defaults. AFAIK it's still an option, but doesn't seem to be set for the LLVM config anymore. Did we change the behavior here for a reason?

I get caught by these a lot when a loop gets simplified.

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

It's 100% non-obvious. It took me a bit to figure out that was how these worked when I added these.

paulkirth marked an inline comment as done.Jan 12 2023, 6:35 PM
paulkirth updated this revision to Diff 489036.Jan 13 2023, 9:13 AM

Fix test for Windows. I had missed one of the file paths in YAML.

@nickdesaulniers @thegameg Are we happy w/ the ReleaseNotes and the documentation changes? If so, I'll land this, but I wasn't sure either of you saw those changes ... or if using isFunctionInPrintList for now is a good choice until we can implement filtering for remarks.

nickdesaulniers accepted this revision.Jan 13 2023, 12:29 PM
thegameg accepted this revision.Jan 13 2023, 12:52 PM
This revision was landed with ongoing or failed builds.Jan 13 2023, 12:52 PM
This revision was automatically updated to reflect the committed changes.
paulkirth reopened this revision.Jan 13 2023, 3:01 PM
This revision is now accepted and ready to land.Jan 13 2023, 3:01 PM

This failed on some environments. See https://lab.llvm.org/buildbot/#/builders/109/builds/55534

clang/test/Frontend/stack-layout-remark.c
25

Line: might be fed to the next line. I suggest;

// YAML: DebugLoc:        { File: '{{.*}}stack-layout-remark.c',{{[:space:]}}Line: [[# @LINE + 24]],
225

ditto.

chapuni added inline comments.Jan 13 2023, 3:09 PM
clang/test/Frontend/stack-layout-remark.c
25

typo. {{[:space:]}}*

paulkirth updated this revision to Diff 489147.Jan 13 2023, 3:56 PM

Use more flexible whitespace matching

paulkirth marked 3 inline comments as done.Jan 13 2023, 3:58 PM

@chapuni Thanks for the suggestion. It didn't occur to me that it could break the lines like that.

Thanks. I have noticed I suggested re-typo. (I didn't amend it since I thought it was obvious)

This revision was landed with ongoing or failed builds.Jan 18 2023, 5:51 PM
This revision was automatically updated to reflect the committed changes.