Page MenuHomePhabricator

[codegen] Add StackFrameLayoutAnalysisPass
ClosedPublic

Authored by paulkirth on Oct 7 2022, 2:26 PM.

Details

Summary

Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.

This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.

The new behavior can be used by adding -Rpass-analysis=stack-frame-layout
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.

Fixes: #58168

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
paulkirth updated this revision to Diff 488427.Wed, Jan 11, 4:56 PM

Update clang test for windows file separators.

Add target triple to all RUN lines.

Seems like the layout is the same on 64-bit windows, but for some reason
clang.exe chooses i386 unless the triple is set. So just set the triple
uniformly, and avoid any potential problems.

Also add a REQUIRES line to the test, since these need an x86_64 target

Make YAML tests less brittle.

It would be really nice if we could limit this to a specific function somehow.

Quick feedback from giving Diff 488727 a spin on the Linux kernel:

via ARCH=arm make LLVM=1 -j128 -s allyesconfig all I found:

  CC      drivers/net/ethernet/mellanox/mlx5/core/en_main.o
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than]
static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
           ^

When I rebuild with:

ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout

I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about mlx5e_setup_tc. Is there a way to limit that?

Filtering through the logs though, I do see:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336

which is good! If I flip on debug info and rebuild, this looks like:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
    old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934
    old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
    new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424

Which is pretty awesome!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of struct mlx5e_params rather than stack allocating them!

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for struct mlx5e_channels. Cool stuff! Now we can finally debug -Wframe-larger-than=!!!

clang/test/Frontend/stack-layout-remark.c
9

Please update:

  1. the patch description/commit message
  2. clang/docs/ReleaseNotes.rst

    to mention this new flag. I kind of wish that -Wstack-frame-larger-than= alluded to this somehow.
llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
66

@paulkirth make sure to mark these code review comments as done when implemented, please!

192

do we need EndIdx, or can we simply use Idx != ObjEnd as the loop terminating condition?

Doesn't look like we use ObjBeg afterwards either; is that necessary? Seems like the call to MFI.getObjectIndexBegin(); could happen in the initialization list of this for loop?

201–203

Consider implementing operator< on SlotData; then I think you can simply call std::sort on SlotInfo and drop this lamda.

207–209
217

If you sink this def into the for loop, then you don't need to clear it.

221–223
228–229

I guess this can be removed since the for loop below wont do anything if there's 0 memoperands?

llvm/test/CodeGen/AArch64/O0-pipeline.ll
76–77

Dang, this adds a bunch of passes to O0 pipelines...any creative ideas on how to not do that?

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

what's going on in this test? Looks like the pass is being run twice or something?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
45

Consider replacing uses of PassName with DEBUG_TYPE since they have the same value.

99–100

consider sinking this closer to use. If you only call one method on it, and it could fit in one line, consider not even creating a variable. i.e.
getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE().emit(ReM)

107

can you sink the call to genSlotDbgMapping() into this arg list? SlotMap seems unreferenced otherwise.

clang/test/Frontend/stack-layout-remark.c
9

Perhaps in the documentation for -Wframe-larger-than=? i.e. adding a code Documentation = [{}] block to BackendFrameLargerThan record in clang/include/clang/Basic/DiagnosticGroups.td or something.

paulkirth updated this revision to Diff 488784.Thu, Jan 12, 3:00 PM
paulkirth marked 21 inline comments as done.

Address comments

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
45

ugh, I forgot to update that after adding DEBUG_TYPE. Thanks for pointing that out.

107

surprisingly this resulted in a compiler error:

StackFrameLayoutAnalysisPass.cpp:104:37: error: non-const lvalue reference to type 'SmallDenseMap<...>' cannot bind to a temporary of type 'SmallDenseMap<...>'
    emitStackFrameLayoutRemarks(MF, genSlotDbgMapping(MF), Rem);

So I've just moved it into emitStackFrameLayoutRemarks() and dropped the parameter, which avoids the issue entirely. Since SlotMap is only used there now, it makes more sense to structure the code like this anyway.

217

for some reason I was under the impression that our style guidelines prefered using clear in situations like this, but I can't find it so I think my brain tricked me. Thanks for pointing that out.

It would be really nice if we could limit this to a specific function somehow.

I think you can do that, right ?
see:
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

Quick feedback from giving Diff 488727 a spin on the Linux kernel:

via ARCH=arm make LLVM=1 -j128 -s allyesconfig all I found:

  CC      drivers/net/ethernet/mellanox/mlx5/core/en_main.o
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3597:12: error: stack frame size (1256) exceeds limit (1024) in 'mlx5e_setup_tc' [-Werror,-Wframe-larger-than]
static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
           ^

When I rebuild with:

ARCH=arm make LLVM=1 -j128 drivers/net/ethernet/mellanox/mlx5/core/en_main.o KCFLAGS=-Rpass-analysis=stack-frame-layout

I get a printout of the stack usage of every function in this TU. That's 1929 lines of text output...I only care about mlx5e_setup_tc. Is there a way to limit that?

see: https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

If that doesn't sort you out, we can probably do something about this. In the worst case we'd need to make this a bigger pass, and also emit diagnostics for stack size, etc instead of in prologue/epilogue inserter. That was mentioned earlier, but I think that should be a separate change.

Filtering through the logs though, I do see:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336

which is good! If I flip on debug info and rebuild, this looks like:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3599:1: remark: 
Function: mlx5e_setup_tc
...
Offset: [SP-400], Type: Variable, Align: 8, Size: 352
    old_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2934
    old_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2958
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3539
Offset: [SP-752], Type: Variable, Align: 8, Size: 352
    new_chs @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3000
Offset: [SP-1088], Type: Variable, Align: 8, Size: 336
    new_params @ drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3424

Which is pretty awesome!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3424 for the last one, which shows we should probably be heap allocating instances of struct mlx5e_params rather than stack allocating them!

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/en_main.c#n3000 shows the same for struct mlx5e_channels. Cool stuff! Now we can finally debug -Wframe-larger-than=!!!

Glad this helps! If you think of more issues, let me know.

paulkirth added inline comments.Thu, Jan 12, 3:14 PM
clang/test/Frontend/stack-layout-remark.c
9

Will do on the ReleaseNotes, but I'm a bit unsure what you want in the summary. It mentions the motivation and describe what this pass is for/does using remarks. Is there something else it should say?

9

Perhaps in the documentation for -Wframe-larger-than=? i.e. adding a code Documentation = [{}] block to BackendFrameLargerThan record in clang/include/clang/Basic/DiagnosticGroups.td or something.

Hmm, I'll take a look there, but I'm not 100% sure I follow what you mean.

are you after somethng like this when frame-larger-than diagnostics happen:

you can debug this by adding -Rpass-analysis=stack-frame-layout -mllvm -pass-remarks-filter=<functionname>

or something like that?

It would be really nice if we could limit this to a specific function somehow.

I think you can do that, right ?
see:
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

This filters on the Name of the remark, so here it would be stack-frame-layout.

I think a -Rpass-func-filter=<regex> would be a great addition.

paulkirth added inline comments.Thu, Jan 12, 3:37 PM
llvm/test/CodeGen/AArch64/O0-pipeline.ll
76–77

This seems to be the norm w/ other Analysis passes, or anything that uses remarks, really.

I'm not really sure what we can do about that other than to move this into an existing pass that already uses remarks. I didn't see any good candidates at the point in the pipeline that we'd like to run this though.

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

not sure I follow. The block here is executing the pass then freeing the pass. I tried to follow the pattern used around this, but we could change it to

HOTNESS:      Executing Pass 'Stack Frame Layout Analysis'
HOTNESS:      Freeing Pass 'Stack Frame Layout Analysis'

and skip the rest

It would be really nice if we could limit this to a specific function somehow.

I think you can do that, right ?
see:
https://llvm.org/docs/Remarks.html#cmdoption-pass-remarks-filter

This filters on the Name of the remark, so here it would be stack-frame-layout.

I think a -Rpass-func-filter=<regex> would be a great addition.

That's right. I misremembered. I think when this was a normal printing pass you could filter w/ -filter-print-funcs=foo but I don't think that works now

Actually if we add

if (!isFunctionInPrintList(MF.getName()))
     return false;

we can filter by name

nickdesaulniers accepted this revision.Thu, Jan 12, 4:21 PM

Actually if we add

if (!isFunctionInPrintList(MF.getName()))
     return false;

we can filter by name

Does name mangling complicate that? Perhaps a C++ user would give an unmangled name, but MF would be looking at mangled names?

Anyways, it's not a pressing issue. I won't block this patch on that. I just redirect all the output to a file then scan that.

Same thing about adding passes to -O0. Someone might care about that, but I don't.

Nice work @paulkirth . I'm excited to use this to help us better understand and reduce our stack usage in the Linux kernel!

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

Oops, I missed the first instance is Executing then the second is Freeing. NVM!

Sorry, mind adding the documentation, too?

clang/test/Frontend/stack-layout-remark.c
9

Hmm, I'll take a look there, but I'm not 100% sure I follow what you mean.

are you after somethng like this when frame-larger-than diagnostics happen:

you can debug this by adding -Rpass-analysis=stack-frame-layout -mllvm -pass-remarks-filter=<functionname>

or something like that?

Basically, my concern is "how will other developers not cc'ed on this phab review ever find this nifty new flag?" If -Wframe-larger-than= doesn't print info about it, then we should at least have it in our docs.

The last sentence you suggested is exactly what I had in mind.

Actually if we add

if (!isFunctionInPrintList(MF.getName()))
     return false;

we can filter by name

I would rather have a more generic mechanism for remarks or diagnostics in general. Even if it uses isFunctionInPrintList, I'd rather have a real flag that doesn't require -mllvm.

Does name mangling complicate that? Perhaps a C++ user would give an unmangled name, but MF would be looking at mangled names?

Agreed. A regex would help a little there.

Anyways, it's not a pressing issue. I won't block this patch on that. I just redirect all the output to a file then scan that.

You can easily process the yaml output from -fsave-optimization-record with the optrecord module:

import optrecord
import re

all_remarks, file_remarks, _ = optrecord.gather_results(optrecord.find_opt_files(<INPUT_FILE>), 1, False)
for r in optrecord.itervalues(all_remarks):
  if re.match(<REGEX>, r.Function):
    print(r)

You can easily...

I'll just note that v1 of this patch IIRC was a note on the single individual instance of the -Wframe-larger-than= diagnostic. No additional flags for optimization remarks, no (large) dump of stack frame info for non-interesting functions, no python module writing necessary.

I recognize that this approach is a compromise, and it's not worth going back and forth.

But the point of this is to simplify the developer experience when -Wframe-larger-than= is encountered. Let's not lose sight of that.

@paulkirth please don't forget to update the commit description with that flag so that I don't have to read the contents of the commit to find this flag again. If you're using arc diff to upload the patch, the --verbatim flag will update the phab description, IIRC.

paulkirth updated this revision to Diff 488815.Thu, Jan 12, 5:12 PM

Address comments

  • Take a stab a ReleaseNotes
  • Update -Wframe-larger-than documentation to reference this pass
  • Enable filtering output from this pass by function name
paulkirth updated this revision to Diff 488834.Thu, Jan 12, 6:02 PM

Fix documentation string, since it was invalid formatting for the rst file.

Also since code blocks don't render correctly in the html, write the documentation so that its usable without the code examples

paulkirth updated this revision to Diff 488842.Thu, Jan 12, 6:28 PM
paulkirth marked 5 inline comments as done.
paulkirth retitled this revision from [codegen] Add a remarks based Stack Layout Analysis pass to [codegen] Add StackFrameLayoutAnalysisPass.
paulkirth edited the summary of this revision. (Show Details)

Update summary

I would rather have a more generic mechanism for remarks or diagnostics in general. Even if it uses isFunctionInPrintList, I'd rather have a real flag that doesn't require -mllvm.

Agreed. I'm going to opt into the isFunctionInPrintList for now. If you feel strongly, I can remove it, but it seems like a useful compromise.

Does name mangling complicate that? Perhaps a C++ user would give an unmangled name, but MF would be looking at mangled names?

It will absolutely expect the mangled name. We may want to look at the other filtering facilities we have available too. Like we have those sanitizer files, and some other matching that can use regex. XRay had some logic for that stuff too, in addition to the things already in sanitizer common, IIRC.

clang/test/Frontend/stack-layout-remark.c
9

The last sentence you suggested is exactly what I had in mind.

Maybe we should follow this patch up w/ a change to the frame larger than diagnostic?

llvm/lib/CodeGen/StackFrameLayoutAnalysisPass.cpp
221–223

BTW, didn't clang format used to fix these? I definitely remember it doing that. I don't have any special config I use, so it's just picking up project defaults. AFAIK it's still an option, but doesn't seem to be set for the LLVM config anymore. Did we change the behavior here for a reason?

I get caught by these a lot when a loop gets simplified.

llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
43–46

It's 100% non-obvious. It took me a bit to figure out that was how these worked when I added these.

paulkirth marked an inline comment as done.Thu, Jan 12, 6:35 PM
paulkirth updated this revision to Diff 489036.Fri, Jan 13, 9:13 AM

Fix test for Windows. I had missed one of the file paths in YAML.

@nickdesaulniers @thegameg Are we happy w/ the ReleaseNotes and the documentation changes? If so, I'll land this, but I wasn't sure either of you saw those changes ... or if using isFunctionInPrintList for now is a good choice until we can implement filtering for remarks.

nickdesaulniers accepted this revision.Fri, Jan 13, 12:29 PM
thegameg accepted this revision.Fri, Jan 13, 12:52 PM
This revision was landed with ongoing or failed builds.Fri, Jan 13, 12:52 PM
This revision was automatically updated to reflect the committed changes.
paulkirth reopened this revision.Fri, Jan 13, 3:01 PM
This revision is now accepted and ready to land.Fri, Jan 13, 3:01 PM

This failed on some environments. See https://lab.llvm.org/buildbot/#/builders/109/builds/55534

clang/test/Frontend/stack-layout-remark.c
25

Line: might be fed to the next line. I suggest;

// YAML: DebugLoc:        { File: '{{.*}}stack-layout-remark.c',{{[:space:]}}Line: [[# @LINE + 24]],
225

ditto.

chapuni added inline comments.Fri, Jan 13, 3:09 PM
clang/test/Frontend/stack-layout-remark.c
25

typo. {{[:space:]}}*

paulkirth updated this revision to Diff 489147.Fri, Jan 13, 3:56 PM

Use more flexible whitespace matching

paulkirth marked 3 inline comments as done.Fri, Jan 13, 3:58 PM

@chapuni Thanks for the suggestion. It didn't occur to me that it could break the lines like that.

Thanks. I have noticed I suggested re-typo. (I didn't amend it since I thought it was obvious)

This revision was landed with ongoing or failed builds.Wed, Jan 18, 5:51 PM
This revision was automatically updated to reflect the committed changes.