This is an archive of the discontinued LLVM Phabricator instance.

Do not spill CSR to stack on entry to noreturn functions
ClosedPublic

Authored by myeisha on Feb 4 2018, 5:39 PM.

Details

Summary

A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space.

Initially reported as PR#9970.

Diff Detail

Event Timeline

myeisha created this revision.Feb 4 2018, 5:39 PM

Thanks for working on this!

lib/CodeGen/TargetFrameLoweringImpl.cpp
96

You probably have to check for nounwind as well (check out @MatzeB's comment here: https://reviews.llvm.org/D42509#988553).

myeisha updated this revision to Diff 132806.Feb 5 2018, 3:53 AM
myeisha edited the summary of this revision. (Show Details)

Restricated CSR save skip to noreturn nounwind functions from previously noreturn. Exception handlers might need CSRs restored.

myeisha marked an inline comment as done.Feb 5 2018, 8:10 AM

One last comment and it should be good. Thanks!

Also, next time don't forget to add reviewers to your patch (I don't know if it was intended or not).

test/CodeGen/ARM/noreturn-csr-skip.ll
18 ↗(On Diff #132806)

You could write MIR tests instead of using inline asm. This would allow you to test one specific pass at a time, here the PrologueEpilogueInserter.

Here's how it would look like for the function ret:

# RUN: llc -mtriple thumbv7m-none-eabi -run-pass prologepilog %s -o - | FileCheck %s

--- |
  define void @ret() nounwind { ret void }
...
---
# This function may return. Check that $r4 is saved and restored.
# CHECK-LABEL: name: ret
# CHECK: killed $r4
# CHECK: def $r4
name: ret
body: |
  bb.0:
    $r4 = IMPLICIT_DEF
    tBX_RET 14, $noreg
---

For more info on MIR: http://llvm.org/docs/MIRLangRef.html.

One last comment and it should be good. Thanks!

Nice :)

Also, next time don't forget to add reviewers to your patch (I don't know if it was intended or not).

That was actually intentional. I had not found the code owner for that part of codegen and the blame list seemed unhelpful (looked like mostly NFC or code motion).

It looks like this transform destroys stack traces on non-x86 platforms. That technically isn't wrong, I guess, but it seems unfriendly (for example, LLVM produces a stack trace on an assertion failure).

It looks like this transform destroys stack traces on non-x86 platforms. That technically isn't wrong, I guess, but it seems unfriendly (for example, LLVM produces a stack trace on an assertion failure).

I don't quite follow. If I force the frame pointer on (eg by storing into a frame pointer returned by llvm.frameaddress) I see that the old frame pointer is stacked even for noreturn nounwind functions. I'd hate to break stack traces, am I missing something?

myeisha updated this revision to Diff 132899.Feb 5 2018, 3:50 PM

It looks like this transform destroys stack traces on non-x86 platforms. That technically isn't wrong, I guess, but it seems unfriendly (for example, LLVM produces a stack trace on an assertion failure).

I might be wrong but I think at least the ARM and AArch64 backend would still save FP+LR. After calling TargetFrameLowering::determineCalleeSaves, FP and LR are usually added to the list of registers to be saved by the target in the target-specific determineCalleeSaves. From what I can see after a quick glance all the backends (except ARC and Sparc? not sure) special-case FP and LR (or whatever equivalent).

I do agree that if this breaks stack traces it should be under some flag or something.

I do agree that if this breaks stack traces it should be under some flag or something.

We could also tie it into frame pointer elimination and skip CSR saves only when fp elimination is active, maybe add a per-target option to skip anyway if the target has special handling for fp/lr.

This isn't correct in general, only in cases when the non-returning functions terminate the program.

Can you elaborate on that? I'm primarily interested in programs that do not terminate at all, with nested nonreturning functions. How would not saving CSRs in the outer nonreturning function prolog affect the inner nonreturning function? The only affected thing that comes to my mind right now is a debugger.

I was thinking of the case when the non-returning function simply does not return to the caller, but terminates its own execution by going elsewhere (something like continuation-passing). If the function has an infinite loop or something like that, and is not intended to ever leave its own context, then not saving CSR shouldn't make a difference.

If I understand the IR correctly then this would either involve some nasty inline assembly or some form of tail call. If the target were not itself nonreturning, then the caller could also not be, so the transform would not apply.

I guess this is just way over my head.

It could be a call to some library function (from a user-provided library). I agree that this is a very unlikely scenario, but we've had a customer report a bug related to a similar situation.

If you want to proceed with this, could you make it opt-in for targets that want it?

I would rather make it generic, not break debug, and habe the user decide whether or not to apply this. If libraries are the concern it probably can't be the target's decision either. Would enabling the transform only with fp-elim and a cl option be just as good?

You can make it generic, but turned off by default, or dependent on a target hook.

myeisha updated this revision to Diff 133065.Feb 6 2018, 1:09 PM

Added a target hook to enable CSR spill skipping, default off and enabled currently only for ARM because that's what I know best.

It could be a call to some library function (from a user-provided library). I agree that this is a very unlikely scenario, but we've had a customer report a bug related to a similar situation.

Just out of curiosity, can you please share some more information about that?

From what I understand,

A (nounwind) calls B (nounwind, noreturn)
B (nounwind, noreturn) calls library function C (nounwind?)
C (nounwind?) returns to A (nounwind) ?

In that case, B not saving the CSRs will destroy A's CSRs, but I can't really understand how we can end up in this case.

My assumptions are:

  • can't (tail-)call throwing functions from nounwind functions
  • can't (tail-)call returning functions from noreturn functions

but I might be wrong on that.

junbuml added a subscriber: junbuml.Feb 7 2018, 6:26 AM

By library function I simply meant a function that is provided by the user, which the compiler doesn't see the definition of. The problem I remember seeing was that he didn't restore a callee-saved register, but I don't remember exactly where this was in relation to a noreturn function. The code that we were compiling was written in C, not C++. I suspect there was something similar to longjmp (but probably not a longjmp itself) somewhere in the user code that aborted some processing routines and "returned" to a top-level function. It's just a guess though (I couldn't find that bug with a quick search).

My main concern here is that this should not be a default behavior for all targets. As long as we have an option to opt out of this, I don't have objections, especially because a problematic case is not very common. The added callback should take care of it.

thegameg resigned from this revision.Feb 8 2018, 10:18 AM

I might be missing a few details of what this change might affect.

Unfortunately there's still a hole in the safety analysis. Specifically, "nounwind" does not mean what you want it to mean. Consider the following testcase:

#include <setjmp.h>
void foo(void* x, int y) { longjmp(x,y); }

This compiles to a "noreturn nounwind" function, but we can't throw away callee-save register values. Granted, we probably *should* have some attribute which indicates we won't longjmp() out of a function, but that doesn't exist at the moment.

I did think about that, but came to the conclusion that longjmp would have to restore all callee-saves to the state they had at setjmp, and that setjmp would have to store them into the jmp_buf. If longjmp uses CSR spills the transform is indeed unsound, but I'm not sure how it would access those without being an llvm builtin. I've checked with what musl does on ARM (because that's what I'm familiar with) to confirm that idea, so at least on ARM the transform should be safe?

I did think about that, but came to the conclusion that longjmp would have to restore all callee-saves to the state they had at setjmp, and that setjmp would have to store them into the jmp_buf.

Huh. I guess that works? But you probably want to explicitly note that in a comment in the code.

myeisha updated this revision to Diff 133479.Feb 8 2018, 1:26 PM

Added a comment about setjmp/longjmp and CSR spills.

Gentle ping.

lebedev.ri set the repository for this revision to rL LLVM.
lebedev.ri added a subscriber: lebedev.ri.

Hopefully added some more reviewers (as per git shortlog -sne), that might be interested/qualified in this.

Another week, another ping.

Other than an inline comment (of no consequence) this looks like it's probably ok here. The naming of the function isn't great (welcome to fabulous bikesheds in programming), but I don't have a better idea off the top of my head so let's go with that.

Since Eli had already started a review let's give him a bit of time (Or Matthias) after this email to take a last look, otherwise I'll approve next week.

Thanks!

-eric

lib/Target/ARM/ARMFrameLowering.cpp
95

I'd probably like to see an assert here (and in the general case) that we've got a noreturn and nounwind function.

myeisha updated this revision to Diff 139380.Mar 21 2018, 3:12 PM

added assertions for noreturn nounwind to the generic and ARM csr spill skip predicates.

myeisha marked an inline comment as done.Mar 21 2018, 3:13 PM
This revision is now accepted and ready to land.Mar 23 2018, 2:43 PM

awesome, thank you :) i do not have commit access, so i won't be able to merge it myself.

Ping--do I still have to do something to get this committed?

t.p.northover closed this revision.Apr 5 2018, 7:29 AM

Sorry about the delay, I've committed this for you as r329287.

vitalybuka reopened this revision.Apr 6 2018, 10:41 PM
vitalybuka added a subscriber: vitalybuka.

I had to revert it in r329486
It breaks UBSAN test http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/9418/steps/run%20lit%20tests%20%5Barm%2Faosp_marlin-userdebug%2FOPR4.170623.016%5D/logs/stdio
Looks like UBSAN crashes unrolling the stack for such functions.

This revision is now accepted and ready to land.Apr 6 2018, 10:41 PM

I managed to track down the issue, and I think the patch needs to respect the "uwtable" attribute too (otherwise the tables produced are useless). So I've recommitted with that change applied; hopefully it'll stick this time.

t.p.northover closed this revision.Apr 7 2018, 4:01 AM