[X86] Fix more -Os + EH issues
ClosedPublic

Authored by mkuper on Oct 15 2015, 2:03 AM.

Details

Summary

This does two things:

  1. Disables push generation on Darwin whenever EH is used and FP is not present.
  2. Generates .cfa_adjust_cfa_offset directives to make the CFA offset correct at each call site.

Generating more precise offsets adjustments for debug info will be a separate patch.

Diff Detail

Repository
rL LLVM
mkuper retitled this revision from to [X86] Fix more -Os + EH issues.Oct 15 2015, 2:03 AM
mkuper updated this object.
mkuper added reviewers: friss, DavidKreitzer, rnk, joerg, kbsmith1.
mkuper added a subscriber: llvm-commits.

Please see inline comments.

lib/Target/X86/X86FrameLowering.cpp
2108 ↗(On Diff #37460)

I think this code needs to go after line 2128 in order to create the CFI adjust after the stack adjustment instruction has happened rather than before.

test/CodeGen/X86/push-cfi-obj.ll
27 ↗(On Diff #37460)

I don't understand what the specific differences in these values in Section Data coma about from. Do you know?

test/CodeGen/X86/push-cfi.ll
14 ↗(On Diff #37460)

I don't think this is really the correct spot for the cfi_adjust. I think it should occur after all the pushes have happenned, so right before the call instruction. Once you get to having a different cfi_adjust per push, each of these should follow immediately after the instruction, not before. Now I understand that you are not yet getting to the point of emitting a cfi_adjust per instruction, but it seems like even if you are accumulating them so that they are only correct at the call, then they should be after all the pushes.

Thanks, Kevin!

lib/Target/X86/X86FrameLowering.cpp
2108 ↗(On Diff #37460)

Since to handle EH we only care about it being correct at the call site, I don't think it matters whether it's before or after the adjustment.

I could do it at line 2128, but that would be slightly less cumbersome, since then I'd need to keep track of both the original Amount and Amount - InternalAmt.

test/CodeGen/X86/push-cfi-obj.ll
27 ↗(On Diff #37460)

More or less. :-)

I mean, the part that *didn't* change encodes the GNU_ARGS_SIZE, and this is what this test is supposed to check (it was introduced by a previous commit that made some changes to the code that outputs the dwarf binary encoding).
The newly added part encodes the def_cfa_offset. I didn't actually verify it's correct because I didn't change the encoding code for that. I'll parse it manually and see it really is correct.

test/CodeGen/X86/push-cfi.ll
14 ↗(On Diff #37460)

I agree with you that it's not the correct spot, in theory - but in terms of being correct at the call site, it doesn't matter if it's before or after the pushes.

The problem is that the way things work right now, eliminateCallFramePseudoInstr() doesn't actually get a pointer to the call, it gets the ADJDOWN/ADJUP pseudos. In theory, there doesn't even *have* to be a call between them.
So, the easiest way is to just put the CFA adjust immediately after those pseudos. I could have the eliminate() code actively look for a call, but I'm not sure that's the best idea. What I would prefer for the debug-info version would be to put the CFA adjust for the sub where the current adjust is (well, after the sub - in line 2128), and the adjust for each push after the push. So that case wouldn't have to search for a call either.

rnk added inline comments.Oct 16 2015, 10:45 AM
lib/CodeGen/AsmPrinter/AsmPrinterDwarf.cpp
231–233 ↗(On Diff #37460)

Hah, I guess we have MC support for .cfi_adjust_offset, but never used if from the compiler. :)

lib/Target/X86/X86CallFrameOptimization.cpp
140 ↗(On Diff #37460)

Isn't STI.getFrameLowering() just TFL?

lib/Target/X86/X86FrameLowering.cpp
2097 ↗(On Diff #37460)

Can we just assert up front that we don't need to produce UNWIND_INFO .xdata for Windows? This call frame optimization stuff only fires for 32-bit x86, which doesn't use UNWIND_INFO.

2110–2111 ↗(On Diff #37460)

I think we should address this now. We know exactly where we're inserting the pushes in X86CallFrameOptimization::adjustCallSequence, so all we have to do is:

  • Insert .cfi_adjust_cfa_offset 4 after each push
  • Insert .cfi_adjust_cfa_offset InternalAmt for frame destruction here
  • Insert .cfi_adjust_cfa_offset Amount after BuildStackAdjustment below

Thanks, Reid!

lib/CodeGen/AsmPrinter/AsmPrinterDwarf.cpp
231–233 ↗(On Diff #37460)

Yup.
It's used by the assembler, though. :-)

lib/Target/X86/X86CallFrameOptimization.cpp
140 ↗(On Diff #37460)

Right, thanks!

lib/Target/X86/X86FrameLowering.cpp
2097 ↗(On Diff #37460)

Sure. Should I just assert on usesWindowsCFI()?

2110–2111 ↗(On Diff #37460)

The question is whether we want to always have the precise adjusts or have two separate code paths for the precise version and the call-site-only version. This depends on whether we care about the eh_frame size for -Os, or not.

If we want it to always be precise, then what you're suggesting is the right way to go. If we don't, then there should be two separate code paths, the one here, and the one you suggest, and I wanted them to be two separate patches.

mkuper added inline comments.Oct 19 2015, 1:50 AM
lib/Target/X86/X86FrameLowering.cpp
2097 ↗(On Diff #37460)

On second thought, no, we can't. It's possible not to have a reserved call frame even without pushes (currently happens e.g. if there are variable-length arrays on the stack). In this case we'll also have an FP, so outgoing arguments will use ebp-based movs.

mkuper updated this revision to Diff 37742.Oct 19 2015, 5:36 AM

The patch now has both the precise and the precise-at-call-site code-paths.

mkuper added inline comments.Oct 19 2015, 5:49 AM
lib/Target/X86/X86CallFrameOptimization.cpp
503 ↗(On Diff #37742)

Argh, this should have been a cast<>, not a static_cast<>.
Also, I can avoid this completely and copy the code from the BuildCFI() helper, but I think it's better this way.

friss added inline comments.Oct 19 2015, 12:52 PM
test/CodeGen/X86/push-cfi-obj.ll
27 ↗(On Diff #37742)

I'm sorry I didn't look at the patch in more details, but does the above mean that you emit GNU_args_size and def_fa_offset in the same function? If that's the case, it seems wrong: if the offset is reflected by def_cfa_offsets, you don't need the GNU_args_size anymore.

rnk added inline comments.Oct 19 2015, 3:56 PM
lib/Target/X86/X86CallFrameOptimization.cpp
503 ↗(On Diff #37742)

You could also change TFL to hold an X86FrameLowering. If you ask X86Subtarget for the framelowering, it actually uses covariant return types to return you an X86FrameLowering instead of a TargetFrameLowering.

test/CodeGen/X86/push-cfi-debug.ll
17–21 ↗(On Diff #37742)

Can you add a test for a callee-cleanup convention like stdcall? I think with the code as written we'll get something like this on Linux, where the stack is 16-byte aligned:

subl $4, %esp
.cfi_adjust_cfa_offset 4
pushl $3
.cfi_adjust_cfa_offset 4
pushl $2
.cfi_adjust_cfa_offset 4
pushl $1
.cfi_adjust_cfa_offset 4
calll _bar
addl $4, %esp
.cfi_adjust_cfa_offset -16

Which isn't *strictly* speaking correct, it should be:

calll bar
.cfi_adjust_cfa_offset -12
addl $4, %esp
.cfi_adjust_cfa_offset -4

We don't need to fix the issue in this change, I just like having test cases with FIXMEs.

mkuper updated this revision to Diff 38000.Oct 21 2015, 6:19 AM

Thanks Reid, Frederic.

This addresses the comments from the last revision.
In particular, gnu_args_size is now only generated for fp-based CFA. Frederic, does that sounds right to you?

rnk added a comment.Oct 21 2015, 11:23 AM

In particular, gnu_args_size is now only generated for fp-based CFA. Frederic, does that sounds right to you?

Hm, I think there might be an issue here. The cfa_adjust directives don't instruct the unwinder to adjust ESP before transferring to the landingpad, right? They just help indicate where the return address lives in memory.

The test case for this situation is something like:

int main() {
  // force stack realignment, FP usage, and use of SP to address local variables
  int __attribute__((align(32))) x = 42;
  try {
    throw 1;
  } catch (int) {
    return x;
  }
}

If ESP is wrong in the landingpad, we'll return the wrong value of x.

I would still have preferred that in the "only correct for call-site" case that the cfi_adjust occurred after the pushes rather than before. But I don't feel strongly about that, and so, am good with the changes now.

So, LGTM from me.

friss added a comment.Oct 21 2015, 8:52 PM
In D13767#272344, @rnk wrote:

In particular, gnu_args_size is now only generated for fp-based CFA. Frederic, does that sounds right to you?

Hm, I think there might be an issue here. The cfa_adjust directives don't instruct the unwinder to adjust ESP before transferring to the landingpad, right? They just help indicate where the return address lives in memory.

I think you are right, sorry for the confusion. What's the exact semantic of GNU_args_size? Is it only to be used when branching to a landing pad and ignored when just traversing the function?

In D13767#272344, @rnk wrote:

In particular, gnu_args_size is now only generated for fp-based CFA. Frederic, does that sounds right to you?

Hm, I think there might be an issue here. The cfa_adjust directives don't instruct the unwinder to adjust ESP before transferring to the landingpad, right? They just help indicate where the return address lives in memory.

I think you are right, sorry for the confusion. What's the exact semantic of GNU_args_size? Is it only to be used when branching to a landing pad and ignored when just traversing the function?

Yes, exactly. GNU_args_size is the delta between the stack pointer value immediately before the invoke that triggered the exception and the stack pointer value on entry to the landing pad. It is ignored for unwinding purposes.

The value that the unwinder substitutes for %esp/%rsp in CFA expressions & callee-save register location expressions is the value of %esp/%rsp immediately prior to the call/invoke.

lib/Target/X86/X86FrameLowering.cpp
2105 ↗(On Diff #38000)

Why are you suppressing the GNU_ARGS_SIZE directives for !hasFP? I think they are still needed. If you agree, some corresponding changes will be needed in the tests.

2120 ↗(On Diff #38000)

It would be nice to encapsulate the hasDebugInfo checks into a separate method, e.g. usePreciseUnwindInfo. That serves a useful documentation purpose and paves the way for supporting asynchronous EH, which also requires precise unwind info.

Thanks, everyone. Looks like I got confused about the semantics, will upload a new patch.
Sorry again for the amount of noise this is causing.

lib/Target/X86/X86FrameLowering.cpp
2105 ↗(On Diff #38000)

Yes, this is a mistake, I misinterpreted Frederic's comment. Will make the appropriate changes, thanks.

2120 ↗(On Diff #38000)

Right, will do.

mkuper updated this revision to Diff 38343.Oct 25 2015, 2:44 AM

Do *not* suppress gnu_args_size regardless of FP.
Also added the encapsulation Dave suggested.

rnk accepted this revision.Nov 2 2015, 1:40 PM

lgtm

This revision is now accepted and ready to land.Nov 2 2015, 1:40 PM
This revision was automatically updated to reflect the committed changes.