Page MenuHomePhabricator

[COFF, ARM64] Do not emit x86_seh_recoverfp intrinsic
AbandonedPublic

Authored by mgrang on Oct 22 2018, 5:40 PM.

Diff Detail

Event Timeline

mgrang created this revision.Oct 22 2018, 5:40 PM
dmajor added a subscriber: dmajor.Oct 22 2018, 7:11 PM
mgrang updated this revision to Diff 171994.Oct 31 2018, 12:44 PM
mgrang retitled this revision from [COFF, ARM64] Add aarch64_seh_recoverfp intrinsic to [COFF, ARM64] Support SEH for ARM64 Windows.
mgrang edited the summary of this revision. (Show Details)

Updated the patch with the following changes:

  1. Emit llvm.x86.seh.recoverfp only for non-aarch64 targets. For aarch64 windows, the parent fp is always passed in x1. So we don't need a separate instrinsic to recover the fp.
  1. Enable SEH exceptions by default for ARM64 windows. If exceptions are not enabled then function which may raise an exception are by default marked as "nounwind".
  1. Removed ARM64 specific tests from exceptions-seh.c. I will push more comprehensive tests for ARM64 SEH soon.
efriedma added inline comments.Oct 31 2018, 1:09 PM
lib/Driver/ToolChains/Clang.cpp
471 ↗(On Diff #171994)

I'm suspicious about adding a new default for AArch64 specifically; why do we need this if x86-64 doesn't?

rnk added a comment.Oct 31 2018, 2:27 PM

Updated the patch with the following changes:

  1. Emit llvm.x86.seh.recoverfp only for non-aarch64 targets. For aarch64 windows, the parent fp is always passed in x1. So we don't need a separate instrinsic to recover the fp.

What about the three stack pointer case of stack realignment plus a dynamic alloca? Typically this is the case where recoverfp is necessary.

lib/Driver/ToolChains/Clang.cpp
471 ↗(On Diff #171994)

Yeah, -cc1 -fexceptions really means "should I emit destructor cleanups", and SEH doesn't require that at all.

mgrang updated this revision to Diff 181106.Jan 10 2019, 10:55 AM
mgrang retitled this revision from [COFF, ARM64] Support SEH for ARM64 Windows to [COFF, ARM64] Do not emit x86_seh_recoverfp intrinsic.
In D53541#1282900, @rnk wrote:

Updated the patch with the following changes:

  1. Emit llvm.x86.seh.recoverfp only for non-aarch64 targets. For aarch64 windows, the parent fp is always passed in x1. So we don't need a separate instrinsic to recover the fp.

What about the three stack pointer case of stack realignment plus a dynamic alloca? Typically this is the case where recoverfp is necessary.

@rnk Sorry, I missed your comment earlier. I am not sure what scenario the three stack pointer case is. Could you please give me a test case for it?

mgrang updated this revision to Diff 181333.Jan 11 2019, 11:19 AM
rnk added a comment.Jan 14 2019, 2:18 PM

What about the three stack pointer case of stack realignment plus a dynamic alloca? Typically this is the case where recoverfp is necessary.

@rnk Sorry, I missed your comment earlier. I am not sure what scenario the three stack pointer case is. Could you please give me a test case for it?

I think you just need a local variable with an alignment greater than the ABI provides, something like:

struct Foo {
  Foo();
  int x, y, z;
};
int filter(int);
void may_throw();
void seh_byval(int n) {
  Foo __declspec(align(32)) o;
  __try {
    may_throw();
  } __except(filter(o.x)) {
  }
}
In D53541#1356904, @rnk wrote:

What about the three stack pointer case of stack realignment plus a dynamic alloca? Typically this is the case where recoverfp is necessary.

@rnk Sorry, I missed your comment earlier. I am not sure what scenario the three stack pointer case is. Could you please give me a test case for it?

I think you just need a local variable with an alignment greater than the ABI provides, something like:

struct Foo {
  Foo();
  int x, y, z;
};
int filter(int);
void may_throw();
void seh_byval(int n) {
  Foo __declspec(align(32)) o;
  __try {
    may_throw();
  } __except(filter(o.x)) {
  }
}

We seem to be generating the same code as MSVC. Here's my test case:

struct Foo {
  void (*ptr)();
  int x, y, z;
};

int filter(int);
void may_throw();
void seh_byval(int n) {
  struct Foo __declspec(align(32)) o;
  __try {
    may_throw();
  } __except(filter(o.x)) {
  }
}

MSVC:

|seh_byval$filt$0| PROC
    stp         fp,lr,[sp,#-0x10]!
    mov         fp,x1
|$LN5@seh_byval$|
    ldr         x8,[fp,#0x10]
    ldr         w0,[x8,#8]
    bl          filter
    nop
|$LN7@seh_byval$|
    ldp         fp,lr,[sp],#0x10
    ret

LLVM:

.seh_proc "?filt$0@0@seh_byval@@"
; %bb.0:                                ; %entry
        movz    x8, #:abs_g1_s:.Lseh_byval$frame_escape_0
        movk    x8, #:abs_g0_nc:.Lseh_byval$frame_escape_0
        add     x8, x1, x8
        ldr     w0, [x8, #8]
        b       filter
rnk added a comment.Jan 15 2019, 11:01 AM

I can't compile the example you gave yet because I haven't applied your patches locally, but this is the "three stack pointer" case that I have in mind:

struct Foo {
  void (*ptr)();
  int x, y, z;
};

void escape(void *);
void large_align(int a0, int a1, int a2, int a3, int a4, int a5, int a6, int a7,
                 int stackarg) {
  struct Foo __declspec(align(32)) alignedlocal;
  alignedlocal.x = 42;
  int vla[a0];
  escape(&alignedlocal);
  vla[0] = stackarg;
  escape(&vla[0]);
}

This is LLVM's generated code:

"?large_align@@YAXHHHHHHHHH@Z":         ; @"?large_align@@YAXHHHHHHHHH@Z"
.seh_proc "?large_align@@YAXHHHHHHHHH@Z"
; %bb.0:                                ; %entry
        stp     x21, x22, [sp, #-48]!   ; 16-byte Folded Spill
        stp     x19, x20, [sp, #16]     ; 16-byte Folded Spill
        stp     x29, x30, [sp, #32]     ; 16-byte Folded Spill
        add     x29, sp, #32            ; =32
        sub     x9, sp, #48             ; =48
        and     sp, x9, #0xffffffffffffffe0
        mov     x19, sp
        mov     w8, #42
        str     w8, [x19, #8]
        mov     w8, w0
        ldr     w22, [x29, #16]
        lsl     x8, x8, #2
        add     x8, x8, #15             ; =15
        lsr     x15, x8, #4
        mov     x21, sp
        bl      __chkstk
        mov     x8, sp
        sub     x20, x8, x15, lsl #4
        mov     sp, x20
        add     x0, x19, #0             ; =0
        bl      "?escape@@YAXPEAX@Z"
        mov     x0, x20
        str     w22, [x20]
        bl      "?escape@@YAXPEAX@Z"
        mov     sp, x21
        sub     sp, x29, #32            ; =32
        ldp     x29, x30, [sp, #32]     ; 16-byte Folded Reload
        ldp     x19, x20, [sp, #16]     ; 16-byte Folded Reload
        ldp     x21, x22, [sp], #48     ; 16-byte Folded Reload
        ret

I see three pointers used to address the stack:

  • sp: to address vla
  • x19: to address locals, the so-called "base" pointer
  • x29: to address parameters on the stack, looks like the traditional FP, points to FP+LR pair as well

At least for x86, the unwind info doesn't describe X19, it just describes X29, since that's what you need to restore CSRs and find the parent stack frame. We saw that the Windows EH runtime passes in some value based on the unwind info. For x86, it was just whatever EBP held, so recoverfp simply aligns that value forward to recover the base pointer (ESI) and then uses that to recover locals. For x64, the runtime passes in the value of RSP after the prologue ends, so we adjust it by the "parent frame offset" to get back the value we put in RBP. It looks like for x64 we never handled the case I'm asking you about, because this program doesn't print the right value:

#include <stdio.h>
struct Foo {
  void (*ptr)();
  int x, y, z;
};
int filter(int n) {
  printf("o.x: %d\n", n);
  return 1;
}
void may_throw() {
  __builtin_trap();
}
int main() {
  struct Foo __declspec(align(32)) o;
  o.x = 42;
  __try {
    may_throw();
  } __except(filter(o.x)) {
  }
}

I get "o.x: 0" instead of 42. I bet we can find something about that in bugzilla somewhere. =/

So, hopefully that explains the intended purpose of llvm.x86.seh.recoverfp, and why we might need to generalize it into something non-x86 specific. Maybe llvm.eh.recoverfp. Let me know if I can clarify anything else.

In D53541#1358210, @rnk wrote:

I can't compile the example you gave yet because I haven't applied your patches locally, but this is the "three stack pointer" case that I have in mind:

struct Foo {
  void (*ptr)();
  int x, y, z;
};

void escape(void *);
void large_align(int a0, int a1, int a2, int a3, int a4, int a5, int a6, int a7,
                 int stackarg) {
  struct Foo __declspec(align(32)) alignedlocal;
  alignedlocal.x = 42;
  int vla[a0];
  escape(&alignedlocal);
  vla[0] = stackarg;
  escape(&vla[0]);
}

This is LLVM's generated code:

"?large_align@@YAXHHHHHHHHH@Z":         ; @"?large_align@@YAXHHHHHHHHH@Z"
.seh_proc "?large_align@@YAXHHHHHHHHH@Z"
; %bb.0:                                ; %entry
        stp     x21, x22, [sp, #-48]!   ; 16-byte Folded Spill
        stp     x19, x20, [sp, #16]     ; 16-byte Folded Spill
        stp     x29, x30, [sp, #32]     ; 16-byte Folded Spill
        add     x29, sp, #32            ; =32
        sub     x9, sp, #48             ; =48
        and     sp, x9, #0xffffffffffffffe0
        mov     x19, sp
        mov     w8, #42
        str     w8, [x19, #8]
        mov     w8, w0
        ldr     w22, [x29, #16]
        lsl     x8, x8, #2
        add     x8, x8, #15             ; =15
        lsr     x15, x8, #4
        mov     x21, sp
        bl      __chkstk
        mov     x8, sp
        sub     x20, x8, x15, lsl #4
        mov     sp, x20
        add     x0, x19, #0             ; =0
        bl      "?escape@@YAXPEAX@Z"
        mov     x0, x20
        str     w22, [x20]
        bl      "?escape@@YAXPEAX@Z"
        mov     sp, x21
        sub     sp, x29, #32            ; =32
        ldp     x29, x30, [sp, #32]     ; 16-byte Folded Reload
        ldp     x19, x20, [sp, #16]     ; 16-byte Folded Reload
        ldp     x21, x22, [sp], #48     ; 16-byte Folded Reload
        ret

I see three pointers used to address the stack:

  • sp: to address vla
  • x19: to address locals, the so-called "base" pointer
  • x29: to address parameters on the stack, looks like the traditional FP, points to FP+LR pair as well

At least for x86, the unwind info doesn't describe X19, it just describes X29, since that's what you need to restore CSRs and find the parent stack frame. We saw that the Windows EH runtime passes in some value based on the unwind info. For x86, it was just whatever EBP held, so recoverfp simply aligns that value forward to recover the base pointer (ESI) and then uses that to recover locals. For x64, the runtime passes in the value of RSP after the prologue ends, so we adjust it by the "parent frame offset" to get back the value we put in RBP. It looks like for x64 we never handled the case I'm asking you about, because this program doesn't print the right value:

#include <stdio.h>
struct Foo {
  void (*ptr)();
  int x, y, z;
};
int filter(int n) {
  printf("o.x: %d\n", n);
  return 1;
}
void may_throw() {
  __builtin_trap();
}
int main() {
  struct Foo __declspec(align(32)) o;
  o.x = 42;
  __try {
    may_throw();
  } __except(filter(o.x)) {
  }
}

I get "o.x: 0" instead of 42. I bet we can find something about that in bugzilla somewhere. =/

So, hopefully that explains the intended purpose of llvm.x86.seh.recoverfp, and why we might need to generalize it into something non-x86 specific. Maybe llvm.eh.recoverfp. Let me know if I can clarify anything else.

@rnk Thanks a lot for the clarification. Yes, I see o.x: 0 instead of 42. Supporting this case would mean implementing recoverfp as well as support generating the correct parent frame offset for arm64 windows. Do you think this can be done in a follow-up patch? So this patch and D53540 would add the basic support for SEH and we can go fix corner/more complex cases in follow-up patches. There are also several more comprehensive test cases in https://github.com/Microsoft/windows_seh_tests which we plan to address next.

rnk added a comment.Jan 15 2019, 11:46 AM

@rnk Thanks a lot for the clarification. Yes, I see o.x: 0 instead of 42. Supporting this case would mean implementing recoverfp as well as support generating the correct parent frame offset for arm64 windows. Do you think this can be done in a follow-up patch? So this patch and D53540 would add the basic support for SEH and we can go fix corner/more complex cases in follow-up patches. There are also several more comprehensive test cases in https://github.com/Microsoft/windows_seh_tests which we plan to address next.

Sure, there's no need to handle the case of highly aligned stack objects in the initial version. But I would recommend making incremental progress in a way that brings us closer to the desired final end state.

I'd recommend renaming llvm.x86.seh.recoverfp to something non-architecture specific, and then implementing it as a no-op in the AArch64 backend. It will simply return the FP it is given, with FIXME comments about handling highly-aligned locals in the future. That'll save us this target check in clang, which we don't need in the long run.

I have renamed llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp in D56747 and D56748.

mgrang abandoned this revision.Jan 16 2019, 11:46 AM

I have added a default lowering for llvm.eh.recoverfp in D53540. So this patch is no longer needed.