This is an archive of the discontinued LLVM Phabricator instance.

[ARM64] [Windows] Handle funclets
ClosedPublic

Authored by efriedma on Aug 30 2018, 6:10 PM.

Details

Summary

This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling.

Diff Detail

Repository
rL LLVM

Event Timeline

ssijaric created this revision.Aug 30 2018, 6:10 PM
mgrang added inline comments.Aug 30 2018, 6:22 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
910 ↗(On Diff #163457)

default should be the first case.

lib/Target/AArch64/AArch64InstrInfo.cpp
1606 ↗(On Diff #163457)

Can you move "MBB.getParent()-" to the next line so that it's more readable?

test/CodeGen/AArch64/wineh-funclets.ll
1 ↗(On Diff #163457)

Check 80 char limits here.

ssijaric added inline comments.Aug 30 2018, 10:14 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
910 ↗(On Diff #163457)

Will change it in the updated patch.

lib/Target/AArch64/AArch64InstrInfo.cpp
1606 ↗(On Diff #163457)

Will do in the updated patch.

test/CodeGen/AArch64/wineh-funclets.ll
1 ↗(On Diff #163457)

Is the 80 char limit on the RUN line a rule? If so, many test cases don't follow this. A single RUN is not split into multiple lines from what I've seen.

ssijaric updated this revision to Diff 163556.Aug 31 2018, 10:27 AM

A few formatting fixes.

mgrang added inline comments.Aug 31 2018, 10:43 AM
test/CodeGen/AArch64/wineh-funclets.ll
1 ↗(On Diff #163457)

< many test cases don't follow this.
They should! 80 chars is what the LLVM coding standards recommend:
https://llvm.org/docs/CodingStandards.html#source-code-formatting

Write your code to fit within 80 columns of text. This helps those of us who like to print out code and look at your code in an xterm without resizing it.

Plus, horizontal scrolling is no fun :)

rnk added a comment.Sep 24 2018, 4:29 PM

Neat. :)

lib/Target/AArch64/AArch64FrameLowering.cpp
645 ↗(On Diff #163556)

"containing"

majnemer added inline comments.
lib/Target/AArch64/AArch64FrameLowering.cpp
1638 ↗(On Diff #163556)

UnwidHelp -> UnwindHelp?

1639–1641 ↗(On Diff #163556)

Should this be moved after the terminator check?

1675–1679 ↗(On Diff #163556)

This is used to populate dispFrame in _s_HandlerType IIRC. IIUC, you will have issues with certain catch handlers without this properly populated.

I don't really know how exception handling on windows works or what a funclet is. So here's just a couple style comments and questions

lib/Target/AArch64/AArch64FrameLowering.cpp
909 ↗(On Diff #163556)

Can be const MachineInstr&

934–935 ↗(On Diff #163556)

Out of interest: How is getWinEHFuncletFrameSize() different from MachineFrameIndo::getStackSize()?

1182 ↗(On Diff #163556)

Add a comment explaining why you need a frame pointer for funclet?
Knowing nothing about funclets myself I wonder why the normal rules for determining whether we need FP are not enough here...

1648 ↗(On Diff #163556)

In case the previous loop ends with MBBI == MBB.end() all the following tests will be undefined behavior.

1652 ↗(On Diff #163556)

My understanding is that all of this stuff is just compiler generated extra instructions? In that case I recommend using an empty debug location instead (I know other places of the code get this wrong too...)

1662 ↗(On Diff #163556)

Use doxygen /// comments.

1668–1672 ↗(On Diff #163556)

doesn't this change the behavior for non-funclets too? If yes this needs to go into a separate patch and needs some comments why.

lib/Target/AArch64/AArch64MCInstLower.cpp
256–268 ↗(On Diff #163556)

Why is this in MCInstLower and not AsmPrinter.cpp? Maybe you can even specify it in tablegen (see for example ARMs LDMIA_RET and similar)

test/CodeGen/AArch64/wineh-funclets.ll
60 ↗(On Diff #163556)

unused basic block?

76 ↗(On Diff #163556)

You should be able to drop most of these attributes.

79–83 ↗(On Diff #163556)

I'm pretty sure you can drop these lines from the test

majnemer added inline comments.Sep 24 2018, 8:04 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
1675–1679 ↗(On Diff #163556)

Ah, it could be that dispFrame is not needed for ARM64. In that case, WinException.cpp should not call getWinEHParentFrameOffset and the ParentFrameOffset should not be emitted as it will add padding between the HandlerType elements.

majnemer added inline comments.Sep 24 2018, 11:50 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
1675–1679 ↗(On Diff #163556)

Hmmm... So after a bunch of poking around with ARM64 CL.exe, I can confirm that they are allocating the dispFrame field and always fill it with zero!

dmajor added a subscriber: dmajor.Oct 22 2018, 2:48 PM
ssijaric added inline comments.Oct 30 2018, 3:07 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
645 ↗(On Diff #163556)

Will fix in the updated test case.

909 ↗(On Diff #163556)

Will change in the updated patch.

934–935 ↗(On Diff #163556)

The stack size (getStackSize()) includes all the locals in its size calculation. We don't include these locals when computing the stack size of a funclet, as they are allocated in the parent's stack frame and accessed via the frame pointer from the funclet. Otherwise, we would be double allocating stack space in the funclet that we don't need. We only save the callee saved registers in the funclet, which are really the callee saved registers of the parent function, including the funclet. Perhaps, only the non-volatile registers that the funclet modifies should be saved/restored in the funclet prologue/epilogue, but we don't keep track of these separately.

I will document this.

1182 ↗(On Diff #163556)

Will do in the updated patch.

1638 ↗(On Diff #163556)

Will fix in the updated patch.

1639–1641 ↗(On Diff #163556)

Yes, it's definitely a better place for it.

1652 ↗(On Diff #163556)

Will fix in the updated patch.

1662 ↗(On Diff #163556)

Will fix in the updated patch.

1675–1679 ↗(On Diff #163556)

Thanks for double checking!! Will leave it as is.

lib/Target/AArch64/AArch64MCInstLower.cpp
256–268 ↗(On Diff #163556)

X86 did the expansion in X86MCInstLower.cpp. I would keep this way in the current patch, and look into tablegen later. I'll add a comment.

test/CodeGen/AArch64/wineh-funclets.ll
60 ↗(On Diff #163556)

The block is generated by clang, and SImplifyCFG removes it. I'll come up with a better test case.

76 ↗(On Diff #163556)

Will update the test case.

79–83 ↗(On Diff #163556)

Will update the test case.

ssijaric added inline comments.Oct 30 2018, 5:30 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
1668–1672 ↗(On Diff #163556)

This is used to calculate displacement of objects from the stack pointer, such as the displacement of the unwind object when filling up the exception handling table on Windows. This is the only place where it is used for AArch64. All the frame indices belong to the parent function (e.g. never a funclet). I put it in this patch, because the unwind object above is created only if the function has funclets.

It may be hard to write a test case that checks the .s file, as it will crash with an assert in the WinException sanity check (since the displacement will be that from the frame pointer). I would like to keep in this patch, if at all possible. If there is a really strong objection, I can move it to a separate patch.

ssijaric updated this revision to Diff 172089.Nov 1 2018, 12:01 AM
ssijaric edited the summary of this revision. (Show Details)

Address comments: spelling mistakes, better comments, a better test case.

ssijaric added inline comments.Nov 1 2018, 12:44 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
1278 ↗(On Diff #172089)

Move this into the assert; it will cause the release only build to fail.

ssijaric updated this revision to Diff 172224.Nov 1 2018, 1:56 PM

Remove an unused variable. Fix a nasty bug that Eli just pointed out, where I didn't insert a PrologEnd in funclets in this patch. Previously, I relied on an RAII object to this for me when we would go out of scope. I removed that logic, but forgot to insert a prolog end marker for the funclet case.

With this patch, the following should now pass:

#include <stdio.h>
int
main() {
  int B[100];
  int x = 4;
  B[2] = 2;
  B[4] = 0;
  try {
    throw B[2];
  } catch (int i) {
    printf("Caught %d\n", i);
    B[4] = B[i] + i;
    try {
      throw 'c';
    } catch (char c) {
      printf("Caught char %c\n", c);
    }
  }
  return x + B[4];
}

with the correct print outs and a return value of 8.

ssijaric updated this revision to Diff 172305.Nov 1 2018, 10:06 PM

Fix the check for IsFunclet in emitEpilogue.

I'm not really an expert on funclets, but the code looks fine to me.

The test coverage here is a little thin... in particular, there isn't any coverage of the emitted unwind opcodes. I'll amend the testcase to check the llvm-readobj output. (Sanjin isn't available for the next few weeks, so I'm going to handle any further revisions to the patch.)

Reid, do you have an opinion on what test coverage is necessary here?

rnk added a comment.Nov 2 2018, 2:15 PM

Reid, do you have an opinion on what test coverage is necessary here?

It's probably important to handle each and every kind of prologue. I don't know if there are any special corners cases for arm64 prologues, but for x64, this ended up being:

  • normal prologue
  • normal with -fno-omit-frame-pointer
  • stack realignment caused by a highly aligned alloca
  • normal with dynamic alloca or VLA
  • dynamic alloca with highly aligned static alloca

SEH and C++ EH have major differences, so test both. Alternatively, you can focus on one in one patch and defer work on the other.

SEH __except blocks in particular had to do some extra work to restore SP and register state when recovering from an exception, especially when the frame has dynamic SP updates.

I still need to go back and review, but hopefully that helps.

rnk added inline comments.Nov 2 2018, 2:23 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
829–831 ↗(On Diff #172305)

We had bugs here with XMM registers, which I don't think we fixed: https://llvm.org/pr32507

How does the AArch64 prologue/epilogue save and restore registers? If you load/store to a fixed frame index stack object, it's not going to work well. If it does something analogous to x86 "PUSH" instructions, it'll work fine.

This patch forces all functions that contain funclets to emit a frame pointer. (The existing testcase covers that.)

I'll do some testing with realignment and dynamic allocas. I don't think AArch64 prologue lowering has any other special cases.

lib/Target/AArch64/AArch64FrameLowering.cpp
829–831 ↗(On Diff #172305)

Spills of callee-saved registers should always use sp-relative loads/stores, as far as I know.

efriedma commandeered this revision.Nov 2 2018, 6:09 PM
efriedma added a reviewer: ssijaric.

Oh, I guess there is one special case for the prologue I didn't mention: varargs functions. I think the patch handles that correctly already, but I'll add a test.

For both dynamic allocas and stack realignment, the initial implementation of unwinding missed a significant rule: "r29 must point to the bottom of stack". That's probably straightforward to resolve, but I don't really want to do it in this patch (and I'm not sure what the practical effect is). Beyond that, there's probably more handling necessary to correctly handle a dynamic alloca inside a funclet (since we need to reset the frame pointer, but we still need access to the old frame pointer). And I don't really want to block this patch on solving that; it's obscure enough that it won't affect most code.

I have a couple small fixes for stack realignment, a fix for a MachineVerifier failure, and a testcase for realignment; I'll upload a new version.

efriedma updated this revision to Diff 172475.Nov 2 2018, 6:11 PM

Misc fixes.

The following testcase is also broken, for reasons I don't understand. I don't think it's a problem with any of the code in this patch, though; the throw out of the "catch" block somehow picks the wrong catch block, but this patch doesn't touch that code.

void a() { try { try { throw 1; } catch (...) { throw 1; } } catch (...) {}}
efriedma updated this revision to Diff 172668.Nov 5 2018, 3:13 PM

Add a bit more test coverage, to cover cases with and without base pointers. Tweak base pointer emission so we emit and use a base pointer where appropriate (including some cases where it isn't strictly necessary, but allows more efficient addressing modes).

I think this covers all the interesting cases for C++ exceptions. (There will be a separate followup patch for SEH exceptions.)

rnk added a comment.Nov 5 2018, 3:53 PM

The following testcase is also broken, for reasons I don't understand. I don't think it's a problem with any of the code in this patch, though; the throw out of the "catch" block somehow picks the wrong catch block, but this patch doesn't touch that code.

void a() { try { try { throw 1; } catch (...) { throw 1; } } catch (...) {}}

It works for x64 and x86, so I guess it's arm-specific. The state table emission code is portable, so maybe the bug is in the arm ip2state mapping logic?

It works for x64 and x86, so I guess it's arm-specific. The state table emission code is portable, so maybe the bug is in the arm ip2state mapping logic?

Looked a bit more; it looks like the problem is actually that the "throw" is the last instruction in the function, so unwinding gets confused. There's some code in X86TargetMachine::X86TargetMachine which turns on TrapUnreachable, which avoids the problem. It looks like MSVC does something similar (it inserts a nop instead of a trap, but same idea). I'll post a patch separately.

ssijaric added inline comments.Nov 6 2018, 3:23 AM
lib/Target/AArch64/AArch64FrameLowering.cpp
829–831 ↗(On Diff #172305)

They should always be saved off the funclet's SP pointer. This is what cl does, if I remember correctly.

Oh, I guess there is one special case for the prologue I didn't mention: varargs functions. I think the patch handles that correctly already, but I'll add a test.

For both dynamic allocas and stack realignment, the initial implementation of unwinding missed a significant rule: "r29 must point to the bottom of stack". That's probably straightforward to resolve, but I don't really want to do it in this patch (and I'm not sure what the practical effect is). Beyond that, there's probably more handling necessary to correctly handle a dynamic alloca inside a funclet (since we need to reset the frame pointer, but we still need access to the old frame pointer). And I don't really want to block this patch on solving that; it's obscure enough that it won't affect most code.

I have a couple small fixes for stack realignment, a fix for a MachineVerifier failure, and a testcase for realignment; I'll upload a new version.

Hi Eli,

Yin and I looked at the rule that says fp must be at the bottom if alloca is present. We tried a few (simple) test cases to see if this would be a problem. We didn't encounter any issues with our very limited testing, so decided not to implement it at this time. From my recollection, the fp (or the fp/lr pair) has to be separated from the code that saves the callee-saved registers, and handled on its own so that it ends up pointing to the bottom.

Thank you for taking over this!!! I may have some time time between flights to look at this a bit more, depending on the internet connection.

I am afraid I don't know much about exceptions on Windows in general. A few comments below.

lib/Target/AArch64/AArch64FrameLowering.cpp
1497 ↗(On Diff #172668)

Small typo: in the in -> in the

2059 ↗(On Diff #172668)

Why is this needed?

2114 ↗(On Diff #172668)

typo paren't -> parent's

lib/Target/AArch64/AArch64RegisterInfo.cpp
260 ↗(On Diff #172668)

Isn't this already caught by the next nested ifs?

test/CodeGen/AArch64/wineh-try-catch-realign.ll
10 ↗(On Diff #172668)

Is there a chance that we can test the PEI/FrameLowering part with more specific MIR tests? Especially the checks for what the base register should be. The fact that we don't serialize AArch64FunctionInfo may make this impossible for now.

rnk added a comment.Nov 6 2018, 10:12 AM

It works for x64 and x86, so I guess it's arm-specific. The state table emission code is portable, so maybe the bug is in the arm ip2state mapping logic?

Looked a bit more; it looks like the problem is actually that the "throw" is the last instruction in the function, so unwinding gets confused. There's some code in X86TargetMachine::X86TargetMachine which turns on TrapUnreachable, which avoids the problem. It looks like MSVC does something similar (it inserts a nop instead of a trap, but same idea). I'll post a patch separately.

We came up with a different solution for this called SEH_Epilogue. It basically inserts a nop if the last real instruction was a call. It's slightly complicated because it looks backwards across empty basic blocks and things. You can see it here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86MCInstLower.cpp#L1800

I've been meaning to see if we can remove the code that enables TrapUnreachable on Windows.

efriedma added inline comments.Nov 7 2018, 2:50 PM
lib/Target/AArch64/AArch64FrameLowering.cpp
2059 ↗(On Diff #172668)

It's not, as far as I can tell; I'll get rid of it.

lib/Target/AArch64/AArch64RegisterInfo.cpp
260 ↗(On Diff #172668)

Yes, I refactored it and didn't realize it. Will fix.

test/CodeGen/AArch64/wineh-try-catch-realign.ll
10 ↗(On Diff #172668)

Yes, I could write an MIR test using -run-pass=prologepilog, but I don't see the point. The CHECK lines would be essentially identical, just checking for MIR opcodes instead of assembly instructions. The MIR doesn't directly record whether we have a base register, or what it is; it's only used as part of frame lowering.

efriedma updated this revision to Diff 173041.Nov 7 2018, 2:57 PM

Address review comments (no functional change from the previous version).

Reid, are you okay with merging this? I'd prefer to get this merged sooner rather than later. (If you don't have to review carefully right now, I can address any minor issues post-commit.)

rnk accepted this revision.Nov 9 2018, 12:58 PM

lgtm

Sorry, I lost track of this.

lib/Target/AArch64/AArch64InstrInfo.cpp
1681 ↗(On Diff #173041)

Yep, makes sense, put the address of the target MBB in X0. Do you have any assurance that X0 isn't used by normal epilogue code? At least on x86, sometimes RAX gets used for large SP updates.

test/CodeGen/AArch64/wineh-try-catch.ll
13–15 ↗(On Diff #173041)

Tests with comments about the correctness of numbers that we filecheck for below are always worrying, but I have no better suggestions. :)

This revision is now accepted and ready to land.Nov 9 2018, 12:58 PM
efriedma added inline comments.Nov 9 2018, 2:25 PM
lib/Target/AArch64/AArch64InstrInfo.cpp
1681 ↗(On Diff #173041)

The current AArch64 emitFrameOffset never does that. And even if it did in some cases, a funclet's stack can't be more than a few hundred bytes anyway.

This revision was automatically updated to reflect the committed changes.