This is an archive of the discontinued LLVM Phabricator instance.

[WinEH] Allocate space in funclets stack to save XMM CSRs
AbandonedPublic

Authored by pengfei on Jun 17 2019, 12:45 AM.

Details

Summary

This is an alternate approach to D57970.

Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value.

This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory.

Event Timeline

pengfei created this revision.Jun 17 2019, 12:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2019, 12:45 AM
pengfei edited the summary of this revision. (Show Details)Jun 17 2019, 12:59 AM
pengfei added reviewers: LuoYuanke, annita.zhang.
rnk added inline comments.Jun 18 2019, 2:54 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
1428

This math relies on too many fragile undocumented invariants:

  • frame indices are consecutive
  • frame indices are fixed (or non-fixed), one is negative, the other not

Is it possible to adjust from the assigned fixed offset in the parent frame to the funclet frame using some other MachineFrameInfo values? This MOVAPS instruction replacement seems like the correct fix, I just want to make it more robust.

1439

This code should execute for funclets as well, I believe, so we get .seh_savexmm directives for funclets.

2259

This is not the right place to do this, I would expect to see it in emitepilog.

pengfei updated this revision to Diff 205751.Jun 20 2019, 1:54 AM

Refine code as Reid's proposal.
Separate the XMM slot caculating so that we can get a consecutive layout in stack.
Also fix an alignment bug due to it affecting this patch.

pengfei marked 3 inline comments as done.Jun 20 2019, 2:20 AM
pengfei added inline comments.
llvm/lib/Target/X86/X86FrameLowering.cpp
1428

Thanks for the advice.

  • I separate the slot allocation of XMM, so it should be consecutive now
  • XMM frame indices are always fixed for now
rnk edited reviewers, added: craig.topper, RKSimon; removed: rnk.Jun 28 2019, 3:00 PM
rnk added a subscriber: rnk.

I haven't found time to review this, so I added @craig.topper and @RKSimon. They work on the x86 backend and should be able to provide guidance on structuring the frame lowering code.

craig.topper added inline comments.Jun 28 2019, 4:19 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
1411

Should we use a VEX instruction here under AVX?

1417

Put else on the line above

2238

Any reason we're using FR64 here instead of VR128?

2240

VEX instruction under AVX?

pengfei updated this revision to Diff 207193.Jun 29 2019, 4:30 AM

Refine code as Craig's proposal

pengfei marked 4 inline comments as done.Jun 29 2019, 4:35 AM

Thanks for the review

llvm/lib/Target/X86/X86FrameLowering.cpp
2238

VR128 should be more accurate. Thanks!

craig.topper added inline comments.Jul 15 2019, 5:36 PM
llvm/lib/Target/X86/X86FrameLowering.cpp
3272

If its always going to be XMM0-15. Which I think it is since you only checked VR128, not VR128X. Then you can ignore VLX and AVX512 I think.

craig.topper added inline comments.Jul 15 2019, 5:39 PM
llvm/test/CodeGen/X86/x86-interrupt_cc.ll
1

Why is a test case that isn't testing Windows affected by this change?

pengfei updated this revision to Diff 210011.Jul 15 2019, 7:12 PM

We don't need to consider the upper registers [X,Y,Z]MM16~[X,Y,Z]MM31. So remove the VLX and AVX512 check.

pengfei marked 2 inline comments as done.Jul 15 2019, 7:19 PM
pengfei added inline comments.
llvm/test/CodeGen/X86/x86-interrupt_cc.ll
1

There's a bug in the former slot reservation function. Because I changed the behavior on reserving XMM, I fixed it in the same patch, too.

majnemer added inline comments.
llvm/lib/Target/X86/X86FrameLowering.cpp
1410

Seems easier to read as alignDown(NumBytes, Align)

2043

Missing a period at the end of this sentence.

2244–2245

These should be on the same line.

pengfei updated this revision to Diff 210033.Jul 16 2019, 12:39 AM

Refine code as David's proposal

pengfei marked 3 inline comments as done.Jul 16 2019, 12:41 AM

Thanks for the review.

This revision is now accepted and ready to land.Jul 25 2019, 8:50 PM
This revision was automatically updated to reflect the committed changes.
rnk added a comment.Aug 20 2019, 3:04 PM

This broke this C++ code:

int getcsr();
void usecsrs(double, int, int, int, int, int, int, int, int);
double getxmm();
struct HasDtor {
  int x = 42;
  ~HasDtor();
};
int foo() {
  int v1 = getcsr();
  int v2 = getcsr();
  int v3 = getcsr();
  int v4 = getcsr();
  int v5 = getcsr();
  int v6 = getcsr();
  int v7 = getcsr();
  int v8 = getcsr();
  double f1 = getxmm();
  HasDtor o;
  usecsrs(f1, v1, v2, v3, v4, v5, v6, v7, v8);
  usecsrs(f1, v1, v2, v3, v4, v5, v6, v7, v8);
  usecsrs(f1, v1, v2, v3, v4, v5, v6, v7, v8);
  usecsrs(f1, v1, v2, v3, v4, v5, v6, v7, v8);
  return v1 + v2 + v3 + v4 + v5 + v6 + v7 + v8;
}

The movaps instruction in the funclet epilogue does not match the prologue:

$ ninja -j900 clang && clang -S t.cpp  -o - -O1 | grep 'movaps.*rsp\|dtor'
ninja: no work to do.
        .def     "?dtor$5@?0??foo@@YAHXZ@4HA";
"?dtor$5@?0??foo@@YAHXZ@4HA":
.seh_proc "?dtor$5@?0??foo@@YAHXZ@4HA"
        movaps  %xmm6, 64(%rsp)
        movaps  72(%rsp), %xmm6
        .long   "?dtor$5@?0??foo@@YAHXZ@4HA"@IMGREL # Action

The code crashes in the epilogue.

I fixed a different issue in rL368631, but I think this time I will revert this.

This comment was removed by pengfei.
pengfei reopened this revision.Aug 20 2019, 5:32 PM
This revision is now accepted and ready to land.Aug 20 2019, 5:32 PM
pengfei updated this revision to Diff 216308.Aug 20 2019, 6:19 PM

Add downward alignment for XMM storing in epilogue. This fix the crush when MaxCallFrameSize is not 16 bytes aligned.

pengfei added a reviewer: rnk.Aug 20 2019, 6:27 PM
pengfei updated this revision to Diff 216374.Aug 21 2019, 4:52 AM

Update test case to verify XMM registers correctly when MaxCallFrameSize is not 16 bytes aligned.

rnk requested changes to this revision.Aug 21 2019, 2:26 PM

My suggestion for drastically simplifying this is to modify X86RegisterInfo::eliminateFrameIndex to check if it's lowering an XMM CSR slot. In that case, it would call a new method, X86FrameLowering::getSaveXMMFrameIndexRef, which would contain the logic to check if the current block is a funclet prologue or epilogue. If the parent BB of the current instruction is in a funclet, then XMM slots would be resolved relative to RSP using the funclet frame size. If not, it would delegate to the standard getFrameIndexReference logic, yielding the correct register and offset for XMM slots in non-funclet prologues and epilogues. The prologue code that currently inserts savexmm markers would need to call this new helper as well instead of the standard getFrameIndexReference method. Does that plan seem reasonable?

llvm/lib/Target/X86/X86FrameLowering.cpp
1186–1187

I think you should include this offset in the return value from getWinEHFuncletFrameSize. With your change, getWinEHFuncletFrameSize no longer returns the true funclet frame size, and it would be a bug to have a call site that forgets to include the XMM size offset. In fact, I found and fixed this exact bug in the catch funclet establishing frame offset code. We should fix it by design before relanding.

2237–2246

For a multi-bb funclet, won't this be false? You should reuse the isFuncletReturnInstr check from above to know if this is a funclet epilogue.

2239

I don't like this repeated math that has to be kept in sync between the prologue and epilogue. If we could factor it out, I would find it easier to read and have more confidence in the correctness.

This revision now requires changes to proceed.Aug 21 2019, 2:26 PM
In D63396#1640095, @rnk wrote:

My suggestion for drastically simplifying this is to modify X86RegisterInfo::eliminateFrameIndex to check if it's lowering an XMM CSR slot. In that case, it would call a new method, X86FrameLowering::getSaveXMMFrameIndexRef, which would contain the logic to check if the current block is a funclet prologue or epilogue. If the parent BB of the current instruction is in a funclet, then XMM slots would be resolved relative to RSP using the funclet frame size. If not, it would delegate to the standard getFrameIndexReference logic, yielding the correct register and offset for XMM slots in non-funclet prologues and epilogues. The prologue code that currently inserts savexmm markers would need to call this new helper as well instead of the standard getFrameIndexReference method. Does that plan seem reasonable?

It's a great idea that simplifies the implementation. I commit a new patch on D66596 that follows your suggestions. Thanks a lot for the review and suggestions.

pengfei abandoned this revision.Aug 26 2019, 9:34 PM