This is an archive of the discontinued LLVM Phabricator instance.

[ARM SEH 6] [ARM] Add SEH opcodes in frame lowering
ClosedPublic

Authored by mstorsjo on May 15 2022, 2:38 PM.

Details

Summary

Skip inserting regular CFI instructions if using WinCFI.

This is based a fair amount on the corresponding ARM64 implementation,
but instead of trying to insert the SEH opcodes one by one where
we generate other prolog/epilog instructions, we try to walk over the
whole prolog/epilog range and insert them. This is done because in
many cases, the exact number of instructions inserted is abstracted
away deeper.

For some cases, we manually insert specific SEH opcodes directly where
instructions are generated, where the automatic mapping of instructions
to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).

Diff Detail

Event Timeline

mstorsjo created this revision.May 15 2022, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 2:38 PM
mstorsjo requested review of this revision.May 15 2022, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2022, 2:38 PM

FYI regarding this patchset; this set of patches allows generating SEH unwind info for plain unwind tables, and it works fine for itanium exception handling in mingw mode.

For the MSVC mode SEH __try, and MSVC mode C++ exceptions, more code generation changes are needed. Hopefully those changes are kinda straightforward (I presume it should be possible to just borrow bits from the AArch64 target), but I haven't implemented them - hopefully someone else can pick that up if the rest of these changes end up mergeable at some point.

efriedma added inline comments.May 16 2022, 12:40 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
1126

We shouldn't be encoding stack realignment into the unwind data. It's basically a dynamic allocation: we have to emit a frame pointer before we realign the stack, and we should cut off the unwind prologue immediately after the frame pointer is set up.

mstorsjo added inline comments.May 16 2022, 1:17 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
1126

Right, that'd be even cleaner. (This part gets tested only in the next patch which tweaks the frame pointers, and all the later opcodes are nops in that case. So cutting off the prologue at that point sounds like a good strategy.)

mstorsjo updated this revision to Diff 431271.May 22 2022, 2:42 PM

Updated to cut the prologue short, omitting parts that realign the stack. Updated to use separate .seh_nop directives for epilogues.

Since the beginning, I've had to disable PostRAScheduler, because it would reshuffle instructions independently of their associated SEH_* machineinstructions. I tried to compare it to the AArch64 implementation, but there, PostRAScheduler doesn't seem to be executed at all, so that didn't give any extra info about how to avoid it. So for this case, I tweaked ARMSubtarget::enablePostRAScheduler to disable the pass when producing SEH.

Compared to the AArch64 case, the ends of epilogues are slightly more problematic here. On AArch64, the epilogue end is before the ret instruction. On ARM, e.g. an bx lr (ARM::tBX_RET) is now followed by a ARM::SEH_Nop and ARM::SEH_EpilogEnd. As the ARM::tBX_RET is a terminator, the SEH_Nop and SEH_EpilogEnd that follows it also must be terminators, otherwise the machine instruction verifier bails out. Due to this, I've added a separate ARM::SEH_Nop_Ret which is marked a terminator.

After updating to use separate nop instructions, the end of an epilogue tBX_RET, SEH_EpilogEnd (nop=1) was changed into tBX_RET, SEH_Nop_Ret, SEH_EpilogEnd. This causes the machine block placement pass to do tail merging of multiple such epilogues in one function (where it previously didn't), which loses the SEH_Nop_ Ret and SEH_EpilogEnd for all but one of those epilogues.

What's the correct way of making sure that tail merging doesn't try to touch the SEH_* instructions? This doesn't seem to be happening on AArch64.

I looked into making a bundle of the rBX_RET, SEH_Nop_Ret, SEH_EpilogEnd (with an finalizeBundle over those three instructions), but that later breaks assembly output with a failed assert "Cannot print this instruction.". I presume that would require something to unbundle them later?

When looking into the tail merging pass, I noticed that if MachineInstr::isCFIInstruction() would return true for the SEH_* instructions, it could be handled differently and maybe the pass wouldn't break them. But it doesn't seem trivial to test out making that return true for the SEH_* instructions, so I don't know if that would fix any of these issues or not.

When looking into the tail merging pass, I noticed that if MachineInstr::isCFIInstruction() would return true for the SEH_* instructions, it could be handled differently and maybe the pass wouldn't break them. But it doesn't seem trivial to test out making that return true for the SEH_* instructions, so I don't know if that would fix any of these issues or not.

I managed to make a PoC of that, where the core of the changes were this:

diff --git a/llvm/include/llvm/CodeGen/MachineInstr.h b/llvm/include/llvm/CodeGen/MachineInstr.h
index cb6698c12d8e..15bdfdb5d7eb 100644
--- a/llvm/include/llvm/CodeGen/MachineInstr.h
+++ b/llvm/include/llvm/CodeGen/MachineInstr.h 
@@ -112,6 +112,7 @@ public:
     NoMerge      = 1 << 15,             // Passes that drop source location info
                                         // (e.g. branch folding) should skip
                                         // this instruction.
+    CFILike      = 1 << 16,
   };
 
 private:
@@ -1207,7 +1208,7 @@ public:
   }
 
   // True if the instruction represents a position in the function.
-  bool isPosition() const { return isLabel() || isCFIInstruction(); }
+  bool isPosition() const { return isLabel() || isCFIInstruction() || getFlag(C
FILike); }
 
   bool isNonListDebugValue() const {
     return getOpcode() == TargetOpcode::DBG_VALUE;

diff --git a/llvm/lib/CodeGen/BranchFolding.cpp b/llvm/lib/CodeGen/BranchFolding
.cpp
index 76f6a00b718e..de81ca874800 100644
--- a/llvm/lib/CodeGen/BranchFolding.cpp
+++ b/llvm/lib/CodeGen/BranchFolding.cpp
@@ -294,7 +294,7 @@ static unsigned HashEndOfMBB(const MachineBasicBlock &MBB) {
 
 /// Whether MI should be counted as an instruction when calculating common tail
.
 static bool countsAsInstruction(const MachineInstr &MI) {
-  return !(MI.isDebugInstr() || MI.isCFIInstruction());
+  return !(MI.isDebugInstr() || MI.isCFIInstruction() || MI.getFlag(MachineInst
r::CFILike));
 }
 
 /// Iterate backwards from the given iterator \p I, towards the beginning of th
e

This fixes the issue I was seeing (by setting that flag on all the SEH_* instructions). However, I'm not sure if that actually avoids the real issue (of splitting off the SEH_* instructions from the regular instructions) or if it just makes the tail merging no longer seem worthwhile doing. It doesn't fix the need to disable the PostRAScheduler in any case.

(Also, adding this new MachineInstr flag is problematic, as it requires widening MachineInstr::Flags from uint16_t to uint32_t.)

For scheduling, the AArch64 backend overrides "isSchedulingBoundary()" for SEH instructions.

I don't think we ever ran into issues with tail merge... maybe setting MachineInstr:::NoMerge would work?

For scheduling, the AArch64 backend overrides "isSchedulingBoundary()" for SEH instructions.

Oh, thanks, that does fix the issue with PostRAScheduling - thanks!

I don't think we ever ran into issues with tail merge... maybe setting MachineInstr:::NoMerge would work?

Yup, that does seem to do the trick too. Thanks!

mstorsjo updated this revision to Diff 431472.May 23 2022, 1:43 PM

No longer need to disable PostRAScheduler, and fixed tail merging by setting the MachineInst::NoMerge flag.

mstorsjo updated this revision to Diff 432237.May 26 2022, 3:38 AM

Plain rebase, no functional change.

efriedma added inline comments.May 31 2022, 5:45 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
300

report_fatal_error (here and other places you abort()).

304

Maybe add t2MOVi16/t2MOVTi16 here?

mstorsjo updated this revision to Diff 433335.Jun 1 2022, 2:56 AM

Moved all SEH_Nop insertion for __chkstk into insertSEH (with only one SEH instruction being manually added there, for the SEH_StackAlloc after the __chkstk call).

Switched to report_fatal_error instead of a manual printout and std::abort().

efriedma added inline comments.Jun 1 2022, 11:43 AM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
315

Looking at this again, this is actually sort of scary. In particular, this is dependent on looking into the future: trying to predict what Thumb2SizeReduction will do with a given instruction. Which is at best fragile, at worst broken if Thumb2SizeReduction doesn't run, or decides to do something different.

I guess you can sort of predict what will happen for t2MOVi16 and t2LDMIA_RET/t2LDMIA_UPD/t2STMDB_UPD. But it's less clear in other cases; we currently don't optimize t2SUBspImm, but we could. Or for TCRETURNdi, we don't actually decide the size until we hit the assembler.

I'm thinking we might want to disable Thumb2SizeReduction on instructions with SEH opcodes. (Or equivalently, on FrameSetup instructions if SEH unwind is enabled.)

mstorsjo added inline comments.Jun 1 2022, 12:48 PM
llvm/lib/Target/ARM/ARMFrameLowering.cpp
315

Thanks - this was indeed one of my fears initially. In practice, these guesses for what it will end up like have worked for all the code I've tested this on so far. But it's indeed brittle.

Skipping Thumb2SizeReduction for FrameSetup/FrameDestroy when SEH unwind is enabled seems to work fine though, so that alleviates most of the issue. (As a future TODO, one could maybe consider rewriting the MI to a narrow form already at this point, for the few opcodes where it matters?)

For TCRETURNdi, I also feared that it would be an issue, but it hasn't cropped up. (Or maybe the nondeterminate length of the instruction makes it unable to calculate the length of the epilogue at that point? And thus just skips the check...)

But it seems like the pseudo expansion of TCRETURNdi already has got such a case; MachO also requires strictly Thumb2 wide branches for tail calls, so we can opt in to that logic for SEH too.

mstorsjo updated this revision to Diff 433506.Jun 1 2022, 1:04 PM

Skip Thumb2SizeReduction for SEH prologs/epilogs, and force tail calls to wide instructions (just like on MachO), to make sure that the unwind info actually matches the width of the final instructions without heuristics about what later passes will do.

efriedma accepted this revision.Jun 1 2022, 1:25 PM

LGTM

llvm/lib/Target/ARM/ARMFrameLowering.cpp
315

I think you might need to implement narrowing for "push" and "pop", but probably not anything else. But in any case, it doesn't need to be in this patch.

This revision is now accepted and ready to land.Jun 1 2022, 1:25 PM
This revision was landed with ongoing or failed builds.Jun 2 2022, 2:29 AM
This revision was automatically updated to reflect the committed changes.