This is an archive of the discontinued LLVM Phabricator instance.

[Windows SEH]: Do not fold branches for MSVC TableSEH function
Needs ReviewPublic

Authored by pengfei on Jul 1 2023, 8:35 PM.

Details

Summary

We have calculated SEH states based on old CFG. Folding them may result in invalid labels when emitting entries for MSVC C specific handler.

Diff Detail

Event Timeline

pengfei created this revision.Jul 1 2023, 8:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2023, 8:35 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
pengfei requested review of this revision.Jul 1 2023, 8:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2023, 8:35 PM

Disabling the entire BranchFolding pass doesn't seem like an effective way to deal with this. If there are specific branches/basic blocks that can't be folded, we need to mark those specific constructs somehow, and make all relevant transforms check for those constructs. Presumably some of the transforms BranchFolding does are legal, and there are other passes that do transforms on branches that need to know what transforms are legal.

rnk added a comment.Jul 12 2023, 4:40 PM

I took the test case and I can observe the ".LBB0_-1"@IMGREL labels in the output, which look like bugs, but can you point at the MBB references that become stale with branchfolding? I think we mainly hold references to EHPad MBBs, which branch folding should ignore, but let me know if I'm wrong.

Thanks @efriedma and @rnk for the comments. And sorry just get time to look at back.

I haven't fully understand the algorithm in BranchFolding.cpp, but guess the problem might be related to the jumping edge issue @efriedma mentioned on https://reviews.llvm.org/D102817#3487499

can you point at the MBB references that become stale with branchfolding? I think we mainly hold references to EHPad MBBs, which branch folding should ignore, but let me know if I'm wrong.

The EHPad BB (bb.4) is removed by https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/BranchFolding.cpp#L1218

At the beginning, bb.4 has a predecessor bb.3. However, since bb.1, bb.2, bb.3 are just empty BB with fallthrough (and probably due to we don't model their jumping edge bb.7, bb.6, bb.4 respectively), they will be merged to bb.5 finally. Then bb.4 will become an orphan BB and be removed by the code above.

I tried to modify some code in this pass or set BlockAddressTaken for the EHPad BB. Neither solve my problem.

Here are the MIR before the pass. Suggestions are welcome. But I'd prefer to land this patch to solve the problem first if the root cause is hard to fix.

# Machine code for function main: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Frame Objects:
  fi#-1: size=8, align=16, fixed, at location [SP-8]

bb.0.entry:
  successors: %bb.1(0x40000000), %bb.7(0x40000000); %bb.1(50.00%), %bb.7(50.00%)

  frame-setup PUSH64r killed $rbp, implicit-def $rsp, implicit $rsp
  frame-setup SEH_PushReg 52
  $rbp = MOV64rr $rsp
  frame-setup SEH_SetFrame 52, 0
  frame-setup SEH_EndPrologue
  JMP_1 %bb.1

bb.1.invoke.cont:
; predecessors: %bb.0
  successors: %bb.2(0x40000000), %bb.7(0x40000000); %bb.2(50.00%), %bb.7(50.00%)

  JMP_1 %bb.2

bb.2.invoke.cont1:
; predecessors: %bb.1
  successors: %bb.3(0x2aaaaaab), %bb.6(0x2aaaaaab), %bb.7(0x2aaaaaab); %bb.3(33.33%), %bb.6(33.33%), %bb.7(33.33%)

  JMP_1 %bb.3

bb.3.invoke.cont2:
; predecessors: %bb.2
  successors: %bb.5(0x00000000), %bb.4(0x2aaaaaab), %bb.6(0x2aaaaaab), %bb.7(0x2aaaaaab); %bb.5(0.00%), %bb.4(33.33%), %bb.6(33.33%), %bb.7(33.33%)

  JMP_1 %bb.5

bb.4.__except (landing-pad):
; predecessors: %bb.3


bb.5.invoke.cont4:
; predecessors: %bb.3

  EH_LABEL <mcsymbol .Ltmp0>
  MOV32mi undef renamable $rax, 1, $noreg, 0, $noreg, 1 :: (volatile store (s32) into `ptr poison`)
  EH_LABEL <mcsymbol .Ltmp1>

bb.6.__except11 (landing-pad):
; predecessors: %bb.2, %bb.3


bb.7.ehcleanup (machine-block-address-taken, landing-pad, ehfunclet-entry):
; predecessors: %bb.0, %bb.1, %bb.2, %bb.3
  liveins: $rdx
  frame-setup MOV64mr killed $rsp, 1, $noreg, 16, $noreg, $rdx
  frame-setup PUSH64r killed $rbp, implicit-def $rsp, implicit $rsp
  frame-setup SEH_PushReg 52
  $rbp = MOV64rr $rdx
  frame-setup SEH_EndPrologue
  SEH_Epilogue
  $rbp = frame-destroy POP64r implicit-def $rsp, implicit $rsp
  CLEANUPRET

# End machine code for function main.

the problem might be related to the jumping edge issue @efriedma mentioned on https://reviews.llvm.org/D102817#3487499

Right, that's makes sense.

I'm concerned that if we ignore the bad CFG now, it's going to come back to bite us again. In particular, for register allocation... although, maybe if we setExposesReturnsTwice(), it would mitigate that particular issue. Or maybe we'll detect "loops" that aren't really loops, and hoist/sink incorrectly. Or a number of other similar things. Basically, I'm concerned we're going to end up spreading classifyEHPersonality() calls all over the backend for various issues.

Can you look into adding EH edges to the relevant blocks?

rnk added a comment.Jul 14 2023, 2:55 PM

Can you look into adding EH edges to the relevant blocks?

I think it's less of a matter of missing edges, and more like a UAF bug. We are storing MBB pointers in a WinEHFuncInfo side table, and BranchFolding replaces the blocks in ways that I hope are valid for landingpads, but invalidate our references. Storing the MBB references on the side is safe if you believe that no backend pass would delete and replace an EHPad block, but it looks like BranchFolding is actually brave enough to do that.

I'm guessing a better fix would be to block this transform in the right cases.

Oh, right, I see what you mean.

BranchFolding has an explicit check for MBB.isMachineBlockAddressTaken(), though; that's supposed to prevent deleting blocks that are mentioned in side-tables.

I'm concerned that if we ignore the bad CFG now, it's going to come back to bite us again.

It doesn't look to me there's a bad CFG if we never do BranchFolding here. You can find the predecessors/successors chain EH BBs correctly in the CFG. The problem looks to me is BranchFolding ignores EH edges by itself.

Can you look into adding EH edges to the relevant blocks?

Yeah, a possible solution is the fallthrough BB to inherit the successors of the BB that merging to it. But I'm worrying it affects non SEH scenarios as well. We have already used EHScopeMembership/SameEHScope to calculate for them. I think the best solution is to take SEH state number into account. But I haven't figure out the mechanism of EHScopeMembership and how to do it for SEH. So in this patch, I just skip this transform for MSVC_TableSEH.

I'm guessing a better fix would be to block this transform in the right cases.

Thanks @rnk, so you are agreeing with this approach, right?

BranchFolding has an explicit check for MBB.isMachineBlockAddressTaken(), though; that's supposed to prevent deleting blocks that are mentioned in side-tables.

I have tried to set AddressTaken for the BB before. The problem is the CFG has already broken by this pass, so preserve the single BB doesn't solve the real problem.

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index d70de26655b9..8cf7bbfd5e9c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -1802,7 +1802,9 @@ void SelectionDAGBuilder::visitCatchPad(const CatchPadInst &I) {
   bool IsCoreCLR = Pers == EHPersonality::CoreCLR;
   bool IsSEH = isAsynchronousEHPersonality(Pers);
   MachineBasicBlock *CatchPadMBB = FuncInfo.MBB;
-  if (!IsSEH)
+  if (IsSEH)
+    CatchPadMBB->setMachineBlockAddressTaken();
+  else
     CatchPadMBB->setIsEHScopeEntry();
   // In MSVC C++ and CoreCLR, catchblocks are funclets and need prologues.
   if (IsMSVCCXX || IsCoreCLR)

Can you look into adding EH edges to the relevant blocks?

Yeah, a possible solution is the fallthrough BB to inherit the successors of the BB that merging to it. But I'm worrying it affects non SEH scenarios as well. We have already used EHScopeMembership/SameEHScope to calculate for them. I think the best solution is to take SEH state number into account. But I haven't figure out the mechanism of EHScopeMembership and how to do it for SEH. So in this patch, I just skip this transform for MSVC_TableSEH.

I was thinking you'd add the edges computed by SEH numbering during isel. I agree it's not a good idea to try to inherit the edges.

I'm concerned that if we ignore the bad CFG now, it's going to come back to bite us again.

It doesn't look to me there's a bad CFG if we never do BranchFolding here. You can find the predecessors/successors chain EH BBs correctly in the CFG. The problem looks to me is BranchFolding ignores EH edges by itself.

If you don't do BranchFolding, the CFG is... sort-of right? I mean, disabling BranchFolding and any similar pass (I'm drawing a blank off the top of my head, but I'm pretty sure BranchFolding is not the only pass that does relevant CFG manipulation), the CFG continues to resemble the IR CFG; you can use that to compute the correct EH edges, and stuff like MachineDominatorTree will work correctly within the EH blocks.

Even with that, though, without the complete set of proper edges, vreg liveness is going to be wrong. Probably not relevant in a lot of cases because clang uses "volatile" heavily, and I think the SEH landing pad clobbers every physical register. But I suspect it's not completely irrelevant.

Also, this patch isn't implementing "disable BranchFolding" in a good way. Checking "getPersonalityFn" explicitly instead of marking up the MIR means that we potentially need to copy-paste the getPersonalityFn check into any pass that does CFG modifications. And people making changes to those passes will, inevitably, not be aware they need to copy-paste it.

I'm concerned that if we ignore the bad CFG now, it's going to come back to bite us again.

It doesn't look to me there's a bad CFG if we never do BranchFolding here. You can find the predecessors/successors chain EH BBs correctly in the CFG. The problem looks to me is BranchFolding ignores EH edges by itself.

If you don't do BranchFolding, the CFG is... sort-of right? I mean, disabling BranchFolding and any similar pass (I'm drawing a blank off the top of my head, but I'm pretty sure BranchFolding is not the only pass that does relevant CFG manipulation), the CFG continues to resemble the IR CFG; you can use that to compute the correct EH edges, and stuff like MachineDominatorTree will work correctly within the EH blocks.

Even with that, though, without the complete set of proper edges, vreg liveness is going to be wrong. Probably not relevant in a lot of cases because clang uses "volatile" heavily, and I think the SEH landing pad clobbers every physical register. But I suspect it's not completely irrelevant.

Also, this patch isn't implementing "disable BranchFolding" in a good way. Checking "getPersonalityFn" explicitly instead of marking up the MIR means that we potentially need to copy-paste the getPersonalityFn check into any pass that does CFG modifications. And people making changes to those passes will, inevitably, not be aware they need to copy-paste it.

I'm not sure of that, but I got some evidence that BranchFolding is special. You can see EHScopeMembership is designed for BranchFolding use only. I also found two related patches rG161935520d5a9 and rG4600c064346 by @majnemer. Though they are targeting to funclets, I think the problem is similar, i.e., there're no edges to funclets so we have to make CFG irreducible manually.

I also found two related patches rG161935520d5a9 and rG4600c064346 by @majnemer. Though they are targeting to funclets, I think the problem is similar, i.e., there're no edges to funclets so we have to make CFG irreducible manually.

Those patches enforce an invariant; "branches between funclets/into funclets/out of funclets are not legal". Most passes won't do that because there isn't any control flow between the blocks in question; basically only BranchFolding and SimplifyCFG do that sort of fold, so it's not something people usually need to be aware of. And for anything that does need to be aware, there's a MachineFunction method getEHScopeMembership() to generically query the property in question.

You could argue there's an invariant here you need to enforce, but you haven't even attempted to describe what the invariants in question are, so it's hard to assess whether other passes are likely to break them. And you haven't provided a method to query for the invariant in question; instead, the personality function is hardcoded into BranchFolding.

I expect figuring all this out and documenting it properly is harder than just adding the relevant edges.

rnk added a comment.Aug 30 2023, 1:58 PM

I wanted to follow up on this, it was hiding at the bottom of my inbox.

To recap the issue, what's going on is that with asynch EH enabled, it is now possible for machine CFG cleanups in branch folding to delete basic blocks and make some funclets unreachable. This causes problems because we store references to funclet entry MBBs in WinEHFuncInfo. The ability to do this relies on the assumption that Machine passes are just not powerful enough to delete branches to funclets, usually because they are reachable from an invoke, and that's not something a machine pass can normally delete.

However, with asynch EH, we have these no-op @llvm.seh.try.begin/end intrinsics. They don't become real call instructions, so they don't create the same kind of optimization barrier that synchronous EH would have. If you look at the machine CFG posted earlier, you see a sequence of MBBs containing nothing but JMPs, and BranchFolding::OptimizeBranches is trying to clean that up into a straightline fallthrough and delete the intermediate basic blocks. Doing that deletes the successor edges to the catch and cleanup funclets, and then the blocks themselves once they become unreachable.

This raises the question of if we should try to make WinEHFuncInfo resilient to these kinds of CFG changes, and I think the answer is "definitely not", because we calculate the state numbers on the LLLVM IR much earlier during WinEHPrepare.

I think the next step is to dig into OptimizeBlock and teach it not to remove blocks for which hasEHPadSuccessor is true.

hasEHPadSuccessor() is actually false for the block that throws the exception. I think that's a bug... but everybody else who's looked at this code apparently disagrees with me.

rnk added a comment.Aug 30 2023, 3:55 PM

hasEHPadSuccessor() is actually false for the block that throws the exception. I think that's a bug... but everybody else who's looked at this code apparently disagrees with me.

Now I see what you mean, sorry it took me a while to work out what you meant about the bad CFG. :) We have no seh.try.end() blocks in this example. They've all been deleted in the mid-level because of the unreachable.

We've arrived once again at my favorite core struggle with LLVM IR: the lack of regions.

I can think of two ways to proceed:

  • We can continue with the existing IR by trying to identify try regions and adding extra unwind edges to MBBs in the try state
  • We can continue with a fictional, imperfect Machine CFG, and attempt to retain the seh.try.end intrinsics by adding a fictional edge from begin to end, so unreachable won't destroy the try scope information.

Thanks @rnk and @efriedma for the follow up. FYI, I have disabled BranchFolding and MachineBlockPlacement for asynch EH internally to workaround this problems. We can improve them, but it's not my top priority at the present.

However, with asynch EH, we have these no-op @llvm.seh.try.begin/end intrinsics. They don't become real call instructions, so they don't create the same kind of optimization barrier that synchronous EH would have.

I was trying to define pseudo opcodes for these intrinsics, but it cannot solve MachineBlockPlacement issue because it's behind the pseudo expansion pass.

We can continue with the existing IR by trying to identify try regions and adding extra unwind edges to MBBs in the try state

In the previous analysis, I thought the edges are preserved until we are merging them (e.g., bb.1 has two successors although it has absolute jump). And I thought the way to solve it is to inherit these successors when merging BBs. However, it may not solve all the problems, since BBs may have different EH states. The point is, I think such edges don't solve the problem here. Or we may have different understanding about adding edges.

We can continue with a fictional, imperfect Machine CFG, and attempt to retain the seh.try.end intrinsics by adding a fictional edge from begin to end, so unreachable won't destroy the try scope information

The design doc says "If some exits flow to unreachable, propagation on those paths terminate, not affecting remaining blocks.". I think there might be some performance affects if we intended to retain these BBs. If we can accept some performance lose, I think disbling the pass is a better choice.

We can continue with the existing IR by trying to identify try regions and adding extra unwind edges to MBBs in the try state

In the previous analysis, I thought the edges are preserved until we are merging them (e.g., bb.1 has two successors although it has absolute jump). And I thought the way to solve it is to inherit these successors when merging BBs. However, it may not solve all the problems, since BBs may have different EH states. The point is, I think such edges don't solve the problem here. Or we may have different understanding about adding edges.

My understanding of "adding edges" is that in the example https://reviews.llvm.org/D154294#4501026, we would add an edge from 'bb.5.invoke.cont4' to the landing pad in the initial MIR CFG. (Not adding the edge when we merge blocks in BranchFolding, but adding the edge when SelectionDAG constructs the MIR.) The edge doesn't explicitly exist in the LLVM IR, but we can infer it from the preceding llvm.seh.try.begin() call.

We can continue with the existing IR by trying to identify try regions and adding extra unwind edges to MBBs in the try state

In the previous analysis, I thought the edges are preserved until we are merging them (e.g., bb.1 has two successors although it has absolute jump). And I thought the way to solve it is to inherit these successors when merging BBs. However, it may not solve all the problems, since BBs may have different EH states. The point is, I think such edges don't solve the problem here. Or we may have different understanding about adding edges.

My understanding of "adding edges" is that in the example https://reviews.llvm.org/D154294#4501026, we would add an edge from 'bb.5.invoke.cont4' to the landing pad in the initial MIR CFG. (Not adding the edge when we merge blocks in BranchFolding, but adding the edge when SelectionDAG constructs the MIR.) The edge doesn't explicitly exist in the LLVM IR, but we can infer it from the preceding llvm.seh.try.begin() call.

bb.5.invoke.cont4 to landing pad (I assume it's bb.7.ehcleanup) doesn't matter to me here. I analyzed the problem was the edge from bb.3.invoke.cont2 to bb.4.__except was removed in BranchFolding. This is the direct problem I wanted to solve here. It seems we are talking about independent problems. I didn't research the other but I think it doesn't solve my problem.

rnk added a comment.Aug 31 2023, 9:36 AM

bb.5.invoke.cont4 to landing pad (I assume it's bb.7.ehcleanup) doesn't matter to me here. I analyzed the problem was the edge from bb.3.invoke.cont2 to bb.4.__except was removed in BranchFolding. This is the direct problem I wanted to solve here. It seems we are talking about independent problems. I didn't research the other but I think it doesn't solve my problem.

I agree, that is what is happening. bb3 is a simple JMP to bb5. BranchFolding sees it as an empty block and nobody else jumps to bb5, so it wants to delete it and RAUW all uses of bb3 with bb5. BranchFolding doesn't transfer the unwind edges from bb3 to bb5, so they are all dropped and the EH pads become dead. If transferred the EH edges to bb5, it would keep the EH pads alive in the CFG. So, maybe transferring those EH edges is an OK solution?

Still, having them present in the first place as Eli says would solve issues in other passes. From reading the state numbering code in winehprepare, it seems like we already assign all these blocks to the try state number, which means any exception in them will unwind to the catchpad. Can we use the state numbers during machine CFG construction to add the missing edges? To say it another way, state numbering is the dataflow algorithm we need to apply the missing edges. It's a matter of transferring that from IR to MIR.

bb.5.invoke.cont4 to landing pad (I assume it's bb.7.ehcleanup) doesn't matter to me here. I analyzed the problem was the edge from bb.3.invoke.cont2 to bb.4.__except was removed in BranchFolding. This is the direct problem I wanted to solve here. It seems we are talking about independent problems. I didn't research the other but I think it doesn't solve my problem.

I agree, that is what is happening. bb3 is a simple JMP to bb5. BranchFolding sees it as an empty block and nobody else jumps to bb5, so it wants to delete it and RAUW all uses of bb3 with bb5. BranchFolding doesn't transfer the unwind edges from bb3 to bb5, so they are all dropped and the EH pads become dead. If transferred the EH edges to bb5, it would keep the EH pads alive in the CFG. So, maybe transferring those EH edges is an OK solution?

I think so, this can solve the specific problem here.

Still, having them present in the first place as Eli says would solve issues in other passes. From reading the state numbering code in winehprepare, it seems like we already assign all these blocks to the try state number, which means any exception in them will unwind to the catchpad. Can we use the state numbers during machine CFG construction to add the missing edges? To say it another way, state numbering is the dataflow algorithm we need to apply the missing edges. It's a matter of transferring that from IR to MIR.

That would be feasible. However, if we built the edges from all BBs in try region to catchpad, the CFG may turn into a too complex graph to understand for developers who don't know much about SEH. Instead, I think it would be much clear if we taking state numbers into account just when doing CFG optimization.

That would be feasible. However, if we built the edges from all BBs in try region to catchpad, the CFG may turn into a too complex graph to understand for developers who don't know much about SEH. Instead, I think it would be much clear if we taking state numbers into account just when doing CFG optimization.

The CFG becomes more complicated in the sense that there are more edges... but it becomes simpler in the sense that code which generically supports EH pads should just work without additional SEH-specific knowledge. The latter property seems more valuable because most people working on the backend know nothing about SEH.

rnk added a comment.Sep 1 2023, 3:14 PM

The CFG becomes more complicated in the sense that there are more edges... but it becomes simpler in the sense that code which generically supports EH pads should just work without additional SEH-specific knowledge. The latter property seems more valuable because most people working on the backend know nothing about SEH.

+1