This is an archive of the discontinued LLVM Phabricator instance.

[WinEH] Fix PreISel intrinsics in funclet catchret.dest blocks
AbandonedPublic

Authored by sgraenitz on Sep 29 2022, 3:00 AM.

Details

Reviewers
theraven
rnk
fhahn
Summary

Extending the ObjC++ test-case from D128190 to a nested try-catch scenario, Clang emits an ObjC
ARC runtime-call in the funclet's catchret.dest block. This causes another binary truncation.

One of my changes from the original review prevents this successfully:
https://reviews.llvm.org/D124762#diff-change-MYHXIwRDUe3r

In the last round I discarded it as seemingly unnecessary, because back then my test didn't reproduce
it. Revisiting the affected test in libobjc2, however, revealed the missing piece:
https://github.com/gnustep/libobjc2/issues/222#issuecomment-1262047638

Diff Detail

Event Timeline

sgraenitz created this revision.Sep 29 2022, 3:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 3:01 AM
Herald added a subscriber: pengfei. · View Herald Transcript
sgraenitz requested review of this revision.Sep 29 2022, 3:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 3:01 AM

Right now this test fails, because it emits an int3 instead of the runtime-call in the catchret.dest block:

Input was:
<<<<<<
          33:  addq $64, %rsp
          34:  popq %rbp
          35:  retq
          36: .LBB0_3: # Block address taken
          37:  # %catchret.dest
          38: $ehgcr_0_3:
next:72'0           X~~~~~ error: no match found
          39:  int3
next:72'0     ~~~~~~
          40:  .seh_handlerdata
next:72'0     ~~~~~~~~~~~~~~~~~~
sgraenitz updated this revision to Diff 463820.Sep 29 2022, 3:09 AM

TL;DR: Appling this extra funclet pad condition fixes the issue

Adding extra debug output showed that the respective block 0x719c170 is referencing the funclet pad 0x719bed0 from another group, while the one in its own group is Null. That's the one we compare with in FuncletBundleOperand == FuncletPad. This looks like invalid state to me:

FuncletPadBB: 0x0000000007198c10
  FuncletPad: 0x0000000000000000
          BB: 0x0000000007198c10 -> OB_funclet: 0x0000000000000000
          BB: 0x000000000719c170 -> OB_funclet: 0x000000000719bed0 ---+
FuncletPadBB: 0x000000000719b9b0                                      |
  FuncletPad: 0x0000000000000000                                      |
FuncletPadBB: 0x000000000719bca0                                      |
  FuncletPad: 0x000000000719bed0 <------------------------------------+
          BB: 0x000000000719bca0 -> OB_funclet: 0x000000000719bed0

It appears that EH funclet coloring is responsible for this. Output from --debug-only=winehprepare-coloring was:

Coloring funclets for test_catch_with_objc_intrinsic
Visiting entry, entry
  Assigned color 'entry' to block 'entry'.
Visiting catch.dispatch, entry
  Assigned color 'catch.dispatch' to block 'catch.dispatch'.
Visiting catch, catch.dispatch
  Assigned color 'catch' to block 'catch'.
Visiting catchret.dest, entry
  Assigned color 'entry' to block 'catchret.dest'. <--- Shouldn't this have color 'catch'?
Visiting eh.cont, entry
  Assigned color 'entry' to block 'eh.cont'.
Visiting eh.cont, entry

I managed to hack it so that catchret.dest ends up in the correct group, but it breaks numerous tests: https://github.com/weliveindetail/llvm-project/commit/d678c7aafc75435147017bf999d54b1be5262db4

Not sure if adding the funclet pad condition as proposed in this patch is the best solution, but it's not breaking any test and it's effective in avoiding the truncation.

rnk requested changes to this revision.Sep 29 2022, 12:15 PM
rnk added subscribers: majnemer, lebedev.ri.

+@majnemer

I'm still concerned about this change. If you make this change, we might want to reconsider the entire funclet bundle design, basically.

The way we designed funclets, we basically assumed the frontend was responsible for generating all the call instructions. This is obviously a problem for middle end passes creating function calls, like any instrumentation tool or this ObjC ARC stuff, and that's why this keeps coming up again.

The risk of not discarding implausible blocks is that valid IR transforms, such as simplifycfg, will create IR that we cannot compile because we can't assign unique colors and EH state numbers to all blocks. The most plausible example of this tail merging, which I think @lebedev.ri was working on, but I don't recall the status.

I think, in the end, this whole funclet bundle design was a way of coming up with an invariant that we could enforce on IR producers that would prevent valid IR transforms from creating IR that we couldn't compile. The motivation of all this is to improve reliability. It was meant to prevent users from hitting problems, filing bugs, finding us, and asking us to make difficult changes to WinEH code. It was always about shifting the problem of preserving the EH structure onto the IR producer, to make it easier to reconstruct the structured nesting in the backend. Which is a valid motivation: we got burned from trying to do heroic backend structure reconstruction the first time we tried to implement this. However, now we have this invariant about how funclet bundles should be placed, and the IR producer needs to obey the invariant, and the consequences of breaking that invariant are that the user sees random crashes.

I think we can say that this plan really hasn't worked out. Nobody knows about these WinEH funclet IR invariants, every instrumentation tool breaks them or has to carry code to preserve them, and people keep hitting bugs. Moving the problem from WinEH lowering to IR production has not ultimately made the compiler more reliable form the perspective of the user, and people root cause the problem to this WinEH code that removes implausible blocks.


So what should we do about this? I think it would be a better user experience if we accepted that EH state numbering was a fallible algorithm. If we have valid IR that we try to lower to use a structured Windows EH personality but we can't do it because coloring fails, we should just report a compiler error. That would be way better than the current experience of crashing at runtime because the user happened to use a feature (ARC) that wasn't tested with WinEH. If we did this, we'd be able to delete quite a lot of code and simplify things across the codebase.

For extra bonus points, it would be great if we reported errors via the LLVMContext rather than aborting the program with report_fatal_error.

What do people think about this? I can't ask @sgraenitz to accept the huge increase in scope, but if we could at least document this as the intended direction in the WinEH docs and as a github issue, that would be enough for me to say we should land this as is.

This revision now requires changes to proceed.Sep 29 2022, 12:15 PM

Whatever rule WinEHPrepare uses to determine if the code is valid, the IR verifier should use the same rule, I think. I don't think it makes sense to let IR pass the verifier if it's uncompilable due to coloring rules; that just makes bugs harder to understand.

Rewriting calls to "unreachable" does seem unfriendly. It's possible funclet operand bundles are not the right choice. I'm not sure what the "right" rule is here; probably any design has downsides. Maybe we want to bias the design in favor of making "simple" transforms, like inserting instrumentation or lowering intrinsics, easy.

We could just write the rules for implicit coloring directly into LangRef, I guess. catchpad and cleanuppad tokens significantly restrict the number of transforms that can actually mess up the IR without breaking some other rule.


If we're going to relax the funclet bundle rules, I think we need to at least ensure that coloring fails reliably with a report_fatal_error if it's presented with uncolorable code. Does that happen currently?

sgraenitz added a comment.EditedOct 4 2022, 8:23 AM

Thanks for your feedback and the historic context. I am not familiar with all the details, but let me try and digest the pieces I understand. Meanwhile, I don't want to interrupt the conceptual conversation kicked-off by @rnk and @efriedma -- so @majnemer and @lebedev.ri please don't hesitate to follow up on it.

The way we designed funclets, we basically assumed the frontend was responsible for generating all the call instructions. This is obviously a problem for middle end passes creating function calls, like any instrumentation tool or this ObjC ARC stuff, and that's why this keeps coming up again.

All such ObjC ARC calls go to the runtime and won't throw. Might that even help the overall approach? What if transformations/instrumentation could add arbitrary calls later on as long as they are nounwind? What prevents WinEH from considering them as regular instructions? If I understand correctly, the stack remains intact during unwinding in the WinEH model, so all regular program state should still be accessible, right?

It was always about shifting the problem of preserving the EH structure onto the IR producer.

How could we teach codegen to emit correct IR? Looking at the specific test case in this patch, it appears that emitting the @llvm.objc.storeStrong call inside the catch block would solve the issue. I guess we'd have to do that for each catch handler then? From the view of WinEH would that be equivalent semantically?

catch:                                            ; preds = %catch.dispatch
  %1 = catchpad within %0 [ptr null, i32 64, ptr %exn.slot]
  %exn = load ptr, ptr %exn.slot, align 8
  %2 = call ptr @llvm.objc.retain(ptr %exn) [ "funclet"(token %1) ]
  store ptr %2, ptr %ex2, align 8
  call void @llvm.objc.storeStrong(ptr %ex2, ptr null) [ "funclet"(token %1) ] <-----+
  catchret from %1 to label %catchret.dest                                           |
                                                                                     |
catchret.dest:                                    ; preds = %catch                   |
  ; call void @llvm.objc.storeStrong(ptr %ex2, ptr null) [ "funclet"(token %1) ] ----+
  br label %eh.cont

If we have valid IR that we try to lower to use a structured Windows EH personality but we can't do it because coloring fails, we should just report a compiler error.

Sounds reasonable, these silent truncations are very bad. However, people started working with the current state and it won't help to prevent them from compiling their projects with an upcoming release. Could we add a compiler flag to disregard the error during a transition period? (And instead emit a warning?)

If we're going to relax the funclet bundle rules, I think we need to at least ensure that coloring fails reliably with a report_fatal_error if it's presented with uncolorable code. Does that happen currently?

If you are talking about funclet coloring, then no it doesn't look like funclet bundles are validated here right now: https://github.com/llvm/llvm-project/blob/release/15.x/llvm/lib/Analysis/EHPersonalities.cpp#L85

sgraenitz abandoned this revision.Nov 16 2022, 5:44 AM

I still think the silent code removal here in WinEHPrepare is very dangerous for all clients of funclet-based EH in LLVM. Thanks @rnk for initiating a discussion here.

Sporadic runtime crashes on exception paths are hard to diagnose and tracing the problem back here was a lot of effort. Understanding and mitigating the issue took long, especially because the documentation doesn't say anything at all about implausible calls or the role of bundle operands:
https://releases.llvm.org/15.0.0/docs/ExceptionHandling.html#exception-handling-using-the-windows-runtime

While the current approach should really be revisited in the long term, I understand that designing and implementing a new one is not an option in the short term. So in the meantime, I'm proposing to adopt the following patches instead (plus a few more preparations linked from there):

  • D137939 teaches the GNUstep-specific CodeGen to emit correct IR for the situation in this review
  • D138123 allows the IR Verifier to check funclet tokens for all intrinsic calls that may lower to function calls in IR transforms (thanks @efriedma for bringing it up!)
rnk added a comment.Nov 18 2022, 11:46 AM

Yeah, let's work towards verifier enforcement.

rnk added a comment.Nov 18 2022, 11:46 AM

Hit send really quickly... Sorry for any delay, thanks for working on this, I think this will result in a much better Windows EH experience. I appreciate the help, and apologize that I can't do more to help.