This is an archive of the discontinued LLVM Phabricator instance.

[StackColoring] Conservatively merge point pV = &(&Variable) in catch(Variable)
ClosedPublic

Authored by xiangzhangllvm on Aug 26 2020, 8:24 PM.

Details

Summary

Conservatively merge pV = &(&Variable) for catch(Variable), EH libs may write the catch value and return the Point (Type**) pV back.
This Point may be dangerously over written due to some work of objects' destructor in try block. (The destructor may work after EH written)

In fact, for the catch point pV, there is usually a very long life range guarded with TIME_START and TIME_END (usually almost through the whole program),
but there is an potion "-stackcoloring-lifetime-start-on-first-use" which will cut it shorter.

This patch try let pV conservatively use its TIME_START and TIME_END as its live-range, not affected by the option "-stackcoloring-lifetime-start-on-first-use"

We find this bug in a big win32 project, which hard to reproduce, now the following case can reproduce.
We also analysis it from front/mid-end, it is hard for front/mid-end to fix this bug, So I continue push this patch go ahead.

Compile with:
clang-cl -m32 -O2 -EHs test.cpp

__attribute__((noinline,nothrow,weak)) void escape(int *p) { }
struct object {
  int i;
  object() {
    i = 1;
  }
  ~object() {
    // if "object" and "exp" are assigned to the same slot,
    // this assign will corrupt "exp".
    i = 9999;
    escape(&i);
  }
};
inline void throwit() { throw 999; }

volatile int v;
inline void func() {
  try {
    object o;
    throwit();
  }
  // "exp" is written by the OS when the "throw" occurs.
  // Then the destructor is called, and the store-assign
  // clobbers the value of "exp".
  // The dereference of "exp" (with value 9999) causes a crash.
  catch (int &exp) {
    v = exp;
  }
}

int main() {
  func();
  return 0;
}

Diff Detail

Event Timeline

xiangzhangllvm created this revision.Aug 26 2020, 8:24 PM
Herald added a project: Restricted Project. · View Herald Transcript
xiangzhangllvm requested review of this revision.Aug 26 2020, 8:24 PM
efriedma added a subscriber: efriedma.

Special-casing a specific load instruction isn't the right approach; the affected instruction is, at best, unpredictable.

Fundamentally, the issue is that the code can't really figure out what the lifetime of the alloca is supposed to be. On the non-exception path, there's an llvm.lifetime.end marker, but on the exception path there is no such marker.

I see two possible approaches here:

  1. Teach stackcoloring to detect that there's a "missing" llvm.lifetime.end marker for the alloca in question, and handle that somehow.
  2. Teach clang to emit an appropriate llvm.lifetime.end marker in the catch block, to precisely express the lifetime to stackcoloring.
xiangzhangllvm edited the summary of this revision. (Show Details)Aug 26 2020, 10:48 PM
xiangzhangllvm added a comment.EditedAug 26 2020, 11:03 PM

Hello efriedma, first thanks for your suggestion.
I agree with"specific load instruction isn't good approach", I just don't know how to easily identify the catch variable in Machine IR.
So I choose the first Load in landing-pad. (I think this can be checked in catch's lowering)

I think there isn't problem of llvm.lifetime.start/end, please see the test in the patch, line 240, 241, 250, 276 all marked with lifetime.start/end.
Without this patch, The catch variable will still be merged even it guard with " lifetime.start/end" and has life-interval with other variable's life range.

lebedev.ri retitled this revision from Conservatively merge &&Variable for catch(Variable) to [StackColoring] Conservatively merge &&Variable for catch(Variable).Aug 26 2020, 11:52 PM
xiangzhangllvm edited the summary of this revision. (Show Details)Aug 27 2020, 1:28 AM

I think there isn't problem of llvm.lifetime.start/end, please see the test in the patch, line 240, 241, 250, 276 all marked with lifetime.start/end.

Those markings aren't really complete. In particular, if the call throws an exception, the LIFETIME_END of the allocation %stack.1.ex2 never runs. So it's not clear when the lifetime is supposed to end.

thanm added a comment.Aug 27 2020, 5:01 AM

I agree with Eli, this does not seem like a good fix, it would be better to figure out why/how the lifetime markers wind up malformed and work on the problem from that angle.

llvm/test/CodeGen/X86/StackColoring-first-use-in-catch.mir
251

Here really I manually add the LIFETIME_END to reproduce merge action.
I'll re-analysis the problem project, thank you for your advise!

xiangzhangllvm retitled this revision from [StackColoring] Conservatively merge &&Variable for catch(Variable) to [StackColoring] Conservatively merge Variable in catch(Variable).
xiangzhangllvm edited the summary of this revision. (Show Details)
xiangzhangllvm added a reviewer: clin1.
xiangzhangllvm edited the summary of this revision. (Show Details)
clin1 added inline comments.Oct 26 2020, 3:03 PM
llvm/lib/CodeGen/StackColoring.cpp
711

It might be worth explaining here that the runtime writes to X outside of the user code, which is why we don't know the exact start of the lifetime.
Question: Why does a memory read start a stack value lifetime---shouldn't it be a write? And if we don't find a write, then the coloring should be conservative?

clin1 added inline comments.Oct 26 2020, 3:39 PM
llvm/test/CodeGen/X86/StackColoring-first-use-in-catch.mir
8

The "printf" calls and string values can be removed from the test case. The optimizer passes must be prevented from removing the "i = 9999" in the destructor, and "exp" must be read in the catch block. This can be done in simpler ways (volatile store, call to dummy non-removable function, etc.)

xiangzhangllvm added inline comments.Oct 26 2020, 5:54 PM
llvm/lib/CodeGen/StackColoring.cpp
711

Yes, maybe we can check the "write", but I am afraid the "undef" stack-value, it may just has read at first use.

Current stack-coloring code didn't distinguish the read/write status for the stack-mem operand in MI,
please refer line 611, "InterestingSlots.test(Slot)" mainly check if the stack-slot guard with lifetime.start/end, and
the applyFirstUse(Slot) is mainly check the LifetimeStartOnFirstUse option.

I'll update the test, thank you a lot!

xiangzhangllvm marked an inline comment as done.Oct 26 2020, 11:24 PM
xiangzhangllvm added inline comments.
llvm/lib/CodeGen/StackColoring.cpp
711

@clin1 Thx for your hint!
Now this patch first record the "write Solt" into StoreSlots, and collect the "load Slot" in catchpad.
After doing that, check "write Solt" and "catchpad load Slot", then put it into ConservativeSlots or not.

xiangzhangllvm edited the summary of this revision. (Show Details)Oct 26 2020, 11:26 PM
xiangzhangllvm edited the summary of this revision. (Show Details)
LuoYuanke added inline comments.Oct 26 2020, 11:40 PM
llvm/test/CodeGen/X86/StackColoring-first-use-in-catch.mir
7

Since you have an .cpp or .ll to duplicate the bug, do you still need .mir test case?

llvm/test/CodeGen/X86/StackColoring-first-use-in-catch.mir
7

Yes, .mir test is better, I used "-run-pass=stack-coloring" here.
it let us just test the "stack-coloring" changing, not affected by other optimizations.

thanm added a comment.Oct 27 2020, 5:26 AM

In your C++ test case, you write this:

// "exp" is written by the OS when the "throw" occurs.

however when I compile the testcase as instructed I see the following entry block:

define dso_local i32 @main() local_unnamed_addr #1 personality i32 (...)* @__CxxFrameHandler3 {
entry:
  %tmp.i.i = alloca i32, align 4
  %o.i = alloca %struct.object, align 4
  %exp.i = alloca i32*, align 4
  %0 = bitcast i32** %exp.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0)
  %1 = bitcast %struct.object* %o.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1) #3
  %i.i.i = getelementptr inbounds %struct.object, %struct.object* %o.i, i32 0, i32 0
  store i32 1, i32* %i.i.i, align 4
  %2 = bitcast i32* %tmp.i.i to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %2)
  store i32 999, i32* %tmp.i.i, align 4
  invoke void @_CxxThrowException(i8* nonnull %2, %eh.ThrowInfo* nonnull @_TI1H) #4
          to label %.noexc.i unwind label %ehcleanup.i

If as you say the OS writes to "%exp.i", then this should be reflected in the IR in the entry block -- the call/invoke should include a reference to the thing that potentially gets written. Otherwise it's hard to see how optimization passes can safely work on the IR.

Suppose this was a C program and the code looked like

int myfunction() {
  char b[128];
  somefunction();
  ... 
  b[0] = ...
}

You are effectively saying "When the call to 'somefunction' happens, the OS may write to 'b'", even though the address of 'b' is not used until much later in the function.

This doesn't make sense -- what would be better is to have the IR reflect realitty, e.g. that the address in question has effectively escaped, and thus it can be written at calls, etc.

Unlike linux, usually by @__cxa_begin_catch ,windows Exception didn't has a suitable port to reflect how it change the catch value, it is more like a "ABI" action, where the throw impliedly write and where the catch block get it.
I think the early designer do this mainly because the catch value (e.g. type** exp.i), is only used in adjacent catch block.

First, I thanks for your careful reviewing.
The windows Exception model has work well for many years, I tend not to change it. If we carefully watch the catch values (e.g. Type * *exp.i), we will find that they usually have a very long life range guarded by lifetime.start and lifetime.end, (usually almost equal with the whole function life range).
This seems that the front/mid ends don't want to do much optimization on this special mechanism, because in their eyes, the back-end optimization should based on the lifetime.start and lifetime.end they generated.
And if we reflected it in the IR, it will bring much complexity into current clear mid end IR, because every function in try block (may contain branch) need to reflected that it may change the catch values (e.g. Type * *exp.i).
I also find that, current "-stackcoloring-lifetime-start-on-first-use" didn't handle the multi-lifetime.start/end case, catch value may has multi-first use. it is make sense to handle the corner case of catch variable in this option.
What's more, current patch is simple and clear, I think it is the most probably easiest way to fix this bug.

Hello @thanm, I think you care about this patch/bug, do you have additional comments? If you feel my idea is reasonable, could you help accept this patch, thank you again!

thanm added a comment.Nov 2 2020, 8:09 AM

If updating the front end is out of the question, then I suppose your approach makes sense.

I'll send some additional comments.

thanm added a comment.Nov 2 2020, 8:21 AM

In your commit message. you refer to "Conservatively merge &(&Variable) for catch(Variable)" -- this seems like a problematic choice of words/terminology. StackColoring operates on slots, not on addresses of slots.

Also "there is a potion" => "there is an option"

llvm/lib/CodeGen/StackColoring.cpp
437

"for which may write memory" is awkward wording. How about "Record the FI slots referenced by a 'may write to memory' instruction?

696

I think it would be better to append this comment to the large block at the start of the file (from lines 103-375), since that's currently the place for discussions about how the algorithm works and the various limitations. You can then refer back up to it here.

708

This doesn't make sense to me. ConservativeSlots contains slots, not addresses of slots -- I think it would be clearer just to say "we put X into ConservativeSlots".

xiangzhangllvm retitled this revision from [StackColoring] Conservatively merge Variable in catch(Variable) to [StackColoring] Conservatively merge point pV = &(&Variable) in catch(Variable).
xiangzhangllvm edited the summary of this revision. (Show Details)
xiangzhangllvm marked 5 inline comments as done.

Thanks for your review !

thanm accepted this revision.Nov 3 2020, 12:40 PM

LGTM. Is there a bug number associated with this?

This revision is now accepted and ready to land.Nov 3 2020, 12:40 PM

LGTM. Is there a bug number associated with this?

Thank you! There is no bug number, did you mean I need to put it on Bugzilla ?

I created a bug ID (48064) for it https://bugs.llvm.org/show_bug.cgi?id=48064

Then you can change your test case name to pr48064.mir

Then you can change your test case name to pr48064.mir

OK, good, I'll change it in committing.

Good to see the problem is solved. Thank you! 😊