This is an archive of the discontinued LLVM Phabricator instance.

Preliminary: enable "spill on exception path" and "spill on normal path" RS4GC options
AbandonedPublic

Authored by JosephTremoulet on Jan 15 2016, 11:58 AM.

Details

Reviewers
reames
chenli
Summary

Option -rs4gc-spill-on-exceptional-path should use spills/fills rather than gc.relocates on exceptional paths out of a statepoint
Option -rs4gc-spill-on-normal-path should do the same for the normal path

Also included are some changes to deal with joins and PHIs that we'll need to support catchpad-style EH

It's not functional yet, but I think it makes sense to share early so we can have contextualized discussions about where it should go and who should implement what.

Diff Detail

Event Timeline

JosephTremoulet retitled this revision from to Preliminary: enable "spill on exception path" and "spill on normal path" RS4GC options.
JosephTremoulet updated this object.
JosephTremoulet added reviewers: reames, chenli.

You can probably ignore the changes in Instructions.h -- I'll split that out and submit upstream before the other pieces land. It just gives us a way to write loops in RS4GC that operate on all an invoke's potential destinations rather than assuming it has a single landingpad.

I've added several inline comments to explain my thinking. One thing that I haven't given any real thought to is whether this has implications for how we compute what the live sets are. I know at least for the PHI case that if a value is only live because of a use in a phi in the catchpad, we don't actually need to spill that value for its own sake because we'll be separately spilling it for the PHI. Not sure if there are other issues along those lines for the non-PHI cases.

include/llvm/IR/Statepoint.h
334

I have the code structured to use a flag to decide if it should spill on exceptional paths or not, which is independent of anything else. Of course it's not actually valid to use relocates on catchpad EH because we can't split those predecessors, so I've put assertions in the places that would fall over if the invalid combo is attempted.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
217

I needed some maps that persist as we process each statepoint, but should still be opaque details from the caller's perspective, so I've added a RecordSet type that the caller can allocate and it contains the record vector and the persistent maps.

1571

We want the allocas that we add to hold the spilled pointers to appear to have their contents modified at the statepoints across which they carry a value, and also to be reported as live at those statepoints. I was thinking that simply adding the allocas to the statepoints gc args would accomplish that, but the assertion failures I'm currently running into make me think I may have been wrong about that. This loop is building up the set of allocas and stuffing them into the gc args.

1585

The following two loops insert stores to the spill slots we generate. As you can see, I used the naïve before-each-invoke store placement strategy for both SSA values live across the statepoint and for the stores to the slots that are used to eliminate PHIs. I'm using the naïve placement of those stores (before each invoke) because doing any better for the PHI case requires checking for interferences and I wanted a base working implementation first. However, it's occurred to me that for the non-PHI cases (the loop over LiveVariables here), each slot corresponds exactly to one SSA value, so we wouldn't need any interference analysis to know it's legal to put a single store to that slot immediately after the value is defined, as opposed to putting a store before every invoke it is live across. We'd need another persistent set in the RecordSet to keep track of this so we don't insert multiple stores for the same value. Since you guys don't need to worry about the PHI case, that might be an interesting option for you to pursue.

1648

This is just another place where processing would fail if we tried to use the relocate mechanism on catchpad EH where the pads may be joins.

1700

PHI loads intentionally omitted here because I'm still splitting critical "normal dest" edges, so there can't be PHIs.

2405

Here I was thinking that if you're spilling for the exceptional path anyway, you may as well leave the CFG alone and allow PHIs at the invokes. Which of course contradicts what I said above about how you'll never have PHIs and so can place your stores immediately after their defs. So maybe this should have a separate control bit dictating it, or just always normalize invokes that target landingpads, or something along those lines.

2578

Obviously it was totally bogus of me to disable this assert and maybe if I stop to understand what's broken here I'll fix whatever is also causing the next assert that I'm running into. I was just trying to see if I could see what other issues were lurking behind this.

Ok, I migrated my work-so-far over to master and pushed up a diff to Phabricator. Happy to hear your thoughts when you've had a chance to look at it, regarding both direction and coordination of work going forward. As for the latter, my only thoughts are

  1. There's the whole business with the PHIs and the invoke dest iterator that's interesting for me and not you. I definitely intend to introduce the iterator as a separate and prior change (though I was thinking to post it at the same time as posting this stuff so that in its review I could point to a use case). I suppose the PHI stuff could be split out and pushed up as a later change.
  2. As I describe in some of the inline comments below, if you don't have PHIs at your landingpads you can spill at defs instead of spilling at invokes. That's probably more interesting to you than it is to me since your project is mature enough to care about code quality already (I care enough to do it eventually, but would probably hold off and do it as a subsequent improvement), so if you're eager to do it knock yourself out.

My immediate plans are to (deal with the 3.8-blocker bug that just came my way and) keep trying to get it running correctly on a simple unit test.

Thanks
-Joseph

reames added inline comments.Jan 15 2016, 6:20 PM
include/llvm/IR/Statepoint.h
334

This seems reasonable as a migration step. I suspect if we go down this path, we should delete all the landing pad special casing here and in the lowering code.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
182

The use of these typedefs might not be a good idea any more. It might be worth looking at whether appropriate use of auto makes them redundant. I introduced them when changing the underlying data structures and never removed them.

211

This may be better as a DenseMap<BasicBlock*, SmallVector<Value*>>

217

I think this is probably over abstraction, but we'll see. Haven't even finished reading through the code yet. :)

1578

I don't think you need ArgBacking containing the LiveVars iff SpillOnNormalPath is true.

p.s. ArgBacking is a confusing name..

1587

Any reason to not insert all the allocas upfront given we know all the values live at any safepoint in the entire function?

1596

I'm confused: Why are you manually updating PHIs at all? Inserting the naive stores/loads on for the relocation path and reusing the same strategy as relocaViaAlloc would seem to get you all of this for free. In fact, we could even share most of the code in question.

1636

I'm really not clear why you're needing to do any transative walks here. I think this is confusing due to the same question as just above.

2405

This should just be removed. If we don't need a distinct landing pad any more because we have a single spill slot used by all the invokes reaching it, this code becomes pointless.

2580

I was really really expecting to see changes in what relocationViaAlloca did and expected. The fact the API stayed roughly the same seems surprising.

What I was expecting was:

  1. We assign spill slots globally for a single SSA value.
  2. We insert explicit spills before each statepoint if we're spilling in either path.
  3. We insert both gc.relocates and fills (depending on options).
  4. We use the PromoteMemToReg hack to convert all uses of the original value (including the new ones we introduced), into SSA. AH! The problem is the Alloca's introduced are no longer fully promoteable. I missed that detail originally.

Hm, if we can't rely on PMToReg to solve the general SSA construction problem for us, this becomes a lot more annoying than I'd realized. Still probably the right approach, but the complexity in your patch suddenly makes a lot more sense.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1578

Funny, I'd written it that way originally, then later convinced myself it should be this way. Is the idea that the statepoint doesn't need to bother with the original values when we're spilling on all paths because those values aren't live over the statepoint anymore (but instead have uses at the spills)? I'd buy that...

And yes, it was a struggle to name "ArgBacking". The idea was that it's "thing that GCArgs might need to be a ref to if it can't just be a ref to LiveVariables". I'm happy for better suggestions.

1587

No reason, that sounds better, it just hadn't occurred to me.

1596

What makes you think I'm updating PHIs? This is the code that's inserting the naïve stores/loads.

1636

If I've got something like

try {
  code;
  try {
    code;
    try {
      invoke();
    } catch (A) { ...}
      catch (B) {...}
  code;
  } catch (X) { ...}
} finally { ... }

Then the invoke's unwind dest is a "catchswitch", which starts a block and is also a terminator, having successors for catch(A) and catch(B) and yet another "catchswitch". The second catchswitch's successors are catch(X) and the finally. This code needs to visit catch(A) and catch(B) and catch(X) and the finally, which are found by transitively following unwind edges in the CFG.

1918

btw, this bit is the tie-in with the mem2reg thing -- this goes and inserts a store to a new alloca after each reload I inserted above, and the subsequent mem2reg picks up those stores along with any reloc/remat stores for the same ssa value when it rewrites it in SSA.

2405

Supposing that we get to the point that we have intelligent spill placement, I agree with you entirely.

In the meantime, there's a tradeoff to consider:

on the one hand, you could remove this code and use the naïve spill placement and be happy that you've got less codepaths here and that you're more in line with the end goal

on the other hand, you could keep this code, and your naïve spill placement could be "for each SSA value that's live across any statepoints, insert a single store immediately after the def" instead of what I'm stuck with which is "for each SSA value that's live across any statepoints, insert a store for it immediately after the def *and also another store immediately before each statepoint whose landingpad uses it in a PHI*

I'm stuck with the second, so have no stake in which path you want to go down here. I just wanted to make sure you're considering the tradeoff. FWIW, if I were in your shoes, my instinct would be to keep the edge splitting so that I could have less horrible naïve spill placement.

2580

By the end of your comment it sounds like we're on the same page, but just to make sure:

The code here is doing (or at least intending to do) all of 1-3 (my superfluous use of memoization for #1 where a pre-pass would work aside).

We don't want MemToReg to do anything with the new allocas because we want them to stay memory. Running it on the new allocas would exactly put back the SSA values and PHIs that we're trying to spill.

But you've reminded me that we do in fact have the inverse utility in the Reg2Mem pass...

OK, I've just spent the last 10 minutes convincing myself that we just want a bunch of calls to DemoteRegToStack and DemotePHIToStack, only to subsequently unconvinced myself (we want loads right after each landingpad, not at all uses). Hmm, maybe DemotePHIToStack does the right (naïve) thing for the PHIs we may be wanting to eliminate (though I'd have to extend it to handle PHIs on catchpads), so we want that for PHIs and we want our own load/store placement for other things we're spilling...
Curious what your take is.

reames edited edge metadata.Jan 22 2016, 10:36 AM

Given this is into brainstorming, would a skype call be better/higher bandwidth?

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1578

Something as simple as "GCArgs" would be a bit more clear. Another option might to track spill slots and explicit relocations separately, then only combine them when actually inserting into the statepoint.

1596

Two bits: one, you're looking through more the immediate successors, and two, you're tracking incoming values from phis. I'm not clear why you need to do that. I thought the entire idea was that you wanted all invokes leading to a shared unwind to share a spill slot for those values. Once that's true, you should need an unconditional reload, and *maybe* to replace a few phis with the new reload. Or am I missing something?

1636

Ah, okay. This is about dealing with MSVC exception handling, not the shared alloca bit. :) That makes a lot more sense now. When you're ready for actual review, I'm definitely going to have you separate the MSVC specific bits first, then follow with the spilling change.

1918

Just to make sure I'm clear, we now have *two* sets of a allocas? One used purely for rewriting, the other the "real" ones that get left? That makes more sense, though I didn't get that from the code on first read through.

Minor: You should change the name of the function if you're going to reuse it in a different way.

2405

I'm not sure I'm following what you're saying at all. The code we're commenting on normalizes invokes to ensure both normal and return paths has a single predecessor.

I thought we were running with the idea that values along the exception edge were always going to be spilled in rs4gc? If so, then we're going to have a single reload in the unwind path for all incoming invokes and one in the normal path. Spill wise, we'll have one store inserted for the exceptional path (at the def you said?), and one store inserted by the current lowering for the normal path (in the incoming block). I don't see how normalizing the exception path or not will matter.

2580

I don't think using DemoteRegToStack is going to be the right approach. For one thing, the current implementation appears to assume it can split the critical edge of the invoke to the landingpad which is exactly what you don't want. DemotePHIToStack might be a useful building block.

Interestingly, if I'd know about that utility originally, the existing relocViaAlloca code probable could have been expressed as "demote value to stack, insert additional relocation stores, promote to reg",

I'm more and more thinking we're trying too hard to solve this within the existing Mem2Reg framework. The entire point of that was to save effort, and it doesn't appear to be doing so.

Might it make sense to switch the existing code over the SSAUpdater as we discussed in the email this morning? Once we'd done that, we can get rid of one set of allocas entirely.

Thanks for the feedback. I'm comfortable iterating with this via Phab and supplementing that with using some of our bi-weekly meeting time to talk, but if you'd prefer to discuss over Skype I'm happy to schedule something.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
1596

I think this is another catchpad-ism surprising you. I need to walk through an arbitrary number of blocks which are unsplittable in the sense that I can't put any code in them and they can't have anything other than PHIs (and catchswitches) in them. Yes, I'm putting a spill just before each invoke for each PHI in the EH dispatch code, and an unconditional load from the spill at the top of each splittable successor.

1636

It'll have to be the other order because there's no way to represent the MSVC stuff with gc.relocate, but sure I can separate into two patches.

1918

Right. The code I'm adding builds up a set of "real" allocas that we want to stay and through which we spill whatever we need to not be enregistered on whichever paths. Subsequently we have code where we'd like to point SSAUpdater at each relocate/remat/load-from-"real"-alloca, but instead today we create throwaway allocas and put a store at each relocate/remat/load-from-"real"-alloca. I wasn't trying to coalesce the one type of spill slot with the other at all, and in fact in my head I have to pretend we're not using allocas for that second part and just think of it as SsaUpdater, else I get terribly confused...

So e.g. whenever I insert one of those unconditional loads from a "real" alloca at the top of an EH pad, it's immediately followed by a store to one of the "throwaway" allocas (until Mem2Reg removes the store).

2405

we'll have one store inserted for the exceptional path (at the def you said?)

Whether it's at the def or not is the key point here. If we split critical edges, then yes you could always just put it at the def. But if we don't split critical edges and therefore allow PHIs at landingpads that we erase by spilling, then since all the values feeding any one PHI have to share the same spill slot, we're doing coalescing and have to check for interferences. To make it concrete with an example:

start:
  %1 = _
  %2 = _
  br i1 _, label %left, label %right
left:
  invoke @_
    to label _ unwind label %pad
right:
  invoke @_
    to label _ unwind label %pad
pad:
  %phi = PHI ty [ %1, %left ], [ %2, %right ]
  _ = foopad ...

then you have to allocate a spill slot for %phi, and you have to spill both %1 and %2 to that spill slot, and if you put those stores at the defs of %1 and %2, the store of %2 will overwrite the store of %1, so you'll get incorrect behavior on the path through %left.

So, ok, what I'm doing to make things correct in the face of PHIs is putting stores at the tails of the predecessors -- i.e. right before the invokes. In the example above, that's a store of %1 to the slot for %phi in %left, and a store of %2 to the slot for %phi in %right, which is what you want.

But if we switch to a more typical example of what EH code looks like (at least in my experience):

  %1 = _
  invoke @callee1, ...
    to label %cont1 unwind label %pad
cont1:
  ...
  invoke @callee2, ...
    to label %cont2 unwind label %pad
cont2:
  invoke @callee3, ...
    to label %cont3 unwind label %pad
...
cont_n:
  %2 = _
  invoke @callee_n+1, ...
    to label %cont_n+1 unwind label %pad
cont_n+1:
  invoke @callee_n+2, ...
    to label %cont_n+2 unwind label %pad
...
pad:
  %phi = PHI ty [ %1, %start ], [ %1, %cont1 ], [ %1, %cont2], ..., [ %1, %cont_n-1], [ %2, %cont_n], [ %2, %cont_n+1], ...
  _ = foopad
  ...

then we're putting redundant stores before a lot of invokes. Smarter spill placement can of course figure this out, and we'll want smart spill placement at the end of the day one way or another, but it's not exactly trivial and not going to be part of the code on day 1.

So the question is just if you want to avoid all those redundant stores, in the meantime before we have smart spill placement, by "cheating" and leaving this normalization code here, in which case we'd split all the exception edges and being naïve would mean you'd get a ton of loads, but at least the loads would be on the exception path, and maybe we already have a backend tail merge optimization that would clean them up for you?

2580

I don't think using DemoteRegToStack is going to be the right approach ... DemotePHIToStack might be a useful building block

Yeah, I've talked myself back out of it since posting. DemotePHIToStack might match where we'll be putting "real" stores and loads for landingpad PHIs, but it's not something that has a really nice extension to WinEH so I don't think it makes sense to use there, and neither do I think it makes sense for this code to have two different spilling mechanisms depending what kind of EH pad it sees.

Might it make sense to switch the existing code over the SSAUpdater as we discussed in the email this morning?

Yes, I think that would definitely make it easier to follow what's going on (though from my point of view it would make the current ToT code easier to read too, and is orthogonal to this). Since I'm still in a "bring up basic correctness" phase, I'm more interested with the bits here and would be inclined to defer switching to SSAUpdater until later, but if you think it's important to switch to SSAUpdater first you wouldn't have to twist my arm too hard, and of course if one of you wants to switch to SSAUpdater I'm more than happy to rebase these changes on top of that.

JosephTremoulet edited edge metadata.
  • rebase
  • drop temp use holder insertion, pushing liveness adjustments into liveness calculations
JosephTremoulet marked an inline comment as done.Jan 29 2016, 7:35 PM

I updated the diff just now because I just now got the changes to a point where they do something other than crash. Now they generate reasonable-looking spills/fills on a basic .ll test with catchpad EH. I have yet to verify whether the right thing happens w.r.t. reporting the spill slots as live when we subsequently generate the stackmaps.

The main thing that the update addresses is that the "insert temp placeholders after statepoints" strategy for updating liveness w.r.t. depot arguments and extended base pointer lifetimes is that it conflicts with allowing unsplit critical exception edges. Consider:

  br i1 _, label %left, label %right
left:
  invoke @_
    to label %left.cont unwind label %pad
left.cont:
  _ = <op> ty addrspace(1)* %x
right:
  invoke @_
    to label %right.cont unwind label %pad
right.cont:
  _ = <op> ty addrspace(1)* %y
pad:
 <code that doesn't use %x or %y>

We'd compute that %x (resp. %y) is live across the statepoint in %left (resp. %right), and go to put uses of their base pointers at the head of each successor, which puts uses of both %x's base and %y's base in %pad. Then we'd re-run liveness, and the uses of %x's base and %y's base would propagate not just back to the statepoints they are live across, but also out the other predecessor edge from the pad, making them live across each other's statepoints. There's an assertion that noticed and rejected this because the expectation is that the only lifetime extension that happens is that bases' lifetimes are extended down into regions where their derived pointers are live, but in this case the statepoint in %left (resp. %right) didn't have any uses derived from %y's base (resp. %x's base).

After much deliberation, I arrived at the conclusion that the best thing to do would be to push the two special liveness-processing rules down into the liveness-calculating code itself, which on the one hand is a shame to lose that separation of concerns but on the other hand I think worked out reasonably well. The two rules are:

  1. depot arguments need to be seen as live *across* the statepoints. Conveniently we can pick depot arguments out of their bundles and so explicitly add them in when computing what's live immediately after a statepoint
  2. base pointers need to be seen as live across any statepoint that one of their derived pointers is live across. For this I plumbed the DerivedToBase maps all the way down into the base liveness transfer function (opaquely through the intermediate functions) where it simply looks up the bases whenever it encounters a statepoint and adds them to the gens. I also have to explicitly add the bases in the routine that gets the live set after a statepoint to make them live through (and as I'm typing this I realize I forgot to add that part in the current code).

I'm thinking I'll need to split this out into several changes for actual review. I think conceptually stages along these lines would make reviewing easier:
1 - iterators for funclet EH successors (depends on nothing)
2 - allow spilling across statepoints on normal and/or exception paths (depends on nothing)
3 - replace the temp use holder insertion with explicit liveness computation adjustments (depends on nothing)
4 - stop splitting critical unwind edges out of invokes, allow PHIs in EH pads (depends on 2&3)
5 - support funclet EH and walking through catchswitches (depends on 1&4)

I haven't started yet sorting through how painful it would be to separate out those pieces, but fwiw my current thinking is that those are the logical pieces.

lib/Transforms/Scalar/RewriteStatepointsForGC.cpp
182

I just added a couple because I use them as parameters, which of course can't be auto. But yes your comment still applies to the others.

1224–1234

To get the right liveness gens, the walk needs to realize when it hits a statepoint and add in the bases. So here we build a map whose keys are the statepoint instructions. The values conceptually are really the sets of bases, but I didn't want to create and destroy a bunch of intermediate sets so the keys are actually pointers to the DerivedToBase maps.

2996

This is where I also need to explicitly add in the bases of the parameter statepoint.

  • add missing base pointer insertion
  • remove stale comment
JosephTremoulet marked an inline comment as done.Jan 29 2016, 8:10 PM

I notice that the spills manually inserted by RS4GC with these changes show up as "Direct" slots in the stackmaps, and that those spilled by lowering show up as "indirect". Looking at http://llvm.org/docs/StackMaps.html#stack-map-format, I can imagine either:

  1. that this is a problem and I need the new slots to be "Indirect" (since the GC pointers are in the slots, not pointing to the slots), or
  2. that this is fine because the runtime wants the slots reported to it (not merely the contents of the slots) so that it can update them

Are either of my conjectures correct? If they need to be reported as "indirect", any suggestions on how to implement that? Seems like annotating the allocas is a non-starter since metadata is droppable, which means we'd need to change the signature of the statepoint intrinsic?

(as far as LLILC is concerned, we only expect one sort of thing, so I could just as easily let the reporting fall out like it's doing and then just have the LLILC code ignore the direct/indirect distinction)

Thanks

I went to create the spill slots up-front. Looking at the code after doing that, I realized it makes it more readable to pull all of the spill/fill insertion into the new pass, so that its workings are akin to the rematerialization pass. So I did that.

I also took a stab at incorporating the feedback that I should change the name of insertRematerializationStores if I'm calling it in a new context. The best idea I had was to start using the term reconstitute to mean rematerialize or spill. So for any gc pointer live across a statepoint, we can "relocate" it via gc.relocate, or "rematerialize it" by copying its defining expression to after the statepoint, or "spill" it by storing it to a new alloca before the statepoint and loading it after, and the two options that aren't "relocate" are jointly "reconstitute". I updated some type/method names accordingly.

I also put in some TODOs for some things that we'll almost certainly want to do for CQ but that I think make sense to implement as follow-on changes:

  • smarter spill placement
  • distinguishing which edge(s) a live pointer is live out of and whether it's live just to be used in a successor phi, to avoid some superfluous spilling

At this point I think the code looks pretty much like I'd like it to look for check-in (barring whatever I discover needs changing as I write/run more tests and get feedback), so I'm going to focus on splitting this into smaller constituent changes that are easier to review, adding appropriate tests with each of them, and putting them up for "real" review, unless somebody objects.

cosmetic tidying

FYI, I've split this out into constituent changes (locally). I think at this point it makes sense to hold off uploading them individually to Phabricator until they're ready for real review (for fear I'd lose track of things otherwise), but I do have them pushed up to my GitHub fork for my own sake, so if someone happens to want to look at the changes that way, they are:

1 - rename a few "rematerialize"s to "reconstitute"s (split out to minimize noise) [no deps]
2 - enable spilling on normal and/or exception path [depends on 1]
3 - stop using temporary use holders [no deps]
4 - allow critical unwind edges when spilling on exception path [depends on 2 and 3]
5 - add iterators for "transitive" unwind destinations [no deps]
6 - support statepoints over funclet EH [depends on 4 and 5]

I'm currently working through tests (checking for / fixing regressions, adding new tests). The one main open question I still have is whether the spill slots need to be reported as "indirect" in the stack map, in which case I'll need to add a change that updates gc.statepoint's signature to accommodate that and rebase #2 on it.

JosephTremoulet abandoned this revision.Jun 14 2018, 1:18 PM