The transform is somewhat involved, but the basic idea is simple: find
derived pointers that have been offset from the base pointer using gep
and replace the relocate of the derived pointer with a gep to the
relocated base pointer (with the same offset).
Details
Diff Detail
Event Timeline
Small nits inline.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
571 | This seems limiting, but correct; and a strict improvement over what we have. We could easily have a DenseMap<CallInst *, CallInst *> that we build in a first pass that maps derived gc relocates to base gc relocates; and write the rest of the algorithm in terms of that. | |
577 | Don't use dyn_cast<> unless you want to check if the cast succeeded or not. I think a cast<> is more appropriate here. Same applies to some later calls to dyn_cast. | |
600 | You've assumed that the gep index is a constant int here. Also, how about geps with multiple index operands? And vector geps? | |
603 | This can be IRBuilder<> Builder(II) I think. |
Lots of cleanup (both style and correctness) needed, but I like the direction. Please revise.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
557 | First, thank you for working on this! Second, I'm not entirely sure this is the right place for this. It's possible that having this in InstCombine (alternatively or possible also) might open up IR level optimizations based on comparisons of relocated pointers. Thoughts? (Also, this is not a reason to hold your current patch.) | |
563 | You can combine these two tests into one. | |
568 | Add a comment about why 2 is the magic number. | |
571 | Er, maybe I'm misreading Sanjoy's comment, but I think his first sentence is wrong. There is no guarantee that the first gc.relocate encountered is a base relocation, | |
577 | Take a look at Statepoint.h for useful utility routines here. | |
586 | Organizing this as visiting each of the uses of a statepoint, and then visiting each statepoint individually would make this strictly stronger. Also, vastly preferred from a readability perspective. | |
595 | Not yet. But if you wanted to write an InstCombine, I'd be happy to review it... |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
571 |
I agree. What I meant was that if the first gc.relocate is not a base relocation, then the worst case scenario is that we will miss the optimization, because that fact is checked later. |
Thanks for the quick reviews, sanjoy and reames!
This is the second cut. It doesn't address all the problems, but I
thought it's enough progress to show. I'll defer issues like where to
put this functionality until the patch is in good shape.
Let me know if this is not moving in the right direction.
Oh, oops: I just realized that a User of a statepoint may be a gc.result: I'll adjust that in the next iteration.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
571 | Is DenseMap<InstrinsicInst *, SmallVector<User *, 2>> more like what we want? |
Revision 3. I implemented a generalized algorithm that doesn't
restrict base relocation being the first one. However, insertion is
now a challenge: the out-of-order test fails. Hopefully, I should have
revision 4 by afternoon tomorrow with everything working; we can then
discuss more high-level questions.
(Hopefully) final revision. Feedback necessary to proceed.
The .insertAfter was a bit tricky, but I figured it out in the
end. Any number of gep arguments should work now, but is there a way
to call gc.relocate such that it returns a vector or struct?
Not that's been tried or tested yet. Patches welcome and it looks like you already started working on that?
Much improved and almost there. Lots of comments, one bug.
FYI, readability wise I had some trouble following the code. Once you break it up, I might spot more things.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
310 | You've got a whitespace problem here. Please fix indentation. Also, you may be invalidating the iterator you're using here. I'd strongly recommend collecting the statepoints into a SmallVector than processing them. (i.e. two pass) | |
313 | These changes should change the dom tree? Why is this needed? | |
566 | I'd suggest using whitespace (blank lines) to break up the logic junks. You might also want to split out a few static helpers. | |
590 | Your comment here confuses me. I think you mean that you'd have to insert the base relocate and then gep off that? | |
596 | Can't this become: | |
610 | Shouldn't the first part of this be impossible and thus an assert? | |
617 | Indentation! Are you using tabs? | |
617 | Early continue please, or if (auto Derived = ...) at least. You need to break up this function for readability. If you want specific suggestions, ask, but any reasonable division should do. | |
620 | I'm not sure doing this for *any* gep is a good idea. I'd suggest starting with a small constant offsets only. | |
624 | InBounds is wrong. out of bounds geps for relocates are explicitly legal. | |
626 | p.s. By this point in the code, I've completely lost track of what Target and Repl refer to. You need better variable names and to break this function up. |
Thanks for the detailed review! Expect the next iteration shortly.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
310 | Whitespace: Nobody is maintaining the Emacs modes in-tree, so I'm using my own. I forgot about indent-tabs-mode. SmallVector done. Why will the iterator get invalidated though? I'm not modifying the instruction in SimplifyOffsetableRelocate, am I? | |
313 | Removed. I thought I needed this because the dominator tree is changing (when we reorder the base relocate to come before the derived relocate)? | |
590 | That's a TODO then: spitting out a base relocate and gep'ing off that when there are enough derived relocate calls. | |
596 | Um, yeah. | |
610 | Cruft left from my previous iterations. Converted into assert now. | |
617 | Not sure what you mean: you want a if (!Derived) continue; so the user doesn't have to read till the end of the block to figure out that it's continuing? | |
620 | I've defined "small" as 20: so if there ConstantInt offsets, all less than 20, then gep. | |
624 | Okay, fixed. | |
626 | If my variable names are too long, I hit the 80-column limit; If they're too short, they aren't descriptive enough :| I'll figure something out. |
In response to reames's review. Looks much better with body split up
into functions. Everything seems to work as expected.
Response to comments, not yet looking at new version.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
310 | I did say *may*. After reading through your code, it was actually fine, but for clarity, I might do the two pass anyways. Your choice. | |
313 | Replacing instructions does not modify the cached dominator tree analysis. | |
617 | Less "read till the end" and more "not have to keep track of another nesting level" | |
626 | That is the constant tension. :) |
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
310 | Whitespace again. Do us both a favor and run this diff through clang-format. | |
543 | llvm style is computeBase... Make the param SmallVectorImpl so that it's size independent. | |
555 | This should be a cast since it's never expected to fail. | |
596 | I think GEPBase might be more clearly named "RelocatedBase". Also, "affect" is a slightly odd verb. Possibly: SimplifyOneRelocate? | |
598 | Please make MadeChange a normal return value rather than return by reference. | |
616 | I might write this as: continue; But this is minor. I'll defer to your preference. | |
618 | To more clearly seperate the legality and profitability, I might write this as: Your current version might be slightly faster, but it's IMO less clear. Minor. I'll defer to your preference. | |
625 | I'd suggest casting once on the previous line. | |
626 | I'm confused by the insertion placement games. Why isn't inserting the new GEP right after the relocation you're replacing okay? | |
676 | Giving names to Item.first and Item.second would make this code more clear. Temporary variables could be used. Creating a helper struct rather than a pair would also be fine. |
Expect a new iteration immediately.
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
543 | I see static bool SinkCast, static bool OptimizeNoopCopyExpression, static bool OptimizeCmpExpression, but I suppose it's good to follow the documented convention when writing new functions? simplifyOffsetableRelocate it is. Note: I can't do SmallVectorImpl inside the DenseMap. | |
596 | SimplifyOneRelocate is misleading because we're simplifying multiple derived-pointer relocates corresponding to the same base-pointer relocate. Perhaps SimplifyRelocatesOffABase? | |
618 | Get - GEP - Small - ConstantInteger - OffsetVector. I thought the name was clever and clear; my preference is not to change it. Also, I found a bug: it shouldn't return if it couldn't find a small constant-integer offset-vector; it should continue, so that other derived pointer relocations get a chance. | |
626 | Ha, this is fun. Consider: %ptr = getelementptr i32* %base, i32 15 %tok = call i32 (i1 ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_i1f(i1 ()* @return_i1, i32 0, i32 0, i32 0, i32* %base, i32* %ptr) %ptr-new = call i32* @llvm.experimental.gc.relocate.p0i32(i32 %tok, i32 4, i32 5) %base-new = call i32* @llvm.experimental.gc.relocate.p0i32(i32 %tok, i32 4, i32 4) %ret = load i32* %ptr-new Here, if I replace the %ptr-new line with a gep, against whom am I gep'ing? %base-new ofcourse, but that's on the next line! The problem is therefore to reorder instructions so that the new instruction comes after %base-new, no matter what the case. Makes sense? | |
676 | Hm? It's a pair because that's what you get when iterating over a DenseMap. I don't want to create new variables, so I've written comments about what Item.first and Item.second are, as a compromise. |
Address reames review. Most visible changes:
- Start function names with a lowercase letter.
- Use SmallVectorImpl in arguments, where possible.
- Run the thing through git-clang-format!
- Change some function and variable names.
LGTM. I assume you want me to commit on your behalf?
lib/CodeGen/CodeGenPrepare.cpp | ||
---|---|---|
617 | What's here is functional, and LGTM for submission. As a *separate* change, you could clarify the insertion logic slightly if you wanted to. I'd suggest either: OR find the last relocate in the statepoint sequence once, then pass that in as the insert point for all relocates you visit. Either makes it lot more clear what's going on. | |
test/Transforms/CodeGenPrepare/statepoint-relocate.ll | ||
24 | You might want to add a CHECK: gc.statepoint in between to make the grouping of GEPs more clear. Entirely optional. |
Thanks; I pushed it immediately because I had branches dependent on this. I'll get back to improving this soon'ish.
You've got a whitespace problem here. Please fix indentation.
Also, you may be invalidating the iterator you're using here. I'd strongly recommend collecting the statepoints into a SmallVector than processing them. (i.e. two pass)