If the only user of Instr is in a return or unreachable block, we can
sink Instr to the`User` safely (unless it reads/writes memory).
Return or unreachable blocks are guaranteed to execute zero
or one time, and Instr always dominates User, so they either will
be executed together (execution of User always implies execution
of Instr) or not executed at all.
Details
Diff Detail
Event Timeline
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3453 | This is not correct for things that throw, things that synchronize, e.g., warp shuffles on GPUs, (and things that loop forever once we have a way to disable forward propagation guarantee). Please add a readnone call test to expose the above. You can probably ask I->isSafeToRemove() && !I->mayReadFromMemory() to determine if it can be moved. Effectively, we might actually remove it from a path that does not end in the user instruction. If the path ends in the user instruction, the fact that we could remove it does mean the value is not interfering with something else. It also doesn't read memory, so we are good. We still have to fix isSafeToRemove wrt. the syncs and endless loops I mentioned above but that is a separate issue. Nit: Dominance relation broken? |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3453 |
These things are checked inside TryToSinkInstruction. It only sinks non-side-effecting things. |
Fixed dominance check message. As for side-effecting, non-finishing etc. instructions, all comment applies to existing sinking as well, and this check is being done inside of TryToSinkInstruction.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3448 | This can actually be reduced to mayReadMemory, but writes will be rejected in TryToSinkInstruction anyways. |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3448 | I see. I think we should adjust the comment. Maybe we should not test the instruction at all and instead extend the TryToSink logic. | |
3453 | Agreed on the side-effects (which include throwing). The shuffles are a separate mess that is not part of this patch. looping forever is not yet a well defined semantic but UB so we are also good on that front. After reading the comment I was not looking into the TryToSink. |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3448 | You are right, this check should be in TryToSinkInstruction. I just moved it there. |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3453 | Actually I'm not sure about "looping forever", it is UB in C++ but I don't remember if LLVM has any conclusive answer whether it is UB in it or not. Anyways, this problem with infinite looping exists in current code as well. |
Thanks for the changes and explanations so far! I added another comment.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | I first thought: Then I realized this is also the test outside. Though, I still think the above is what we want here. | |
3453 | The "de-facto" answer for now is we assume forward progress guarantees (in various places). The full story is more complicated but hopefully we'll clean it up with an attribute soon. Anyway, not relevant here ;) You can also do: Terminator->getNumSuccessors() == 0; |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | This is not true. We don't want to sink from A to C in case: A B \ / C because BC may be a hot path and A is "nearly never executed" block. |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | Maybe the example above will be more clear like this: A | C<-| | | B--| Though, we want to sink from A to either B or C in case A / \ B C |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3453 |
I think it's true, but I'd rather keep it as is to make it absolutely obvious what we are doing. |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | (I guess I should not look at code tired. Since I'm still doing it, please continue to be patient with me.) The first version of this was not checking the relation, right? Now, could we check the UserInst to be not a PHI to avoid the problematic cases and allow [I] / \ [] [] \ / [ = .. I] ` Do we want that? |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | UserInst being a Phi is a special case. General rule for InstCombine is that it does not increase number of instructions (unless it gives some clear benefits). If user in the block below is a phi, we cannot sink I without duplication. We are not doing this.
For loads - we don't, and the only reason of this is that [] is a potentially big piece of code that needs a full-scale alias analysis. InstCombine is supposed to be lightweight and it's not doing that. For not loads - my patch will handle this (supposed that user block ends with return). |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | Actually, if I is a Phi, and user block does not have other inputs except for those that use I - maybe we could sink, but it's going FAAAR beyond the scope of this patch. :) |
If the two tests I mentioned are added and work as expected, I think this is OK. I'm a little worried we move in one case where we shouldn't without the PHI user logic I describe below.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | The idea was that if the unique user of I (in the return block) is *not* a PHI, I must dominate the user and therefore it is executed at least as often as the user. If that is true, and I does not read memory (among other things) it is beneficial to move it regardless of the unique predecessor thing. If you wanted to prevent that, wouldn't you need the predecessor check for all instructions, not only the ones that may read, in order to *prevent moving* what I describe above. Asked differently, is %a below moved? (It is not right now: https://godbolt.org/z/Y5BxGQ) bb0: %a = add i32 %arg0, 1 br i1 %c, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: %p = phi i32 [0, %bb1], [1, %bb2] %r = add i32 %p, %a ret i32 %r I'd say we want the movement but we need a test. We should also have this as a test: https://godbolt.org/z/U8KrtS |
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp | ||
---|---|---|
3309 | Ok, I will add these tests. |
FWIW, the added tests look good.
I don't really see how the case of not sinking from A when A and B blocks merge in C can happen, if this applies for non-memory instructions only. But if it can, having a test for this would be good too.
The only case where this may theoretically happen is shown in test_05_neg. The Phi in return block bb3 initiates sinking from bb0 to bb2, but it gets rejected because bb2 has 2 preds. Another example of that (with non-phi instruction in ret block) is just not possible in SSA.
Right! I could not think of a case without a loop. Thank you for confirming!
This lgtm. Leaving the final review to @jdoerfert.
Hi @mkazantsev ,
Linaro benchmarking CI flagged this patch as increases code-size of SPEC2k6's 401.bzip2 by 3% on ARM (Thumb2 mode) and by 5% on AArch64. This happens at -Os -flto.
Would you please check if this triggers a corner-case in your patch or something else that we can easily fix?
Thanks!
I first thought:
This is more restrictive than you want it to be.
DestBlock != I->getParent()->getUniqueSuccessor is what u want.
(There is even a less strict restriction but that is more complicated.)
That allows DestBlock to have multiple predecessors (maybe add at test).
Then I realized this is also the test outside. Though, I still think the above is what we want here.