This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Sink pure instructions down to return and unreachable blocks
ClosedPublic

Authored by mkazantsev on May 18 2020, 6:23 AM.

Details

Summary

If the only user of Instr is in a return or unreachable block, we can
sink Instr to the`User` safely (unless it reads/writes memory).
Return or unreachable blocks are guaranteed to execute zero
or one time, and Instr always dominates User, so they either will
be executed together (execution of User always implies execution
of Instr) or not executed at all.

Diff Detail

Event Timeline

mkazantsev created this revision.May 18 2020, 6:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2020, 6:23 AM
jdoerfert requested changes to this revision.May 18 2020, 8:16 AM
jdoerfert added a subscriber: jdoerfert.
jdoerfert added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3458

This is not correct for things that throw, things that synchronize, e.g., warp shuffles on GPUs, (and things that loop forever once we have a way to disable forward propagation guarantee).

Please add a readnone call test to expose the above. You can probably ask I->isSafeToRemove() && !I->mayReadFromMemory() to determine if it can be moved. Effectively, we might actually remove it from a path that does not end in the user instruction. If the path ends in the user instruction, the fact that we could remove it does mean the value is not interfering with something else. It also doesn't read memory, so we are good.

We still have to fix isSafeToRemove wrt. the syncs and endless loops I mentioned above but that is a separate issue.


Nit: Dominance relation broken?

This revision now requires changes to proceed.May 18 2020, 8:16 AM
mkazantsev marked an inline comment as done.May 18 2020, 9:01 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3458

This is not correct for things that throw, things that synchronize, e.g., warp shuffles on GPUs, (and things that loop forever once we have a way to disable forward propagation guarantee).

These things are checked inside TryToSinkInstruction. It only sinks non-side-effecting things.

mkazantsev updated this revision to Diff 264781.EditedMay 18 2020, 9:09 PM

Fixed dominance check message. As for side-effecting, non-finishing etc. instructions, all comment applies to existing sinking as well, and this check is being done inside of TryToSinkInstruction.

mkazantsev marked an inline comment as done.May 18 2020, 9:10 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3453

This can actually be reduced to mayReadMemory, but writes will be rejected in TryToSinkInstruction anyways.

mkazantsev marked an inline comment as done.May 18 2020, 9:13 PM
jdoerfert added inline comments.May 18 2020, 9:27 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3453

I see. I think we should adjust the comment.

Maybe we should not test the instruction at all and instead extend the TryToSink logic.
TryToSink does the test side-effect test already and it can also deal with instructions reading memory, though only when sink to the successor. If this logic is in there we know what tests to apply and which ones not. We also have a clearer path forward for extensions. Maybe I miss a reason to keep the logic here?

3458

Agreed on the side-effects (which include throwing). The shuffles are a separate mess that is not part of this patch. looping forever is not yet a well defined semantic but UB so we are also good on that front. After reading the comment I was not looking into the TryToSink.

Move memory read check to where it truly belongs - inside TryToSinkInstruction.

mkazantsev marked an inline comment as done.May 18 2020, 9:33 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3453

You are right, this check should be in TryToSinkInstruction. I just moved it there.

Fixed typo in block comparison.

mkazantsev marked an inline comment as done.May 18 2020, 9:38 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3458

Actually I'm not sure about "looping forever", it is UB in C++ but I don't remember if LLVM has any conclusive answer whether it is UB in it or not. Anyways, this problem with infinite looping exists in current code as well.

Thanks for the changes and explanations so far! I added another comment.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

I first thought:
This is more restrictive than you want it to be.
DestBlock != I->getParent()->getUniqueSuccessor is what u want.
(There is even a less strict restriction but that is more complicated.)
That allows DestBlock to have multiple predecessors (maybe add at test).

Then I realized this is also the test outside. Though, I still think the above is what we want here.

3458

The "de-facto" answer for now is we assume forward progress guarantees (in various places). The full story is more complicated but hopefully we'll clean it up with an attribute soon. Anyway, not relevant here ;)


You can also do: Terminator->getNumSuccessors() == 0;

mkazantsev marked an inline comment as done.May 18 2020, 10:30 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

This is not true. We don't want to sink from A to C in case:

A    B
 \  /
  C

because BC may be a hot path and A is "nearly never executed" block.

mkazantsev marked an inline comment as done.May 18 2020, 10:35 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

Maybe the example above will be more clear like this:

A
| 
C<-|
|  |
B--|

Though, we want to sink from A to either B or C in case

  A
 /  \
B    C
mkazantsev marked an inline comment as done.May 18 2020, 10:51 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3458

You can also do: Terminator->getNumSuccessors() == 0;

I think it's true, but I'd rather keep it as is to make it absolutely obvious what we are doing.

jdoerfert added inline comments.May 18 2020, 11:04 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

(I guess I should not look at code tired. Since I'm still doing it, please continue to be patient with me.)

The first version of this was not checking the relation, right?

Now, could we check the UserInst to be not a PHI to avoid the problematic cases and allow

 [I]
 / \
[] []
 \ /
 [ = .. I]
`

Do we want that?

mkazantsev marked an inline comment as done.May 18 2020, 11:16 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

UserInst being a Phi is a special case. General rule for InstCombine is that it does not increase number of instructions (unless it gives some clear benefits). If user in the block below is a phi, we cannot sink I without duplication. We are not doing this.

Do we want that?

For loads - we don't, and the only reason of this is that [] is a potentially big piece of code that needs a full-scale alias analysis. InstCombine is supposed to be lightweight and it's not doing that.

For not loads - my patch will handle this (supposed that user block ends with return).

mkazantsev marked an inline comment as done.May 18 2020, 11:17 PM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

Actually, if I is a Phi, and user block does not have other inputs except for those that use I - maybe we could sink, but it's going FAAAR beyond the scope of this patch. :)

If the two tests I mentioned are added and work as expected, I think this is OK. I'm a little worried we move in one case where we shouldn't without the PHI user logic I describe below.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

The idea was that if the unique user of I (in the return block) is *not* a PHI, I must dominate the user and therefore it is executed at least as often as the user. If that is true, and I does not read memory (among other things) it is beneficial to move it regardless of the unique predecessor thing. If you wanted to prevent that, wouldn't you need the predecessor check for all instructions, not only the ones that may read, in order to *prevent moving* what I describe above. Asked differently, is %a below moved? (It is not right now: https://godbolt.org/z/Y5BxGQ)

bb0:
  %a = add i32 %arg0, 1
  br i1 %c, label %bb1, label %bb2
bb1:
  br label %bb3
bb2:
  br label %bb3
bb3:
  %p = phi i32 [0, %bb1], [1, %bb2]
  %r = add i32 %p, %a
  ret i32 %r

I'd say we want the movement but we need a test.


We should also have this as a test: https://godbolt.org/z/U8KrtS
It shows the argument you made earlier, don't sink into more often executed block.

mkazantsev marked an inline comment as done.May 19 2020, 7:52 AM
mkazantsev added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3309

Ok, I will add these tests.

mkazantsev marked an inline comment as done.

Added tests.

FWIW, the added tests look good.

I don't really see how the case of not sinking from A when A and B blocks merge in C can happen, if this applies for non-memory instructions only. But if it can, having a test for this would be good too.

FWIW, the added tests look good.

I don't really see how the case of not sinking from A when A and B blocks merge in C can happen, if this applies for non-memory instructions only. But if it can, having a test for this would be good too.

The only case where this may theoretically happen is shown in test_05_neg. The Phi in return block bb3 initiates sinking from bb0 to bb2, but it gets rejected because bb2 has 2 preds. Another example of that (with non-phi instruction in ret block) is just not possible in SSA.

FWIW, the added tests look good.

I don't really see how the case of not sinking from A when A and B blocks merge in C can happen, if this applies for non-memory instructions only. But if it can, having a test for this would be good too.

The only case where this may theoretically happen is shown in test_05_neg. The Phi in return block bb3 initiates sinking from bb0 to bb2, but it gets rejected because bb2 has 2 preds. Another example of that (with non-phi instruction in ret block) is just not possible in SSA.

Right! I could not think of a case without a loop. Thank you for confirming!
This lgtm. Leaving the final review to @jdoerfert.

jdoerfert accepted this revision.May 21 2020, 7:16 PM

LGTM. Thanks!

This revision is now accepted and ready to land.May 21 2020, 7:16 PM
This revision was automatically updated to reflect the committed changes.

Hi @mkazantsev ,
Linaro benchmarking CI flagged this patch as increases code-size of SPEC2k6's 401.bzip2 by 3% on ARM (Thumb2 mode) and by 5% on AArch64. This happens at -Os -flto.

Would you please check if this triggers a corner-case in your patch or something else that we can easily fix?

Thanks!