Page MenuHomePhabricator

[InstCombine] Improve TryToSink for side-effecting calls that would be trivially dead
AbandonedPublic

Authored by anna on Sep 16 2021, 1:19 PM.

Details

Summary

This patch adds support to sink side-effecting calls that are legal to
sink to the successor use block. We can sink such calls as long as these
two conditions are satisfied:

  1. The instruction would be trivially dead (i.e. if there were no uses of the instruction, we can remove the instruction).
  2. There are no side-effecting instructions between the current one until the end of the block.

This helps with sinking allocations down to use blocks when safe to do
so (see added testcases).

Diff Detail

Event Timeline

anna created this revision.Sep 16 2021, 1:19 PM
anna requested review of this revision.Sep 16 2021, 1:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2021, 1:19 PM
mkazantsev added inline comments.Sep 23 2021, 11:08 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3667

What if I has implicit control flow, e.g. may call system exit?

anna added inline comments.Sep 27 2021, 6:42 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3667

So, after this transform, we would have some non side-effecting instructions that are executed before this I that may call system exit. Why would this matter?

Note that if we were moving this instruction past some side-effecting (example not marked willreturn), there is a problem. But that isn't the case here.

Skimming the code for this and the prerequisite patch, I'm a bit uncomfortable. It really feels like you're trying to a) do several things in one change, and b) possibly abusing an interface for a purpose it wasn't intended.

I would suggest moving the no-alias-call bit to a separate patch as that seems unrelated to the core functionality.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3718

Hm, I am not sure here, but I think you should check for potential reads here as well.

My reasoning is as follows. While the instruction we're moving might be trivially dead if there are no users, it may *not* be trivially dead if there are users. There may be some externally arranged invariant that (say) the result is stored with a volatile variable *if-and-only-if* the call actually has a side effect.

As an example, consider the following:
a = malloc()
stat = malloc_library_counter()
if (c)

free(a)

Basically, is it legal to sink the malloc in this case? I legitimately don't know what the right answer is here.

Another example might be:
a = malloc()
stat = malloc_library_counter()
if (c)

(volatile i8*)g = a;
reames added inline comments.Sep 28 2021, 9:57 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3718

JFYI, my potential cases just described are definitely disallowed by the function comment in the header which describes trivial-deadness. Unfortunately, so is malloc. As such, that comment is definitely wrong, and needs updated. :)

anna added a comment.Sep 30 2021, 8:54 AM

Skimming the code for this and the prerequisite patch, I'm a bit uncomfortable. It really feels like you're trying to a) do several things in one change, and b) possibly abusing an interface for a purpose it wasn't intended.

For others following along: we plan to discuss offline. Will summarize result.

I would suggest moving the no-alias-call bit to a separate patch as that seems unrelated to the core functionality.

Unfortunately, the no-alias bit is required for the malloc, since the Scan instruction *starts* from the instruction we try to sink (i.e. the malloc call) and we fail at mayWriteToMemory. We can frame this as possibly a CT win and land it separately.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3718

I'm not sure if trivial-deadness is the property used in DSE for removing the malloc, but the case above is handled by DSE even if we have an externally arranged invariant through malloc_library_counter for example:

cat simple.ll

declare noalias i8* @malloc(i32) willreturn
declare i32 @malloc_library_counter()
define void @test20A() {
  %m = call i8* @malloc(i32 24) 
   %stat = call i32 @malloc_library_counter()
   store i8 0, i8* %m
   ret void
}

` opt -basic-aa -dse -S simple.ll` :

define void @test20A() {

%stat = call i32 @malloc_library_counter()
ret void

}

anna planned changes to this revision.Oct 1 2021, 7:05 AM

Had an offline discussion with Philip. Plan to separate out two APIs showing the difference between "isTriviallyDeadOnAllPaths" versus "isTriviallyDeadOnSomePaths".

anna updated this revision to Diff 390089.Nov 26 2021, 9:49 AM

separated out API

anna updated this revision to Diff 391088.Dec 1 2021, 11:10 AM

rebased

nikic requested changes to this revision.Dec 1 2021, 11:36 AM
nikic added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3709

Comparing with the previous code block, the DestBlock->getUniquePredecessor() != I->getParent() check is missing. With an unreachable/return terminator, there may be intervening blocks. Please add a test for this.

This revision now requires changes to proceed.Dec 1 2021, 11:36 AM
nikic added inline comments.Dec 1 2021, 11:55 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3714

I find the reasoning in terms of mayHaveSideEffects() in this patch confusing. A "side effect" in LLVM can either be a memory write, unwinding or divergence. I think the side effects you have in mind here are only of the "memory write" variety. I'd prefer if this code was more explicit about the actual correctness requirements.

llvm/test/Transforms/InstCombine/sink_instruction.ll
235

This test doesn't make sense to me, at least as a test for the new functionality. It would already fold beforehand because @log() is marked side-effect free. I think you were trying to test the isMathLibCallNoop() case, but I don't think it is relevant to your patch, because it requires constant arguments -- in which case the result will be constant folded anyway and the call removed.

nikic added a comment.Dec 1 2021, 11:57 AM

Do you have a use case for this apart from sinking allocations? If not, I would recommend restricting this to allocations only, because the generalized reasoning seems pretty subtle to me, and I'm not sure in which cases (apart from allocations) it would help.

anna added a comment.Dec 1 2021, 12:29 PM

Do you have a use case for this apart from sinking allocations? If not, I would recommend restricting this to allocations only, because the generalized reasoning seems pretty subtle to me, and I'm not sure in which cases (apart from allocations) it would help.

To be honest, I ended up writing a sinking allocations test support, just so that we have this code upstream :)
In our downstream use case, we have support for additional "trivially dead" instructions in java world, such as object.hashCode and identityHashCode. These modify state, so they are not readnone, but they can be sunk to uses (which is beneficial especially if use is on cold path).

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3714

I did mean instructions that are side-effecting, not just the subset of "memory write". For an allocation, it may be enough to worry about "memory writes" only, but we're looking at the entire set of Instructions that are trivially dead on unused paths (see main comment about downstream use cases for example).

bool Instruction::mayHaveSideEffects() const {
    return mayWriteToMemory() || mayThrow() || !willReturn();
  }

Consider this:

  %x = call @A() <-- mayThrow
  %y = call @B() <-- mayThrow`
...

BB:
  use (%x)

If the original behaviour of the program was such that call to @A can throw an exception and we can handle the exception and call @B also throws an exception, by sinking the call @A we have changed the order of exceptions.
Or even the case if call @B does not return.

I am not sure why we are making it more general here by relaxing the restrictions?

llvm/test/Transforms/InstCombine/sink_instruction.ll
235

Yes, you're right. It is not relevant. Makes sense to land separately.
I do not have a way to showcase a test since there's no test which would trigger in this patch upstream. At least pretty much everything we already have triggers without this patch (either they are side-effect free or the instruction itself has no uses such as true guards and assumes).

Unfortunately, the only testcase which works with this patch is the alloc case, but there was a request to separate out from this patch because it looks slightly orthogonal (D114648).

reames added a comment.Dec 1 2021, 1:56 PM

Thinking about it a bit, I think there are some interesting cases where we can sink something with an analyzeable side effect. An example I came up with is a large function with out-params which are unused in the caller.

Here's an attempt at a C example:

int foo();

extern int test(int *out) __attribute__((noinline));
int test(int *out) {
  *out = foo();
  return foo();
}

int wrapper() {

  int notdead;
  if (test(&notdead))
    return 0;

  int dead;
  int tmp = test(&dead);
  if (notdead)
    return tmp;
  return foo();
}

Key points in the example:

  • foo is used to represent a large *no-throw* block of code with hard to IPO results.
  • We have a parameter which is only written. (Today, we fail to infer this fact from IR oddly. We should fix that.)
  • We have an unused temporary created to pass to said out-param.
  • We have two or more callers, some of which *do* use the out-param.

In the IR, we'd look for something like:

  • An argmemonly nothrow willreturn function, with one or more write only params.
  • A callsite to said function where the memory passed is an alloca used only by lifetime markers and the call.

Such a call site can be sunk to the dataflow uses without changing the visibility of the side effect.

To be careful, this comment is meant to motivate the basic idea, not any particular implementation detail of this patch.

anna added a comment.Dec 3 2021, 6:23 AM

Thinking about it a bit, I think there are some interesting cases where we can sink something with an analyzeable side effect. An example I came up with is a large function with out-params which are unused in the caller.

Thanks Philip. This is a definitely interesting example worth handling. Let me update the code to consider this use case and precommit some of the tests. The use case we have downstream (as mentioned is object.hashCode), but this example makes a good usecase for upstream IMO.

nikic added inline comments.Dec 3 2021, 11:54 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3714

If wouldInstructionBeTriviallyDeadOnUnusedPaths() is true, then you are basically promising that call @A cannot throw, or at least we're allowed to pretend it can't. Otherwise it would clearly be illegal to sink the call past control flow in any case.

Now, do we care about whether call @B throws? If we sink call @A past a call that throws or diverges, then side-effects from call @A may not occur, but the whole premise of the optimization is that they do not matter. You are already sinking the call past explicit control flow into a different block, so I don't see why sinking it past implicit control flow would be relevant.

anna added inline comments.Dec 7 2021, 7:27 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3714

If wouldInstructionBeTriviallyDeadOnUnusedPaths() is true, then you are basically promising that call @A cannot throw, or at least we're allowed to pretend it can't. Otherwise it would clearly be illegal to sink the call past control flow in any case.

Yes, that's true. I missed that while writing the example.

If we sink call @A past a call that throws or diverges, then side-effects from call @A may not occur, but the whole premise of the optimization is that they do not matter.

I had added the more general "no side-effect instructions in between" constraint so that we do not change the observable behaviour of the program. By sinking it past a throwable call, we are changing the observable behaviour of the program (but your point below clarified the position for me).

You are already sinking the call past explicit control flow into a different block, so I don't see why sinking it past implicit control flow would be relevant.

Good point. I think with this logic, if we sunk some "allocation calls" to a block that wasn't executed at runtime, we are already changing the observable behaviour of the program and that is fine, since the allocation isn't used.

Thinking through couple of examples:

Example 1:

  %x = call @A(i32 %d) <-- side-effecting because it writes to memory
  %y = call @A(i32 %d2)
...
  bb:
   use (%x)
   use(%y)

In this case, the memory-write variant is enough to make sure that we do not reorder the side-effecting calls in the bb block. After the transform we will (iteratively) sink both calls to @A to the use block without reordering the instructions

Example 2:

  %x = call @A(i32 %d) <-- side-effecting because it will not return
  %y = call @A(i32 %d2)
...
  bb:
   use (%x)
   use(%y)

In Example2, we already fail at wouldInstructionBeTriviallyDeadOnUnusedPaths since instruction does not return. Nothing will be sunk to use.

The last one is the "sinking past a throwable call":

   %x = call @A(i32 %d) <-- side-effecting because it writes to memory
   call void @B() <-- can throw (i.e. not marked nounwind)
   ..
bb:
   use (%x)

This is same as the case with explicit control flow:

  %x = call @A(i32 %d) <-- side-effecting because it writes to memory
  br i1 %false, label %fail, label %pass

fail:
  use(%x)

pass:
 ...

In short, I think you've convinced me with the reasoning for just writes to memory constraint. I will add a detailed comment explaining the reasoning (especially since it wasn't clear to me at first).

anna updated this revision to Diff 393155.Dec 9 2021, 7:06 AM

rewrote for outparam usecase in simplest form

anna added a comment.Dec 9 2021, 7:08 AM

Note that this is a rewrite of original patch to handle sinking of function call with outparam(s). I have also added allocation test cases to show that these are not affected by the patch.

nikic added a comment.Dec 13 2021, 3:15 AM

I see that you've added the writeonly argument handling to this patch. I would prefer to split that off into a separate one, because it's not directly related. Even if it means that this lands without any upstream test changes at first.

I think something that your reasoning on writeonly arguments may be missing is a capture by the call, in which case there might be an indirect read of the alloca later that is not covered by your analysis. Might not matter for the sinking as implemented here, but would make it illegal to drop the call.

anna added a comment.Dec 13 2021, 10:41 AM

I see that you've added the writeonly argument handling to this patch. I would prefer to split that off into a separate one, because it's not directly related. Even if it means that this lands without any upstream test changes at first.

I don't think we generally do this. I will need to add either the allocations or the paramcase soon-ish after this. But I see your point (since the change as-is does not affect any upstream code). Will wait if @reames has any concerns.

I think something that your reasoning on writeonly arguments may be missing is a capture by the call, in which case there might be an indirect read of the alloca later that is not covered by your analysis. Might not matter for the sinking as implemented here, but would make it illegal to drop the call.

That's true. Note that the capture should be by another argument passed to the call since the called function is argmemonly.

I see that you've added the writeonly argument handling to this patch. I would prefer to split that off into a separate one, because it's not directly related. Even if it means that this lands without any upstream test changes at first.

I don't think we generally do this. I will need to add either the allocations or the paramcase soon-ish after this. But I see your point (since the change as-is does not affect any upstream code). Will wait if @reames has any concerns.

I think we need to find a way to untangle this into something both reviewable and testable.

It occurs to me that the handling for writeonly calls is separable. In addition to allowing sinking, we can also simply *delete* such calls. As such, we should be able to write tests which exercise that logic on it's own without the remainder of the patch.

I think something that your reasoning on writeonly arguments may be missing is a capture by the call, in which case there might be an indirect read of the alloca later that is not covered by your analysis. Might not matter for the sinking as implemented here, but would make it illegal to drop the call.

That's true. Note that the capture should be by another argument passed to the call since the called function is argmemonly.

I agree, the current code has this bug.

Returning to something discussed previously, I would feel more comfortable if the side effect reasoning to this patch was initially restricted to memory writes. We can come back and generalize (the downstream case can be supported), but doing that in a separate change in case we miss something would make me more comfortable.

llvm/lib/Transforms/Utils/Local.cpp
508 ↗(On Diff #393155)

Please pull this out as a static function. This is too long for a lambda.

513 ↗(On Diff #393155)

This is too strict. You can have other arguments provided they don't also write.

It's also incorrect as you allow a writeonly param to an unrelated global which we can't remove.

What you want is to identify the writeonly params, and then check the uses in the search below correspond to them. You must prove that a) the alloca only reaches no-capture uses on the call, and b) that all writeonly params correspond to uses of the alloca.

541 ↗(On Diff #393155)

For future exploration: It really feels like the property here of "this is never read from" is something we should be able to compute and cache much more broadly. For instance, DSE will naturally find such cases, but not be able to sink them. I don't really know what to do about that, but it sure feels like we should be able to do something.

I posted D115829 to separate out the writeonly argument logic. This isn't a direct split as the logic isn't in the same utility function, but it gets the key bit of logic in and exercised. I tried doing a straight split first, and that caused a few more test diffs than I expected as we have a number of places using wouldBeTriviallyDead as utilities. We'll probably sink this down afterwards, but I wanted to minimize the code impact in the first patch.

anna abandoned this revision.Jan 11 2022, 7:56 AM

At this point, everything necessary is landed upstream through the different approach. See linked patches from Philip.