This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
4/10
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/2
sink_instruction.ll

Differential D109917

[InstCombine] Improve TryToSink for side-effecting calls that would be trivially dead
AbandonedPublic

Authored by anna on Sep 16 2021, 1:19 PM.

Download Raw Diff

Details

Reviewers

reames
nikic
mkazantsev
apilipenko

Summary

This patch adds support to sink side-effecting calls that are legal to
sink to the successor use block. We can sink such calls as long as these
two conditions are satisfied:

The instruction would be trivially dead (i.e. if there were no uses of the instruction, we can remove the instruction).
There are no side-effecting instructions between the current one until the end of the block.

This helps with sinking allocations down to use blocks when safe to do
so (see added testcases).

Diff Detail

Event Timeline

anna created this revision.Sep 16 2021, 1:19 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 16 2021, 1:19 PM

anna requested review of this revision.Sep 16 2021, 1:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2021, 1:19 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

anna added a parent revision: D109916: Introduce API for stack state modifying instructions.Sep 16 2021, 1:20 PM

Harbormaster completed remote builds in B124270: Diff 373051.Sep 16 2021, 1:39 PM

mkazantsev added inline comments.Sep 23 2021, 11:08 PM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3695	What if `I` has implicit control flow, e.g. may call system exit?

anna added inline comments.Sep 27 2021, 6:42 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3695	So, after this transform, we would have some non side-effecting instructions that are executed before this `I` that may call system exit. Why would this matter? Note that if we were moving this instruction past some side-effecting (example not marked `willreturn`), there is a problem. But that isn't the case here.

Skimming the code for this and the prerequisite patch, I'm a bit uncomfortable. It really feels like you're trying to a) do several things in one change, and b) possibly abusing an interface for a purpose it wasn't intended.

I would suggest moving the no-alias-call bit to a separate patch as that seems unrelated to the core functionality.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3745	Hm, I am not sure here, but I think you should check for potential reads here as well. My reasoning is as follows. While the instruction we're moving might be trivially dead if there are no users, it may not be trivially dead if there are users. There may be some externally arranged invariant that (say) the result is stored with a volatile variable if-and-only-if the call actually has a side effect. As an example, consider the following: a = malloc() stat = malloc_library_counter() if (c) free(a) Basically, is it legal to sink the malloc in this case? I legitimately don't know what the right answer is here. Another example might be: a = malloc() stat = malloc_library_counter() if (c) (volatile i8*)g = a;

reames added inline comments.Sep 28 2021, 9:57 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3745	JFYI, my potential cases just described are definitely disallowed by the function comment in the header which describes trivial-deadness. Unfortunately, so is malloc. As such, that comment is definitely wrong, and needs updated. :)

In D109917#3027817, @reames wrote:

Skimming the code for this and the prerequisite patch, I'm a bit uncomfortable. It really feels like you're trying to a) do several things in one change, and b) possibly abusing an interface for a purpose it wasn't intended.

For others following along: we plan to discuss offline. Will summarize result.

I would suggest moving the no-alias-call bit to a separate patch as that seems unrelated to the core functionality.

Unfortunately, the no-alias bit is required for the malloc, since the Scan instruction *starts* from the instruction we try to sink (i.e. the malloc call) and we fail at mayWriteToMemory. We can frame this as possibly a CT win and land it separately.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3745	I'm not sure if `trivial-deadness` is the property used in DSE for removing the malloc, but the case above is handled by DSE even if we have an externally arranged invariant through `malloc_library_counter` for example: cat simple.ll declare noalias i8* @malloc(i32) willreturn declare i32 @malloc_library_counter() define void @test20A() { %m = call i8* @malloc(i32 24) %stat = call i32 @malloc_library_counter() store i8 0, i8* %m ret void } ` opt -basic-aa -dse -S simple.ll` : define void @test20A() { %stat = call i32 @malloc_library_counter() ret void }

Had an offline discussion with Philip. Plan to separate out two APIs showing the difference between "isTriviallyDeadOnAllPaths" versus "isTriviallyDeadOnSomePaths".

anna mentioned this in rGf98a918d3495: [TrivialDeadness] Update function comment.Oct 1 2021, 9:09 AM

anna mentioned this in D114647: [TrivialDeadness] Introduce API separating two different usages.Nov 26 2021, 9:36 AM

separated out API

anna mentioned this in D114648: [InstCombine] Sink noaliasing calls that can have side-effects.Nov 26 2021, 9:52 AM

anna added a child revision: D114648: [InstCombine] Sink noaliasing calls that can have side-effects.Nov 26 2021, 9:53 AM

anna edited parent revisions, added: D114647: [TrivialDeadness] Introduce API separating two different usages; removed: D109916: Introduce API for stack state modifying instructions.

Harbormaster completed remote builds in B136259: Diff 390089.Nov 26 2021, 10:04 AM

rebased

Harbormaster completed remote builds in B136970: Diff 391088.Dec 1 2021, 11:11 AM

nikic requested changes to this revision.Dec 1 2021, 11:36 AM

nikic added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3736	Comparing with the previous code block, the `DestBlock->getUniquePredecessor() != I->getParent()` check is missing. With an unreachable/return terminator, there may be intervening blocks. Please add a test for this.

This revision now requires changes to proceed.Dec 1 2021, 11:36 AM

nikic added inline comments.Dec 1 2021, 11:55 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3741	I find the reasoning in terms of mayHaveSideEffects() in this patch confusing. A "side effect" in LLVM can either be a memory write, unwinding or divergence. I think the side effects you have in mind here are only of the "memory write" variety. I'd prefer if this code was more explicit about the actual correctness requirements.
llvm/test/Transforms/InstCombine/sink_instruction.ll
235	This test doesn't make sense to me, at least as a test for the new functionality. It would already fold beforehand because `@log()` is marked side-effect free. I think you were trying to test the isMathLibCallNoop() case, but I don't think it is relevant to your patch, because it requires constant arguments -- in which case the result will be constant folded anyway and the call removed.

Do you have a use case for this apart from sinking allocations? If not, I would recommend restricting this to allocations only, because the generalized reasoning seems pretty subtle to me, and I'm not sure in which cases (apart from allocations) it would help.

In D109917#3165100, @nikic wrote:

Do you have a use case for this apart from sinking allocations? If not, I would recommend restricting this to allocations only, because the generalized reasoning seems pretty subtle to me, and I'm not sure in which cases (apart from allocations) it would help.

To be honest, I ended up writing a sinking allocations test support, just so that we have this code upstream :)
In our downstream use case, we have support for additional "trivially dead" instructions in java world, such as object.hashCode and identityHashCode. These modify state, so they are not readnone, but they can be sunk to uses (which is beneficial especially if use is on cold path).

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3741	I did mean instructions that are side-effecting, not just the subset of "memory write". For an allocation, it may be enough to worry about "memory writes" only, but we're looking at the entire set of Instructions that are trivially dead on unused paths (see main comment about downstream use cases for example). bool Instruction::mayHaveSideEffects() const { return mayWriteToMemory() \|\| mayThrow() \|\| !willReturn(); } Consider this: %x = call @A() <-- mayThrow %y = call @B() <-- mayThrow` ... BB: use (%x) If the original behaviour of the program was such that call to `@A` can throw an exception and we can handle the exception and `call @B` also throws an exception, by sinking the `call @A` we have changed the order of exceptions. Or even the case if `call @B` does not return. I am not sure why we are making it more general here by relaxing the restrictions?
llvm/test/Transforms/InstCombine/sink_instruction.ll
235	Yes, you're right. It is not relevant. Makes sense to land separately. I do not have a way to showcase a test since there's no test which would trigger in this patch upstream. At least pretty much everything we already have triggers without this patch (either they are side-effect free or the instruction itself has no uses such as true guards and assumes). Unfortunately, the only testcase which works with this patch is the alloc case, but there was a request to separate out from this patch because it looks slightly orthogonal (D114648).

Thinking about it a bit, I think there are some interesting cases where we can sink something with an analyzeable side effect. An example I came up with is a large function with out-params which are unused in the caller.

Here's an attempt at a C example:

int foo();

extern int test(int *out) __attribute__((noinline));
int test(int *out) {
  *out = foo();
  return foo();
}

int wrapper() {

  int notdead;
  if (test(&notdead))
    return 0;

  int dead;
  int tmp = test(&dead);
  if (notdead)
    return tmp;
  return foo();
}

Key points in the example:

foo is used to represent a large *no-throw* block of code with hard to IPO results.
We have a parameter which is only written. (Today, we fail to infer this fact from IR oddly. We should fix that.)
We have an unused temporary created to pass to said out-param.
We have two or more callers, some of which *do* use the out-param.

In the IR, we'd look for something like:

An argmemonly nothrow willreturn function, with one or more write only params.
A callsite to said function where the memory passed is an alloca used only by lifetime markers and the call.

Such a call site can be sunk to the dataflow uses without changing the visibility of the side effect.

To be careful, this comment is meant to motivate the basic idea, not any particular implementation detail of this patch.

In D109917#3165403, @reames wrote:

Thinking about it a bit, I think there are some interesting cases where we can sink something with an analyzeable side effect. An example I came up with is a large function with out-params which are unused in the caller.

Thanks Philip. This is a definitely interesting example worth handling. Let me update the code to consider this use case and precommit some of the tests. The use case we have downstream (as mentioned is object.hashCode), but this example makes a good usecase for upstream IMO.

anna mentioned this in rG72750f00121e: [TrivialDeadness] Introduce API separating two different usages.Dec 3 2021, 7:10 AM

nikic added inline comments.Dec 3 2021, 11:54 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3741	If `wouldInstructionBeTriviallyDeadOnUnusedPaths()` is true, then you are basically promising that `call @A` cannot throw, or at least we're allowed to pretend it can't. Otherwise it would clearly be illegal to sink the call past control flow in any case. Now, do we care about whether `call @B` throws? If we sink `call @A` past a call that throws or diverges, then side-effects from `call @A` may not occur, but the whole premise of the optimization is that they do not matter. You are already sinking the call past explicit control flow into a different block, so I don't see why sinking it past implicit control flow would be relevant.

anna added inline comments.Dec 7 2021, 7:27 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3741	If wouldInstructionBeTriviallyDeadOnUnusedPaths() is true, then you are basically promising that call @A cannot throw, or at least we're allowed to pretend it can't. Otherwise it would clearly be illegal to sink the call past control flow in any case. Yes, that's true. I missed that while writing the example. If we sink call @A past a call that throws or diverges, then side-effects from call @A may not occur, but the whole premise of the optimization is that they do not matter. I had added the more general "no side-effect instructions in between" constraint so that we do not change the observable behaviour of the program. By sinking it past a throwable call, we are changing the observable behaviour of the program (but your point below clarified the position for me). You are already sinking the call past explicit control flow into a different block, so I don't see why sinking it past implicit control flow would be relevant. Good point. I think with this logic, if we sunk some "allocation calls" to a block that wasn't executed at runtime, we are already changing the observable behaviour of the program and that is fine, since the allocation isn't used. Thinking through couple of examples: Example 1: %x = call @A(i32 %d) <-- side-effecting because it writes to memory %y = call @A(i32 %d2) ... bb: use (%x) use(%y) In this case, the memory-write variant is enough to make sure that we do not reorder the side-effecting calls in the bb block. After the transform we will (iteratively) sink both calls to `@A` to the use block without reordering the instructions Example 2: %x = call @A(i32 %d) <-- side-effecting because it will not return %y = call @A(i32 %d2) ... bb: use (%x) use(%y) In Example2, we already fail at `wouldInstructionBeTriviallyDeadOnUnusedPaths` since instruction does not return. Nothing will be sunk to use. The last one is the "sinking past a throwable call": %x = call @A(i32 %d) <-- side-effecting because it writes to memory call void @B() <-- can throw (i.e. not marked nounwind) .. bb: use (%x) This is same as the case with explicit control flow: %x = call @A(i32 %d) <-- side-effecting because it writes to memory br i1 %false, label %fail, label %pass fail: use(%x) pass: ... In short, I think you've convinced me with the reasoning for just writes to memory constraint. I will add a detailed comment explaining the reasoning (especially since it wasn't clear to me at first).

anna mentioned this in rG287fa2e16432: Update with additional tests for sinking calls to uses.Dec 8 2021, 1:01 PM

rewrote for outparam usecase in simplest form

Note that this is a rewrite of original patch to handle sinking of function call with outparam(s). I have also added allocation test cases to show that these are not affected by the patch.

Harbormaster completed remote builds in B138443: Diff 393155.Dec 9 2021, 8:10 AM

I see that you've added the writeonly argument handling to this patch. I would prefer to split that off into a separate one, because it's not directly related. Even if it means that this lands without any upstream test changes at first.

I think something that your reasoning on writeonly arguments may be missing is a capture by the call, in which case there might be an indirect read of the alloca later that is not covered by your analysis. Might not matter for the sinking as implemented here, but would make it illegal to drop the call.

In D109917#3188468, @nikic wrote:

I see that you've added the writeonly argument handling to this patch. I would prefer to split that off into a separate one, because it's not directly related. Even if it means that this lands without any upstream test changes at first.

I don't think we generally do this. I will need to add either the allocations or the paramcase soon-ish after this. But I see your point (since the change as-is does not affect any upstream code). Will wait if @reames has any concerns.

I think something that your reasoning on writeonly arguments may be missing is a capture by the call, in which case there might be an indirect read of the alloca later that is not covered by your analysis. Might not matter for the sinking as implemented here, but would make it illegal to drop the call.

That's true. Note that the capture should be by another argument passed to the call since the called function is argmemonly.

In D109917#3189682, @anna wrote:

In D109917#3188468, @nikic wrote:

I see that you've added the writeonly argument handling to this patch. I would prefer to split that off into a separate one, because it's not directly related. Even if it means that this lands without any upstream test changes at first.

I don't think we generally do this. I will need to add either the allocations or the paramcase soon-ish after this. But I see your point (since the change as-is does not affect any upstream code). Will wait if @reames has any concerns.

I think we need to find a way to untangle this into something both reviewable and testable.

It occurs to me that the handling for writeonly calls is separable. In addition to allowing sinking, we can also simply *delete* such calls. As such, we should be able to write tests which exercise that logic on it's own without the remainder of the patch.

I think something that your reasoning on writeonly arguments may be missing is a capture by the call, in which case there might be an indirect read of the alloca later that is not covered by your analysis. Might not matter for the sinking as implemented here, but would make it illegal to drop the call.

That's true. Note that the capture should be by another argument passed to the call since the called function is argmemonly.

I agree, the current code has this bug.

Returning to something discussed previously, I would feel more comfortable if the side effect reasoning to this patch was initially restricted to memory writes. We can come back and generalize (the downstream case can be supported), but doing that in a separate change in case we miss something would make me more comfortable.

llvm/lib/Transforms/Utils/Local.cpp
508 ↗	(On Diff #393155)	Please pull this out as a static function. This is too long for a lambda.
513 ↗	(On Diff #393155)	This is too strict. You can have other arguments provided they don't also write. It's also incorrect as you allow a writeonly param to an unrelated global which we can't remove. What you want is to identify the writeonly params, and then check the uses in the search below correspond to them. You must prove that a) the alloca only reaches no-capture uses on the call, and b) that all writeonly params correspond to uses of the alloca.
541 ↗	(On Diff #393155)	For future exploration: It really feels like the property here of "this is never read from" is something we should be able to compute and cache much more broadly. For instance, DSE will naturally find such cases, but not be able to sink them. I don't really know what to do about that, but it sure feels like we should be able to do something.

I posted D115829 to separate out the writeonly argument logic. This isn't a direct split as the logic isn't in the same utility function, but it gets the key bit of logic in and exercised. I tried doing a straight split first, and that caused a few more test diffs than I expected as we have a number of places using wouldBeTriviallyDead as utilities. We'll probably sink this down afterwards, but I wanted to minimize the code impact in the first patch.

reames mentioned this in rGda41cfddcad6: Add test coverage for D109917 and variants.Dec 22 2021, 7:38 PM

reames mentioned this in D116200: [instcombine] Allow sinking of calls with known writes to uses.Dec 22 2021, 8:06 PM

At this point, everything necessary is landed upstream through the different approach. See linked patches from Philip.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

27 lines

test/

Transforms/

InstCombine/

sink_instruction.ll

170 lines

Diff 390089

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 3,680 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitFreeze(FreezeInst &I) {

return nullptr;		return nullptr;
}		}

/// Try to move the specified instruction from its current block into the		/// Try to move the specified instruction from its current block into the
/// beginning of DestBlock, which can only happen if it's safe to move the		/// beginning of DestBlock, which can only happen if it's safe to move the
/// instruction past all of the instructions between it and the end of its		/// instruction past all of the instructions between it and the end of its
/// block.		/// block.
static bool TryToSinkInstruction(Instruction I, BasicBlock DestBlock) {		static bool TryToSinkInstruction(Instruction I, BasicBlock DestBlock,
		TargetLibraryInfo &TLI) {
assert(I->getUniqueUndroppableUser() && "Invariants didn't hold!");		assert(I->getUniqueUndroppableUser() && "Invariants didn't hold!");
BasicBlock *SrcBlock = I->getParent();		BasicBlock *SrcBlock = I->getParent();

// Cannot move control-flow-involving, volatile loads, vaarg, etc.		// Cannot move control-flow-involving instructions.
if (isa<PHINode>(I) \|\| I->isEHPad() \|\| I->mayHaveSideEffects() \|\|		if (isa<PHINode>(I) \|\| I->isEHPad() \|\| I->isTerminator())
		mkazantsevUnsubmitted Not Done Reply Inline Actions What if `I` has implicit control flow, e.g. may call system exit? mkazantsev: What if `I` has implicit control flow, e.g. may call system exit?
		annaAuthorUnsubmitted Done Reply Inline Actions So, after this transform, we would have some non side-effecting instructions that are executed before this `I` that may call system exit. Why would this matter? Note that if we were moving this instruction past some side-effecting (example not marked `willreturn`), there is a problem. But that isn't the case here. anna: So, after this transform, we would have some non side-effecting instructions that are executed…
I->isTerminator())
return false;		return false;

// Do not sink static or dynamic alloca instructions. Static allocas must		// Do not sink static or dynamic alloca instructions. Static allocas must
// remain in the entry block, and dynamic allocas must not be sunk in between		// remain in the entry block, and dynamic allocas must not be sunk in between
// a stacksave / stackrestore pair, which would incorrectly shorten its		// a stacksave / stackrestore pair, which would incorrectly shorten its
// lifetime.		// lifetime.
if (isa<AllocaInst>(I))		if (isa<AllocaInst>(I))
return false;		return false;
Show All 17 Lines	if (DestBlock->getUniquePredecessor() != I->getParent())
return false;		return false;
for (BasicBlock::iterator Scan = I->getIterator(),		for (BasicBlock::iterator Scan = I->getIterator(),
E = I->getParent()->end();		E = I->getParent()->end();
Scan != E; ++Scan)		Scan != E; ++Scan)
if (Scan->mayWriteToMemory())		if (Scan->mayWriteToMemory())
return false;		return false;
}		}

		// Even if instructions have side-effects, if they would be trivially dead
		// along all paths, DSE removes that instruction. We can apply the same logic
		// here to eliminate the instruction along paths it is not used, by sinking
		// the instruction to the use block (provided we do not sink them past
		// side-effecting instructions).
		if (I->mayHaveSideEffects()) {
		if (!wouldInstructionBeTriviallyDeadOnPathsWithoutUse(I, &TLI))
		return false;
		nikicUnsubmitted Not Done Reply Inline Actions Comparing with the previous code block, the `DestBlock->getUniquePredecessor() != I->getParent()` check is missing. With an unreachable/return terminator, there may be intervening blocks. Please add a test for this. nikic: Comparing with the previous code block, the `DestBlock->getUniquePredecessor() != I->getParent…
		for (BasicBlock::iterator Scan = std::next(I->getIterator()),
		E = std::prev(I->getParent()->end());
		Scan != E; ++Scan) {
		if (Scan->mayHaveSideEffects())
		return false;
		nikicUnsubmitted Not Done Reply Inline Actions I find the reasoning in terms of mayHaveSideEffects() in this patch confusing. A "side effect" in LLVM can either be a memory write, unwinding or divergence. I think the side effects you have in mind here are only of the "memory write" variety. I'd prefer if this code was more explicit about the actual correctness requirements. nikic: I find the reasoning in terms of mayHaveSideEffects() in this patch confusing. A "side effect"…
		annaAuthorUnsubmitted Done Reply Inline Actions I did mean instructions that are side-effecting, not just the subset of "memory write". For an allocation, it may be enough to worry about "memory writes" only, but we're looking at the entire set of Instructions that are trivially dead on unused paths (see main comment about downstream use cases for example). bool Instruction::mayHaveSideEffects() const { return mayWriteToMemory() \|\| mayThrow() \|\| !willReturn(); } Consider this: %x = call @A() <-- mayThrow %y = call @B() <-- mayThrow` ... BB: use (%x) If the original behaviour of the program was such that call to `@A` can throw an exception and we can handle the exception and `call @B` also throws an exception, by sinking the `call @A` we have changed the order of exceptions. Or even the case if `call @B` does not return. I am not sure why we are making it more general here by relaxing the restrictions? anna: I did mean instructions that are side-effecting, not just the subset of "memory write". For an…
		nikicUnsubmitted Not Done Reply Inline Actions If `wouldInstructionBeTriviallyDeadOnUnusedPaths()` is true, then you are basically promising that `call @A` cannot throw, or at least we're allowed to pretend it can't. Otherwise it would clearly be illegal to sink the call past control flow in any case. Now, do we care about whether `call @B` throws? If we sink `call @A` past a call that throws or diverges, then side-effects from `call @A` may not occur, but the whole premise of the optimization is that they do not matter. You are already sinking the call past explicit control flow into a different block, so I don't see why sinking it past implicit control flow would be relevant. nikic: If `wouldInstructionBeTriviallyDeadOnUnusedPaths()` is true, then you are basically promising…
		annaAuthorUnsubmitted Done Reply Inline Actions If wouldInstructionBeTriviallyDeadOnUnusedPaths() is true, then you are basically promising that call @A cannot throw, or at least we're allowed to pretend it can't. Otherwise it would clearly be illegal to sink the call past control flow in any case. Yes, that's true. I missed that while writing the example. If we sink call @A past a call that throws or diverges, then side-effects from call @A may not occur, but the whole premise of the optimization is that they do not matter. I had added the more general "no side-effect instructions in between" constraint so that we do not change the observable behaviour of the program. By sinking it past a throwable call, we are changing the observable behaviour of the program (but your point below clarified the position for me). You are already sinking the call past explicit control flow into a different block, so I don't see why sinking it past implicit control flow would be relevant. Good point. I think with this logic, if we sunk some "allocation calls" to a block that wasn't executed at runtime, we are already changing the observable behaviour of the program and that is fine, since the allocation isn't used. Thinking through couple of examples: Example 1: %x = call @A(i32 %d) <-- side-effecting because it writes to memory %y = call @A(i32 %d2) ... bb: use (%x) use(%y) In this case, the memory-write variant is enough to make sure that we do not reorder the side-effecting calls in the bb block. After the transform we will (iteratively) sink both calls to `@A` to the use block without reordering the instructions Example 2: %x = call @A(i32 %d) <-- side-effecting because it will not return %y = call @A(i32 %d2) ... bb: use (%x) use(%y) In Example2, we already fail at `wouldInstructionBeTriviallyDeadOnUnusedPaths` since instruction does not return. Nothing will be sunk to use. The last one is the "sinking past a throwable call": %x = call @A(i32 %d) <-- side-effecting because it writes to memory call void @B() <-- can throw (i.e. not marked nounwind) .. bb: use (%x) This is same as the case with explicit control flow: %x = call @A(i32 %d) <-- side-effecting because it writes to memory br i1 %false, label %fail, label %pass fail: use(%x) pass: ... In short, I think you've convinced me with the reasoning for just writes to memory constraint. I will add a detailed comment explaining the reasoning (especially since it wasn't clear to me at first). anna: > If wouldInstructionBeTriviallyDeadOnUnusedPaths() is true, then you are basically promising…
		}
		}
I->dropDroppableUses([DestBlock](const Use *U) {		I->dropDroppableUses([DestBlock](const Use *U) {
if (auto *I = dyn_cast<Instruction>(U->getUser()))		if (auto *I = dyn_cast<Instruction>(U->getUser()))
		reamesUnsubmitted Not Done Reply Inline Actions Hm, I am not sure here, but I think you should check for potential reads here as well. My reasoning is as follows. While the instruction we're moving might be trivially dead if there are no users, it may not be trivially dead if there are users. There may be some externally arranged invariant that (say) the result is stored with a volatile variable if-and-only-if the call actually has a side effect. As an example, consider the following: a = malloc() stat = malloc_library_counter() if (c) free(a) Basically, is it legal to sink the malloc in this case? I legitimately don't know what the right answer is here. Another example might be: a = malloc() stat = malloc_library_counter() if (c) (volatile i8)g = a; reames:* Hm, I am not sure here, but I think you should check for potential reads here as well. My…
		reamesUnsubmitted Not Done Reply Inline Actions JFYI, my potential cases just described are definitely disallowed by the function comment in the header which describes trivial-deadness. Unfortunately, so is malloc. As such, that comment is definitely wrong, and needs updated. :) reames: JFYI, my potential cases just described are definitely disallowed by the function comment in…
		annaAuthorUnsubmitted Done Reply Inline Actions I'm not sure if `trivial-deadness` is the property used in DSE for removing the malloc, but the case above is handled by DSE even if we have an externally arranged invariant through `malloc_library_counter` for example: cat simple.ll declare noalias i8* @malloc(i32) willreturn declare i32 @malloc_library_counter() define void @test20A() { %m = call i8* @malloc(i32 24) %stat = call i32 @malloc_library_counter() store i8 0, i8* %m ret void } ` opt -basic-aa -dse -S simple.ll` : define void @test20A() { %stat = call i32 @malloc_library_counter() ret void } anna: I'm not sure if `trivial-deadness` is the property used in DSE for removing the malloc, but the…
return I->getParent() != DestBlock;		return I->getParent() != DestBlock;
return true;		return true;
});		});
/// FIXME: We could remove droppable uses that are not dominated by		/// FIXME: We could remove droppable uses that are not dominated by
/// the new position.		/// the new position.

BasicBlock::iterator InsertPos = DestBlock->getFirstInsertionPt();		BasicBlock::iterator InsertPos = DestBlock->getFirstInsertionPt();
I->moveBefore(&*InsertPos);		I->moveBefore(&*InsertPos);
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (!DIIClones.empty()) {
// maintain the original order.		// maintain the original order.
for (auto &DIIClone : llvm::reverse(DIIClones)) {		for (auto &DIIClone : llvm::reverse(DIIClones)) {
DIIClone->insertBefore(&*InsertPos);		DIIClone->insertBefore(&*InsertPos);
LLVM_DEBUG(dbgs() << "SINK: " << *DIIClone << '\n');		LLVM_DEBUG(dbgs() << "SINK: " << *DIIClone << '\n');
}		}
}		}

return true;		return true;
}		}
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - } +} Lint: Pre-merge checks: clang-format: please reformat the code ``` - } +} ```

bool InstCombinerImpl::run() {		bool InstCombinerImpl::run() {
while (!Worklist.isEmpty()) {		while (!Worklist.isEmpty()) {
// Walk deferred instructions in reverse order, and push them to the		// Walk deferred instructions in reverse order, and push them to the
// worklist, which means they'll end up popped from the worklist in-order.		// worklist, which means they'll end up popped from the worklist in-order.
while (Instruction *I = Worklist.popDeferred()) {		while (Instruction *I = Worklist.popDeferred()) {
// Check to see if we can DCE the instruction. We do this already here to		// Check to see if we can DCE the instruction. We do this already here to
// reduce the number of uses and thus allow other folds to trigger.		// reduce the number of uses and thus allow other folds to trigger.
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	auto getOptionalSinkBlockForInst =
}		}
return None;		return None;
};		};

auto OptBB = getOptionalSinkBlockForInst(I);		auto OptBB = getOptionalSinkBlockForInst(I);
if (OptBB) {		if (OptBB) {
auto UserParent = OptBB;		auto UserParent = OptBB;
// Okay, the CFG is simple enough, try to sink this instruction.		// Okay, the CFG is simple enough, try to sink this instruction.
if (TryToSinkInstruction(I, UserParent)) {		if (TryToSinkInstruction(I, UserParent, TLI)) {
LLVM_DEBUG(dbgs() << "IC: Sink: " << *I << '\n');		LLVM_DEBUG(dbgs() << "IC: Sink: " << *I << '\n');
MadeIRChange = true;		MadeIRChange = true;
// We'll add uses of the sunk instruction below, but since		// We'll add uses of the sunk instruction below, but since
// sinking can expose opportunities for it's operands add		// sinking can expose opportunities for it's operands add
// them to the worklist		// them to the worklist
for (Use &U : I->operands())		for (Use &U : I->operands())
if (Instruction *OpI = dyn_cast<Instruction>(U.get()))		if (Instruction *OpI = dyn_cast<Instruction>(U.get()))
Worklist.push(OpI);		Worklist.push(OpI);
▲ Show 20 Lines • Show All 432 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/sink_instruction.ll

	Show All 35 Lines
	; CHECK: bb:			; CHECK: bb:
	; CHECK-NEXT: [[X_ADDR_17:%.]] = phi i32 [ [[X:%.]], [[ENTRY:%.]] ], [ [[X_ADDR_0:%.]], [[BB2:%.*]] ]			; CHECK-NEXT: [[X_ADDR_17:%.]] = phi i32 [ [[X:%.]], [[ENTRY:%.]] ], [ [[X_ADDR_0:%.]], [[BB2:%.*]] ]
	; CHECK-NEXT: [[I_06:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP4:%.]], [[BB2]] ]			; CHECK-NEXT: [[I_06:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP4:%.]], [[BB2]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[X_ADDR_17]], 0			; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[X_ADDR_17]], 0
	; CHECK-NEXT: br i1 [[TMP0]], label [[BB1:%.*]], label [[BB2]]			; CHECK-NEXT: br i1 [[TMP0]], label [[BB1:%.*]], label [[BB2]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP1:%.*]] = add nsw i32 [[X_ADDR_17]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add nsw i32 [[X_ADDR_17]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = sdiv i32 [[TMP1]], [[X_ADDR_17]]			; CHECK-NEXT: [[TMP2:%.*]] = sdiv i32 [[TMP1]], [[X_ADDR_17]]
	; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @bar() #[[ATTR1:[0-9]+]]			; CHECK-NEXT: [[TMP3:%.*]] = tail call i32 @bar()
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[X_ADDR_0]] = phi i32 [ [[TMP2]], [[BB1]] ], [ [[X_ADDR_17]], [[BB]] ]			; CHECK-NEXT: [[X_ADDR_0]] = phi i32 [ [[TMP2]], [[BB1]] ], [ [[X_ADDR_17]], [[BB]] ]
	; CHECK-NEXT: [[TMP4]] = add nuw nsw i32 [[I_06]], 1			; CHECK-NEXT: [[TMP4]] = add nuw nsw i32 [[I_06]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP4]], 1000000			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP4]], 1000000
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[BB4:%.*]], label [[BB]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[BB4:%.*]], label [[BB]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: ret i32 [[X_ADDR_0]]			; CHECK-NEXT: ret i32 [[X_ADDR_0]]
	▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
	sw.bb: ; preds = %entry, %entry			sw.bb: ; preds = %entry, %entry
	br label %sw.epilog			br label %sw.epilog

	sw.epilog: ; preds = %entry, %sw.bb			sw.epilog: ; preds = %entry, %sw.bb
	%sum.0 = phi i32 [ %add, %sw.bb ], [ %0, %dispatchBB ], [ %0, %dispatchBB ]			%sum.0 = phi i32 [ %add, %sw.bb ], [ %0, %dispatchBB ], [ %0, %dispatchBB ]
	ret i32 %sum.0			ret i32 %sum.0
	}			}

				declare void @check(i8*)
				declare noalias i8* @malloc(i32) willreturn nounwind
				declare void @check2(i8, i8)

				declare void @checkd(double)
				declare double @log(double) willreturn nounwind readnone
				define void @test7(i1 %cond, double %d) {
				; CHECK-LABEL: @test7(
				; CHECK-NEXT: br i1 [[COND:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: [[A:%.]] = call double @log(double [[D:%.]])
				; CHECK-NEXT: call void @checkd(double [[A]])
				; CHECK-NEXT: ret void
				; CHECK: else:
				; CHECK-NEXT: ret void
				;
				%A = call double @log(double %d)
				nikicUnsubmitted Not Done Reply Inline Actions This test doesn't make sense to me, at least as a test for the new functionality. It would already fold beforehand because `@log()` is marked side-effect free. I think you were trying to test the isMathLibCallNoop() case, but I don't think it is relevant to your patch, because it requires constant arguments -- in which case the result will be constant folded anyway and the call removed. nikic: This test doesn't make sense to me, at least as a test for the new functionality. It would…
				annaAuthorUnsubmitted Done Reply Inline Actions Yes, you're right. It is not relevant. Makes sense to land separately. I do not have a way to showcase a test since there's no test which would trigger in this patch upstream. At least pretty much everything we already have triggers without this patch (either they are side-effect free or the instruction itself has no uses such as true guards and assumes). Unfortunately, the only testcase which works with this patch is the alloc case, but there was a request to separate out from this patch because it looks slightly orthogonal (D114648). anna: Yes, you're right. It is not relevant. Makes sense to land separately. I do not have a way to…
				br i1 %cond, label %if, label %else

				if:
				call void @checkd(double %A)
				ret void
				else:
				ret void
				}

				; TODO: We can sink down the allocation to its only use.
				define i8** @test_allocation_sink_bc(i1 %cond) {
				; CHECK-LABEL: @test_allocation_sink_bc(
				; CHECK-NEXT: [[A:%.]] = call dereferenceable_or_null(16000) i8 @malloc(i32 16000)
				; CHECK-NEXT: br i1 [[COND:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: [[X:%.]] = bitcast i8 [[A]] to i8**
				; CHECK-NEXT: ret i8** [[X]]
				; CHECK: else:
				; CHECK-NEXT: [[B:%.]] = call dereferenceable_or_null(24000) i8 @malloc(i32 24000)
				; CHECK-NEXT: [[Y:%.]] = bitcast i8 [[B]] to i8**
				; CHECK-NEXT: ret i8** [[Y]]
				;
				%A = call i8* @malloc(i32 16000)
				br i1 %cond, label %if, label %else

				if:
				%X = bitcast i8* %A to i8**
				ret i8** %X


				else:
				%B = call i8* @malloc(i32 24000)
				%Y = bitcast i8* %B to i8**
				ret i8** %Y
				}

				; TODO: We can sink down the allocation to its only use.
				define void @test_allocation_sink(i1 %cond) {
				; CHECK-LABEL: @test_allocation_sink(
				; CHECK-NEXT: [[A:%.]] = call dereferenceable_or_null(16000) i8 @malloc(i32 16000)
				; CHECK-NEXT: br i1 [[COND:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: call void @check(i8* [[A]])
				; CHECK-NEXT: ret void
				; CHECK: else:
				; CHECK-NEXT: ret void
				;
				%A = call i8* @malloc(i32 16000)
				br i1 %cond, label %if, label %else

				if:
				call void @check(i8* %A)
				ret void

				else:
				ret void
				}

				; TODO: We can sink this malloc as well down to the use block.
				define void @test_allocation_sink_noalias_load(i1 %cond, i32 %i, i32* readonly %P) {
				; CHECK-LABEL: @test_allocation_sink_noalias_load(
				; CHECK-NEXT: [[A:%.]] = call dereferenceable_or_null(16000) i8 @malloc(i32 16000)
				; CHECK-NEXT: [[IDXPROM:%.]] = sext i32 [[I:%.]] to i64
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 [[IDXPROM]]
				; CHECK-NEXT: [[L:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; CHECK-NEXT: br i1 [[COND:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: call void @check(i8* [[A]])
				; CHECK-NEXT: ret void
				; CHECK: else:
				; CHECK-NEXT: [[X:%.*]] = call i32 @foo(i32 [[L]], i32 [[L]])
				; CHECK-NEXT: [[Y:%.*]] = call i32 @foo(i32 [[L]], i32 [[L]])
				; CHECK-NEXT: ret void
				;
				%A = call i8* @malloc(i32 16000)
				%idxprom = sext i32 %i to i64
				%arrayidx = getelementptr inbounds i32, i32* %P, i64 %idxprom
				%L = load i32, i32* %arrayidx, align 4
				br i1 %cond, label %if, label %else

				if:
				call void @check(i8* %A)
				ret void

				else:
				%X = call i32 @foo(i32 %L, i32 %L)
				%Y = call i32 @foo(i32 %L, i32 %L)
				ret void
				}

				; TODO: We can sink both allocations to its only use without them being
				; incorrectly reordered.
				define void @test_allocation_sink2(i1 %cond) {
				; CHECK-LABEL: @test_allocation_sink2(
				; CHECK-NEXT: [[A:%.]] = call dereferenceable_or_null(16000) i8 @malloc(i32 16000)
				; CHECK-NEXT: [[B:%.]] = call dereferenceable_or_null(16) i8 @malloc(i32 16)
				; CHECK-NEXT: br i1 [[COND:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: call void @check2(i8* [[A]], i8* [[B]])
				; CHECK-NEXT: ret void
				; CHECK: else:
				; CHECK-NEXT: ret void
				;
				%A = call i8* @malloc(i32 16000)
				%B = call i8* @malloc(i32 16)
				br i1 %cond, label %if, label %else

				if:
				call void @check2(i8* %A, i8* %B)
				ret void

				else:
				ret void
				}

				declare i8* @llvm.stacksave()
				declare void @llvm.stackrestore(i8*)
				; We cannot sink the stacksave to the only use (stackrestore), since that incorrectly modifies
				; stack state (dynamic alloca in between).
				define void @test_neg_stacksave(i1 %cond, i32 %size) {
				; CHECK-LABEL: @test_neg_stacksave(
				; CHECK-NEXT: [[TMP:%.]] = call i8 @llvm.stacksave()
				; CHECK-NEXT: [[TMP1:%.]] = zext i32 [[SIZE:%.]] to i64
				; CHECK-NEXT: [[A:%.*]] = alloca i8, i64 [[TMP1]], align 1
				; CHECK-NEXT: call void @check(i8* nonnull [[A]])
				; CHECK-NEXT: br label [[NEXT:%.*]]
				; CHECK: next:
				; CHECK-NEXT: call void @llvm.stackrestore(i8* [[TMP]])
				; CHECK-NEXT: br i1 [[COND:%.]], label [[IF:%.]], label [[ELSE:%.*]]
				; CHECK: if:
				; CHECK-NEXT: [[X:%.*]] = tail call i32 @bar()
				; CHECK-NEXT: ret void
				; CHECK: else:
				; CHECK-NEXT: ret void
				;
				%tmp = call i8* @llvm.stacksave( )
				%A = alloca i8, i32 %size
				call void @check(i8* %A)
				br label %next

				next:
				call void @llvm.stackrestore( i8* %tmp )
				br i1 %cond, label %if, label %else

				if:
				%X = tail call i32 @bar() nounwind
				ret void

				else:
				ret void
				}