This is an archive of the discontinued LLVM Phabricator instance.

[IR] PR27065: Part2. Fix BasicBlock::removePredecessor to not break SSA form.
AbandonedPublic

Authored by dendibakh on Nov 5 2019, 11:28 AM.

Details

Summary

BasicBlock::removePredecessor replaces phi nodes with single entry even if it will generate self-referencing instructions and break SSA form. This patch fixes this behavior.

Logic regarding the change in CloneFunction.cpp:

  • We prevent replacing phi nodes in BasicBlock::removePredecessor only in the cases where it will break SSA form.
  • Since it was working before, i.e. verifier did not complain, it means that the code was dead. Then, it shouldn't matter if we skip blocks that still have phi nodes.

See example in the lit test in this patch.

Diff Detail

Event Timeline

dendibakh created this revision.Nov 5 2019, 11:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2019, 11:28 AM

I doubt this is actually fixing anything. Unreachable code is generally allowed to be in non-SSA form. As noted in https://bugs.llvm.org/show_bug.cgi?id=27065, it is load-combine that is broken to look at unreachable code.

Unreachable code is generally allowed to be in non-SSA form.

Is it documented somewhere?
I think it adds certain limitations to the usage of some LLVM functionality. For example, if someone decide to link in detached blocks (after removePredecessor) again, (s)he will be surprised to find broken SSA code in there. It limits room for hacking certainly.

jdoerfert added a comment.EditedNov 6 2019, 3:09 PM

Unreachable code is generally allowed to be in non-SSA form.

Is it documented somewhere?

Probably is, I don't have a reference handy though.

I think it adds certain limitations to the usage of some LLVM functionality. For example, if someone decide to link in detached blocks (after removePredecessor) again, (s)he will be surprised to find broken SSA code in there. It limits room for hacking certainly.

While that is not untrue, there is a very simple solution: run trivial dead code removal first. What you are trying to do, prevent generation of degenerated unreachable IR, is an uphill battle on more fronts than you would imagine.

spatel added a comment.Nov 7 2019, 5:17 AM

Unreachable code is generally allowed to be in non-SSA form.

Is it documented somewhere?

Probably is, I don't have a reference handy though.

I think it adds certain limitations to the usage of some LLVM functionality. For example, if someone decide to link in detached blocks (after removePredecessor) again, (s)he will be surprised to find broken SSA code in there. It limits room for hacking certainly.

While that is not untrue, there is a very simple solution: run trivial dead code removal first. What you are trying to do, prevent generation of degenerated unreachable IR, is an uphill battle on more fronts than you would imagine.

A same/similar question about this code in BasicBlock was raised on llvm-dev here:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089091.html

Unfortunately, I don't think the legality of the invalid SSA is documented anywhere. But as we can see, it passes the IR verifier.

As noted in D69823, I've made several patches to other passes to work-around the issue. It would be great if we didn't have to do that, but AFAIK, we're stuck.

A same/similar question about this code in BasicBlock was raised on llvm-dev here:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089091.html

Unfortunately, I don't think the legality of the invalid SSA is documented anywhere. But as we can see, it passes the IR verifier.

As noted in D69823, I've made several patches to other passes to work-around the issue. It would be great if we didn't have to do that, but AFAIK, we're stuck.

Thanks for the link Sanjay,
However, it's still not clear to me why broken unreachable code is better than valid unreachable code? Is it a hidden way to test that passes do not look into unreachable code?

A same/similar question about this code in BasicBlock was raised on llvm-dev here:
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089091.html

Unfortunately, I don't think the legality of the invalid SSA is documented anywhere. But as we can see, it passes the IR verifier.

As noted in D69823, I've made several patches to other passes to work-around the issue. It would be great if we didn't have to do that, but AFAIK, we're stuck.

Thanks for the link Sanjay,

The most common way to make sure passes do not choke on unreachable IR is to replace the loop
for (BasicBlock &BB : Fn)
with
for (BasicBlock &BB: depth_first(Fn))
which will never visit the unreachable parts.

However, it's still not clear to me why broken unreachable code is better than valid unreachable code?

Because you have to add logic all over the place to ensure this condition.

Is it a hidden way to test that passes do not look into unreachable code?

Not that I know of. You can write a test case with really broken unreachable IR and run all passes on it. That would actually be a really cool thing to do.

The general problem here is, how do you define dominance for unreachable code? LLVM makes the choice that every basic block dominates an unreachable basic block. This leads to the slightly strange result that an instruction can use itself... but there isn't really any good alternative given we allow unreachable basic blocks.

In that context, trying to avoid generating instructions that use themselves isn't really productive; you can get equally confusing results in other ways.

We could consider just saying that unreachable blocks aren't valid. Last time I looked, that didn't really seem feasible because there were a bunch of transforms that could leave around unreachable code, without any easy way to check. But there's been a bunch of work to preserve the dominator tree in more places, so maybe it's easier now.

I'm not arguing that passes should not look into unreachable code.

But in this exact case, BasicBlock::removePredecessor is pretty much general API, not limited to particular pass. We are not 100% sure that the broken code will be unreachable.
Consider this:

define i32 @f(i1 %cmp, i8 *%call) {
entry:
  br i1 %cmp, label %phi.bb, label %ret.bb
phi.bb:
  %ptr.1 = phi i8* [ %call, %entry],[ %inc.ptr1, %use.bb ]
  br label %use.bb
use.bb:
  %inc.ptr1 = getelementptr inbounds i8, i8* %ptr.1, i64 1
  br i1 %cmp, label %ret.bb, label %phi.bb
ret.bb:
  ret i32 0
}

After doing:

PhiBB->removePredecessor(EntryBB);

We have:

define i32 @f(i1 %cmp, i8* %call) {
entry:
  br i1 %cmp, label %phi.bb, label %ret.bb				<== edge entry->phi.bb will be removed later
phi.bb:
  br label %use.bb
use.bb:
  %inc.ptr1 = getelementptr inbounds i8, i8* %inc.ptr1, i64 1 		<== broken
  br i1 %cmp, label %ret.bb, label %phi.bb
ret.bb:
  ret i32 0
}

User can later decide to link phi.bb into some other place (outlining, for example). Can that happen? If yes, it is not unreachable code anymore.

To summarize my point: even if we know the code is dead and noone will look into it, I don't see benefits of allowing it to be broken. In this particular case, (I hope) it will require no additional changes, since passes do not look into unreachable code and do not care if it is valid or not.

I'm not arguing that passes should not look into unreachable code.

But in this exact case, BasicBlock::removePredecessor is pretty much general API, not limited to particular pass. We are not 100% sure that the broken code will be unreachable.
Consider this:

define i32 @f(i1 %cmp, i8 *%call) {
entry:
  br i1 %cmp, label %phi.bb, label %ret.bb
phi.bb:
  %ptr.1 = phi i8* [ %call, %entry],[ %inc.ptr1, %use.bb ]
  br label %use.bb
use.bb:
  %inc.ptr1 = getelementptr inbounds i8, i8* %ptr.1, i64 1
  br i1 %cmp, label %ret.bb, label %phi.bb
ret.bb:
  ret i32 0
}

After doing:

PhiBB->removePredecessor(EntryBB);

We have:

define i32 @f(i1 %cmp, i8* %call) {
entry:
  br i1 %cmp, label %phi.bb, label %ret.bb				<== edge entry->phi.bb will be removed later
phi.bb:
  br label %use.bb
use.bb:
  %inc.ptr1 = getelementptr inbounds i8, i8* %inc.ptr1, i64 1 		<== broken
  br i1 %cmp, label %ret.bb, label %phi.bb
ret.bb:
  ret i32 0
}

While in the middle of a code rewrite, code can be broken. I don't see a way around that.

User can later decide to link phi.bb into some other place (outlining, for example). Can that happen? If yes, it is not unreachable code anymore.

Arguably, you can do that but you can do a lot of things that will result in problems. If you disconnect a block with a phi you are already at a point that does not make much sense. You argue that keeping the phi around as a "known" value that needs to be replaced once the block is reconnected, though it seems more like a workaround to me. Once disconnected, IR looses it's semantic, especially PHI nodes.

To summarize my point: even if we know the code is dead and noone will look into it, I don't see benefits of allowing it to be broken. In this particular case, (I hope) it will require no additional changes, since passes do not look into unreachable code and do not care if it is valid or not.

The benefit is we don't need special cases trying to not "break" it:

  • They will probably never be complete, thus do not remove the need to deal with unreachable code appropriately.
  • They will probably introduce conservative regressions and workarounds, as the one you added: https://reviews.llvm.org/D69865#C1676614NL685
lebedev.ri requested changes to this revision.Nov 10 2019, 9:33 AM

(as per disscussion in https://reviews.llvm.org/D69865)

This revision now requires changes to proceed.Nov 10 2019, 9:33 AM
lebedev.ri resigned from this revision.Jan 12 2023, 4:43 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:43 PM
Herald added a subscriber: StephenFan. · View Herald Transcript
dendibakh abandoned this revision.Jan 13 2023, 3:41 AM