This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Phi inputs that come from an unreachable block should be undef.
AbandonedPublic

Authored by vporpo on Jun 2 2022, 10:02 AM.

Details

Summary

Up until now when we generate a vector phi we create a poison value if the input
comes from a BB that is not reachable from the input.

For example:

bb1
  %foo = ...
  br bb2

unreachable_bb:
  br bb2

bb2:
  %phi0 = phi [%foo0, %bb1], [0, %unreachable_bb]
  %phi1 = phi [%foo1, %bb1], [1, %unreachable_bb]

When we vectorize the phi, we are generating:

%vec_phi = phi [%vec_foo, %bb1], [ poison, %unreachable_bb ]

I think that this is incorrect, the input should be undef and not poison.
The value coming from an unreachable block from entry is uninitialized, which
IIUC is modeled by undef.

Diff Detail

Event Timeline

vporpo created this revision.Jun 2 2022, 10:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 10:02 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
vporpo requested review of this revision.Jun 2 2022, 10:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 10:02 AM
This revision is now accepted and ready to land.Jun 2 2022, 10:35 AM

The value coming from an unreachable block from entry is uninitialized

There is no "value coming from an unreachable block", we cannot ever get to an unreachable block, that is what unreachable means. Could you give some more background on the bigger problem you are trying to solve here?

nikic requested changes to this revision.Jun 2 2022, 10:43 AM
nikic added a reviewer: nlopes.
nikic added a subscriber: nikic.

Using poison for values from unreachable blocks is legal and preferred.

This revision now requires changes to proceed.Jun 2 2022, 10:43 AM

Using poison for values from unreachable blocks is legal and preferred.

Agreed, the proposed change is not desirable.
If you are hitting some miscompilation, the bug is elsewhere not here.

MaskRay accepted this revision.EditedJun 2 2022, 10:58 AM
MaskRay added a subscriber: MaskRay.

Some SLPVectorizer changes near D114171 exposed a miscompile issue. Thanks for tracking it down.

This can be considered a workaround, but it will be better than the current miscompile.

Some SLPVectorizer changes near D114171 exposed a miscompile issue. Thanks for tracking it down.

There is D126877, which should fix it.

The issue is that the input to SLP contains phis with an undef input when the input is unreachable. Then SLP converts this to poison which I think is not legal. The end result is that enabling SLP breaks the benchmark.

A reduced example of the code that causes the issue looks like this:

bb1:
   br label %bb3
bb2:
  br label %bb3
bb3:
 %phi0 = phi i8 [ %a0, %bb1 ], [ 1, %bb2]
 %phi1 = phi i8 [ %a1, %bb1 ], [ undef, %bb2 ]
 %zext0 = zext i8 %phi0 to i32
 %zext1 = zext i8 %phi1 to i32
 %or = or i32 %zext0, %zext1

%phi1 has an undef input, so %zext1 contains an i32 value with the lower 8-bits being undef. So the %or has the 24 high order bits zero.

SLP converts the undef to poison, which if I understand correctly causes %zext1 and %or to be poison, so no guarantees about the 24 high order bits being zero.

OK so if poison is the right value from unreachable blocks, then the issue must be in the pass that generates the undefs in the first place?

nlopes requested changes to this revision.Jun 2 2022, 11:15 AM

The issue is that the input to SLP contains phis with an undef input when the input is unreachable. Then SLP converts this to poison which I think is not legal.

It is legal. A value from an unreachable branch is not observable, so you can pick whatever you want. We prefer poison as it's the nicest one (as it can be replaced by anything).

A reduced example of the code that causes the issue looks like this:

bb1:
   br label %bb3
bb2:
  br label %bb3
bb3:
 %phi0 = phi i8 [ %a0, %bb1 ], [ 1, %bb2]
 %phi1 = phi i8 [ %a1, %bb1 ], [ undef, %bb2 ]
 %zext0 = zext i8 %phi0 to i32
 %zext1 = zext i8 %phi1 to i32
 %or = or i32 %zext0, %zext1

%phi1 has an undef input, so %zext1 contains an i32 value with the lower 8-bits being undef. So the %or has the 24 high order bits zero.

SLP converts the undef to poison

That's fine. Moreover, it can then replace %phi1 with %a1.

which if I understand correctly causes %zext1 and %or to be poison

That shouldn't happen unless it can prove that %a1 is poison.

OK so if poison is the right value from unreachable blocks, then the issue must be in the pass that generates the undefs in the first place?

No, because the undef is the entry of an unreachable block, so any value is fine. Maybe you've reduced the test case too much. But the bug is surely elsewhere.
Can you share the original test case?

I think that the issue boils down to whether: %zext = zext i8 poison to i32 is an i32 poison.

Instcombine is folding zext into phis like so:

For a phi with an undef input:

bb1:
  br label %bb3
bb2:
  br label %bb3
bb3:
  %phi = phi <2 x i8> [ %arg, %bb1 ], [ <i8 1, i8 undef>, %bb2 ]
  %zext = zext <2 x i8> %phi to <2 x i32>

Instcombine will fold the zext into phi like this:

bb1:
  %phi.cast = zext <2 x i8> %arg to <2 x i32>
  br label %bb3
bb2:                                              ; No predecessors!
  br label %bb3
bb3:                                              ; preds = %bb2, %bb1
  %phi = phi <2 x i32> [ %phi.cast, %bb1 ], [ <i32 1, i32 0>, %bb2 ]

But for a phi with a poison input it will generate this:

bb1:
  %phi.cast = zext <2 x i8> %arg to <2 x i32>
  br label %bb3
bb2:                                              ; No predecessors!
  br label %bb3
bb3:                                              ; preds = %bb2, %bb1   
  %phi = phi <2 x i32> [ %phi.cast, %bb1 ], [ <i32 1, i32 poison>, %bb2 ]

The input from %bb2 is now an i32 poison, which does not seem to hold the same properties as a zero-extended i8 poison.

I think that the issue boils down to whether: %zext = zext i8 poison to i32 is an i32 poison.

It is: https://llvm.org/docs/LangRef.html#poison-values

But for a phi with a poison input it will generate this:

bb1:
  %phi.cast = zext <2 x i8> %arg to <2 x i32>
  br label %bb3
bb2:                                              ; No predecessors!
  br label %bb3
bb3:                                              ; preds = %bb2, %bb1   
  %phi = phi <2 x i32> [ %phi.cast, %bb1 ], [ <i32 1, i32 poison>, %bb2 ]

The input from %bb2 is now an i32 poison, which does not seem to hold the same properties as a zero-extended i8 poison.

That's a correct transformation. zext poison == poison.

I'll say this one last time: the patch you proposed although it's correct, it's not desirable: it's a regression in terms of performance. If you are seeing a miscompilation, the bug is not in the line you've changed.

I think that the issue boils down to whether: %zext = zext i8 poison to i32 is an i32 poison.

It is, yes.

Instcombine is folding zext into phis like so:

For a phi with an undef input:

bb1:
  br label %bb3
bb2:
  br label %bb3
bb3:
  %phi = phi <2 x i8> [ %arg, %bb1 ], [ <i8 1, i8 undef>, %bb2 ]
  %zext = zext <2 x i8> %phi to <2 x i32>

Instcombine will fold the zext into phi like this:

bb1:
  %phi.cast = zext <2 x i8> %arg to <2 x i32>
  br label %bb3
bb2:                                              ; No predecessors!
  br label %bb3
bb3:                                              ; preds = %bb2, %bb1
  %phi = phi <2 x i32> [ %phi.cast, %bb1 ], [ <i32 1, i32 0>, %bb2 ]

But for a phi with a poison input it will generate this:

bb1:
  %phi.cast = zext <2 x i8> %arg to <2 x i32>
  br label %bb3
bb2:                                              ; No predecessors!
  br label %bb3
bb3:                                              ; preds = %bb2, %bb1   
  %phi = phi <2 x i32> [ %phi.cast, %bb1 ], [ <i32 1, i32 poison>, %bb2 ]

The input from %bb2 is now an i32 poison, which does not seem to hold the same properties as a zero-extended i8 poison.

It holds the same properties in all cases where we reach the block through bb.2, all zero of them. That's why poison should be fine: we know it's only for a case that cannot ever happen.

I'll say this one last time: the patch you proposed although it's correct, it's not desirable: it's a regression in terms of performance. If you are seeing a miscompilation, the bug is not in the line you've changed.

I understand that poison is the preferred input to the phis. But given that it is not, then shouldn't SLP try to maintain the program semantics? Because as I showed in the examples above, by converting an undef to a poison we are actually changing the properties of the values: a zero-extended undef has its high-order bits zero, but a zero-extended poison does not hold this property.

So perhaps if the input is already undef, we should try to keep it and not replace it with poison?

vporpo updated this revision to Diff 433848.Jun 2 2022, 12:46 PM

If the input is undef keep it, else create a poison value.

nikic requested changes to this revision.Jun 2 2022, 1:02 PM

I understand that poison is the preferred input to the phis. But given that it is not, then shouldn't SLP try to maintain the program semantics?

SLP does maintain program semantics: The phi value for an unreachable block doesn't matter, it can be poison, undef, 42, or anything else.

As has been asked multiple times, please explain the problem you're trying to fix in a larger context.

This revision now requires changes to proceed.Jun 2 2022, 1:02 PM
vporpo added a comment.Jun 2 2022, 1:35 PM

SLP does maintain program semantics: The phi value for an unreachable block doesn't matter, it can be poison, undef, 42, or anything else.

Yeah I guess you are right, If the result of the phi does not become poison when one of its inputs is a poison, then it does not change the program behavior.
I guess these SLP poison values were a red herring.

vporpo abandoned this revision.Jun 2 2022, 1:43 PM