This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify the semantics of branch on undef
ClosedPublic

Authored by aqjune on Mar 28 2020, 12:12 AM.

Details

Summary

This patch clarifies the semantics of branching on undef value.

Defining br undef as undefined behavior explains optimizations that use branch conditions, such as CVP (D76931) and GVN (propagateEquality).

For switch cond, it is defined to raise UB if cond is an expression containing undef && cond is not frozen &&
it may yield different values.
This allows that at the destination block the branch condition can be assumed to be frozen already (otherwise UB was already triggered).
This condition is slightly stricter than MemorySanitizer, which allows undef-y condition if it always leads to the same destination,
but it does not break MemorySanitizer because we are giving stricter constraint.

Diff Detail

Event Timeline

aqjune created this revision.Mar 28 2020, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2020, 12:12 AM
aqjune added a subscriber: regehr.

I don't like the way you say "non-frozen undef" but maybe it is just me. For me, a frozen value is neither undef nor poison, full stop.
I also don't like "expression containing undef" which could be interpreted as and i1 false, undef, though that one is an OK branch condition.

llvm/docs/LangRef.rst
7068

Do we have this for switch also?

I also don't like "expression containing undef" which could be interpreted as and i1 false, undef, though that one is an OK branch condition.

There is a second condition which is that it should always yield the same value. I'll make this more explicit.

nikic added inline comments.Mar 28 2020, 2:18 AM
llvm/docs/LangRef.rst
3457

This definition seems unnecessary confusing, just saying "Branching on undef is undefined behavior" should be sufficient here. Undef is not frozen by definition. There's also no meaningful concept of "contains undef" in this context, as branch conditions are always scalar, so the value can only be undef or not be undef. Expressions reduce to undef or not undef in the usual ways.

7068

Yes, and also for indirectbr.

aqjune updated this revision to Diff 253353.Mar 28 2020, 9:32 AM

Make statements simpler, update switch/indirectbr

aqjune marked 4 inline comments as done.Mar 28 2020, 9:37 AM
aqjune added inline comments.
llvm/docs/LangRef.rst
3457

The tricky case is when a partially undef (e.g. undef | 1) is given to switch.
I rephrased the text to make it clear that switching on a value is UB if it is not frozen. This needed defining the notion of a frozen value above.

nlopes added inline comments.Mar 28 2020, 11:51 AM
llvm/docs/LangRef.rst
3457

Undef is a non-deterministic value that can evaluate to different values each time it is used, while "freeze undef" cannot.
That's the main difference. The way to detect this is to say that if the expression we branch on can evaluate to both true and false given the current state then it's UB.
Not sure we need to make this so explicit. Alive2 already has the details implemented.

BTW, @aqjune can you post the examples of optimizations that are incorrect if "br undef" is UB that I wrote the other day. Sorry, I don't have that email here at hand.

nlopes accepted this revision.Mar 28 2020, 11:51 AM
This revision is now accepted and ready to land.Mar 28 2020, 11:51 AM
fhahn added a comment.Mar 28 2020, 1:44 PM

We should probably also think about what to do with all the tests that contain branches on undef. Also, bugpoint should probably stop creating branches on undef with this spelled out as is?

aqjune marked an inline comment as done.Mar 29 2020, 7:53 AM

With this semantics, separating if (cond1 && cond2) into if (cond1) { if(cond2) { .. } } becomes incorrect. If cond1 is undef and cond2 is false, this introduces UB.
There are several places where this happens, such as test/Transforms/SimplifyCFG/switch_create.ll and test/SimpleLoopUnswitch/LIV-loop-condtion.ll. They can be fixed by introducing freeze.

We should probably also think about what to do with all the tests that contain branches on undef. Also, bugpoint should probably stop creating branches on undef with this spelled out as is?

We can make Alive2 automatically convert it to br (freeze undef), .. when running through unit tests, but making bugpoint directly print it will be helpful as well.

We should probably also think about what to do with all the tests that contain branches on undef. Also, bugpoint should probably stop creating branches on undef with this spelled out as is?

When I've updated other tests that would break with changed/improved undef semantics, I usually replace the 'undef' with a parameter like this:
rGfebcb24f1490
Maybe we can script that to fix most of the tests?

We should probably also think about what to do with all the tests that contain branches on undef. Also, bugpoint should probably stop creating branches on undef with this spelled out as is?

When I've updated other tests that would break with changed/improved undef semantics, I usually replace the 'undef' with a parameter like this:
rGfebcb24f1490
Maybe we can script that to fix most of the tests?

agreed!

This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Apr 5 2020, 1:35 PM

I think the clarification allows to easily assume information from branch conditions without considering undef, but IIUC it makes introducing new branches harder? For example, the following would not be allowed, as it might introduce UB with the new definition, right? I mean that example below wouldn't work due to poison anyways, so I am not sure if it makes things actually worse in practice (there probably are other examples with values that possibly are undef rather than poison). But I think this kind of thing should definitely be highlighted somewhere as problematic (if my understanding is indeed correct). Introducing new branches out of 'thin' air is done in at least a couple of places (e.g. loop versioning, runtime checks and probably many more places).

define i32 @src(i32 %x) {
  ret i32 %x
}

define i32 @tgt(i32 %x) {
  %c = icmp eq i32 %x, 10
  br i1 %c, label %true, label %false

true:
  ret i32 10

false:
ret i32 %x
}

As Eli pointed out in the discussion for D76611, there are quite a few places that introduce new branches which are already disabled for functions with the SanitizeMemory attribute, as this causes problems in practice already. I guess those all would need to be fixed and until all code is using the same assumptions, there will be bugs as we have seen already. Is there already an effort in that direction? If not, it might be good to at least record & track the issues. I would be happy to get something started in bugzilla or something else.

nlopes added a comment.Apr 5 2020, 2:29 PM

Yes, introducing branches on a variable that may be undef/poison is not legal. However, you can use freeze to make it safe.
I think @aqjune fixed loop unswitching already. (don't recall if that was the reverted patch). It's true there a couple more places left to fix.

aqjune added a comment.Apr 5 2020, 5:37 PM

Yes, introducing branches is unsound if its condition can be undef or poison. LoopUnswitch was reverted due to performance issue sadly, but I'm reapplying similar changes that less affect the quality of generated code.
For example, https://reviews.llvm.org/D76179 , which freezes condition when transforming select to br , in CodeGenPrepare.
It will be great if such transformations are tracked & fixed.