This is an archive of the discontinued LLVM Phabricator instance.

[ValueLattice] Distinguish between constant ranges with/without undef.
ClosedPublic

Authored by fhahn on Mar 27 2020, 8:46 AM.

Details

Summary

This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.

A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below

define i32 @f(i32 %a, i1 %c) {

br i1 %c, label %true, label %false

true:

%a.255 = and i32 %a, 255
br label %exit

false:

br label %exit

exit:

%p = phi i32 [ %a.255, %true ], [ undef, %false ]
%f.1 = icmp eq i32 %p, 300
call void @use(i1 %f.1)
%res = and i32 %p, 255
ret i32 %res

}

In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the second use might not be
< 256.

Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.

Diff Detail

Event Timeline

fhahn created this revision.Mar 27 2020, 8:46 AM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
nikic added a subscriber: nikic.Mar 27 2020, 10:18 AM

As we have somewhat recently defined that branch on poison is UB, I wonder whether it's possible to also make branch on undef UB. It so happens that currently all the things we define as UB for poison are also UB for undef, and branching is the only distinction. Is there a reason for that? It would be really great if we could stop thinking about undef when it comes to conditions implied by branches.

Hi,
The semantics of whether br undef is nondet. choice or UB really has been a topic between me and Nuno too.
As @nikic said, we can define it as UB if which branch to take depends on an undef value.
And I think this semantics explains many more optimizations in LLVM as well (including the CVP example that removes and 255 :) ).
It seems this is not explicitly stated at LangRef; it would be great if it is mentioned to avoid further confusion.

aqjune added a subscriber: nlopes.Mar 27 2020, 10:46 AM

Hi,
The semantics of whether br undef is nondet. choice or UB really has been a topic between me and Nuno too.
As @nikic said, we can define it as UB if which branch to take depends on an undef value.

Yes, the example in the description is just the tip of the iceberg. If branch on undef would not be UB, there are many more cases that CVP would miss I think, unless we teach it to insert freeze I guess

The natural definition of undef extends to branches without any special extensions. It might make sense to redefine it to have some poison-like characteristics if that would make our life easier; I don't think any frontends would care. But I think really, you're just trading one set of problems for another. For example, if you have a switch on "x & -2", right now we can eliminate the "&". If branching on poison is UB, that requires freezing x.

Ultimately, I'm think trying to redefine undef at this point would act as a distraction for the poison work. That seems worse than whatever minor improvement we might get from redefining it.

If branching on poison is UB, that requires freezing x.

Err, I meant "branching on undef".

The natural definition of undef extends to branches without any special extensions. It might make sense to redefine it to have some poison-like characteristics if that would make our life easier; I don't think any frontends would care. But I think really, you're just trading one set of problems for another. For example, if you have a switch on "x & -2", right now we can eliminate the "&". If branching on poison is UB, that requires freezing x.

I didn't really get your example, could you maybe spell it out in more detail?

I believe we already have a simple upper bound on which optimizations we're going to lose if we make this change: MSan already considers branching on undef to be UB, and we already block optimizations that have the potential to introduce branch on undef where it did not exist before if MSan is enabled. From a quick look, there currently seem to be only two places related to branching:

https://github.com/llvm/llvm-project/blob/8896d123154c4606d11f6b6f97c33d5a455cd9ea/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L3809-L3812
https://github.com/llvm/llvm-project/blob/ee7510dc86656b739881466fddd59253d008139d/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L719-L727

So there seem to be just two transforms that are potentially affected (and iirc LoopUnswitch was one of the motivating cases for freezing, so probably that would go away once that is done?)

On the other hand we have any code that is inferring information from predicates. If branch on undef is not UB, I suspect that most of CVP is rendered unsound/useless. Probably parts of SCEV that reason about branch conditions as well. And probably parts of ValueTracking, e.g. we use dominating conditions for non-zero checks. Thankfully at least assumes are not affected.

Ultimately, I'm think trying to redefine undef at this point would act as a distraction for the poison work. That seems worse than whatever minor improvement we might get from redefining it.

I'm not sure why this would be a distraction from the poison work. Aligning the UB definitions for undef and poison seems like something that would make that work more straightforward, as we have less different semantics to deal with.

I didn't really get your example, could you maybe spell it out in more detail?

http://volta.cs.utah.edu:8080/z/i-K25Q

Aligning the UB definitions for undef and poison seems like something that would make that work more straightforward, as we have less different semantics to deal with.

If they really align, without causing other problems, it might be okay, I guess.

nikic added a comment.Mar 27 2020, 1:07 PM

I just realized that we already treat branch on undef as UB in isGuaranteedNotToBeUndefOrPoison(): https://github.com/llvm/llvm-project/blob/0ab5b5b8581d9f2951575f7245824e6e4fc57dec/llvm/lib/Analysis/ValueTracking.cpp#L4655-L4673

Ultimately, I'm think trying to redefine undef at this point would act as a distraction for the poison work. That seems worse than whatever minor improvement we might get from redefining it.

I agree, but removing undef value seems to be a pretty big work. Before that happens, making existing optimizations work consistently w.r.t undef might be an important job.

I made a LangRef change patch to make the semantics of branching on undef explicit here: https://reviews.llvm.org/D76973

fhahn updated this revision to Diff 253634.Mar 30 2020, 10:36 AM

I've adjusted the defaults after rG05f0e598ab26: per default, markConstantRange & co assume that the constant range does not contain undef (existing behavior) and isConstantRange/getConstantRange allow constant ranges including undef (existing behavior).

mergeIn automatically marks constant ranges as containing undef based on the merged values. Clients that manually construct ValueLatticeElements are responsible for marking them as containing undef accordingly and also update isConstantRange/getConstantRange calls to exclude ranges including undef where it is not safe. One example is processAnd in CorrelatedValuePropagation, but there will be future changes needed to update all places in LVI that construct lattice elements manually. Currently it uses a helper that returns an optional constant range in multiple places so there's no easy way to recover the information if it may be undef.

I'll also update the example in the description shortly to the code below. %p will be represented as constantrange_including_undef [0, 256) and it would not be allowed to replace %res with %p. Simplifying the condition %f.1 based on range information including undef is allowed, if we replace it with a constant (and remove the original instruction that contained the use of the potentially undef value).

define i32 @f(i32 %a, i1 %c) {
  br i1 %c, label %true, label %false
true:
  %a.255 = and i32 %a, 255
  br label %exit
false:
  br label %exit
exit:
  %p = phi i32 [ %a.255, %true ], [ undef, %false ]
  %f.1 = icmp eq i32 %p, 300
  call void @use(i1 %f.1)
  %res = and i32 %p, 255
  ret i32 %res
}

The changed test cases should give a good idea of the kind of improvements this gives over the singlecr approach.

fhahn edited the summary of this revision. (Show Details)Mar 30 2020, 10:39 AM
efriedma accepted this revision.Mar 30 2020, 3:13 PM

LGTM

(On a side note, we can probably also simplify ResolvedUndefsIn given the new rule.)

This revision is now accepted and ready to land.Mar 30 2020, 3:13 PM
This revision was automatically updated to reflect the committed changes.