This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify/NewGVN] Add option to control the use of undef.
ClosedPublic

Authored by fhahn on Jul 28 2020, 1:04 PM.

Details

Summary

Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Diff Detail

Event Timeline

fhahn created this revision.Jul 28 2020, 1:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 1:04 PM
fhahn requested review of this revision.Jul 28 2020, 1:04 PM

Is the goal here that it's legal to replace the output value of SimplifyInstruction with the input value, in addition to the normal replacement of the input value with the output value? Or do we specifically care about literal UndefValue constants somehow, as opposed to values which are undef? If we had a PoisonValue constant, would we also need to exclude it?

fhahn added a comment.Jul 28 2020, 2:18 PM

Is the goal here that it's legal to replace the output value of SimplifyInstruction with the input value, in addition to the normal replacement of the input value with the output value? Or do we specifically care about literal UndefValue constants somehow, as opposed to values which are undef? If we had a PoisonValue constant, would we also need to exclude it?

I think we probably want to be able to replace in both directions as you mentioned. I should have been more specific. The immediate concern is undef constants, as they are most likely to introduce problems (e.g. if the the undef is interpreted as different value for different simplify calls). But we probably also want to limit distribution, as in @aqjune 's example in D84655. This is probably going to be much harder to audit for though I am afraid.

Do we also need to restrict transforms like "X & 0 = 0"?

I'm afraid that adding Q.CanUseUndef to the places where isa<UndefValue> is used isn't enough; equality (==) can implicitly fold undef since UndefValue is singleton.

For example, given icmp eq X, Y, if X and Y are both UndefValue, comparison X == Y will yield true. This will trigger this transformation even if Q.CanUseUndef was false:

// icmp X, X -> true/false
// icmp X, undef -> true/false because undef could be X.
if (LHS == RHS || (Q.CanUseUndef && isa<UndefValue>(RHS)))
  return ConstantInt::get(ITy, CmpInst::isTrueWhenEqual(Pred));

However, in theory, this is also using one of the two undefs by constraining it to have a value that is equal to another undef.

I'm just brainstorming, what if there is a pass that removes all undefs from a function by folding them away? If there is no undef, bad undef foldings will never happen trivially. InstSimplify may create undefs on-the-fly again, but making InstSimplify never create undef will be easier.

nikic added a comment.Jul 29 2020, 8:55 AM

Is the goal here that it's legal to replace the output value of SimplifyInstruction with the input value, in addition to the normal replacement of the input value with the output value? Or do we specifically care about literal UndefValue constants somehow, as opposed to values which are undef? If we had a PoisonValue constant, would we also need to exclude it?

I think the only thing we need to prevent is that InstSimplify assumes a specific value for (or any kind of constraint on) an undef value. We can still allow other refinement, and I don't think poison values are problematic in this context either, because they don't have a consistency requirement (we don't have to pick one particular value for the poison -- and generally can't).

Do we also need to restrict transforms like "X & 0 = 0"?

Same here, this should not be a problem. Even if X is undef, this transform is valid for any choice of value for X.

For example, given icmp eq X, Y, if X and Y are both UndefValue, comparison X == Y will yield true.

Hm, this is a tricky case. I agree that doing this fold is not right, but it also seems hard to really enforce, because of how many value comparisons there are (and not all of them are problematic). I think it may be better to ignore this particular case until we have evidence that it can cause issues in practice (the cases where this could occur are a lot more narrow, because at the very least it involves two undef values from distinct sources).

Is the goal here that it's legal to replace the output value of SimplifyInstruction with the input value, in addition to the normal replacement of the input value with the output value? Or do we specifically care about literal UndefValue constants somehow, as opposed to values which are undef? If we had a PoisonValue constant, would we also need to exclude it?

I think the only thing we need to prevent is that InstSimplify assumes a specific value for (or any kind of constraint on) an undef value. We can still allow other refinement, and I don't think poison values are problematic in this context either, because they don't have a consistency requirement (we don't have to pick one particular value for the poison -- and generally can't).

Yeah as mentioned earlier, assuming different values for undef constants is the biggest issue in practice. Thinking about it again, requiring replacement in both directions would be too limiting due to undef (e.g. we cannot replace a constant with an instruction, unless we know none of the operands are poison).

Hm, this is a tricky case. I agree that doing this fold is not right, but it also seems hard to really enforce, because of how many value comparisons there are (and not all of them are problematic). I think it may be better to ignore this particular case until we have evidence that it can cause issues in practice (the cases where this could occur are a lot more narrow, because at the very least it involves two undef values from distinct sources).

Yes I think we would have to accept that CanUseUndef is 'best-effort' and incomplete. Not great, but I don't think there's any practical way to provide a complete implementation.

Okay. I agree that it may not be problematic in practice; if it matters, icmps with two arguments being undef can be detected in advance and folded.

fhahn added a comment.Aug 5 2020, 12:03 PM

Ping. Should we go with this approach to start with, to unblock some patches and iterate on the details as follow-ups? It's not ideal for CanUseUndef to not cover all cases initially, but I am not sure what the alternative would be.

nikic accepted this revision.Aug 5 2020, 12:31 PM

Ping. Should we go with this approach to start with, to unblock some patches and iterate on the details as follow-ups? It's not ideal for CanUseUndef to not cover all cases initially, but I am not sure what the alternative would be.

Yes, that SGTM.

llvm/include/llvm/Analysis/InstructionSimplify.h
103

Possibly be a bit more specific here and say that simplifications that constrain the range of possible values for an undef are not allowed.

This revision is now accepted and ready to land.Aug 5 2020, 12:31 PM
fhahn updated this revision to Diff 283920.Aug 7 2020, 9:01 AM

Adjust wording of option to mention that simplifications are not allowed to constrain the range of value for uses of undef. Example is that simplifications cannot assume a certain value for a use of undef.

This revision was landed with ongoing or failed builds.Aug 9 2020, 11:23 AM
This revision was automatically updated to reflect the committed changes.