This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] fold min/max intrinsics using icmp simplification
AbandonedPublic

Authored by spatel on Jul 27 2020, 6:52 AM.

Details

Summary

As discussed in rG0481e1ae3c17, we can use icmp simplification to reduce min/max intrinsics, so we don't have to re-implement a long list of possible transforms.

But (if my undef reasoning is correct), this is not as easy as always returning the existing operand. In the case of undefs, we need to clamp to the limit constant. This is similar to what we do with the saturating math intrinsics.

Diff Detail

Event Timeline

spatel created this revision.Jul 27 2020, 6:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2020, 6:52 AM
nikic added a subscriber: fhahn.Jul 27 2020, 12:03 PM

But (if my undef reasoning is correct), this is not as easy as always returning the existing operand. In the case of undefs, we need to clamp to the limit constant. This is similar to what we do with the saturating math intrinsics.

Gah, undef strikes again. I don't think what you did here is sufficient, because the SimplifyICmpInst() may depend on undef in a more roundabout manner than a direct undef element in one of the min/max operands. I think in the short term this needs to be guarded by guaranteed-not-undef, at least for the operand that we simplify to.

In the longer term, I think we really need to introduce an InstSimplify query flag that disables undef-based folds. I think at this point we have multiple places where we need this. This is one, it is also the imho more proper solution to D84250, and @fhahn mentioned that this may be needed for NewGVN correctness as well.

spatel updated this revision to Diff 281049.Jul 27 2020, 2:04 PM

Patch updated:
Bail out if the would-be constant result is not safe. Not sure if this goes far enough. We still manage to fold 1 of the partial undef tests because the icmp analysis allows it and we are returning a non-constant value in that case.

Hi,
Is there a case where depending on SimplifyICmpInst may be problematic? I'm not very familiar with the history of miscompilation related with InstSimplify (except the distribution of undef), so it is possible that I'm missing something.

v = smax(x, y)
use(v)
  => // isICmpTrue(sgt, x, y) is true
v = smax(x, y)
use(x) // Is this valid?

For the D84250, the fix had to be conservative because distribution is incorrect under presence of undef. If InstSimplify can be arbitrarily smart so e.g. it iterates over the basic block and find the equivalent expression (as GVN does), this can cause miscompliation:

tmp = x + x
v = x * (1 + 1) // InstSimplify reaches to the conclusion that this is equivalent to x + x, finds tmp, so it uses tmp instead.
->
tmp = x + x
v = tmp // This is wrong, because previously v is always an even, but not it can be anything if x is undef.

Hi,
Is there a case where depending on SimplifyICmpInst may be problematic? I'm not very familiar with the history of miscompilation related with InstSimplify (except the distribution of undef), so it is possible that I'm missing something.

v = smax(x, y)
use(v)
  => // isICmpTrue(sgt, x, y) is true
v = smax(x, y)
use(x) // Is this valid?

For the D84250, the fix had to be conservative because distribution is incorrect under presence of undef. If InstSimplify can be arbitrarily smart so e.g. it iterates over the basic block and find the equivalent expression (as GVN does), this can cause miscompliation:

tmp = x + x
v = x * (1 + 1) // InstSimplify reaches to the conclusion that this is equivalent to x + x, finds tmp, so it uses tmp instead.
->
tmp = x + x
v = tmp // This is wrong, because previously v is always an even, but not it can be anything if x is undef.

Yes, this reminds me of D77868, so I'm worried that we can't trust SimplifyICmpInst without a stronger undef check. Given that, I'd rather abandon this approach. I think we can put in a handful of direct min/max simplifications and be confident that we have good optimization and safety. Ie, back to the approach started with rG0481e1a.

Yes, this reminds me of D77868, so I'm worried that we can't trust SimplifyICmpInst without a stronger undef check. Given that, I'd rather abandon this approach. I think we can put in a handful of direct min/max simplifications and be confident that we have good optimization and safety. Ie, back to the approach started with rG0481e1a.

I think this case is a bit different from D77868: min/max can be expressed as a composition of icmp with select.

v = smax(a, b)
<=>
c = icmp sgt a, b
v = select c, a, b

It seems legal to use SimplifyICmpInst to get the simplified value of c.
If these two expressions are different (which can happen when e.g. smax/smin are defined to block poison), using SimplifyICmpInst may be illegal.
(BTW, this equivalence seems to be the most important thing to prove first once the min/max intrinsic functions are added into Alive2.)

In case of D77868, using SimplifyAndInst/SimplifyOrInst to get the value of select was illegal, because select is not equivalent to and/or.

fhahn added a comment.Jul 28 2020, 1:04 PM

But (if my undef reasoning is correct), this is not as easy as always returning the existing operand. In the case of undefs, we need to clamp to the limit constant. This is similar to what we do with the saturating math intrinsics.

In the longer term, I think we really need to introduce an InstSimplify query flag that disables undef-based folds. I think at this point we have multiple places where we need this. This is one, it is also the imho more proper solution to D84250, and @fhahn mentioned that this may be needed for NewGVN correctness as well.

For NewGVN we could also work around the problem by providing a way for InstSimplify to query the 'virtual' operands instead of the actual IR operands. I have some rough patches for that, but it gets quite hairy when it comes to threading that through to the various match() uses.

So for now, I put up a patch to control the use of undef through SimplifyQuery D84792. If it is useful in other cases too, it might make sense to get this in first.

nikic added a comment.Jul 28 2020, 1:15 PM

Yes, this reminds me of D77868, so I'm worried that we can't trust SimplifyICmpInst without a stronger undef check. Given that, I'd rather abandon this approach. I think we can put in a handful of direct min/max simplifications and be confident that we have good optimization and safety. Ie, back to the approach started with rG0481e1a.

I think this case is a bit different from D77868: min/max can be expressed as a composition of icmp with select.

v = smax(a, b)
<=>
c = icmp sgt a, b
v = select c, a, b

The problem is that this does not hold. It's okay to go from icmp+select to max, but not the other way around. The difference is exactly that max requires a unique value for a and b, while the icmp+select sequence allows a and b to be chosen independently for the icmp and the select, if they are undef. This is also the reason why many of our existing min/max transforms are not actually legal when undef is taken into consideration.

The problem is that this does not hold. It's okay to go from icmp+select to max, but not the other way around. The difference is exactly that max requires a unique value for a and b, while the icmp+select sequence allows a and b to be chosen independently for the icmp and the select, if they are undef. This is also the reason why many of our existing min/max transforms are not actually legal when undef is taken into consideration.

I see your point. You're right, they are not equivalent. max(undef, 3) -> undef is invalid, which is ok if it was represented with icmp + select.
It is sad that the decomposition of max to icmp + select is invalid. It could have made this kind of reasoning simpler.

I think either allowing handful transformations for max/min only or using InstSimplify after a flag for unusing undef is added is fine.

spatel abandoned this revision.Jul 29 2020, 2:15 PM

I added code to simplify these intrinsics without using SimplifyICmpInst, so we can make progress. If we find a way to avoid undef-related logic holes, we can revisit this approach.
rG3fb13b8484dc
rG9ee7d7122c06
rG3c20ede18b85
rG3e8534fbc62c
rGee9617e96b05
rG5cd695dd7fbd
rGfef513f5ccb7