This is an archive of the discontinued LLVM Phabricator instance.

[EarlyCSE] recognize commuted and swapped variants of min/max as equivalent (PR35642)
ClosedPublic

Authored by spatel on Dec 12 2017, 2:37 PM.

Diff Detail

Event Timeline

spatel created this revision.Dec 12 2017, 2:37 PM

Could you use matchSelectPattern (from ValueTracking) here instead of matching the individual min/max patterns? That will also give us absolute value and the floating-point min/max as well. I think that you can hash_combine the fields of SelectPatternResult, and the logic might be simpler overall too (hopefully).

fhahn added a subscriber: fhahn.Dec 12 2017, 8:25 PM

Could you use matchSelectPattern (from ValueTracking) here instead of matching the individual min/max patterns? That will also give us absolute value and the floating-point min/max as well. I think that you can hash_combine the fields of SelectPatternResult, and the logic might be simpler overall too (hopefully).

Yes, that's better. I was purposely postponing FP in this patch because it's not clear to me what we need to handle in those cases (intrinsics and/or IR patterns). But either way, using value tracking should simplify the code here.

spatel updated this revision to Diff 126784.Dec 13 2017, 10:44 AM

Patch updated:
Use ValueTracking's matchSelectPattern() rather than pattern matchers for min/max. This:

  1. Makes the patch smaller
  2. Catches more cases (extra test added)
  3. Allows follow-up patches to enable abs() and FP min/max more easily
hfinkel accepted this revision.Dec 13 2017, 12:52 PM
hfinkel added inline comments.
lib/Transforms/Scalar/EarlyCSE.cpp
150

Is there anything to do to address this TODO except for removing this check, the corresponding check below, and, to be good, adding a few test cases?

I don't want this artificially restricted, but I'm fine enabling the others in a follow-up commit (so that we can revert separately if necessary). This LGTM, but please do take care of the other cases in a follow-up commit, or add a comment explaining why that's non-trivial.

This revision is now accepted and ready to land.Dec 13 2017, 12:52 PM
spatel added inline comments.Dec 13 2017, 1:37 PM
lib/Transforms/Scalar/EarlyCSE.cpp
150

For abs, it should just be fixing the check and adding tests. I need to look closer at which FP variants are commutable. But yes, I'll do both of those after this patch.

This revision was automatically updated to reflect the committed changes.
hfinkel added inline comments.Dec 13 2017, 3:59 PM
lib/Transforms/Scalar/EarlyCSE.cpp
150

I need to look closer at which FP variants are commutable. But yes, I'll do both of those after this patch.

Ah, I should have mentioned. I looked through the code before I made the suggestion. I think that you only need exclude SPF_UNKNOWN. It looks like all of the others commute because matchSelectPattern specifically excludes matching floating-point forms that don't commute (it specifically excludes cases where signed-zeros could cause problems, NaNs might cause problems, etc.).

spatel added inline comments.Dec 13 2017, 4:03 PM
lib/Transforms/Scalar/EarlyCSE.cpp
150

Great - I was coming to that conclusion too now that I'm reading the code. Just need to come up with N test cases (and probably a pile of negative tests too). :)

I think we will need to handle the 'num' intrinsics separately (but similarly):

define double @maxnum_intrinsic(double %x, double %y) {
  %m1 = call double @llvm.maxnum.f64(double %x, double %y)
  %m2 = call double @llvm.maxnum.f64(double %y, double %x)
  %r = fadd double %m1, %m2
  ret double %r
}
hfinkel added inline comments.Dec 13 2017, 4:09 PM
lib/Transforms/Scalar/EarlyCSE.cpp
150

Cool. Regarding [max|min]num, I agree.

On closer inspection, I think matchSelectPattern is either not sufficient or broken because it doesn't handle -0.0 as we would require here:

declare void @self_destruct_if_neg_zero(double)  
define double @fmin_any_ordered_commute(double %x, double %y) {
  %cmp1 = fcmp nnan olt double %x, %y                            ; if x=+0.0 and y=-0.0, returns false
  %cmp2 = fcmp nnan olt double %y, %x                            ; if x=+0.0 and y=-0.0, returns false
  %neg_zero_if_false = select i1 %cmp1, double %x, double %y     ; returns -0.0
  %pos_zero_if_false = select i1 %cmp2, double %y, double %x     ; returns +0.0
  call void @self_destruct_if_neg_zero(double %pos_zero_if_false)
  ret double %neg_zero_if_false
}

If we use what is returned by matchSelectPattern ( {SPF_FMINNUM, SPNB_RETURNS_ANY, false} ), we get:

$ ./opt -early-cse fminmax.ll -S

define double @fmin_any_ordered_commute(double %x, double %y) {
  %cmp1 = fcmp nnan olt double %x, %y
  %cmp2 = fcmp nnan olt double %y, %x
  %neg_zero_if_false = select i1 %cmp1, double %x, double %y
  call void @self_destruct_if_neg_zero(double %neg_zero_if_false)  ; boom!
  ret double %neg_zero_if_false
}

On closer inspection, I think matchSelectPattern is either not sufficient or broken because it doesn't handle -0.0 as we would require here:

declare void @self_destruct_if_neg_zero(double)  
define double @fmin_any_ordered_commute(double %x, double %y) {
  %cmp1 = fcmp nnan olt double %x, %y                            ; if x=+0.0 and y=-0.0, returns false
  %cmp2 = fcmp nnan olt double %y, %x                            ; if x=+0.0 and y=-0.0, returns false
  %neg_zero_if_false = select i1 %cmp1, double %x, double %y     ; returns -0.0
  %pos_zero_if_false = select i1 %cmp2, double %y, double %x     ; returns +0.0
  call void @self_destruct_if_neg_zero(double %pos_zero_if_false)
  ret double %neg_zero_if_false
}

If we use what is returned by matchSelectPattern ( {SPF_FMINNUM, SPNB_RETURNS_ANY, false} ), we get:

$ ./opt -early-cse fminmax.ll -S

define double @fmin_any_ordered_commute(double %x, double %y) {
  %cmp1 = fcmp nnan olt double %x, %y
  %cmp2 = fcmp nnan olt double %y, %x
  %neg_zero_if_false = select i1 %cmp1, double %x, double %y
  call void @self_destruct_if_neg_zero(double %neg_zero_if_false)  ; boom!
  ret double %neg_zero_if_false
}

This seems inconsistent with the intent of matchSelectPattern, at least based on the comment:

// If the predicate is an "or-equal"  (FP) predicate, then signed zeroes may
// return inconsistent results between implementations.
//   (0.0 <= -0.0) ? 0.0 : -0.0 // Returns 0.0
//   minNum(0.0, -0.0)          // May return -0.0 or 0.0 (IEEE 754-2008 5.3.1)
// Therefore we behave conservatively and only proceed if at least one of the
// operands is known to not be zero, or if we don't care about signed zeroes.
switch (Pred) {
default: break;
case CmpInst::FCMP_OGE: case CmpInst::FCMP_OLE:
case CmpInst::FCMP_UGE: case CmpInst::FCMP_ULE:
  if (!FMF.noSignedZeros() && !isKnownNonZero(CmpLHS) &&
      !isKnownNonZero(CmpRHS))
    return {SPF_UNKNOWN, SPNB_NA, false};
}

It seems like there are cases missing here (e.g., this does not apply only to the GE/LE predicates)? Based on your example, it seems like we have the same problem with the GT/LT predicates.

// If the predicate is an "or-equal"  (FP) predicate, then signed zeroes may
// return inconsistent results between implementations.
//   (0.0 <= -0.0) ? 0.0 : -0.0 // Returns 0.0
//   minNum(0.0, -0.0)          // May return -0.0 or 0.0 (IEEE 754-2008 5.3.1)
// Therefore we behave conservatively and only proceed if at least one of the
// operands is known to not be zero, or if we don't care about signed zeroes.
switch (Pred) {
default: break;
case CmpInst::FCMP_OGE: case CmpInst::FCMP_OLE:
case CmpInst::FCMP_UGE: case CmpInst::FCMP_ULE:
  if (!FMF.noSignedZeros() && !isKnownNonZero(CmpLHS) &&
      !isKnownNonZero(CmpRHS))
    return {SPF_UNKNOWN, SPNB_NA, false};
}

It seems like there are cases missing here (e.g., this does not apply only to the GE/LE predicates)? Based on your example, it seems like we have the same problem with the GT/LT predicates.

Yep - that's the quick fix; add the other preds. A better solution might be to distinguish signed zero behavior in the same way that we're doing for NAN. But let me try the easy patch first and see if anything breaks. :)

It seems like there are cases missing here (e.g., this does not apply only to the GE/LE predicates)? Based on your example, it seems like we have the same problem with the GT/LT predicates.

Yep - that's the quick fix; add the other preds. A better solution might be to distinguish signed zero behavior in the same way that we're doing for NAN. But let me try the easy patch first and see if anything breaks. :)

Unfortunately, we would regress this test:

define i8 @t9(float %a) {
  %t1 = fcmp ult float %a, 0.0
  %t2 = fptosi float %a to i8
  %t3 = select i1 %t1, i8 %t2, i8 0
  ret i8 %t3
}

Currently, we recognize that as fmin with a cast and canonicalize to:

define i8 @t9(float %a) {
  %.inv = fcmp oge float %a, 0.0
  %t31 = select i1 %.inv, float 0.0, float %a
  %1 = fptosi float %t31 to i8
  ret i8 %1
}

We could set the 'nsz' bit that we're sending to the matcher to avoid the regression in the case of a min/max with cast.