As shown in:
https://bugs.llvm.org/show_bug.cgi?id=35642
...we can have different forms of min/max, so we should recognize those here in EarlyCSE similar to how we already handle binops and compares that can commute.
Details
- Reviewers
hfinkel efriedma gberry - Commits
- rG558a465473bd: [EarlyCSE] recognize swapped variants of abs/nabs as equivalent
rG3c7a35de7fbc: [EarlyCSE] recognize commuted and swapped variants of min/max as equivalent…
rL320653: [EarlyCSE] recognize swapped variants of abs/nabs as equivalent
rL320640: [EarlyCSE] recognize commuted and swapped variants of min/max as equivalent…
Diff Detail
- Repository
- rL LLVM
Event Timeline
Could you use matchSelectPattern (from ValueTracking) here instead of matching the individual min/max patterns? That will also give us absolute value and the floating-point min/max as well. I think that you can hash_combine the fields of SelectPatternResult, and the logic might be simpler overall too (hopefully).
Yes, that's better. I was purposely postponing FP in this patch because it's not clear to me what we need to handle in those cases (intrinsics and/or IR patterns). But either way, using value tracking should simplify the code here.
Patch updated:
Use ValueTracking's matchSelectPattern() rather than pattern matchers for min/max. This:
- Makes the patch smaller
- Catches more cases (extra test added)
- Allows follow-up patches to enable abs() and FP min/max more easily
lib/Transforms/Scalar/EarlyCSE.cpp | ||
---|---|---|
151 ↗ | (On Diff #126784) | Is there anything to do to address this TODO except for removing this check, the corresponding check below, and, to be good, adding a few test cases? I don't want this artificially restricted, but I'm fine enabling the others in a follow-up commit (so that we can revert separately if necessary). This LGTM, but please do take care of the other cases in a follow-up commit, or add a comment explaining why that's non-trivial. |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
---|---|---|
151 ↗ | (On Diff #126784) | For abs, it should just be fixing the check and adding tests. I need to look closer at which FP variants are commutable. But yes, I'll do both of those after this patch. |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
---|---|---|
151 ↗ | (On Diff #126784) |
Ah, I should have mentioned. I looked through the code before I made the suggestion. I think that you only need exclude SPF_UNKNOWN. It looks like all of the others commute because matchSelectPattern specifically excludes matching floating-point forms that don't commute (it specifically excludes cases where signed-zeros could cause problems, NaNs might cause problems, etc.). |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
---|---|---|
151 ↗ | (On Diff #126784) | Great - I was coming to that conclusion too now that I'm reading the code. Just need to come up with N test cases (and probably a pile of negative tests too). :) I think we will need to handle the 'num' intrinsics separately (but similarly): define double @maxnum_intrinsic(double %x, double %y) { %m1 = call double @llvm.maxnum.f64(double %x, double %y) %m2 = call double @llvm.maxnum.f64(double %y, double %x) %r = fadd double %m1, %m2 ret double %r } |
lib/Transforms/Scalar/EarlyCSE.cpp | ||
---|---|---|
151 ↗ | (On Diff #126784) | Cool. Regarding [max|min]num, I agree. |
On closer inspection, I think matchSelectPattern is either not sufficient or broken because it doesn't handle -0.0 as we would require here:
declare void @self_destruct_if_neg_zero(double) define double @fmin_any_ordered_commute(double %x, double %y) { %cmp1 = fcmp nnan olt double %x, %y ; if x=+0.0 and y=-0.0, returns false %cmp2 = fcmp nnan olt double %y, %x ; if x=+0.0 and y=-0.0, returns false %neg_zero_if_false = select i1 %cmp1, double %x, double %y ; returns -0.0 %pos_zero_if_false = select i1 %cmp2, double %y, double %x ; returns +0.0 call void @self_destruct_if_neg_zero(double %pos_zero_if_false) ret double %neg_zero_if_false }
If we use what is returned by matchSelectPattern ( {SPF_FMINNUM, SPNB_RETURNS_ANY, false} ), we get:
$ ./opt -early-cse fminmax.ll -S
define double @fmin_any_ordered_commute(double %x, double %y) { %cmp1 = fcmp nnan olt double %x, %y %cmp2 = fcmp nnan olt double %y, %x %neg_zero_if_false = select i1 %cmp1, double %x, double %y call void @self_destruct_if_neg_zero(double %neg_zero_if_false) ; boom! ret double %neg_zero_if_false }
This seems inconsistent with the intent of matchSelectPattern, at least based on the comment:
// If the predicate is an "or-equal" (FP) predicate, then signed zeroes may // return inconsistent results between implementations. // (0.0 <= -0.0) ? 0.0 : -0.0 // Returns 0.0 // minNum(0.0, -0.0) // May return -0.0 or 0.0 (IEEE 754-2008 5.3.1) // Therefore we behave conservatively and only proceed if at least one of the // operands is known to not be zero, or if we don't care about signed zeroes. switch (Pred) { default: break; case CmpInst::FCMP_OGE: case CmpInst::FCMP_OLE: case CmpInst::FCMP_UGE: case CmpInst::FCMP_ULE: if (!FMF.noSignedZeros() && !isKnownNonZero(CmpLHS) && !isKnownNonZero(CmpRHS)) return {SPF_UNKNOWN, SPNB_NA, false}; }
It seems like there are cases missing here (e.g., this does not apply only to the GE/LE predicates)? Based on your example, it seems like we have the same problem with the GT/LT predicates.
Yep - that's the quick fix; add the other preds. A better solution might be to distinguish signed zero behavior in the same way that we're doing for NAN. But let me try the easy patch first and see if anything breaks. :)
Unfortunately, we would regress this test:
define i8 @t9(float %a) { %t1 = fcmp ult float %a, 0.0 %t2 = fptosi float %a to i8 %t3 = select i1 %t1, i8 %t2, i8 0 ret i8 %t3 }
Currently, we recognize that as fmin with a cast and canonicalize to:
define i8 @t9(float %a) { %.inv = fcmp oge float %a, 0.0 %t31 = select i1 %.inv, float 0.0, float %a %1 = fptosi float %t31 to i8 ret i8 %1 }
We could set the 'nsz' bit that we're sending to the matcher to avoid the regression in the case of a min/max with cast.