This is the opposite direction of D62158 (we have to choose 1 form or the other). Now that we have FMF on the select, this becomes more palatable. And the benefits of having a single IR instruction for this operation (less chances of missing folds based on extra uses, etc) overcome my previous comments about the potential advantage of larger pattern matching/analysis. I'll abandon the other patch if there's general agreement.
Details
Diff Detail
Event Timeline
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | Do we want NSZ here? I see that this was an artifact of the old fcmp+select lowering, but I'm not sure if it is correct. Definitely not IEEE-754 compliant... |
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | Now that I think about it, maybe it would be better to skip the fcmp+sel step and lower the libm call directly into the intrinsic? |
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | Can you explain/show example for the compliancy problem? Reference for our current docs: |
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | Ah, right. This is ok in IEEE-754 2008. My copy of IEEE-754 2018 (a draft):
So adding NSZ there wouldn't be correct for the min(-0,+0) and friends cases. |
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | I might still be missing some subtlety. My interpretation of the LangRef text (and so also IEEE-754 2008) is that 'nsz' ("Allow optimizations to treat the sign of a zero argument or result as insignificant") is implicit in the definition of minnum/maxnum. By explicitly propagating the 'nsz' in this case, we are future-proofing the behavior even for the new standard. The original fcmp+select said that sign of zero is a 'don't care', and we want the intrinsic call to have that same freedom. |
LGTM
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | Yeah, I agree your new code is fine. I was questioning whether we should be adding NSZ during the @fmin(...) lowering, since the NSZ isn't on the original fmin(...) call. But now I don't see anything that requires @fmin*(...) to honor the sign of a zero in the Standards or libm. So this is fine too. Sorry for the noise... |
llvm/test/Transforms/InstCombine/fast-math.ll | ||
---|---|---|
884 | Ah, I see now. Not noise at all - these are excellent and valid points. It goes back to your other comment - we could be transforming the fmin call directly to the intrinsic in these examples. So that confusion was just caused by me being lazy and not creating minimal tests for this patch. Let me do that, then I'll try to fix up that existing transform. |
Do we want NSZ here? I see that this was an artifact of the old fcmp+select lowering, but I'm not sure if it is correct. Definitely not IEEE-754 compliant...