minnum/maxnum are supposed to return a canonicalized value.
We were folding cases where one of the operands is constant
without inserting the appropriate canonicalize. Also fix
the LangRef to mention the canonicalize part of the IEEE
definition.
Details
Diff Detail
Event Timeline
IEEE 754 rules are that everything canonicalizes except bitwise operations (copy, abs, negate, copysign) and decimal re-encoding operations (which you don't care about).
Does this mean that we need to make all other float optimizations in LLVM do the same?
for example, we cannot optimize fmul(x, 1) to x, we must optimize it to fcanonicalize(x), right? thus preventing pretty much all float optimizations, since presumably that will act as a barrier.
Also related is D37999. Is it OK to universally constant fold some subset of canonicalizes?
Well, so far we're saved by the fact that there are no non-canonical floats or doubles in IEEE 754 (platforms that treat denorms as non-canonical zeros are already outside IEEE 754, so do whatever you like). Non-canonical encodings *do* exist for float80, but they canonicalize if you look at them funny, so it hasn't been a problem.
If we want to get this really formally right at some point, then yes, fmul(x,1) would become fcanonicalize(x), which would usually be consumed into whatever consumes the result of the multiply, because everything except the aforementioned functions is described in terms of canonicalized inputs.
Regardless of semantics, this patch almost surely causes a major problem for us.
I originally added fcanonicalize for the purpose of our numerics folks, who wanted to be able to force canonicalizations in numerics tests and so forth. It was meant to be a "dumb intrinsic" that would just blindly emit a no-op float instruction to force canonicalization of the input.
If we start emitting fcanonicalize all over the place in instcombine, it's going to destroy our codegen quality and significantly increase instruction count.
Do we really need to do this for the sake of float80 or something?
We also don't have optimizations now to eliminate canonicalizes when they are redundant, but could add them
Even if they're not redundant, it's something we still don't want.
For example: imagine a function that takes two input texture reads, takes the max, and stores that. On our arch, that wouldn't canonicalize, so we would be forced to insert an additional arithmetic instruction for every single "max", doubling the amount of math in the function.