Page MenuHomePhabricator

InstCombine: Insert missing canonicalizes
Needs ReviewPublic

Authored by arsenm on Sep 18 2017, 12:08 PM.

Details

Reviewers
scanon
escha
Summary

minnum/maxnum are supposed to return a canonicalized value.
We were folding cases where one of the operands is constant
without inserting the appropriate canonicalize. Also fix
the LangRef to mention the canonicalize part of the IEEE
definition.

Diff Detail

Event Timeline

arsenm created this revision.Sep 18 2017, 12:08 PM
fhahn added a subscriber: fhahn.Nov 21 2017, 5:26 AM
scanon edited edge metadata.Nov 27 2017, 11:05 AM

IEEE 754 rules are that everything canonicalizes except bitwise operations (copy, abs, negate, copysign) and decimal re-encoding operations (which you don't care about).

escha edited edge metadata.Nov 27 2017, 11:18 AM

IEEE 754 rules are that everything canonicalizes except bitwise operations (copy, abs, negate, copysign) and decimal re-encoding operations (which you don't care about).

Does this mean that we need to make all other float optimizations in LLVM do the same?

for example, we cannot optimize fmul(x, 1) to x, we must optimize it to fcanonicalize(x), right? thus preventing pretty much all float optimizations, since presumably that will act as a barrier.

IEEE 754 rules are that everything canonicalizes except bitwise operations (copy, abs, negate, copysign) and decimal re-encoding operations (which you don't care about).

Does this mean that we need to make all other float optimizations in LLVM do the same?

for example, we cannot optimize fmul(x, 1) to x, we must optimize it to fcanonicalize(x), right? thus preventing pretty much all float optimizations, since presumably that will act as a barrier.

Also related is D37999. Is it OK to universally constant fold some subset of canonicalizes?

IEEE 754 rules are that everything canonicalizes except bitwise operations (copy, abs, negate, copysign) and decimal re-encoding operations (which you don't care about).

Does this mean that we need to make all other float optimizations in LLVM do the same?

for example, we cannot optimize fmul(x, 1) to x, we must optimize it to fcanonicalize(x), right? thus preventing pretty much all float optimizations, since presumably that will act as a barrier.

Well, so far we're saved by the fact that there are no non-canonical floats or doubles in IEEE 754 (platforms that treat denorms as non-canonical zeros are already outside IEEE 754, so do whatever you like). Non-canonical encodings *do* exist for float80, but they canonicalize if you look at them funny, so it hasn't been a problem.

If we want to get this really formally right at some point, then yes, fmul(x,1) would become fcanonicalize(x), which would usually be consumed into whatever consumes the result of the multiply, because everything except the aforementioned functions is described in terms of canonicalized inputs.

Regardless of semantics, this patch almost surely causes a major problem for us.

I originally added fcanonicalize for the purpose of our numerics folks, who wanted to be able to force canonicalizations in numerics tests and so forth. It was meant to be a "dumb intrinsic" that would just blindly emit a no-op float instruction to force canonicalization of the input.

If we start emitting fcanonicalize all over the place in instcombine, it's going to destroy our codegen quality and significantly increase instruction count.

Do we really need to do this for the sake of float80 or something?

We also don't have optimizations now to eliminate canonicalizes when they are redundant, but could add them

Even if they're not redundant, it's something we still don't want.

For example: imagine a function that takes two input texture reads, takes the max, and stores that. On our arch, that wouldn't canonicalize, so we would be forced to insert an additional arithmetic instruction for every single "max", doubling the amount of math in the function.