Page MenuHomePhabricator

[IR] redefine 'reassoc' fast-math-flag and add 'trans' fast-math-flag
ClosedPublic

Authored by spatel on Oct 25 2017, 1:49 PM.

Details

Summary

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html

...this is a step in cleaning up our fast-math-flags implementation in IR.

As proposed in the above threads, I've replaced the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'.

I've also added a bit to allow relaxed precision for transcendental functions called 'AllowTrans' (this was initially proposed as 'libm' or similar).

...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day.

I'm not sure, but I may be diverging from the last proposal by keeping the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract trans float %op1, %op2
...made me think we want to keep the shortcut synonym.

I've also gone ahead and renamed the getter/setters, and mechanically added 'TODO' comments where we need to review how the old setUnsafeAlgebra() was used. In some cases, it's obvious that should be translated to setAllowReassoc(), but others may need to be discussed.

Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. I'm hoping this:
"Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR version. Ie, I don't think we're ever loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we squash all of the TODO comments.

Diff Detail

Event Timeline

spatel created this revision.Oct 25 2017, 1:49 PM
spatel updated this revision to Diff 120322.Oct 25 2017, 2:55 PM

Patch updated:
I forgot to add assembler tests for the new keywords which revealed that I had forgotten to add 'reassoc' to the parser. Fixed the code and added tests.

efriedma added inline comments.Oct 25 2017, 3:07 PM
docs/LangRef.rst
2283

Could you mark up the LLVM intrinsics affected by the "trans" flag with the meaning with/without the flag?

Please clarify the language here to indicate this affects the semantics of both LLVM intrinsics and known library functions. And include a more complete description of what counts as a known library function. And explain what "relaxed" means, given that libm generally doesn't provide correctly rounded versions of transcendental functions.

Also, do we want to optimize sqrt() based on this flag? It technically isn't transcendental, but we currently generate an approximation in some cases based on fast-math flags.

spatel added inline comments.Oct 25 2017, 4:12 PM
docs/LangRef.rst
2283

I think we'd include sqrt() in the 'trans' bucket (so maybe 'libm' was the better name). But looking back through the dev thread, I don't see an actual definition of that term or what this flag would map to as a clang command-line param. @hfinkel / @wristow - suggestions?

D39319 (Set contract flag when setting unsafe algebra flag) proposes to fix a bug introduced with D31164 (Add AllowContract to FastMathFlags). That bug is also fixed here - but I should add that test.

This might suggest that we should split 'trans' into its own small follow-up patch. That would also allow the unsafe TODO fixes to proceed independently of 'trans' in case that takes longer to sort out. Let me know if splitting this patch up seems better.

hfinkel edited edge metadata.Oct 26 2017, 10:26 AM

I don't see a lot of value in having:

// TODO: This should use hasAllowReassoc()?

before every call to isFast. I can grep for isFast more easily than I can grep for the comment. Moreover, if something really needs 'isFast' because the transformation really needs "every possible liberty", then there should definitely be a comment explaining that.

Also, unless we make hasAllowReassoc() also imply hasNoSignedZeros() (as GCC does, FWIW), we'll almost certainly need that check as well in many places. Many need NoNans and NoInfs too.

docs/LangRef.rst
2283

You're correct, sqrt is not a transcendental function (it is an algebraic function because it is the root of a polynomial equation). The problem is that algebraic functions also include things that we don't want to include here (e.g., addition, division). I don't like 'libm' because it refers to a very specific set of functions, but maybe that's the best we can do. The best description I have of this set is: "All of the mathematical operations that generally produce irrational numbers and are not in the set of functions specified by the IEEE specification (e.g., +,-,*,/,%,sqrt)."

wristow added inline comments.Oct 26 2017, 11:42 AM
docs/LangRef.rst
2283

For context, the original suggestion of the flag-name 'libm' was changed to 'trans' based on an observation that OpenCL has an option '-cl-fast-relaxed-math' that includes the semantics "This option also relaxes the precision of commonly used math functions" (see http://man.opencl.org/clCompileProgram.html), and so 'libm' may incorrectly imply it only applies to a more limited set of functions (math operations) only implemented in "libm.a". That led to the suggestion of 'trans', for transcendental functions. I originally liked the 'trans' suggestion, moving away from the libm.a implication. But I do feel that sqrt() should also be optimized on this flag, so as people said, 'trans' isn't perfect either.

I don't have a precise formal definition of what things would be controlled by this, but loosely, I'd say it's "All mathematical operations that have runtime library support on many platforms." In practice in LLVM, I think this means many (most?) of the math operations handled by "SimplifyLibCalls.cpp". So for me, in addition to transcendental functions, it would include sqrt(), and even simpler things like fmin() and fmax().

So after giving it more thought, I'm now back to preferring 'libm' over 'trans'. If someone has a better suggestion, that would be great. But I haven't thought of one.

wristow edited edge metadata.Oct 26 2017, 12:01 PM

...
This might suggest that we should split 'trans' into its own small follow-up patch. That would also allow the unsafe TODO fixes to proceed independently of 'trans' in case that takes longer to sort out. Let me know if splitting this patch up seems better.

I'd like to see the new enumerations of the FastMathFlags bits all defined in one patch. So removing 'UnsafeAlgebra', and adding 'AllowReassoc' and 'AllowTrans' (along with all the changes required since 'UnsafeAlgebra' can no longer be referenced) in the first patch, is my preference. (As an aside, probably 'AllowTrans' will be changed to something like 'AllowMathLib', due to other discussions here.)

...
This might suggest that we should split 'trans' into its own small follow-up patch. That would also allow the unsafe TODO fixes to proceed independently of 'trans' in case that takes longer to sort out. Let me know if splitting this patch up seems better.

I'd like to see the new enumerations of the FastMathFlags bits all defined in one patch. So removing 'UnsafeAlgebra', and adding 'AllowReassoc' and 'AllowTrans' (along with all the changes required since 'UnsafeAlgebra' can no longer be referenced) in the first patch, is my preference. (As an aside, probably 'AllowTrans' will be changed to something like 'AllowMathLib', due to other discussions here.)

Makes sense to me as well.

wristow added inline comments.Oct 26 2017, 12:15 PM
docs/LangRef.rst
2283

So after giving it more thought, I'm now back to preferring 'libm' over 'trans'. If someone has a better suggestion, that would be great. But I haven't thought of one.

In the spirit of the flag 'arcp' for 'AllowReciprocal', and the possibility of 'AllowMathLib' for the internal enumeration name, how about 'amlib'?

With that, there's no direct implication of "this is only for libm.a operations", or "this is only for transcendental functions". That said, the 'am' part does give it an awkward "morning library" feeling. Maybe 'amathlib' instead?

docs/LangRef.rst
2283

It seems that we're allowing something kind of open-ended here. That is, we don't seem to have an exact set of functions that will be covered. If that's the case then we should probably document it as such -- something like "allow substitution of approximate calculations for functions whose meaning are recognized by the optimizer." And maybe the flag could be "approx".

2286

Since reciprocal is an algebraically-equivalent transformation, this documentation isn't quite correct. Does this enable anything other than reassociation?

hfinkel added inline comments.Oct 26 2017, 12:35 PM
docs/LangRef.rst
2283

I like this suggestion.

wristow added inline comments.Oct 26 2017, 10:20 PM
docs/LangRef.rst
2283

Yes, I think it's open-ended. And I very much like the description "allow substitution of approximate calculations for functions whose meaning are recognized by the optimizer.".

I'm less enthusiastic about the flag-name 'approx', although I'm not horribly opposed to it (especially since I cannot come up with anything I really like). On it's own, a flag named 'approx' sounds too wide of scope. To me, it sounds like it might be describing all of what is enabled by -ffast-math. In short, it doesn't explicitly convey the concept of it being approximate calculations for functions whose meanings are recognized. To describe my concern "from the other direction", the reciprocal transformation is also an approximation (as is virtually everything else enabled by fast-math), and we don't intend to control the reciprocal transformation via this flag.

2286

Good point. I don't think 'reassoc' intended to enable anything else. Maybe:

Allow algebraic reassociation transformations that may dramatically change results in floating point.

For that matter, I think 'algebraic' could also be removed, so just "Allow reassociation transformations that..." would work.

hfinkel added inline comments.Oct 27 2017, 8:11 AM
docs/LangRef.rst
2283

We could call it funcapprox or approxfunc or something like that. We could use something shorter too (apfn perhaps).

wristow added inline comments.Oct 27 2017, 10:36 AM
docs/LangRef.rst
2283

I like it. I'm happy with any of those three. Given the brevity of many of the others (eg, 'arcp', 'nnan'), I lean toward 'apfn'.

spatel updated this revision to Diff 120647.Oct 27 2017, 10:51 AM
spatel marked 4 inline comments as done.

Patch updated:

  1. Changed 'AllowTrans' to 'AllowMathLib' and 'trans' to 'aml'. The IR abbreviation is more alphabet soup, but less "awkward morning" than 'amlib' - still open to suggestions :)
  2. Updated LangRef with suggested fixes. Trying to toe the weasel-word line here with enough ambiguity to accomplish transforms but not be completely meaningless.
  3. Removed 'TODO' comments at uses of isFast() / setFast(). We can just grep those calls and fix them.
  4. Added a variant of the 'fast' means all flags test from D39319.

Sadly, I didn't get email notifications for the last couple of name suggestions before I posted the updated patch. I agree that 'ApproxFunc' and 'afn' are better than 'AllowMathLib' and 'aml', so I'll change that unless there are objections or better suggestions.

spatel updated this revision to Diff 120687.Oct 27 2017, 1:34 PM

Patch updated:
'AllowMathLib' --> 'ApproxFunc'
'aml' --> 'afn'

hfinkel added inline comments.Oct 27 2017, 7:08 PM
docs/LangRef.rst
2282

I think that we should essentially use Andrew's proposed definition here:

Allow substitution of approximate calculations for functions (e.g., sin, log, sqrt).
10475

This conformance language is only necessary for sqrt. For the other functions, there is no standard for their accuracy/precision. You might say that with 'afn' the result may not match what would have been returned by the system's libm implementation.

spatel updated this revision to Diff 120747.Oct 29 2017, 8:38 AM
spatel marked 2 inline comments as done.

Patch updated - edited LangRef phrasing.

hfinkel added inline comments.Oct 30 2017, 3:05 PM
docs/LangRef.rst
10830

I don't expect that afn would affect fma. I recommend removing the statement here. It might be true that fma is computed differently under -ffast-math because of denormal handling, but that applies to everything. The same is true of the conversion functions below (floor, rint, etc.). maxnum/minnum too.

spatel updated this revision to Diff 120909.Oct 30 2017, 3:59 PM

Patch updated:
We don't need the 'afn' warning label on every FP intrinsic. The note is now only applied to these 10:
sqrt, powi, sin, cos, pow, exp, exp2, log, log10, log2

efriedma added inline comments.Oct 30 2017, 5:19 PM
docs/LangRef.rst
10475

We probably want different language here than for the transcendental functions; libm sqrt() is precisely the IEEE754 squareRoot().

10514

Does afn actually do anything here? I think "unspecified sequence of rounding operations" implies it isn't exact anyway. (And __powisf2 isn't really part of libm.)

10548

This language might give the wrong idea. Even without 'afn', the result may not match the target's libm: we constant-fold using a different implementation. The part to call out is that we might substitute an implementation which is less accurate.

10830

We might want to transform fma(a,b,c) to a*b+c in fast-math mode on targets which don't have a native fma instruction? Not sure if that makes sense.

hfinkel added inline comments.Oct 30 2017, 5:48 PM
docs/LangRef.rst
10475

True. On the other hand, the system's libm sqrt should be IEEE compliant, so saying that this differs from the libm result covers that (and also covers other cases, such as PPC long double, which aren't IEEE).

10548

we constant-fold using a different implementation.

Which, indeed, might be less accurate -- just hopefully not by much ;)

I realize that having these notes on the intrinsics seems like it could be helpful, but I'm leaning toward recommending that we don't have them at all.

10830

Good point. This would be relatively easy to implement as well:

afn fma(a, b, c) -> [afn] fmuladd(a, b, c)

fmuladd lowers either to an fma, or not, depending on target preferences.

wristow added inline comments.Oct 30 2017, 6:37 PM
include/llvm/IR/Operator.h
220

One loose end that needs to be taken care of more or less simultaneously is a Clang change. Specifically, the constructor for CodeGenFunction (in "CodeGenFunction.cpp") invokes FastMathFlags::setUnsafeAlgebra(), so it will need to be changed to setFast().

spatel added inline comments.Oct 31 2017, 6:39 AM
include/llvm/IR/Operator.h
220

This is correct - I didn't post it, but I have that one line patch in place locally, so I was planning to submit it to the clang repo as close as possible after this patch and reference this commit (if there's a way to avoid the build breakage cleanly, please let me know).

spatel updated this revision to Diff 120986.Oct 31 2017, 8:47 AM
spatel marked 4 inline comments as done.

Patch updated - just changes to the LangRef text (this now attempts to subsume the similar changes in D28335) :

  1. Add an extra line to the 'sqrt' semantics because that has well-defined behavior...unless the type is not IEEE.
  2. Remove the 'afn' blurb for 'powi' because that's loose by definition.
  3. Add the 'afn' blurb back to 'fma' because that actually could be affected.
  4. Try to soften the existing libm language for all 10 'afn'-affected intrinsics while retaining some shred of accuracy. :)
wristow added inline comments.Oct 31 2017, 7:08 PM
include/llvm/IR/Operator.h
187

We'll need to add flags to SDNodeFlags that are analogous to AllowReassoc and ApproxFunc. Adding them in a separate patch seems fine, but in case the lack of that change in this patch was an oversight, I wanted to raise the point here.

220

I don't know of a clean way. Definitely fine with me to submit it right after this patch is submitted.

lib/Transforms/Scalar/Reassociate.cpp
2021

Very minor point/question: Since the test is no longer hasUnsafeALgebra(), are we OK with the comment still saying unsafe algebra? Or do we want to change the comment above to something like:

// Don't optimize floating point instructions that don't have fast-math.

I'm fine leaving it as-is, but I've found these sorts of things in a handful of places, so if we want to change them, I'll look through the patch more thoroughly, and identify each one I find.

test/Assembler/fast-math-flags.ll
97

Is testing the 'reassoc' flag on the 'call' instruction really what you intended here? I would have thought add/sub/mul/div, but 'call' surprised me.

I'm happy; please go ahead when the other reviewers are too.

spatel added inline comments.Nov 2 2017, 7:34 AM
include/llvm/IR/Operator.h
187

I want to fix the backend too, but that's separate patches. For example, we don't propagate the IR flags properly yet (see D37686).

Also IIRC, the backend does not treat the global "-enable-unsafe-fp-math" as an umbrella for other flags. So that setting does not imply "-enable-no-nans-fp-math" or other FP relaxations.

lib/Transforms/Scalar/Reassociate.cpp
2021

I'll fix the comment to match the current code. Since this is the reassociation pass, I would guess that 'reassoc' is all we need to enable transforms here, but we'll have to verify that that is correct.

test/Assembler/fast-math-flags.ll
97

It is surprising but intentional. I want to highlight the fact that we allow any FMF on any FPMathOperator. So things like this or 'fadd arcp ...' are legal but I'm not sure how that would be used for optimization. We may want to refine that someday?

wristow added inline comments.Nov 2 2017, 10:55 AM
include/llvm/IR/Operator.h
187

I see. Thanks for clarifying.

lib/Transforms/Scalar/Reassociate.cpp
2021

To be clear, I'm not suggesting that in this patch we change the code here to check for 'reassoc' (i.e., I'm not suggesting we change the isFast() call to hasAllowReassoc() at this time). I view that as a separate piece of work, where we go through and carefully audit existing FMF-related checks, and decide how to use the more precise flags. Possibly it's just 'reassoc' that is needed for this case, or possibly it's 'reassoc' and some other conditions.

All I was suggesting by my comment/question, is that code-comments referring to a no longer-existing "unsafe algebra" umbrella flag, are a bit misleading. So I wondered whether we wanted to change those comments to better match the new implementation. It's a pretty minor point, in my view.

From my POV, the main purpose of this patch is to fix the underlying implementation to allow us to go through and do that audit, and fix issues like this "use isFast() or use some finer check?" example, here.

test/Assembler/fast-math-flags.ll
97

OK, thanks for explaining.

spatel updated this revision to Diff 121546.Nov 3 2017, 2:01 PM

Patch updated:
Change code comments that reference 'unsafe' to the new 'fast' vocabulary. This is not condoning what may be a wrongful use of 'fast'; it's just trying to keep our code and comments in sync. I didn't go out of my way to look beyond a few lines in the diffs, so there may still be 'unsafe' refs and wrapper functions out there, but we'll squash those as we audit the usage of 'isFast'.

wristow accepted this revision.Nov 3 2017, 2:42 PM

Patch updated:
Change code comments that reference 'unsafe' to the new 'fast' vocabulary. ...

That covers my last questions.
LGTM.

This revision is now accepted and ready to land.Nov 3 2017, 2:42 PM
This revision was automatically updated to reflect the committed changes.
mcberg2017 added a subscriber: mcberg2017.EditedFeb 7 2018, 3:29 PM

As an llvm vendor for Apple targets we have actually run into the incompatibility issues with old fast and current fast, we elected to use version markers in the IR for in house code to tell us that the old fast is relevant in the bit code reader during auto upgrade for IR into new Fast format.

spatel added a comment.Feb 8 2018, 7:01 AM

As an llvm vendor for Apple targets we have actually run into the incompatibility issues with old fast and current fast, we elected to use version markers in the IR for in house code to tell us that the old fast is relevant in the bit code reader during auto upgrade for IR into new Fast format.

Thanks for letting me know. Did this manifest as a performance problem only or was there a visible functional difference too?

Hi Sanjay,

Did this manifest as a performance problem only or was there a visible functional difference too?

I guess it depends how you qualify visible functional difference :).
With the previous layout of the bits, reassociation was performed, but with the new layout, it is not.
The reason behind that reassociation checks isFast, which with the new layout is true only if all the 7 bits are set to true and that doesn't happen when the upgrade path is used (we get only 5 of them).

So this is a performance problem that is exposed by a functional difference in the compiler (an optimization was triggered and now it is not).

Given this new format was not released yet, do you see a way we could make the autoupgrade path to preserve the isFast semantic from the old format to the new format?

I haven't audited the code, but potential we are going to regress all the code that relied on isFast and given this is not publicly available yet (unless I am mistaken), I was wondering if there is a way to fix that.

Cheers,
-Quentin

spatel added a comment.Feb 8 2018, 4:48 PM

Hi Sanjay,

Did this manifest as a performance problem only or was there a visible functional difference too?

I guess it depends how you qualify visible functional difference :).
With the previous layout of the bits, reassociation was performed, but with the new layout, it is not.
The reason behind that reassociation checks isFast, which with the new layout is true only if all the 7 bits are set to true and that doesn't happen when the upgrade path is used (we get only 5 of them).

So this is a performance problem that is exposed by a functional difference in the compiler (an optimization was triggered and now it is not).

Right - we expected that could happen with existing IR compiled with -ffast-math. I don't think we could go wrong in that scenario given that 'fast' gives us license to do all kinds of transforms, but doesn't require it - although people have different expectations once they get accustomed to the optimizations. :)

Given this new format was not released yet, do you see a way we could make the autoupgrade path to preserve the isFast semantic from the old format to the new format?
I haven't audited the code, but potential we are going to regress all the code that relied on isFast and given this is not publicly available yet (unless I am mistaken), I was wondering if there is a way to fix that.

Michael has this patch already, right? I think we have to create a new version of the IR since the bits changed meaning (we can't just flip 'on' new bits).

I have no objection to that, but that use case wasn't important to me, so that's why I didn't bother in this patch. AFAIK, this is new for v6.0, so yes, it's still possible to do this before that window closes.

I think we have to create a new version of the IR since the bits changed meaning (we can't just flip 'on' new bits).

Yeah, exactly.

I have no objection to that, but that use case wasn't important to me, so that's why I didn't bother in this patch. AFAIK, this is new for v6.0, so yes, it's still possible to do this before that window closes.

That would be ideal, but on the other hand, like you said, the new code won't be wrong.
I don't know what it takes to bump the bitcode version nor what are the implications, so maybe that's the right call, but let us have this conversation on LLVM dev.

Let me start a thread on that.

BTW, while writing the RFC, I realized that we could potentially generated incorrect code if we were silently downgrade a post-r317488 bitcode with a pre-r317488 compiler. (I.e., running fast math optimizations whereas we only wanted reassoc)