This is an archive of the discontinued LLVM Phabricator instance.

[Target] move reciprocal estimate settings from TargetOptions to TargetLowering
ClosedPublic

Authored by spatel on Sep 21 2016, 3:22 PM.

Details

Summary

See D24815 for the clang side of this that turns command-line flags into a function attribute string.

The motivation for the change is that we can't have pseudo-global settings for codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via FMF, metadata, or something else. But making them function attributes is at least an improvement over the current mess.

The ingredients of this patch are:

  1. Remove the reciprocal estimate command-line debug option.
  2. Add TargetRecip to TargetLowering.
  3. Remove TargetRecip from TargetOptions.
  4. Clean up the TargetRecip implementation to work with this new scheme.
  5. Set the default reciprocal settings in TargetLoweringBase (everything is off).
  6. Update the PowerPC defaults, users, and tests.
  7. Update the x86 defaults, users, and tests.

Diff Detail

Event Timeline

spatel updated this revision to Diff 72114.Sep 21 2016, 3:22 PM
spatel retitled this revision from to [Target] move reciprocal estimate settings from TargetOptions to TargetLowering.
spatel updated this object.
spatel added reviewers: echristo, evandro, hfinkel.
spatel added a subscriber: llvm-commits.

The question seems to me to be: is it a property of the sub target (can be switched function-per-function) or is it uniform for a target or for a module)?

If the latter, then it does not belong to a function attribute.

The question seems to me to be: is it a property of the sub target (can be switched function-per-function) or is it uniform for a target or for a module)?

If the latter, then it does not belong to a function attribute.

IMO, it needs even finer granularity than the function - I'd like to get this on instructions similar to other FMF attributes. We have users that want to selectively vary the codegen for sqrt/div within a function using source code pragmas.

echristo edited edge metadata.Sep 21 2016, 3:45 PM

The question seems to me to be: is it a property of the sub target (can be switched function-per-function) or is it uniform for a target or for a module)?

If the latter, then it does not belong to a function attribute.

So, the interesting thing is that people are using this on a per-subtarget basis for tuning. i.e. cpu X in the presence of this flag tunes one way, CPU Y tunes a different way.

IMO, it needs even finer granularity than the function - I'd like to get this on instructions similar to other FMF attributes. We have users that want to selectively vary the codegen for sqrt/div within a function using source code pragmas.

It seems this patch does not get you any closer from this goal?

IMO, it needs even finer granularity than the function - I'd like to get this on instructions similar to other FMF attributes. We have users that want to selectively vary the codegen for sqrt/div within a function using source code pragmas.

It seems this patch does not get you any closer from this goal?

There are independent goals, but I see this is an intermediate step:

  1. It allows the settings to be different per function.
  2. The settings survive in an LTO build. AFAIK, currently we drop all reciprocal settings with LTO.
  3. It allows rL268539 to be reimplemented. Not sure if there are other barriers for that though?

There are independent goals, but I see this is an intermediate step:

  1. It allows the settings to be different per function.

The technical solution to handle this "per instruction" would be totally different though. These attributes would become obsolete as soon as you have the per instruction solution.
Now if there is not plan to get something per-instruction in the near future, why not.

What happens during inlining with this patch?

  1. The settings survive in an LTO build. AFAIK, currently we drop all reciprocal settings with LTO.

This is incidental and shouldn't be a motivation, i.e. making something a property of the function when it is not intrinsically just to "fix" LTO does not seem right to me. The LTO implementation is responsible to properly initialize the backend/codegen,

  1. It allows rL268539 to be reimplemented. Not sure if there are other barriers for that though?

The post-commit thread is an interesting reading :)

Note: I'm not against this patch, I'm just raising questions, which you may have already considered :)

There are independent goals, but I see this is an intermediate step:

  1. It allows the settings to be different per function.

The technical solution to handle this "per instruction" would be totally different though. These attributes would become obsolete as soon as you have the per instruction solution.
Now if there is not plan to get something per-instruction in the near future, why not.

These are all excellent questions. I think I'm going to learn something today. :)

I agree that the 'per instruction' solution would obsolete this. I should have made that clearer in the patch summary, and I will make that clear in the commit message if this patch is committed.

So why don't we just get to the ideal solution right now? It's just not high enough on anyone's priority list (including mine) to get it done immediately. Partly, this is because of an issue that you properly raised and I mentioned in the post-commit thread for r268539:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160104/323154.html

So while I'd like to do 'per instruction', I think it's (much?) more time and work than this. It's worth mentioning that I wouldn't have made this patch a priority if I didn't think that all of reciprocal estimate codegen had been issued an existential threat - thanks, Eric! :)

It's possible that I misinterpreted the messages from r268539, but I took the possibility of a PPC revert as a chance that x86 would be the next domino to fall. After reviewing the diffs here, I still don't know why x86 is immune to the LTO-breakage problem if PPC is not. Neither appears to be directly predicating codegen on a CPU model, just CPU subtarget features.

FWIW, I think the AArch patch absolutely needs to be re-worked so that it's not checking 'isExynosM1()'. This goes back to Eric's last comment and I think is your primary concern: -mrecip was not intended to be a subtarget-based tuning parameter. It's a programmer-based optimization hint that says, "I want to make a speed/accuracy trade-off for these particular FP operations using these parameters to tune the codegen." At least, that's my understanding: it's a refinement of -ffast-math.

What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math). functionsHaveCompatibleAttributes() or something in the cost model might need updating to improve this?

What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math).

Is it legal?
You said that this is a tradeoff speed/precision that is made by the programmer. If the cursor is set on "speed" on the caller, inlining a callee where it is not the case would lead to less precision for the callee code.

What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math).

Is it legal?
You said that this is a tradeoff speed/precision that is made by the programmer. If the cursor is set on "speed" on the caller, inlining a callee where it is not the case would lead to less precision for the callee code.

Yes - but that's where the protective cover of -ffast-math comes into play. We were already issued a license to go crazy...just how crazy is up to artistic interpretation. Keep in mind that we already generate reciprocal estimates with -ffast-math (with some guidance from the target about the default codegen). The -mrecip attributes should give us more clues about what the programmer wants, but we make no guarantees about what they'll get.

Today we drop the fast-math attributes if we inline a non-fastmath function in a fast-math one I believe.

I'm not sure how the mrecip works: is it adding refinement only to prevent some specific "fast-math" (i.e. unsafe) transformations?
I.e. if the fast-math flag is dropped, what is the impact of the "mrecip" attribute?

Bonus points: where is it documented?

I'm not sure how the mrecip works: is it adding refinement only to prevent some specific "fast-math" (i.e. unsafe) transformations?

'Prevent' is not the right description. It can be used to inhibit or enhance the effects of fast-math, but it should have no effect without fast-math.

I.e. if the fast-math flag is dropped, what is the impact of the "mrecip" attribute?

If fast-math is off, mrecip should have no impact. We should be able to confirm this independently of this patch.

Here's an IR playpen we can use to walk through some scenarios:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.11.0"

define float @foo(float %x) #0 {
  %y = call fast float @bar(float %x)
  ret float %y
}

define float @bar(float %x) #1 {
  %y = call float @baz(float %x)
  ret float %y
}

define float @baz(float %x) #2 {
  %y = fdiv float 1.0, %x
  ret float %y
}

attributes #0 = { "unsafe-fp-math"="false" "mrecip"="divf:0" }
attributes #1 = { "unsafe-fp-math"="true" "mrecip"="divf:1" }
attributes #2 = { "unsafe-fp-math"="true" "mrecip"="divf:0" }

Notice that the backend still largely ignores the instruction-level FMF, so you can delete any 'fast' on the calls or fdiv, and there should be no difference in codegen.

I tried some experiments using something like this:
$ ./opt -inline fdiv.ll -S | ./llc -o -

I haven't seen anything go wrong yet, but I'm still trying. :)

Bonus points: where is it documented?

That's an easy one - we have no docs. The intent was to mimic and expand on the gcc flag with the same name:
https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/x86-Options.html#x86-Options
https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options

spatel updated this revision to Diff 72291.Sep 23 2016, 8:54 AM
spatel edited edge metadata.

Patch updated:
Add getTargetRecipForFunc() to TargetLoweringBase to eliminate the repeated code in the callers. Now the lowering diffs for PPC and x86 are one-line changes.

I wondered how that prerequisite check for fast-math was currently handled:

if (Options.UnsafeFPMath) {

which gets set by:

// FIXME: This function needs to go away for a number of reasons:
// a) global state on the TargetMachine is terrible in general,
// b) there's no default state here to keep,
// c) these target options should be passed only on the function
//    and not on the TargetMachine (via TargetOptions) at all.
void TargetMachine::resetTargetOptions(const Function &F) const {

...which made me think of this bug:
https://llvm.org/bugs/show_bug.cgi?id=23172

The world is broken in a way much bigger than inlining too strict or too loose reciprocal estimate codegen settings, isn't it?

I wondered how that prerequisite check for fast-math was currently handled:

if (Options.UnsafeFPMath) {

which gets set by:

// FIXME: This function needs to go away for a number of reasons:
// a) global state on the TargetMachine is terrible in general,
// b) there's no default state here to keep,
// c) these target options should be passed only on the function
//    and not on the TargetMachine (via TargetOptions) at all.
void TargetMachine::resetTargetOptions(const Function &F) const {

...which made me think of this bug:
https://llvm.org/bugs/show_bug.cgi?id=23172

The world is broken in a way much bigger than inlining too strict or too loose reciprocal estimate codegen settings, isn't it?

Pretty much. I've been trying to get all of the ones that "matter" out of there as fast as possible.

That said, what -would- you like to do for inlining here? It seems like you're going to want a target routine there on matching. My guess for now is that you actually expect everything to be compiled with the same options and so not inlining based on difference would also be safe. @hfinkel thoughts here?

A couple of comments inline otherwise I'm fine with this implementation for now.

Thanks!

-eric

include/llvm/Target/TargetLowering.h
2178

Seems reasonable - otherwise possible to keep it local in the various routines that want to know about the estimates? Parse there rather than initialize at the beginning? That way the function above that builds up the struct can be the whole interface rather than caching. That said, not sure how often it's called.

lib/CodeGen/TargetLoweringBase.cpp
1488

Bikeshed: mrecip is pretty hard to understand (for me at least) naming wise, reciprocal-estimates while longer might be a bit more comprehensible?

hfinkel accepted this revision.Oct 2 2016, 7:20 PM
hfinkel edited edge metadata.

I think that the per-instruction metadata is a neat idea, and opens up a number of interesting and useful possibilities. However, we really should fix the current feature. This LGTM, although you should probably just address your TODO comment before committing unless there is some reason it is not trivial.

include/llvm/Target/TargetRecip.h
51

This TODO does not explain what this means. Do you mean using -1 RefinementSteps instead of having a separate boolean flag?

We probably should just do this, the current interface which has "true, Steps" everywhere is not aesthetically pleasing.

This revision is now accepted and ready to land.Oct 2 2016, 7:20 PM

I wondered how that prerequisite check for fast-math was currently handled:

if (Options.UnsafeFPMath) {

which gets set by:

// FIXME: This function needs to go away for a number of reasons:
// a) global state on the TargetMachine is terrible in general,
// b) there's no default state here to keep,
// c) these target options should be passed only on the function
//    and not on the TargetMachine (via TargetOptions) at all.
void TargetMachine::resetTargetOptions(const Function &F) const {

...which made me think of this bug:
https://llvm.org/bugs/show_bug.cgi?id=23172

The world is broken in a way much bigger than inlining too strict or too loose reciprocal estimate codegen settings, isn't it?

Pretty much. I've been trying to get all of the ones that "matter" out of there as fast as possible.

That said, what -would- you like to do for inlining here? It seems like you're going to want a target routine there on matching. My guess for now is that you actually expect everything to be compiled with the same options and so not inlining based on difference would also be safe. @hfinkel thoughts here?

I think we need to be careful here; users set this option, in some cases, so that they can use fast-math but still get the accuracy they require (by increasing the default number of Newton iterations). We should really not inline functions that require more iterations into functions that require fewer. Obviously using per-instruction metadata fixes this is a much nicer way. For better or worse, this is really only an issue for LTO builds.

A couple of comments inline otherwise I'm fine with this implementation for now.

Thanks!

-eric

echristo added inline comments.Oct 3 2016, 5:15 PM
lib/Target/TargetRecip.cpp
157

Do we still need ::set rather than just putting it as part of the constructor?

spatel marked an inline comment as done.Oct 4 2016, 12:36 PM
spatel added inline comments.
include/llvm/Target/TargetLowering.h
2178
include/llvm/Target/TargetRecip.h
51

So yes, I agree this looks ugly, and it seemed simple enough to just combine the fields into a single int value when I added the TODO comment, but on closer inspection, it's not trivial because the user is allowed to specify "default" for the enablement part and still change the number of refinement steps. I think this also mucks with Eric's suggestions to simplify the 'set' method and/or just pull it all into the local users.

There's probably some relatively simple way to do this that I'm just not seeing right now, but even if there is, I'd rather not jeopardize the important thing (getting this out of target options) by introducing a bug while refactoring.

I've also convinced myself that I just need to get back on the FMF wagon and finish the DAG work and the clang/IR changes, so we can finally be clear of this mess. If that succeeds, then the ugly bits are going to disappear anyway. :)

spatel updated this revision to Diff 73540.Oct 4 2016, 1:02 PM
spatel edited edge metadata.

Patch updated:

  1. Expanded the TODO comment in TargetRecip.h to mention the "default" case.
  2. Changed the attribute name from "mrecip" to "reciprocal-estimates" (matches the clang side of the patch).
This revision was automatically updated to reflect the committed changes.
echristo added inline comments.Oct 7 2016, 4:23 PM
llvm/trunk/include/llvm/Target/TargetLowering.h
2184 ↗(On Diff #73549)

I figured with getTargetRecipForFunc we could remove this? It returns the struct...

spatel added inline comments.Oct 9 2016, 11:30 AM
llvm/trunk/include/llvm/Target/TargetLowering.h
2184 ↗(On Diff #73549)

Yes, there's big refactor/rewrite potential here - the current API/implementation is just awful. Fixing it will affect D25291, but it should be easy to adapt if that goes in first.

Let me start working on the refactor since it's not clear how long we'll need this solution. I did look into what the ideal (instruction-level) solution might look like and have a good lead: MD_fpmath can be extended for reciprocal estimates. I'll send a proposal to llvm-dev to see if anyone sees problems with that.