This is an archive of the discontinued LLVM Phabricator instance.

make reciprocal estimate code generation more flexible by adding command-line options
ClosedPublic

Authored by spatel on Apr 10 2015, 1:27 PM.

Diff Detail

Event Timeline

spatel updated this revision to Diff 23616.Apr 10 2015, 1:27 PM
spatel retitled this revision from to [x86] make reciprocal estimate code generation more flexible.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: andreadb, alexr, craig.topper.
spatel added a subscriber: Unknown Object (MLST).
hfinkel requested changes to this revision.Apr 11 2015, 4:57 PM
hfinkel edited edge metadata.

Whether or not we're allowed to use estimates for operations that are normally required to be exact, div and sqrt, are not really target features. They're much closer, conceptually, to UnsafeFPMath and friends. Plus, we need these same options for other backends (PowerPC, for example). I'd like to see flags for these added to include/llvm/Target/TargetOptions.h (along with UnsafeFPMath) so that we can handle them in a uniform way.

This revision now requires changes to proceed.Apr 11 2015, 4:57 PM

Hi Hal -

Thanks for the quick feedback. I'm looking at the PPC variations that gcc supports. I'm thinking that we'll need an enum with bitmasks to cover all the potential variations. Just making sure that I'm not going into the weeds...

Also (if maintaining compatibility with gcc is a goal), we'll still need to accept and process different flags per arch.

Hi Hal -

Thanks for the quick feedback. I'm looking at the PPC variations that gcc supports. I'm thinking that we'll need an enum with bitmasks to cover all the potential variations. Just making sure that I'm not going into the weeds...

I think that sounds right.

Also (if maintaining compatibility with gcc is a goal), we'll still need to accept and process different flags per arch.

Not exactly in that sense. If we accept a superset of the options that gcc supports because we apply them to targets/variants that gcc does not, that's fine. This seems pretty target independent, and I'd rather handle it in a uniform way if possible.

spatel updated this revision to Diff 23803.Apr 15 2015, 3:01 PM
spatel edited edge metadata.

After thinking it over, I decided that a simple enum of reciprocal operation bools wasn't going to do the job. In addition to wanting to enable specific ops on specific data types, we have users asking to customize the number of N-R refinements calculated...and they want different counts based on the type of op.

The best way to achieve this level of flexibility (and further customization requests that are sure to follow) is with a 'TargetRecip' class. So that's what I'm proposing to add in this updated patch.

From the llc command-line, you can now do something like this:
$ llc -recip=vec-divf,sqrtd.2

That translates to: allow vector float division and scalar double square root codegen. Use the target default setting for N-R refinement steps for the div but override the refinement steps for the sqrt to be '2'.

To minimize change in the updated x86 backend, I've defaulted to disabling all recip codegen, but for enabled operations, we use 1 N-R step.

This means only users that were targeting AMD Jaguar will see a functional change from this patch (that chip had defaulted both sqrt and div codegen for scalars and vectors on). We can easily change the x86 target defaults in a follow-on patch. We'll also want to update the PowerPC and R600 backends to use this new command-line functionality.

I wanted to add more testing for the llc parameter parsing, but I'm not finding a way to do it. If there's a way to do that with a generic target, please let me know.

I have a dependent clang driver / front-end patch to pass similar command-line params through to the backend in progress.

spatel updated this revision to Diff 23815.Apr 15 2015, 5:28 PM
spatel edited edge metadata.

Patch updated: minor cleanup to spacing, includes, interface, comments.

hfinkel added inline comments.May 14 2015, 12:49 PM
include/llvm/Target/TargetOptions.h
240 ↗(On Diff #23815)

reciprocal estimate -> reciprocal-estimate

include/llvm/Target/TargetRecip.h
34 ↗(On Diff #23815)

To reduce confusion, I think it would be better to have a naming prefix on these. TailCallKind, for example, uses TCK_. Let's stick RO_ on these (including the INVALID one).

34 ↗(On Diff #23815)

If you're using INVALID for the "number of", please name it NUM_RECIP_OPS (or similar).

48 ↗(On Diff #23815)

Let's not use INVALID for "all". You can add an All to the enum (it can even have the same value as INVALID).

lib/Target/TargetRecip.cpp
58 ↗(On Diff #23815)

These strings are user-provided, we can't assert on invalid inputs.

148 ↗(On Diff #23815)

Hrmm, there could be duplicates. Just parse them in order (users may provide duplicates).

spatel added inline comments.May 14 2015, 3:58 PM
include/llvm/Target/TargetRecip.h
34 ↗(On Diff #23815)

With a named enum, I was expecting that users always have to use the prefix "RecipOps::" like I did in the x86 code.

But I see that's not done with TailCallKind...

spatel added inline comments.May 14 2015, 9:06 PM
include/llvm/Target/TargetOptions.h
240 ↗(On Diff #23815)

Fixed.

include/llvm/Target/TargetRecip.h
34 ↗(On Diff #23815)

Fixed.

34 ↗(On Diff #23815)

Fixed.

48 ↗(On Diff #23815)

Fixed - added an RO_All that is equal to RO_NUM_RECIP_OPS.

lib/Target/TargetRecip.cpp
58 ↗(On Diff #23815)

Fixed - replaced with 'report_fatal_error'.

148 ↗(On Diff #23815)

Fixed. Still not sure if there's a decent way to test option parsing for llc, so there are very likely malformed input parsing bugs here.

spatel updated this revision to Diff 25842.May 14 2015, 9:14 PM
spatel retitled this revision from [x86] make reciprocal estimate code generation more flexible to make reciprocal estimate code generation more flexible by adding command-line options.
spatel updated this object.

Patch updated based on feedback from Hal - thanks!

spatel updated this revision to Diff 26013.May 18 2015, 2:17 PM

Patch updated:

  1. Use '!' to indicate disabled (match change in D8989)
  2. Use array of strings as keys to a map instead of enum + loosely-coupled array of recip parameters.
hfinkel added inline comments.May 20 2015, 1:45 PM
include/llvm/Target/TargetRecip.h
66 ↗(On Diff #26013)

function names should start with a lower-case letter.

lib/Target/TargetRecip.cpp
49 ↗(On Diff #26013)

Function names start with a lower-case letter.

112 ↗(On Diff #26013)

DisabledPrefix (this is not a macro, don't name it like one)

132 ↗(On Diff #26013)

Extra space?

136 ↗(On Diff #26013)

Please also try with the 'd' suffix.

spatel added inline comments.May 20 2015, 3:20 PM
include/llvm/Target/TargetRecip.h
66 ↗(On Diff #26013)

Fixed.

lib/Target/TargetRecip.cpp
49 ↗(On Diff #26013)

Fixed.

112 ↗(On Diff #26013)

Fixed.

132 ↗(On Diff #26013)

Fixed. Nice catch. :)

136 ↗(On Diff #26013)

If we matched a 'd' entry but failed 'f', that would be an assertion failure given the logic below. Let me know if you were thinking of something else.

spatel updated this revision to Diff 26185.May 20 2015, 3:24 PM

Patch updated based on Hal's feedback:

  1. Fixed function names to start with lowercase
  2. Fixed variable names to not be all caps
  3. Removed extra space
  4. Added assert for 'f' and 'd' suffix matching
hfinkel added inline comments.May 20 2015, 3:38 PM
lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

Why false? Do you want a target feature here?

spatel added inline comments.May 20 2015, 3:56 PM
lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

We had target features to control these, but I think it would be better to behave like gcc unless we have reason to diverge.

That said, this does not match gcc behavior yet; that would be my next patch. For x86 at least, we would turn the following on by default when using -ffast-math:
sqrt
vec-sqrt
vec-div

I didn't set these defaults in this patch because it would change -ffast-math codegen for all CPUs other than btver2 (which had the recip codegen enabled for all eligible x86 recip types via target features).

hfinkel added inline comments.May 20 2015, 3:58 PM
lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

I thought that getRsqrtEstimate, etc. were only called when fast-math is on?

spatel added inline comments.May 20 2015, 4:11 PM
lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

That's correct; everything is gated by -ffast-math or an unsafe algebra equivalent. Recip-est codegen should only be active after that check.

With gcc, once you have -ffast-math, you also get -mrecip=sqrt,vec-sqrt,vec-div by default. AFAIK, that is independent of arch or CPU subtarget.

The lone exception is 'div' (scalar division). Estimating scalar div on x86 (no FMA until recently) breaks a lot of real world code, so I want to keep that off by default (and again match gcc default behavior).

hfinkel added inline comments.May 22 2015, 12:16 PM
lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

But you turn them all off here, right? So that does not match gcc. Please add a comment here explaining what is going on.

spatel added inline comments.May 22 2015, 1:07 PM
lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

Right - my intent was to separate the structural change from the functional change as much as possible.

We don't currently generate any reciprocal estimates for x86 with -ffast-math except when targeting btver2. So all x86 codegen should be identical after this patch except in the case where someone has specified -mcpu=btver2 -ffast-math.

I'll send the patch to change this default behavior to match GCC defaults as soon as possible, but I wanted to keep that separate in case anyone does not agree that we should change that default behavior. PPC and ARM recip patches would also be independent but similar (assuming they want to match GCC too).

hfinkel accepted this revision.May 22 2015, 1:12 PM
hfinkel edited edge metadata.

LGTM.

lib/Target/X86/X86TargetMachine.cpp
108 ↗(On Diff #26185)

Okay, but please add a comment explaining the current state of things, with a TODO, here (maybe the follow-up with happen soon, but in case there's trouble, the code will be clear in the mean time).

This revision is now accepted and ready to land.May 22 2015, 1:12 PM
spatel updated this revision to Diff 26346.May 22 2015, 1:57 PM
spatel edited edge metadata.

Patch updated:
Added TODO comment to explain why x86 reciprocal estimate is defaulted to 'off' and the reason to change that default soon.

This revision was automatically updated to reflect the committed changes.