This is an archive of the discontinued LLVM Phabricator instance.

[Target] move reciprocal estimate settings from TargetOptions to TargetLowering
ClosedPublic

Authored by spatel on Sep 21 2016, 3:22 PM.

Download Raw Diff

Details

Reviewers

echristo
evandro
hfinkel

Commits

rGbfdbea6481a2: [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
rL283252: [Target] move reciprocal estimate settings from TargetOptions to TargetLowering

Summary

See D24815 for the clang side of this that turns command-line flags into a function attribute string.

The motivation for the change is that we can't have pseudo-global settings for codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via FMF, metadata, or something else. But making them function attributes is at least an improvement over the current mess.

The ingredients of this patch are:

Remove the reciprocal estimate command-line debug option.
Add TargetRecip to TargetLowering.
Remove TargetRecip from TargetOptions.
Clean up the TargetRecip implementation to work with this new scheme.
Set the default reciprocal settings in TargetLoweringBase (everything is off).
Update the PowerPC defaults, users, and tests.
Update the x86 defaults, users, and tests.

Diff Detail

Event Timeline

spatel updated this revision to Diff 72114.Sep 21 2016, 3:22 PM

spatel retitled this revision from to [Target] move reciprocal estimate settings from TargetOptions to TargetLowering.

spatel updated this object.

spatel added reviewers: echristo, evandro, hfinkel.

spatel added a subscriber: llvm-commits.

Herald added subscribers: nemanjai, mehdi_amini, mcrosier. · View Herald TranscriptSep 21 2016, 3:22 PM

spatel added a parent revision: D24815: [clang] make reciprocal estimate codegen a function attribute.Sep 21 2016, 3:36 PM

The question seems to me to be: is it a property of the sub target (can be switched function-per-function) or is it uniform for a target or for a module)?

If the latter, then it does not belong to a function attribute.

In D24816#549237, @mehdi_amini wrote:

The question seems to me to be: is it a property of the sub target (can be switched function-per-function) or is it uniform for a target or for a module)?

If the latter, then it does not belong to a function attribute.

IMO, it needs even finer granularity than the function - I'd like to get this on instructions similar to other FMF attributes. We have users that want to selectively vary the codegen for sqrt/div within a function using source code pragmas.

In D24816#549237, @mehdi_amini wrote:

The question seems to me to be: is it a property of the sub target (can be switched function-per-function) or is it uniform for a target or for a module)?

If the latter, then it does not belong to a function attribute.

So, the interesting thing is that people are using this on a per-subtarget basis for tuning. i.e. cpu X in the presence of this flag tunes one way, CPU Y tunes a different way.

IMO, it needs even finer granularity than the function - I'd like to get this on instructions similar to other FMF attributes. We have users that want to selectively vary the codegen for sqrt/div within a function using source code pragmas.

It seems this patch does not get you any closer from this goal?

In D24816#549293, @mehdi_amini wrote:

IMO, it needs even finer granularity than the function - I'd like to get this on instructions similar to other FMF attributes. We have users that want to selectively vary the codegen for sqrt/div within a function using source code pragmas.

It seems this patch does not get you any closer from this goal?

There are independent goals, but I see this is an intermediate step:

It allows the settings to be different per function.
The settings survive in an LTO build. AFAIK, currently we drop all reciprocal settings with LTO.
It allows rL268539 to be reimplemented. Not sure if there are other barriers for that though?

There are independent goals, but I see this is an intermediate step:

It allows the settings to be different per function.

The technical solution to handle this "per instruction" would be totally different though. These attributes would become obsolete as soon as you have the per instruction solution.
Now if there is not plan to get something per-instruction in the near future, why not.

What happens during inlining with this patch?

The settings survive in an LTO build. AFAIK, currently we drop all reciprocal settings with LTO.

This is incidental and shouldn't be a motivation, i.e. making something a property of the function when it is not intrinsically just to "fix" LTO does not seem right to me. The LTO implementation is responsible to properly initialize the backend/codegen,

It allows rL268539 to be reimplemented. Not sure if there are other barriers for that though?

The post-commit thread is an interesting reading :)

Note: I'm not against this patch, I'm just raising questions, which you may have already considered :)

In D24816#549338, @mehdi_amini wrote:

There are independent goals, but I see this is an intermediate step:

It allows the settings to be different per function.

The technical solution to handle this "per instruction" would be totally different though. These attributes would become obsolete as soon as you have the per instruction solution.
Now if there is not plan to get something per-instruction in the near future, why not.

These are all excellent questions. I think I'm going to learn something today. :)

I agree that the 'per instruction' solution would obsolete this. I should have made that clearer in the patch summary, and I will make that clear in the commit message if this patch is committed.

So why don't we just get to the ideal solution right now? It's just not high enough on anyone's priority list (including mine) to get it done immediately. Partly, this is because of an issue that you properly raised and I mentioned in the post-commit thread for r268539:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160104/323154.html

So while I'd like to do 'per instruction', I think it's (much?) more time and work than this. It's worth mentioning that I wouldn't have made this patch a priority if I didn't think that all of reciprocal estimate codegen had been issued an existential threat - thanks, Eric! :)

It's possible that I misinterpreted the messages from r268539, but I took the possibility of a PPC revert as a chance that x86 would be the next domino to fall. After reviewing the diffs here, I still don't know why x86 is immune to the LTO-breakage problem if PPC is not. Neither appears to be directly predicating codegen on a CPU model, just CPU subtarget features.

FWIW, I think the AArch patch absolutely needs to be re-worked so that it's not checking 'isExynosM1()'. This goes back to Eric's last comment and I think is your primary concern: -mrecip was not intended to be a subtarget-based tuning parameter. It's a programmer-based optimization hint that says, "I want to make a speed/accuracy trade-off for these particular FP operations using these parameters to tune the codegen." At least, that's my understanding: it's a refinement of -ffast-math.

What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math). functionsHaveCompatibleAttributes() or something in the cost model might need updating to improve this?

In D24816#549812, @spatel wrote:

What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math).

Is it legal?
You said that this is a tradeoff speed/precision that is made by the programmer. If the cursor is set on "speed" on the caller, inlining a callee where it is not the case would lead to less precision for the callee code.

In D24816#549822, @mehdi_amini wrote:

In D24816#549812, @spatel wrote:

What happens during inlining with this patch?

[spends some quality time in the debugger because I've never looked at how the inliner works]
The caller's attributes are applied to the inlined code. My initial take is that this is legal, but not ideal (ie, all of -mrecip is supposed to be gated by fast/unsafe math).

Is it legal?
You said that this is a tradeoff speed/precision that is made by the programmer. If the cursor is set on "speed" on the caller, inlining a callee where it is not the case would lead to less precision for the callee code.

Yes - but that's where the protective cover of -ffast-math comes into play. We were already issued a license to go crazy...just how crazy is up to artistic interpretation. Keep in mind that we already generate reciprocal estimates with -ffast-math (with some guidance from the target about the default codegen). The -mrecip attributes should give us more clues about what the programmer wants, but we make no guarantees about what they'll get.

Today we drop the fast-math attributes if we inline a non-fastmath function in a fast-math one I believe.

I'm not sure how the mrecip works: is it adding refinement only to prevent some specific "fast-math" (i.e. unsafe) transformations?
I.e. if the fast-math flag is dropped, what is the impact of the "mrecip" attribute?

Bonus points: where is it documented?

In D24816#549962, @mehdi_amini wrote:

I'm not sure how the mrecip works: is it adding refinement only to prevent some specific "fast-math" (i.e. unsafe) transformations?

'Prevent' is not the right description. It can be used to inhibit or enhance the effects of fast-math, but it should have no effect without fast-math.

I.e. if the fast-math flag is dropped, what is the impact of the "mrecip" attribute?

If fast-math is off, mrecip should have no impact. We should be able to confirm this independently of this patch.

Here's an IR playpen we can use to walk through some scenarios:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.11.0"

define float @foo(float %x) #0 {
  %y = call fast float @bar(float %x)
  ret float %y
}

define float @bar(float %x) #1 {
  %y = call float @baz(float %x)
  ret float %y
}

define float @baz(float %x) #2 {
  %y = fdiv float 1.0, %x
  ret float %y
}

attributes #0 = { "unsafe-fp-math"="false" "mrecip"="divf:0" }
attributes #1 = { "unsafe-fp-math"="true" "mrecip"="divf:1" }
attributes #2 = { "unsafe-fp-math"="true" "mrecip"="divf:0" }

Notice that the backend still largely ignores the instruction-level FMF, so you can delete any 'fast' on the calls or fdiv, and there should be no difference in codegen.

I tried some experiments using something like this:
$ ./opt -inline fdiv.ll -S | ./llc -o -

I haven't seen anything go wrong yet, but I'm still trying. :)

Bonus points: where is it documented?

That's an easy one - we have no docs. The intent was to mimic and expand on the gcc flag with the same name:
https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/x86-Options.html#x86-Options
https://gcc.gnu.org/onlinedocs/gcc-6.2.0/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options

Patch updated:
Add getTargetRecipForFunc() to TargetLoweringBase to eliminate the repeated code in the callers. Now the lowering diffs for PPC and x86 are one-line changes.

I wondered how that prerequisite check for fast-math was currently handled:

if (Options.UnsafeFPMath) {

which gets set by:

// FIXME: This function needs to go away for a number of reasons:
// a) global state on the TargetMachine is terrible in general,
// b) there's no default state here to keep,
// c) these target options should be passed only on the function
//    and not on the TargetMachine (via TargetOptions) at all.
void TargetMachine::resetTargetOptions(const Function &F) const {

...which made me think of this bug:
https://llvm.org/bugs/show_bug.cgi?id=23172

The world is broken in a way much bigger than inlining too strict or too loose reciprocal estimate codegen settings, isn't it?

Ping.

In D24816#550786, @spatel wrote:
I wondered how that prerequisite check for fast-math was currently handled:
if (Options.UnsafeFPMath) {
which gets set by:
// FIXME: This function needs to go away for a number of reasons:
// a) global state on the TargetMachine is terrible in general,
// b) there's no default state here to keep,
// c) these target options should be passed only on the function
//    and not on the TargetMachine (via TargetOptions) at all.
void TargetMachine::resetTargetOptions(const Function &F) const {
...which made me think of this bug:
https://llvm.org/bugs/show_bug.cgi?id=23172

The world is broken in a way much bigger than inlining too strict or too loose reciprocal estimate codegen settings, isn't it?

Pretty much. I've been trying to get all of the ones that "matter" out of there as fast as possible.

That said, what -would- you like to do for inlining here? It seems like you're going to want a target routine there on matching. My guess for now is that you actually expect everything to be compiled with the same options and so not inlining based on difference would also be safe. @hfinkel thoughts here?

A couple of comments inline otherwise I'm fine with this implementation for now.

Thanks!

-eric

include/llvm/Target/TargetLowering.h
2178	Seems reasonable - otherwise possible to keep it local in the various routines that want to know about the estimates? Parse there rather than initialize at the beginning? That way the function above that builds up the struct can be the whole interface rather than caching. That said, not sure how often it's called.
lib/CodeGen/TargetLoweringBase.cpp
1488	Bikeshed: mrecip is pretty hard to understand (for me at least) naming wise, reciprocal-estimates while longer might be a bit more comprehensible?

I think that the per-instruction metadata is a neat idea, and opens up a number of interesting and useful possibilities. However, we really should fix the current feature. This LGTM, although you should probably just address your TODO comment before committing unless there is some reason it is not trivial.

include/llvm/Target/TargetRecip.h
51	This TODO does not explain what this means. Do you mean using -1 RefinementSteps instead of having a separate boolean flag? We probably should just do this, the current interface which has "true, Steps" everywhere is not aesthetically pleasing.

This revision is now accepted and ready to land.Oct 2 2016, 7:20 PM

In D24816#558657, @echristo wrote:
In D24816#550786, @spatel wrote:
I wondered how that prerequisite check for fast-math was currently handled:
if (Options.UnsafeFPMath) {
which gets set by:
// FIXME: This function needs to go away for a number of reasons:
// a) global state on the TargetMachine is terrible in general,
// b) there's no default state here to keep,
// c) these target options should be passed only on the function
//    and not on the TargetMachine (via TargetOptions) at all.
void TargetMachine::resetTargetOptions(const Function &F) const {
...which made me think of this bug:
https://llvm.org/bugs/show_bug.cgi?id=23172

The world is broken in a way much bigger than inlining too strict or too loose reciprocal estimate codegen settings, isn't it?
Pretty much. I've been trying to get all of the ones that "matter" out of there as fast as possible.

That said, what -would- you like to do for inlining here? It seems like you're going to want a target routine there on matching. My guess for now is that you actually expect everything to be compiled with the same options and so not inlining based on difference would also be safe. @hfinkel thoughts here?

I think we need to be careful here; users set this option, in some cases, so that they can use fast-math but still get the accuracy they require (by increasing the default number of Newton iterations). We should really not inline functions that require more iterations into functions that require fewer. Obviously using per-instruction metadata fixes this is a much nicer way. For better or worse, this is really only an issue for LTO builds.

A couple of comments inline otherwise I'm fine with this implementation for now.

Thanks!

-eric

spatel mentioned this in D24815: [clang] make reciprocal estimate codegen a function attribute.Oct 3 2016, 3:49 PM

echristo added inline comments.Oct 3 2016, 5:15 PM

lib/Target/TargetRecip.cpp
157	Do we still need ::set rather than just putting it as part of the constructor?

spatel marked an inline comment as done.Oct 4 2016, 12:36 PM

spatel added inline comments.

include/llvm/Target/TargetLowering.h
2178
include/llvm/Target/TargetRecip.h
51	So yes, I agree this looks ugly, and it seemed simple enough to just combine the fields into a single int value when I added the TODO comment, but on closer inspection, it's not trivial because the user is allowed to specify "default" for the enablement part and still change the number of refinement steps. I think this also mucks with Eric's suggestions to simplify the 'set' method and/or just pull it all into the local users. There's probably some relatively simple way to do this that I'm just not seeing right now, but even if there is, I'd rather not jeopardize the important thing (getting this out of target options) by introducing a bug while refactoring. I've also convinced myself that I just need to get back on the FMF wagon and finish the DAG work and the clang/IR changes, so we can finally be clear of this mess. If that succeeds, then the ugly bits are going to disappear anyway. :)

Patch updated:

Expanded the TODO comment in TargetRecip.h to mention the "default" case.
Changed the attribute name from "mrecip" to "reciprocal-estimates" (matches the clang side of the patch).

Closed by commit rL283252: [Target] move reciprocal estimate settings from TargetOptions to TargetLowering (authored by spatel). · Explain WhyOct 4 2016, 1:55 PM

This revision was automatically updated to reflect the committed changes.

echristo added inline comments.Oct 7 2016, 4:23 PM

llvm/trunk/include/llvm/Target/TargetLowering.h
2184 ↗	(On Diff #73549)	I figured with getTargetRecipForFunc we could remove this? It returns the struct...

spatel added inline comments.Oct 9 2016, 11:30 AM

llvm/trunk/include/llvm/Target/TargetLowering.h
2184 ↗	(On Diff #73549)	Yes, there's big refactor/rewrite potential here - the current API/implementation is just awful. Fixing it will affect D25291, but it should be easy to adapt if that goes in first. Let me start working on the refactor since it's not clear how long we'll need this solution. I did look into what the ideal (instruction-level) solution might look like and have a good lead: MD_fpmath can be extended for reciprocal estimates. I'll send a proposal to llvm-dev to see if anyone sees problems with that.

spatel mentioned this in D25440: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering .Oct 10 2016, 10:13 AM

spatel mentioned this in rL284495: [Target] remove TargetRecip class; move reciprocal estimate isel functionality….Oct 18 2016, 10:14 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

CommandFlags.h

8 lines

Target/

TargetLowering.h

8 lines

TargetOptions.h

6 lines

TargetRecip.h

33 lines

lib/

CodeGen/

TargetLoweringBase.cpp

16 lines

Target/

PowerPC/

PPCISelLowering.cpp

21 lines

PPCTargetMachine.cpp

17 lines

TargetRecip.cpp

53 lines

X86/

X86ISelLowering.cpp

14 lines

X86TargetMachine.cpp

10 lines

test/

CodeGen/

PowerPC/

recipest.ll

46 lines

X86/

recip-fastmath.ll

211 lines

sqrt-fastmath-mir.ll

4 lines

sqrt-fastmath.ll

258 lines

Diff 72291

include/llvm/CodeGen/CommandFlags.h

Show All 21 Lines
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/MC/MCTargetOptionsCommandFlags.h"		#include "llvm/MC/MCTargetOptionsCommandFlags.h"
#include "llvm/MC/SubtargetFeature.h"		#include "llvm/MC/SubtargetFeature.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include "llvm/Target/TargetRecip.h"
#include <string>		#include <string>
using namespace llvm;		using namespace llvm;

cl::opt<std::string>		cl::opt<std::string>
MArch("march", cl::desc("Architecture to generate code for (see --version)"));		MArch("march", cl::desc("Architecture to generate code for (see --version)"));

cl::opt<std::string>		cl::opt<std::string>
MCPU("mcpu",		MCPU("mcpu",
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	FuseFPOps("fp-contract",
clEnumValN(FPOpFusion::Fast, "fast",		clEnumValN(FPOpFusion::Fast, "fast",
"Fuse FP ops whenever profitable"),		"Fuse FP ops whenever profitable"),
clEnumValN(FPOpFusion::Standard, "on",		clEnumValN(FPOpFusion::Standard, "on",
"Only fuse 'blessed' FP ops."),		"Only fuse 'blessed' FP ops."),
clEnumValN(FPOpFusion::Strict, "off",		clEnumValN(FPOpFusion::Strict, "off",
"Only fuse FP ops when the result won't be affected."),		"Only fuse FP ops when the result won't be affected."),
clEnumValEnd));		clEnumValEnd));

cl::list<std::string>
ReciprocalOps("recip",
cl::CommaSeparated,
cl::desc("Choose reciprocal operation types and parameters."),
cl::value_desc("all,none,default,divf,!vec-sqrtd,vec-divd:0,sqrt:9..."));

cl::opt<bool>		cl::opt<bool>
DontPlaceZerosInBSS("nozero-initialized-in-bss",		DontPlaceZerosInBSS("nozero-initialized-in-bss",
cl::desc("Don't place zero-initialized symbols into bss section"),		cl::desc("Don't place zero-initialized symbols into bss section"),
cl::init(false));		cl::init(false));

cl::opt<bool>		cl::opt<bool>
EnableGuaranteedTailCallOpt("tailcallopt",		EnableGuaranteedTailCallOpt("tailcallopt",
cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),		cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	DebuggerTuningOpt("debugger-tune",
clEnumValEnd));		clEnumValEnd));

// Common utility function tightly tied to the options listed here. Initializes		// Common utility function tightly tied to the options listed here. Initializes
// a TargetOptions object with CodeGen flags and returns it.		// a TargetOptions object with CodeGen flags and returns it.
static inline TargetOptions InitTargetOptionsFromCodeGenFlags() {		static inline TargetOptions InitTargetOptionsFromCodeGenFlags() {
TargetOptions Options;		TargetOptions Options;
Options.LessPreciseFPMADOption = EnableFPMAD;		Options.LessPreciseFPMADOption = EnableFPMAD;
Options.AllowFPOpFusion = FuseFPOps;		Options.AllowFPOpFusion = FuseFPOps;
Options.Reciprocals = TargetRecip(ReciprocalOps);
Options.UnsafeFPMath = EnableUnsafeFPMath;		Options.UnsafeFPMath = EnableUnsafeFPMath;
Options.NoInfsFPMath = EnableNoInfsFPMath;		Options.NoInfsFPMath = EnableNoInfsFPMath;
Options.NoNaNsFPMath = EnableNoNaNsFPMath;		Options.NoNaNsFPMath = EnableNoNaNsFPMath;
Options.NoTrappingFPMath = EnableNoTrappingFPMath;		Options.NoTrappingFPMath = EnableNoTrappingFPMath;
Options.FPDenormalType = DenormalType;		Options.FPDenormalType = DenormalType;
Options.HonorSignDependentRoundingFPMathOption =		Options.HonorSignDependentRoundingFPMathOption =
EnableHonorSignDependentRoundingFPMath;		EnableHonorSignDependentRoundingFPMath;
if (FloatABIForCalls != FloatABI::Default)		if (FloatABIForCalls != FloatABI::Default)
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	namespace llvm {
class MachineLoop;		class MachineLoop;
class MachineRegisterInfo;		class MachineRegisterInfo;
class Mangler;		class Mangler;
class MCContext;		class MCContext;
class MCExpr;		class MCExpr;
class MCSymbol;		class MCSymbol;
template<typename T> class SmallVectorImpl;		template<typename T> class SmallVectorImpl;
class DataLayout;		class DataLayout;
		struct TargetRecip;
class TargetRegisterClass;		class TargetRegisterClass;
class TargetLibraryInfo;		class TargetLibraryInfo;
class TargetLoweringObjectFile;		class TargetLoweringObjectFile;
class Value;		class Value;

namespace Sched {		namespace Sched {
enum Preference {		enum Preference {
None, // No preference		None, // No preference
▲ Show 20 Lines • Show All 464 Lines • ▼ Show 20 Lines	while (true) {
VT = getTypeToTransformTo(Context, VT);		VT = getTypeToTransformTo(Context, VT);
break;		break;
default:		default:
llvm_unreachable("Type is not legal nor is it to be expanded!");		llvm_unreachable("Type is not legal nor is it to be expanded!");
}		}
}		}
}		}

		/// Return the reciprocal estimate code generation preferences for this target
		/// after potentially overriding settings using the function's attributes.
		/// FIXME: Like all unsafe-math target settings, this should really be an
		/// instruction-level attribute/metadata/FMF.
		TargetRecip getTargetRecipForFunc(MachineFunction &MF) const;

/// Vector types are broken down into some number of legal first class types.		/// Vector types are broken down into some number of legal first class types.
/// For example, EVT::v8f32 maps to 2 EVT::v4f32 with Altivec or SSE1, or 8		/// For example, EVT::v8f32 maps to 2 EVT::v4f32 with Altivec or SSE1, or 8
/// promoted EVT::f64 values with the X86 FP stack. Similarly, EVT::v2i64		/// promoted EVT::f64 values with the X86 FP stack. Similarly, EVT::v2i64
/// turns into 4 EVT::i32 values with both PPC and X86.		/// turns into 4 EVT::i32 values with both PPC and X86.
///		///
/// This method returns the number of registers needed, and the VT for each		/// This method returns the number of registers needed, and the VT for each
/// register. It also returns the VT and quantity of the intermediate values		/// register. It also returns the VT and quantity of the intermediate values
/// before they are promoted/expanded.		/// before they are promoted/expanded.
▲ Show 20 Lines • Show All 1,611 Lines • ▼ Show 20 Lines	protected:
/// Return true if the value types that can be represented by the specified		/// Return true if the value types that can be represented by the specified
/// register class are all legal.		/// register class are all legal.
bool isLegalRC(const TargetRegisterClass *RC) const;		bool isLegalRC(const TargetRegisterClass *RC) const;

/// Replace/modify any TargetFrameIndex operands with a targte-dependent		/// Replace/modify any TargetFrameIndex operands with a targte-dependent
/// sequence of memory operands that is recognized by PrologEpilogInserter.		/// sequence of memory operands that is recognized by PrologEpilogInserter.
MachineBasicBlock *emitPatchPoint(MachineInstr &MI,		MachineBasicBlock *emitPatchPoint(MachineInstr &MI,
MachineBasicBlock *MBB) const;		MachineBasicBlock *MBB) const;
		TargetRecip ReciprocalEstimates;
		echristoUnsubmitted Not Done Reply Inline Actions Seems reasonable - otherwise possible to keep it local in the various routines that want to know about the estimates? Parse there rather than initialize at the beginning? That way the function above that builds up the struct can be the whole interface rather than caching. That said, not sure how often it's called. echristo: Seems reasonable - otherwise possible to keep it local in the various routines that want to…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions spatel:
};		};

/// This class defines information used to lower LLVM code to legal SelectionDAG		/// This class defines information used to lower LLVM code to legal SelectionDAG
/// operators that the target instruction selector can accept natively.		/// operators that the target instruction selector can accept natively.
///		///
/// This class also defines callbacks that targets must implement to lower		/// This class also defines callbacks that targets must implement to lower
/// target-specific constructs to SelectionDAG operators.		/// target-specific constructs to SelectionDAG operators.
class TargetLowering : public TargetLoweringBase {		class TargetLowering : public TargetLoweringBase {
▲ Show 20 Lines • Show All 910 Lines • Show Last 20 Lines

include/llvm/Target/TargetOptions.h

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	TargetOptions()
HonorSignDependentRoundingFPMathOption(false), NoZerosInBSS(false),		HonorSignDependentRoundingFPMathOption(false), NoZerosInBSS(false),
GuaranteedTailCallOpt(false), StackAlignmentOverride(0),		GuaranteedTailCallOpt(false), StackAlignmentOverride(0),
StackSymbolOrdering(true), EnableFastISel(false), UseInitArray(false),		StackSymbolOrdering(true), EnableFastISel(false), UseInitArray(false),
DisableIntegratedAS(false), CompressDebugSections(false),		DisableIntegratedAS(false), CompressDebugSections(false),
RelaxELFRelocations(false), FunctionSections(false),		RelaxELFRelocations(false), FunctionSections(false),
DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),		DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),
EmulatedTLS(false), EnableIPRA(false),		EmulatedTLS(false), EnableIPRA(false),
FloatABIType(FloatABI::Default),		FloatABIType(FloatABI::Default),
AllowFPOpFusion(FPOpFusion::Standard), Reciprocals(TargetRecip()),		AllowFPOpFusion(FPOpFusion::Standard),
JTType(JumpTable::Single), ThreadModel(ThreadModel::POSIX),		JTType(JumpTable::Single), ThreadModel(ThreadModel::POSIX),
EABIVersion(EABI::Default), DebuggerTuning(DebuggerKind::Default),		EABIVersion(EABI::Default), DebuggerTuning(DebuggerKind::Default),
FPDenormalType(FPDenormal::IEEE),		FPDenormalType(FPDenormal::IEEE),
ExceptionModel(ExceptionHandling::None) {}		ExceptionModel(ExceptionHandling::None) {}

/// PrintMachineCode - This flag is enabled when the -print-machineinstrs		/// PrintMachineCode - This flag is enabled when the -print-machineinstrs
/// option is specified on the command line, and should enable debugging		/// option is specified on the command line, and should enable debugging
/// output from the code generator.		/// output from the code generator.
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	public:
/// precision won't effect the result.		/// precision won't effect the result.
///		///
/// Note: This option only controls formation of fused ops by the		/// Note: This option only controls formation of fused ops by the
/// optimizers. Fused operations that are explicitly specified (e.g. FMA		/// optimizers. Fused operations that are explicitly specified (e.g. FMA
/// via the llvm.fma.* intrinsic) will always be honored, regardless of		/// via the llvm.fma.* intrinsic) will always be honored, regardless of
/// the value of this option.		/// the value of this option.
FPOpFusion::FPOpFusionMode AllowFPOpFusion;		FPOpFusion::FPOpFusionMode AllowFPOpFusion;

/// This class encapsulates options for reciprocal-estimate code generation.
TargetRecip Reciprocals;

/// JTType - This flag specifies the type of jump-instruction table to		/// JTType - This flag specifies the type of jump-instruction table to
/// create for functions that have the jumptable attribute.		/// create for functions that have the jumptable attribute.
JumpTable::JumpTableType JTType;		JumpTable::JumpTableType JTType;

/// ThreadModel - This flag specifies the type of threading model to assume		/// ThreadModel - This flag specifies the type of threading model to assume
/// for things like atomics		/// for things like atomics
ThreadModel::Model ThreadModel;		ThreadModel::Model ThreadModel;

Show All 30 Lines	return
ARE_EQUAL(GuaranteedTailCallOpt) &&		ARE_EQUAL(GuaranteedTailCallOpt) &&
ARE_EQUAL(StackAlignmentOverride) &&		ARE_EQUAL(StackAlignmentOverride) &&
ARE_EQUAL(EnableFastISel) &&		ARE_EQUAL(EnableFastISel) &&
ARE_EQUAL(UseInitArray) &&		ARE_EQUAL(UseInitArray) &&
ARE_EQUAL(TrapUnreachable) &&		ARE_EQUAL(TrapUnreachable) &&
ARE_EQUAL(EmulatedTLS) &&		ARE_EQUAL(EmulatedTLS) &&
ARE_EQUAL(FloatABIType) &&		ARE_EQUAL(FloatABIType) &&
ARE_EQUAL(AllowFPOpFusion) &&		ARE_EQUAL(AllowFPOpFusion) &&
ARE_EQUAL(Reciprocals) &&
ARE_EQUAL(JTType) &&		ARE_EQUAL(JTType) &&
ARE_EQUAL(ThreadModel) &&		ARE_EQUAL(ThreadModel) &&
ARE_EQUAL(EABIVersion) &&		ARE_EQUAL(EABIVersion) &&
ARE_EQUAL(DebuggerTuning) &&		ARE_EQUAL(DebuggerTuning) &&
ARE_EQUAL(FPDenormalType) &&		ARE_EQUAL(FPDenormalType) &&
ARE_EQUAL(ExceptionModel) &&		ARE_EQUAL(ExceptionModel) &&
ARE_EQUAL(MCOptions) &&		ARE_EQUAL(MCOptions) &&
ARE_EQUAL(EnableIPRA);		ARE_EQUAL(EnableIPRA);
Show All 11 Lines

include/llvm/Target/TargetRecip.h

	Show All 11 Lines
	// If a target does not support operations in this specification, then code			// If a target does not support operations in this specification, then code
	// generation will default to using supported operations.			// generation will default to using supported operations.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TARGET_TARGETRECIP_H			#ifndef LLVM_TARGET_TARGETRECIP_H
	#define LLVM_TARGET_TARGETRECIP_H			#define LLVM_TARGET_TARGETRECIP_H

	#include "llvm/ADT/StringRef.h"
	#include <cstdint>			#include <cstdint>
	#include <map>			#include <map>
	#include <string>			#include <string>
	#include <vector>			#include <vector>

	namespace llvm {			namespace llvm {

				class StringRef;

	struct TargetRecip {			struct TargetRecip {
	public:			public:
	TargetRecip();			TargetRecip();

	/// Initialize all or part of the operations from command-line options or			/// Parse a comma-separated string of reciprocal settings to set values in
	/// a front end.			/// this struct.
	TargetRecip(const std::vector<std::string> &Args);			void set(StringRef &Args);

	/// Set whether a particular reciprocal operation is enabled and how many			/// Set enablement and refinement steps for a particular reciprocal operation.
	/// refinement steps are needed when using it. Use "all" to set enablement			/// Use "all" to give all operations the same values.
	/// and refinement steps for all operations.			void set(StringRef Key, bool Enable, unsigned RefSteps);
	void setDefaults(StringRef Key, bool Enable, unsigned RefSteps);
				/// Return true if the reciprocal operation has been enabled.
	/// Return true if the reciprocal operation has been enabled by default or
	/// from the command-line. Return false if the operation has been disabled
	/// by default or from the command-line.
	bool isEnabled(StringRef Key) const;			bool isEnabled(StringRef Key) const;

	/// Return the number of iterations necessary to refine the			/// Return the number of iterations necessary to refine the
	/// the result of a machine instruction for the given reciprocal operation.			/// the result of a machine instruction for the given reciprocal operation.
	unsigned getRefinementSteps(StringRef Key) const;			unsigned getRefinementSteps(StringRef Key) const;

	bool operator==(const TargetRecip &Other) const;			bool operator==(const TargetRecip &Other) const;

	private:			private:
	enum {			// TODO: Define 'NotEnabled' as -1 and simplify this?
				hfinkelUnsubmitted Not Done Reply Inline Actions This TODO does not explain what this means. Do you mean using -1 RefinementSteps instead of having a separate boolean flag? We probably should just do this, the current interface which has "true, Steps" everywhere is not aesthetically pleasing. hfinkel: This TODO does not explain what this means. Do you mean using -1 RefinementSteps instead of…
				spatelAuthorUnsubmitted Not Done Reply Inline Actions So yes, I agree this looks ugly, and it seemed simple enough to just combine the fields into a single int value when I added the TODO comment, but on closer inspection, it's not trivial because the user is allowed to specify "default" for the enablement part and still change the number of refinement steps. I think this also mucks with Eric's suggestions to simplify the 'set' method and/or just pull it all into the local users. There's probably some relatively simple way to do this that I'm just not seeing right now, but even if there is, I'd rather not jeopardize the important thing (getting this out of target options) by introducing a bug while refactoring. I've also convinced myself that I just need to get back on the FMF wagon and finish the DAG work and the clang/IR changes, so we can finally be clear of this mess. If that succeeds, then the ugly bits are going to disappear anyway. :) spatel: So yes, I agree this looks ugly, and it seemed simple enough to just combine the fields into a…
	Uninitialized = -1
	};

	struct RecipParams {			struct RecipParams {
	int8_t Enabled;			bool Enabled;
	int8_t RefinementSteps;			int8_t RefinementSteps;

	RecipParams() : Enabled(Uninitialized), RefinementSteps(Uninitialized) {}			RecipParams() : Enabled(false), RefinementSteps(0) {}
	};			};

	std::map<StringRef, RecipParams> RecipMap;			std::map<StringRef, RecipParams> RecipMap;
	typedef std::map<StringRef, RecipParams>::iterator RecipIter;			typedef std::map<StringRef, RecipParams>::iterator RecipIter;
	typedef std::map<StringRef, RecipParams>::const_iterator ConstRecipIter;			typedef std::map<StringRef, RecipParams>::const_iterator ConstRecipIter;

	bool parseGlobalParams(const std::string &Arg);			bool parseGlobalParams(const std::string &Arg);
	void parseIndividualParams(const std::vector<std::string> &Args);			void parseIndividualParams(const std::vector<std::string> &Args);
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TARGET_TARGETRECIP_H			#endif // LLVM_TARGET_TARGETRECIP_H

lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines	TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {

MinCmpXchgSizeInBits = 0;		MinCmpXchgSizeInBits = 0;

std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames), nullptr);		std::fill(std::begin(LibcallRoutineNames), std::end(LibcallRoutineNames), nullptr);

InitLibcallNames(LibcallRoutineNames, TM.getTargetTriple());		InitLibcallNames(LibcallRoutineNames, TM.getTargetTriple());
InitCmpLibcallCCs(CmpLibcallCCs);		InitCmpLibcallCCs(CmpLibcallCCs);
InitLibcallCallingConvs(LibcallCallingConvs);		InitLibcallCallingConvs(LibcallCallingConvs);
		ReciprocalEstimates.set("all", false, 0);
}		}

void TargetLoweringBase::initActions() {		void TargetLoweringBase::initActions() {
// All operations default to being supported.		// All operations default to being supported.
memset(OpActions, 0, sizeof(OpActions));		memset(OpActions, 0, sizeof(OpActions));
memset(LoadExtActions, 0, sizeof(LoadExtActions));		memset(LoadExtActions, 0, sizeof(LoadExtActions));
memset(TruncStoreActions, 0, sizeof(TruncStoreActions));		memset(TruncStoreActions, 0, sizeof(TruncStoreActions));
memset(IndexedModeActions, 0, sizeof(IndexedModeActions));		memset(IndexedModeActions, 0, sizeof(IndexedModeActions));
▲ Show 20 Lines • Show All 631 Lines • ▼ Show 20 Lines	EVT TargetLoweringBase::getSetCCResultType(const DataLayout &DL, LLVMContext &,
assert(!VT.isVector() && "No default SetCC type for vectors!");		assert(!VT.isVector() && "No default SetCC type for vectors!");
return getPointerTy(DL).SimpleTy;		return getPointerTy(DL).SimpleTy;
}		}

MVT::SimpleValueType TargetLoweringBase::getCmpLibcallReturnType() const {		MVT::SimpleValueType TargetLoweringBase::getCmpLibcallReturnType() const {
return MVT::i32; // return the default value		return MVT::i32; // return the default value
}		}

		TargetRecip
		TargetLoweringBase::getTargetRecipForFunc(MachineFunction &MF) const {
		const Function *F = MF.getFunction();
		if (!F->hasFnAttribute("mrecip"))
		echristoUnsubmitted Done Reply Inline Actions Bikeshed: mrecip is pretty hard to understand (for me at least) naming wise, reciprocal-estimates while longer might be a bit more comprehensible? echristo: Bikeshed: mrecip is pretty hard to understand (for me at least) naming wise, reciprocal…
		return ReciprocalEstimates;

		// Make a copy of the target's default reciprocal codegen settings.
		TargetRecip Recips = ReciprocalEstimates;

		// Override any settings that are customized for this function.
		StringRef RecipString = F->getFnAttribute("mrecip").getValueAsString();
		Recips.set(RecipString);
		return Recips;
		}

/// getVectorTypeBreakdown - Vector types are broken down into some number of		/// getVectorTypeBreakdown - Vector types are broken down into some number of
/// legal first class types. For example, MVT::v8f32 maps to 2 MVT::v4f32		/// legal first class types. For example, MVT::v8f32 maps to 2 MVT::v4f32
/// with Altivec or SSE1, or 8 promoted MVT::f64 values with the X86 FP stack.		/// with Altivec or SSE1, or 8 promoted MVT::f64 values with the X86 FP stack.
/// Similarly, MVT::v2i64 turns into 4 MVT::i32 values with both PPC and X86.		/// Similarly, MVT::v2i64 turns into 4 MVT::i32 values with both PPC and X86.
///		///
/// This method returns the number of registers needed, and the VT for each		/// This method returns the number of registers needed, and the VT for each
/// register. It also returns the VT and quantity of the intermediate values		/// register. It also returns the VT and quantity of the intermediate values
/// before they are promoted/expanded.		/// before they are promoted/expanded.
▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 884 Lines • ▼ Show 20 Lines	PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
}		}

// Use reciprocal estimates.		// Use reciprocal estimates.
if (TM.Options.UnsafeFPMath) {		if (TM.Options.UnsafeFPMath) {
setTargetDAGCombine(ISD::FDIV);		setTargetDAGCombine(ISD::FDIV);
setTargetDAGCombine(ISD::FSQRT);		setTargetDAGCombine(ISD::FSQRT);
}		}

		// For the estimates, convergence is quadratic, so we essentially double the
		// number of digits correct after every iteration. For both FRE and FRSQRTE,
		// the minimum architected relative accuracy is 2^-5. When hasRecipPrec(),
		// this is 2^-14. IEEE float has 23 digits and double has 52 digits.
		unsigned RefinementSteps = Subtarget.hasRecipPrec() ? 1 : 3,
		RefinementSteps64 = RefinementSteps + 1;

		ReciprocalEstimates.set("sqrtf", true, RefinementSteps);
		ReciprocalEstimates.set("vec-sqrtf", true, RefinementSteps);
		ReciprocalEstimates.set("divf", true, RefinementSteps);
		ReciprocalEstimates.set("vec-divf", true, RefinementSteps);

		ReciprocalEstimates.set("sqrtd", true, RefinementSteps64);
		ReciprocalEstimates.set("vec-sqrtd", true, RefinementSteps64);
		ReciprocalEstimates.set("divd", true, RefinementSteps64);
		ReciprocalEstimates.set("vec-divd", true, RefinementSteps64);

// Darwin long double math library functions have $LDBL128 appended.		// Darwin long double math library functions have $LDBL128 appended.
if (Subtarget.isDarwin()) {		if (Subtarget.isDarwin()) {
setLibcallName(RTLIB::COS_PPCF128, "cosl$LDBL128");		setLibcallName(RTLIB::COS_PPCF128, "cosl$LDBL128");
setLibcallName(RTLIB::POW_PPCF128, "powl$LDBL128");		setLibcallName(RTLIB::POW_PPCF128, "powl$LDBL128");
setLibcallName(RTLIB::REM_PPCF128, "fmodl$LDBL128");		setLibcallName(RTLIB::REM_PPCF128, "fmodl$LDBL128");
setLibcallName(RTLIB::SIN_PPCF128, "sinl$LDBL128");		setLibcallName(RTLIB::SIN_PPCF128, "sinl$LDBL128");
setLibcallName(RTLIB::SQRT_PPCF128, "sqrtl$LDBL128");		setLibcallName(RTLIB::SQRT_PPCF128, "sqrtl$LDBL128");
setLibcallName(RTLIB::LOG_PPCF128, "logl$LDBL128");		setLibcallName(RTLIB::LOG_PPCF128, "logl$LDBL128");
▲ Show 20 Lines • Show All 8,705 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::getRsqrtEstimate(SDValue Operand,
bool &UseOneConstNR) const {		bool &UseOneConstNR) const {
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();
if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|		if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|		(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|		(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|
(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|		(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|
(VT == MVT::v4f64 && Subtarget.hasQPX())) {		(VT == MVT::v4f64 && Subtarget.hasQPX())) {
TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;		TargetRecip Recips = getTargetRecipForFunc(DCI.DAG.getMachineFunction());
std::string RecipOp = getRecipOp("sqrt", VT);		std::string RecipOp = getRecipOp("sqrt", VT);
if (!Recips.isEnabled(RecipOp))		if (!Recips.isEnabled(RecipOp))
return SDValue();		return SDValue();

RefinementSteps = Recips.getRefinementSteps(RecipOp);		RefinementSteps = Recips.getRefinementSteps(RecipOp);
UseOneConstNR = true;		UseOneConstNR = true;
return DCI.DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);		return DCI.DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);
}		}
return SDValue();		return SDValue();
}		}

SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand,		SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const {		unsigned &RefinementSteps) const {
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();
if ((VT == MVT::f32 && Subtarget.hasFRES()) \|\|		if ((VT == MVT::f32 && Subtarget.hasFRES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRE()) \|\|		(VT == MVT::f64 && Subtarget.hasFRE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|		(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|
(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|		(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|
(VT == MVT::v4f64 && Subtarget.hasQPX())) {		(VT == MVT::v4f64 && Subtarget.hasQPX())) {
TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;		TargetRecip Recips = getTargetRecipForFunc(DCI.DAG.getMachineFunction());
std::string RecipOp = getRecipOp("div", VT);		std::string RecipOp = getRecipOp("div", VT);
if (!Recips.isEnabled(RecipOp))		if (!Recips.isEnabled(RecipOp))
return SDValue();		return SDValue();

RefinementSteps = Recips.getRefinementSteps(RecipOp);		RefinementSteps = Recips.getRefinementSteps(RecipOp);
return DCI.DAG.getNode(PPCISD::FRE, SDLoc(Operand), VT, Operand);		return DCI.DAG.getNode(PPCISD::FRE, SDLoc(Operand), VT, Operand);
}		}
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 2,652 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCTargetMachine.cpp

Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	PPCTargetMachine::PPCTargetMachine(const Target &T, const Triple &TT,
CodeModel::Model CM, CodeGenOpt::Level OL)		CodeModel::Model CM, CodeGenOpt::Level OL)
: LLVMTargetMachine(T, getDataLayoutString(TT), TT, CPU,		: LLVMTargetMachine(T, getDataLayoutString(TT), TT, CPU,
computeFSAdditions(FS, OL, TT), Options,		computeFSAdditions(FS, OL, TT), Options,
getEffectiveRelocModel(TT, RM), CM, OL),		getEffectiveRelocModel(TT, RM), CM, OL),
TLOF(createTLOF(getTargetTriple())),		TLOF(createTLOF(getTargetTriple())),
TargetABI(computeTargetABI(TT, Options)),		TargetABI(computeTargetABI(TT, Options)),
Subtarget(TargetTriple, CPU, computeFSAdditions(FS, OL, TT), *this) {		Subtarget(TargetTriple, CPU, computeFSAdditions(FS, OL, TT), *this) {

// For the estimates, convergence is quadratic, so we essentially double the
// number of digits correct after every iteration. For both FRE and FRSQRTE,
// the minimum architected relative accuracy is 2^-5. When hasRecipPrec(),
// this is 2^-14. IEEE float has 23 digits and double has 52 digits.
unsigned RefinementSteps = Subtarget.hasRecipPrec() ? 1 : 3,
RefinementSteps64 = RefinementSteps + 1;

this->Options.Reciprocals.setDefaults("sqrtf", true, RefinementSteps);
this->Options.Reciprocals.setDefaults("vec-sqrtf", true, RefinementSteps);
this->Options.Reciprocals.setDefaults("divf", true, RefinementSteps);
this->Options.Reciprocals.setDefaults("vec-divf", true, RefinementSteps);

this->Options.Reciprocals.setDefaults("sqrtd", true, RefinementSteps64);
this->Options.Reciprocals.setDefaults("vec-sqrtd", true, RefinementSteps64);
this->Options.Reciprocals.setDefaults("divd", true, RefinementSteps64);
this->Options.Reciprocals.setDefaults("vec-divd", true, RefinementSteps64);

initAsmInfo();		initAsmInfo();
}		}

PPCTargetMachine::~PPCTargetMachine() {}		PPCTargetMachine::~PPCTargetMachine() {}

void PPC32TargetMachine::anchor() { }		void PPC32TargetMachine::anchor() { }

PPC32TargetMachine::PPC32TargetMachine(const Target &T, const Triple &TT,		PPC32TargetMachine::PPC32TargetMachine(const Target &T, const Triple &TT,
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

lib/Target/TargetRecip.cpp

Show All 10 Lines
// generation in a target-independent way.		// generation in a target-independent way.
// If a target does not support operations in this specification, then code		// If a target does not support operations in this specification, then code
// generation will default to using supported operations.		// generation will default to using supported operations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Target/TargetRecip.h"		#include "llvm/Target/TargetRecip.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/ADT/SmallVector.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"

using namespace llvm;		using namespace llvm;

// These are the names of the individual reciprocal operations. These are		// These are the names of the individual reciprocal operations. These are
// the key strings for queries and command-line inputs.		// the key strings for queries and command-line inputs.
// In addition, the command-line interface recognizes the global parameters		// In addition, the command-line interface recognizes the global parameters
// "all", "none", and "default".		// "all", "none", and "default".
static const char *const RecipOps[] = {		static const char *const RecipOps[] = {
"divd",		"divd",
"divf",		"divf",
"vec-divd",		"vec-divd",
"vec-divf",		"vec-divf",
"sqrtd",		"sqrtd",
"sqrtf",		"sqrtf",
"vec-sqrtd",		"vec-sqrtd",
"vec-sqrtf",		"vec-sqrtf",
};		};

// The uninitialized state is needed for the enabled settings and refinement		/// All operations are disabled by default and refinement steps are set to zero.
// steps because custom settings may arrive via the command-line before target
// defaults are set.
TargetRecip::TargetRecip() {		TargetRecip::TargetRecip() {
unsigned NumStrings = llvm::array_lengthof(RecipOps);		unsigned NumStrings = llvm::array_lengthof(RecipOps);
for (unsigned i = 0; i < NumStrings; ++i)		for (unsigned i = 0; i < NumStrings; ++i)
RecipMap.insert(std::make_pair(RecipOps[i], RecipParams()));		RecipMap.insert(std::make_pair(RecipOps[i], RecipParams()));
}		}

static bool parseRefinementStep(StringRef In, size_t &Position,		static bool parseRefinementStep(StringRef In, size_t &Position,
uint8_t &Value) {		uint8_t &Value) {
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumArgs; ++i) {
if (Iter == RecipMap.end()) {		if (Iter == RecipMap.end()) {
// Try again specifying float suffix.		// Try again specifying float suffix.
Iter = RecipMap.find(Val.str() + 'f');		Iter = RecipMap.find(Val.str() + 'f');
if (Iter == RecipMap.end()) {		if (Iter == RecipMap.end()) {
Iter = RecipMap.find(Val.str() + 'd');		Iter = RecipMap.find(Val.str() + 'd');
assert(Iter == RecipMap.end() && "Float entry missing from map");		assert(Iter == RecipMap.end() && "Float entry missing from map");
report_fatal_error("Invalid option for -recip.");		report_fatal_error("Invalid option for -recip.");
}		}

// The option was specified without a float or double suffix.
if (RecipMap[Val.str() + 'd'].Enabled != Uninitialized) {
// Make sure that the double entry was not already specified.
// The float entry will be checked below.
report_fatal_error("Duplicate option for -recip.");
}
}		}

if (Iter->second.Enabled != Uninitialized)
report_fatal_error("Duplicate option for -recip.");

// Mark the matched option as found. Do not allow duplicate specifiers.		// Mark the matched option as found. Do not allow duplicate specifiers.
Iter->second.Enabled = !IsDisabled;		Iter->second.Enabled = !IsDisabled;
if (!RefStepString.empty())		if (!RefStepString.empty())
Iter->second.RefinementSteps = RefSteps;		Iter->second.RefinementSteps = RefSteps;

// If the precision was not specified, the double entry is also initialized.		// If the precision was not specified, the double entry is also initialized.
if (Val.back() != 'f' && Val.back() != 'd') {		if (Val.back() != 'f' && Val.back() != 'd') {
RecipParams &Params = RecipMap[Val.str() + 'd'];		RecipParams &Params = RecipMap[Val.str() + 'd'];
Params.Enabled = !IsDisabled;		Params.Enabled = !IsDisabled;
if (!RefStepString.empty())		if (!RefStepString.empty())
Params.RefinementSteps = RefSteps;		Params.RefinementSteps = RefSteps;
}		}
}		}
}		}

TargetRecip::TargetRecip(const std::vector<std::string> &Args) :		void TargetRecip::set(StringRef &RecipString) {
		echristoUnsubmitted Not Done Reply Inline Actions Do we still need ::set rather than just putting it as part of the constructor? echristo: Do we still need ::set rather than just putting it as part of the constructor?
TargetRecip() {		SmallVector<StringRef, 4> RecipStringVector;
unsigned NumArgs = Args.size();		SplitString(RecipString, RecipStringVector, ",");
		std::vector<std::string> RecipVector;
		for (unsigned i = 0; i < RecipStringVector.size(); ++i)
		RecipVector.push_back(RecipStringVector[i].str());

		unsigned NumArgs = RecipVector.size();

// Check if "all", "default", or "none" was specified.		// Check if "all", "default", or "none" was specified.
if (NumArgs == 1 && parseGlobalParams(Args[0]))		if (NumArgs == 1 && parseGlobalParams(RecipVector[0]))
return;		return;

parseIndividualParams(Args);		parseIndividualParams(RecipVector);
}		}

bool TargetRecip::isEnabled(StringRef Key) const {		bool TargetRecip::isEnabled(StringRef Key) const {
ConstRecipIter Iter = RecipMap.find(Key);		ConstRecipIter Iter = RecipMap.find(Key);
assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");		assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");
assert(Iter->second.Enabled != Uninitialized &&
"Enablement setting was not initialized");
return Iter->second.Enabled;		return Iter->second.Enabled;
}		}

unsigned TargetRecip::getRefinementSteps(StringRef Key) const {		unsigned TargetRecip::getRefinementSteps(StringRef Key) const {
ConstRecipIter Iter = RecipMap.find(Key);		ConstRecipIter Iter = RecipMap.find(Key);
assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");		assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");
assert(Iter->second.RefinementSteps != Uninitialized &&
"Refinement step setting was not initialized");
return Iter->second.RefinementSteps;		return Iter->second.RefinementSteps;
}		}

/// Custom settings (previously initialized values) override target defaults.		void TargetRecip::set(StringRef Key, bool Enable, unsigned RefSteps) {
void TargetRecip::setDefaults(StringRef Key, bool Enable,
unsigned RefSteps) {
if (Key == "all") {		if (Key == "all") {
for (auto &KV : RecipMap) {		for (auto &KV : RecipMap) {
RecipParams &RP = KV.second;		RecipParams &RP = KV.second;
if (RP.Enabled == Uninitialized)
RP.Enabled = Enable;		RP.Enabled = Enable;
if (RP.RefinementSteps == Uninitialized)
RP.RefinementSteps = RefSteps;		RP.RefinementSteps = RefSteps;
}		}
} else {		} else {
RecipParams &RP = RecipMap[Key];		RecipParams &RP = RecipMap[Key];
if (RP.Enabled == Uninitialized)
RP.Enabled = Enable;		RP.Enabled = Enable;
if (RP.RefinementSteps == Uninitialized)
RP.RefinementSteps = RefSteps;		RP.RefinementSteps = RefSteps;
}		}
}		}

bool TargetRecip::operator==(const TargetRecip &Other) const {		bool TargetRecip::operator==(const TargetRecip &Other) const {
for (const auto &KV : RecipMap) {		for (const auto &KV : RecipMap) {
StringRef Op = KV.first;		StringRef Op = KV.first;
const RecipParams &RP = KV.second;		const RecipParams &RP = KV.second;
const RecipParams &OtherRP = Other.RecipMap.find(Op)->second;		const RecipParams &OtherRP = Other.RecipMap.find(Op)->second;
if (RP.RefinementSteps != OtherRP.RefinementSteps)		if (RP.RefinementSteps != OtherRP.RefinementSteps)
return false;		return false;
if (RP.Enabled != OtherRP.Enabled)		if (RP.Enabled != OtherRP.Enabled)
return false;		return false;
}		}
return true;		return true;
}		}

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
		#include "llvm/Target/TargetRecip.h"
#include "X86IntrinsicsInfo.h"		#include "X86IntrinsicsInfo.h"
#include <bitset>		#include <bitset>
#include <numeric>		#include <numeric>
#include <cctype>		#include <cctype>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "x86-isel"		#define DEBUG_TYPE "x86-isel"

Show All 15 Lines	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,

// Set up the TargetLowering object.		// Set up the TargetLowering object.

// X86 is weird. It always uses i8 for shift amounts and setcc results.		// X86 is weird. It always uses i8 for shift amounts and setcc results.
setBooleanContents(ZeroOrOneBooleanContent);		setBooleanContents(ZeroOrOneBooleanContent);
// X86-SSE is even stranger. It uses -1 or 0 for vector masks.		// X86-SSE is even stranger. It uses -1 or 0 for vector masks.
setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);		setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);

		// By default (and when -ffast-math is on), enable estimate codegen with 1
		// refinement step for floats (not doubles) except scalar division. Scalar
		// division estimates are disabled because they break too much real-world
		// code. These defaults are intended to match GCC behavior.
		ReciprocalEstimates.set("sqrtf", true, 1);
		ReciprocalEstimates.set("divf", false, 1);
		ReciprocalEstimates.set("vec-sqrtf", true, 1);
		ReciprocalEstimates.set("vec-divf", true, 1);

// For 64-bit, since we have so many registers, use the ILP scheduler.		// For 64-bit, since we have so many registers, use the ILP scheduler.
// For 32-bit, use the register pressure specific scheduling.		// For 32-bit, use the register pressure specific scheduling.
// For Atom, always use ILP scheduling.		// For Atom, always use ILP scheduling.
if (Subtarget.isAtom())		if (Subtarget.isAtom())
setSchedulingPreference(Sched::ILP);		setSchedulingPreference(Sched::ILP);
else if (Subtarget.is64Bit())		else if (Subtarget.is64Bit())
setSchedulingPreference(Sched::ILP);		setSchedulingPreference(Sched::ILP);
else		else
▲ Show 20 Lines • Show All 15,109 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::getRsqrtEstimate(SDValue Op,
if (VT == MVT::f32 && Subtarget.hasSSE1())		if (VT == MVT::f32 && Subtarget.hasSSE1())
RecipOp = "sqrtf";		RecipOp = "sqrtf";
else if ((VT == MVT::v4f32 && Subtarget.hasSSE1()) \|\|		else if ((VT == MVT::v4f32 && Subtarget.hasSSE1()) \|\|
(VT == MVT::v8f32 && Subtarget.hasAVX()))		(VT == MVT::v8f32 && Subtarget.hasAVX()))
RecipOp = "vec-sqrtf";		RecipOp = "vec-sqrtf";
else		else
return SDValue();		return SDValue();

TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;		TargetRecip Recips = getTargetRecipForFunc(DCI.DAG.getMachineFunction());
if (!Recips.isEnabled(RecipOp))		if (!Recips.isEnabled(RecipOp))
return SDValue();		return SDValue();

RefinementSteps = Recips.getRefinementSteps(RecipOp);		RefinementSteps = Recips.getRefinementSteps(RecipOp);
UseOneConstNR = false;		UseOneConstNR = false;
return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);		return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);
}		}

Show All 15 Lines	SDValue X86TargetLowering::getRecipEstimate(SDValue Op,
if (VT == MVT::f32 && Subtarget.hasSSE1())		if (VT == MVT::f32 && Subtarget.hasSSE1())
RecipOp = "divf";		RecipOp = "divf";
else if ((VT == MVT::v4f32 && Subtarget.hasSSE1()) \|\|		else if ((VT == MVT::v4f32 && Subtarget.hasSSE1()) \|\|
(VT == MVT::v8f32 && Subtarget.hasAVX()))		(VT == MVT::v8f32 && Subtarget.hasAVX()))
RecipOp = "vec-divf";		RecipOp = "vec-divf";
else		else
return SDValue();		return SDValue();

TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;		TargetRecip Recips = getTargetRecipForFunc(DCI.DAG.getMachineFunction());
if (!Recips.isEnabled(RecipOp))		if (!Recips.isEnabled(RecipOp))
return SDValue();		return SDValue();

RefinementSteps = Recips.getRefinementSteps(RecipOp);		RefinementSteps = Recips.getRefinementSteps(RecipOp);
return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);		return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);
}		}

/// If we have at least two divisions that use the same divisor, convert to		/// If we have at least two divisions that use the same divisor, convert to
▲ Show 20 Lines • Show All 17,346 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	X86TargetMachine::X86TargetMachine(const Target &T, const Triple &TT,
// On PS4, the "return address" of a 'noreturn' call must still be within		// On PS4, the "return address" of a 'noreturn' call must still be within
// the calling function, and TrapUnreachable is an easy way to get that.		// the calling function, and TrapUnreachable is an easy way to get that.
// The check here for 64-bit windows is a bit icky, but as we're unlikely		// The check here for 64-bit windows is a bit icky, but as we're unlikely
// to ever want to mix 32 and 64-bit windows code in a single module		// to ever want to mix 32 and 64-bit windows code in a single module
// this should be fine.		// this should be fine.
if ((TT.isOSWindows() && TT.getArch() == Triple::x86_64) \|\| TT.isPS4())		if ((TT.isOSWindows() && TT.getArch() == Triple::x86_64) \|\| TT.isPS4())
this->Options.TrapUnreachable = true;		this->Options.TrapUnreachable = true;

// By default (and when -ffast-math is on), enable estimate codegen for
// everything except scalar division. By default, use 1 refinement step for
// all operations. Defaults may be overridden by using command-line options.
// Scalar division estimates are disabled because they break too much
// real-world code. These defaults match GCC behavior.
this->Options.Reciprocals.setDefaults("sqrtf", true, 1);
this->Options.Reciprocals.setDefaults("divf", false, 1);
this->Options.Reciprocals.setDefaults("vec-sqrtf", true, 1);
this->Options.Reciprocals.setDefaults("vec-divf", true, 1);

initAsmInfo();		initAsmInfo();
}		}

X86TargetMachine::~X86TargetMachine() {}		X86TargetMachine::~X86TargetMachine() {}

const X86Subtarget *		const X86Subtarget *
X86TargetMachine::getSubtargetImpl(const Function &F) const {		X86TargetMachine::getSubtargetImpl(const Function &F) const {
Attribute CPUAttr = F.getFnAttribute("target-cpu");		Attribute CPUAttr = F.getFnAttribute("target-cpu");
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/recipest.ll

	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx \| FileCheck %s
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -enable-unsafe-fp-math -mattr=-vsx -recip=sqrtf:0,sqrtd:0 \| FileCheck %s -check-prefix=CHECK-NONR
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=-vsx \| FileCheck -check-prefix=CHECK-SAFE %s

	target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"			target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)
	declare float @llvm.sqrt.f32(float)			declare float @llvm.sqrt.f32(float)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)

	define double @foo(double %a, double %b) nounwind {			define double @foo(double %a, double %b) nounwind {
	%x = call double @llvm.sqrt.f64(double %b)			%x = call double @llvm.sqrt.f64(double %b)
	%r = fdiv double %a, %x			%r = fdiv double %a, %x
	ret double %r			ret double %r

	; CHECK: @foo			; CHECK: @foo
	; CHECK-DAG: frsqrte			; CHECK-DAG: frsqrte
	; CHECK-DAG: fnmsub			; CHECK-DAG: fnmsub
	; CHECK: fmul			; CHECK: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmadd			; CHECK-NEXT: fmadd
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK-NEXT: fmul			; CHECK-NEXT: fmul
	; CHECK: blr			; CHECK: blr

	; CHECK-NONR: @foo
	; CHECK-NONR: frsqrte
	; CHECK-NONR-NOT: fmadd
	; CHECK-NONR: fmul
	; CHECK-NONR-NOT: fmadd
	; CHECK-NONR: blr

	; CHECK-SAFE: @foo			; CHECK-SAFE: @foo
	; CHECK-SAFE: fsqrt			; CHECK-SAFE: fsqrt
	; CHECK-SAFE: fdiv			; CHECK-SAFE: fdiv
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}			}

				define double @no_estimate_refinement_f64(double %a, double %b) #0 {
				%x = call double @llvm.sqrt.f64(double %b)
				%r = fdiv double %a, %x
				ret double %r

				; CHECK-LABEL: @no_estimate_refinement_f64
				; CHECK: frsqrte
				; CHECK-NOT: fmadd
				; CHECK: fmul
				; CHECK-NOT: fmadd
				; CHECK: blr
				}


	define double @foof(double %a, float %b) nounwind {			define double @foof(double %a, float %b) nounwind {
	%x = call float @llvm.sqrt.f32(float %b)			%x = call float @llvm.sqrt.f32(float %b)
	%y = fpext float %x to double			%y = fpext float %x to double
	%r = fdiv double %a, %y			%r = fdiv double %a, %y
	ret double %r			ret double %r

	; CHECK: @foof			; CHECK: @foof
	; CHECK-DAG: frsqrtes			; CHECK-DAG: frsqrtes
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-DAG: frsqrtes			; CHECK-DAG: frsqrtes
	; CHECK-DAG: fnmsubs			; CHECK-DAG: fnmsubs
	; CHECK: fmuls			; CHECK: fmuls
	; CHECK-NEXT: fmadds			; CHECK-NEXT: fmadds
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: fmuls			; CHECK-NEXT: fmuls
	; CHECK-NEXT: blr			; CHECK-NEXT: blr

	; CHECK-NONR: @goo
	; CHECK-NONR: frsqrtes
	; CHECK-NONR-NOT: fmadds
	; CHECK-NONR: fmuls
	; CHECK-NONR-NOT: fmadds
	; CHECK-NONR: blr

	; CHECK-SAFE: @goo			; CHECK-SAFE: @goo
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}			}


				define float @no_estimate_refinement_f32(float %a, float %b) #0 {
				%x = call float @llvm.sqrt.f32(float %b)
				%r = fdiv float %a, %x
				ret float %r

				; CHECK-LABEL: @no_estimate_refinement_f32
				; CHECK: frsqrtes
				; CHECK-NOT: fmadds
				; CHECK: fmuls
				; CHECK-NOT: fmadds
				; CHECK: blr
				}

	; Recognize that this is rsqrt(a) * rcp(b) * c,			; Recognize that this is rsqrt(a) * rcp(b) * c,
	; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.			; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.
	define float @rsqrt_fmul(float %a, float %b, float %c) {			define float @rsqrt_fmul(float %a, float %b, float %c) {
	%x = call float @llvm.sqrt.f32(float %a)			%x = call float @llvm.sqrt.f32(float %a)
	%y = fmul float %x, %b			%y = fmul float %x, %b
	%z = fdiv float %c, %y			%z = fdiv float %c, %y
	ret float %z			ret float %z

	▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	; CHECK: vrsqrtefp			; CHECK: vrsqrtefp
	; CHECK-DAG: vcmpeqfp			; CHECK-DAG: vcmpeqfp

	; CHECK-SAFE: @hoo3			; CHECK-SAFE: @hoo3
	; CHECK-SAFE-NOT: vrsqrtefp			; CHECK-SAFE-NOT: vrsqrtefp
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}			}

				attributes #0 = { nounwind "mrecip"="sqrtf:0,sqrtd:0" }

test/CodeGen/X86/recip-fastmath.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 -recip=!divf,!vec-divf \| FileCheck %s --check-prefix=NORECIP			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf,vec-divf \| FileCheck %s --check-prefix=RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=AVX
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf:2,vec-divf:2 \| FileCheck %s --check-prefix=REFINE

	; If the target's divss/divps instructions are substantially			; If the target's divss/divps instructions are substantially
	; slower than rcpss/rcpps with a Newton-Raphson refinement,			; slower than rcpss/rcpps with a Newton-Raphson refinement,
	; we should generate the estimate sequence.			; we should generate the estimate sequence.

	; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )			; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )
	; for details about the accuracy, speed, and implementation			; for details about the accuracy, speed, and implementation
	; differences of x86 reciprocal estimates.			; differences of x86 reciprocal estimates.

	define float @reciprocal_estimate(float %x) #0 {			define float @f32_no_estimate(float %x) #0 {
				; AVX-LABEL: f32_no_estimate:
				; AVX: # BB#0:
				; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; AVX-NEXT: vdivss %xmm0, %xmm1, %xmm0
				; AVX-NEXT: retq
				;
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div
				}

				define float @f32_one_step(float %x) #1 {
				; AVX-LABEL: f32_one_step:
				; AVX: # BB#0:
				; AVX-NEXT: vrcpss %xmm0, %xmm0, %xmm1
				; AVX-NEXT: vmulss %xmm1, %xmm0, %xmm0
				; AVX-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
				; AVX-NEXT: vsubss %xmm0, %xmm2, %xmm0
				; AVX-NEXT: vmulss %xmm0, %xmm1, %xmm0
				; AVX-NEXT: vaddss %xmm0, %xmm1, %xmm0
				; AVX-NEXT: retq
				;
				%div = fdiv fast float 1.0, %x
				ret float %div
				}

				define float @f32_two_step(float %x) #2 {
				; AVX-LABEL: f32_two_step:
				; AVX: # BB#0:
				; AVX-NEXT: vrcpss %xmm0, %xmm0, %xmm1
				; AVX-NEXT: vmulss %xmm1, %xmm0, %xmm2
				; AVX-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
				; AVX-NEXT: vsubss %xmm2, %xmm3, %xmm2
				; AVX-NEXT: vmulss %xmm2, %xmm1, %xmm2
				; AVX-NEXT: vaddss %xmm2, %xmm1, %xmm1
				; AVX-NEXT: vmulss %xmm1, %xmm0, %xmm0
				; AVX-NEXT: vsubss %xmm0, %xmm3, %xmm0
				; AVX-NEXT: vmulss %xmm0, %xmm1, %xmm0
				; AVX-NEXT: vaddss %xmm0, %xmm1, %xmm0
				; AVX-NEXT: retq
				;
				%div = fdiv fast float 1.0, %x
				ret float %div
				}

				define <4 x float> @v4f32_no_estimate(<4 x float> %x) #0 {
				; AVX-LABEL: v4f32_no_estimate:
				; AVX: # BB#0:
				; AVX-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
				; AVX-NEXT: vdivps %xmm0, %xmm1, %xmm0
				; AVX-NEXT: retq
				;
				%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
				ret <4 x float> %div
				}

	; NORECIP-LABEL: reciprocal_estimate:			define <4 x float> @v4f32_one_step(<4 x float> %x) #1 {
	; NORECIP: movss			; AVX-LABEL: v4f32_one_step:
	; NORECIP-NEXT: divss			; AVX: # BB#0:
	; NORECIP-NEXT: movaps			; AVX-NEXT: vrcpps %xmm0, %xmm1
	; NORECIP-NEXT: retq			; AVX-NEXT: vmulps %xmm1, %xmm0, %xmm0
				; AVX-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; RECIP-LABEL: reciprocal_estimate:			; AVX-NEXT: vsubps %xmm0, %xmm2, %xmm0
	; RECIP: vrcpss			; AVX-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; RECIP: vmulss			; AVX-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; RECIP: vsubss			; AVX-NEXT: retq
	; RECIP: vmulss			;
	; RECIP: vaddss			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	; RECIP-NEXT: retq			ret <4 x float> %div

	; REFINE-LABEL: reciprocal_estimate:
	; REFINE: vrcpss
	; REFINE: vmulss
	; REFINE: vsubss
	; REFINE: vmulss
	; REFINE: vaddss
	; REFINE: vmulss
	; REFINE: vsubss
	; REFINE: vmulss
	; REFINE: vaddss
	; REFINE-NEXT: retq
	}			}

	define <4 x float> @reciprocal_estimate_v4f32(<4 x float> %x) #0 {			define <4 x float> @v4f32_two_step(<4 x float> %x) #2 {
				; AVX-LABEL: v4f32_two_step:
				; AVX: # BB#0:
				; AVX-NEXT: vrcpps %xmm0, %xmm1
				; AVX-NEXT: vmulps %xmm1, %xmm0, %xmm2
				; AVX-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
				; AVX-NEXT: vsubps %xmm2, %xmm3, %xmm2
				; AVX-NEXT: vmulps %xmm2, %xmm1, %xmm2
				; AVX-NEXT: vaddps %xmm2, %xmm1, %xmm1
				; AVX-NEXT: vmulps %xmm1, %xmm0, %xmm0
				; AVX-NEXT: vsubps %xmm0, %xmm3, %xmm0
				; AVX-NEXT: vmulps %xmm0, %xmm1, %xmm0
				; AVX-NEXT: vaddps %xmm0, %xmm1, %xmm0
				; AVX-NEXT: retq
				;
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <4 x float> %div			ret <4 x float> %div
				}

	; NORECIP-LABEL: reciprocal_estimate_v4f32:			define <8 x float> @v8f32_no_estimate(<8 x float> %x) #0 {
	; NORECIP: movaps			; AVX-LABEL: v8f32_no_estimate:
	; NORECIP-NEXT: divps			; AVX: # BB#0:
	; NORECIP-NEXT: movaps			; AVX-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; NORECIP-NEXT: retq			; AVX-NEXT: vdivps %ymm0, %ymm1, %ymm0
				; AVX-NEXT: retq
	; RECIP-LABEL: reciprocal_estimate_v4f32:			;
	; RECIP: vrcpps			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	; RECIP: vmulps			ret <8 x float> %div
	; RECIP: vsubps
	; RECIP: vmulps
	; RECIP: vaddps
	; RECIP-NEXT: retq

	; REFINE-LABEL: reciprocal_estimate_v4f32:
	; REFINE: vrcpps
	; REFINE: vmulps
	; REFINE: vsubps
	; REFINE: vmulps
	; REFINE: vaddps
	; REFINE: vmulps
	; REFINE: vsubps
	; REFINE: vmulps
	; REFINE: vaddps
	; REFINE-NEXT: retq
	}			}

	define <8 x float> @reciprocal_estimate_v8f32(<8 x float> %x) #0 {			define <8 x float> @v8f32_one_step(<8 x float> %x) #1 {
				; AVX-LABEL: v8f32_one_step:
				; AVX: # BB#0:
				; AVX-NEXT: vrcpps %ymm0, %ymm1
				; AVX-NEXT: vmulps %ymm1, %ymm0, %ymm0
				; AVX-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
				; AVX-NEXT: vsubps %ymm0, %ymm2, %ymm0
				; AVX-NEXT: vmulps %ymm0, %ymm1, %ymm0
				; AVX-NEXT: vaddps %ymm0, %ymm1, %ymm0
				; AVX-NEXT: retq
				;
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div
				}

	; NORECIP-LABEL: reciprocal_estimate_v8f32:			define <8 x float> @v8f32_two_step(<8 x float> %x) #2 {
	; NORECIP: movaps			; AVX-LABEL: v8f32_two_step:
	; NORECIP: movaps			; AVX: # BB#0:
	; NORECIP-NEXT: divps			; AVX-NEXT: vrcpps %ymm0, %ymm1
	; NORECIP-NEXT: divps			; AVX-NEXT: vmulps %ymm1, %ymm0, %ymm2
	; NORECIP-NEXT: movaps			; AVX-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; NORECIP-NEXT: movaps			; AVX-NEXT: vsubps %ymm2, %ymm3, %ymm2
	; NORECIP-NEXT: retq			; AVX-NEXT: vmulps %ymm2, %ymm1, %ymm2
				; AVX-NEXT: vaddps %ymm2, %ymm1, %ymm1
	; RECIP-LABEL: reciprocal_estimate_v8f32:			; AVX-NEXT: vmulps %ymm1, %ymm0, %ymm0
	; RECIP: vrcpps			; AVX-NEXT: vsubps %ymm0, %ymm3, %ymm0
	; RECIP: vmulps			; AVX-NEXT: vmulps %ymm0, %ymm1, %ymm0
	; RECIP: vsubps			; AVX-NEXT: vaddps %ymm0, %ymm1, %ymm0
	; RECIP: vmulps			; AVX-NEXT: retq
	; RECIP: vaddps			;
	; RECIP-NEXT: retq			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
				ret <8 x float> %div
	; REFINE-LABEL: reciprocal_estimate_v8f32:
	; REFINE: vrcpps
	; REFINE: vmulps
	; REFINE: vsubps
	; REFINE: vmulps
	; REFINE: vaddps
	; REFINE: vmulps
	; REFINE: vsubps
	; REFINE: vmulps
	; REFINE: vaddps
	; REFINE-NEXT: retq
	}			}

	attributes #0 = { "unsafe-fp-math"="true" }			attributes #0 = { "unsafe-fp-math"="true" "mrecip"="!divf,!vec-divf" }
				attributes #1 = { "unsafe-fp-math"="true" "mrecip"="divf,vec-divf" }
				attributes #2 = { "unsafe-fp-math"="true" "mrecip"="divf:2,vec-divf:2" }

test/CodeGen/X86/sqrt-fastmath-mir.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2,fma -recip=sqrt:2 -stop-after=expand-isel-pseudos 2>&1 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2,fma -stop-after=expand-isel-pseudos 2>&1 \| FileCheck %s

	declare float @llvm.sqrt.f32(float) #0			declare float @llvm.sqrt.f32(float) #0

	define float @foo(float %f) #0 {			define float @foo(float %f) #0 {
	; CHECK: {{name: *foo}}			; CHECK: {{name: *foo}}
	; CHECK: body:			; CHECK: body:
	; CHECK: %0 = COPY %xmm0			; CHECK: %0 = COPY %xmm0
	; CHECK: %1 = VRSQRTSSr killed %2, %0			; CHECK: %1 = VRSQRTSSr killed %2, %0
	Show All 33 Lines
	; CHECK: %12 = VMULSSrr killed %11, killed %10			; CHECK: %12 = VMULSSrr killed %11, killed %10
	; CHECK: %xmm0 = COPY %12			; CHECK: %xmm0 = COPY %12
	; CHECK: RET 0, %xmm0			; CHECK: RET 0, %xmm0
	%sqrt = tail call float @llvm.sqrt.f32(float %f)			%sqrt = tail call float @llvm.sqrt.f32(float %f)
	%div = fdiv fast float 1.0, %sqrt			%div = fdiv fast float 1.0, %sqrt
	ret float %div			ret float %div
	}			}

	attributes #0 = { "unsafe-fp-math"="true" }			attributes #0 = { "unsafe-fp-math"="true" "mrecip"="sqrt:2" }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }

test/CodeGen/X86/sqrt-fastmath.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 -recip=!sqrtf,!vec-sqrtf,!divf,!vec-divf \| FileCheck %s --check-prefix=NORECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=AVX
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=sqrtf,vec-sqrtf \| FileCheck %s --check-prefix=ESTIMATE
				declare double @__sqrt_finite(double)
	declare double @__sqrt_finite(double) #0			declare float @__sqrtf_finite(float)
	declare float @__sqrtf_finite(float) #0			declare x86_fp80 @__sqrtl_finite(x86_fp80)
	declare x86_fp80 @__sqrtl_finite(x86_fp80) #0			declare float @llvm.sqrt.f32(float)
	declare float @llvm.sqrt.f32(float) #0			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #0			declare <8 x float> @llvm.sqrt.v8f32(<8 x float>)
	declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0

				define double @finite_f64_no_estimate(double %d) #0 {
	define double @fd(double %d) #0 {			; AVX-LABEL: finite_f64_no_estimate:
	; NORECIP-LABEL: fd:			; AVX: # BB#0:
	; NORECIP: # BB#0:			; AVX-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0
	; NORECIP-NEXT: sqrtsd %xmm0, %xmm0			; AVX-NEXT: retq
	; NORECIP-NEXT: retq			;
	;			%call = tail call double @__sqrt_finite(double %d) #2
	; ESTIMATE-LABEL: fd:
	; ESTIMATE: # BB#0:
	; ESTIMATE-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0
	; ESTIMATE-NEXT: retq
	%call = tail call double @__sqrt_finite(double %d) #1
	ret double %call			ret double %call
	}			}

				; No estimates for doubles.

	define float @ff(float %f) #0 {			define double @finite_f64_estimate(double %d) #1 {
	; NORECIP-LABEL: ff:			; AVX-LABEL: finite_f64_estimate:
	; NORECIP: # BB#0:			; AVX: # BB#0:
	; NORECIP-NEXT: sqrtss %xmm0, %xmm0			; AVX-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0
	; NORECIP-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ESTIMATE-LABEL: ff:			%call = tail call double @__sqrt_finite(double %d) #2
	; ESTIMATE: # BB#0:			ret double %call
	; ESTIMATE-NEXT: vrsqrtss %xmm0, %xmm0, %xmm1			}
	; ESTIMATE-NEXT: vmulss %xmm1, %xmm0, %xmm2
	; ESTIMATE-NEXT: vmulss %xmm1, %xmm2, %xmm1			define float @finite_f32_no_estimate(float %f) #0 {
	; ESTIMATE-NEXT: vaddss {{.*}}(%rip), %xmm1, %xmm1			; AVX-LABEL: finite_f32_no_estimate:
	; ESTIMATE-NEXT: vmulss {{.*}}(%rip), %xmm2, %xmm2			; AVX: # BB#0:
	; ESTIMATE-NEXT: vmulss %xmm1, %xmm2, %xmm1			; AVX-NEXT: vsqrtss %xmm0, %xmm0, %xmm0
	; ESTIMATE-NEXT: vxorps %xmm2, %xmm2, %xmm2			; AVX-NEXT: retq
	; ESTIMATE-NEXT: vcmpeqss %xmm2, %xmm0, %xmm0			;
	; ESTIMATE-NEXT: vandnps %xmm1, %xmm0, %xmm0			%call = tail call float @__sqrtf_finite(float %f) #2
	; ESTIMATE-NEXT: retq
	%call = tail call float @__sqrtf_finite(float %f) #1
	ret float %call			ret float %call
	}			}

				define float @finite_f32_estimate(float %f) #1 {
				; AVX-LABEL: finite_f32_estimate:
				; AVX: # BB#0:
				; AVX-NEXT: vrsqrtss %xmm0, %xmm0, %xmm1
				; AVX-NEXT: vmulss %xmm1, %xmm0, %xmm2
				; AVX-NEXT: vmulss %xmm1, %xmm2, %xmm1
				; AVX-NEXT: vaddss {{.*}}(%rip), %xmm1, %xmm1
				; AVX-NEXT: vmulss {{.*}}(%rip), %xmm2, %xmm2
				; AVX-NEXT: vmulss %xmm1, %xmm2, %xmm1
				; AVX-NEXT: vxorps %xmm2, %xmm2, %xmm2
				; AVX-NEXT: vcmpeqss %xmm2, %xmm0, %xmm0
				; AVX-NEXT: vandnps %xmm1, %xmm0, %xmm0
				; AVX-NEXT: retq
				;
				%call = tail call float @__sqrtf_finite(float %f) #2
				ret float %call
				}

	define x86_fp80 @fld(x86_fp80 %ld) #0 {			define x86_fp80 @finite_f80_no_estimate(x86_fp80 %ld) #0 {
	; NORECIP-LABEL: fld:			; AVX-LABEL: finite_f80_no_estimate:
	; NORECIP: # BB#0:			; AVX: # BB#0:
	; NORECIP-NEXT: fldt {{[0-9]+}}(%rsp)			; AVX-NEXT: fldt {{[0-9]+}}(%rsp)
	; NORECIP-NEXT: fsqrt			; AVX-NEXT: fsqrt
	; NORECIP-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ESTIMATE-LABEL: fld:			%call = tail call x86_fp80 @__sqrtl_finite(x86_fp80 %ld) #2
	; ESTIMATE: # BB#0:
	; ESTIMATE-NEXT: fldt {{[0-9]+}}(%rsp)
	; ESTIMATE-NEXT: fsqrt
	; ESTIMATE-NEXT: retq
	%call = tail call x86_fp80 @__sqrtl_finite(x86_fp80 %ld) #1
	ret x86_fp80 %call			ret x86_fp80 %call
	}			}

				; Don't die on the impossible.

				define x86_fp80 @finite_f80_estimate_but_no(x86_fp80 %ld) #1 {
				; AVX-LABEL: finite_f80_estimate_but_no:
				; AVX: # BB#0:
				; AVX-NEXT: fldt {{[0-9]+}}(%rsp)
				; AVX-NEXT: fsqrt
				; AVX-NEXT: retq
				;
				%call = tail call x86_fp80 @__sqrtl_finite(x86_fp80 %ld) #2
				ret x86_fp80 %call
				}

	define float @reciprocal_square_root(float %x) #0 {			define float @f32_no_estimate(float %x) #0 {
	; NORECIP-LABEL: reciprocal_square_root:			; AVX-LABEL: f32_no_estimate:
	; NORECIP: # BB#0:			; AVX: # BB#0:
	; NORECIP-NEXT: sqrtss %xmm0, %xmm1			; AVX-NEXT: vsqrtss %xmm0, %xmm0, %xmm0
	; NORECIP-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; NORECIP-NEXT: divss %xmm1, %xmm0			; AVX-NEXT: vdivss %xmm0, %xmm1, %xmm0
	; NORECIP-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ESTIMATE-LABEL: reciprocal_square_root:
	; ESTIMATE: # BB#0:
	; ESTIMATE-NEXT: vrsqrtss %xmm0, %xmm0, %xmm1
	; ESTIMATE-NEXT: vmulss %xmm1, %xmm1, %xmm2
	; ESTIMATE-NEXT: vmulss %xmm2, %xmm0, %xmm0
	; ESTIMATE-NEXT: vaddss {{.*}}(%rip), %xmm0, %xmm0
	; ESTIMATE-NEXT: vmulss {{.*}}(%rip), %xmm1, %xmm1
	; ESTIMATE-NEXT: vmulss %xmm0, %xmm1, %xmm0
	; ESTIMATE-NEXT: retq
	%sqrt = tail call float @llvm.sqrt.f32(float %x)			%sqrt = tail call float @llvm.sqrt.f32(float %x)
	%div = fdiv fast float 1.0, %sqrt			%div = fdiv fast float 1.0, %sqrt
	ret float %div			ret float %div
	}			}

	define <4 x float> @reciprocal_square_root_v4f32(<4 x float> %x) #0 {			define float @f32_estimate(float %x) #1 {
	; NORECIP-LABEL: reciprocal_square_root_v4f32:			; AVX-LABEL: f32_estimate:
	; NORECIP: # BB#0:			; AVX: # BB#0:
	; NORECIP-NEXT: sqrtps %xmm0, %xmm1			; AVX-NEXT: vrsqrtss %xmm0, %xmm0, %xmm1
	; NORECIP-NEXT: movaps {{.*#+}} xmm0 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; AVX-NEXT: vmulss %xmm1, %xmm1, %xmm2
	; NORECIP-NEXT: divps %xmm1, %xmm0			; AVX-NEXT: vmulss %xmm2, %xmm0, %xmm0
	; NORECIP-NEXT: retq			; AVX-NEXT: vaddss {{.*}}(%rip), %xmm0, %xmm0
	;			; AVX-NEXT: vmulss {{.*}}(%rip), %xmm1, %xmm1
	; ESTIMATE-LABEL: reciprocal_square_root_v4f32:			; AVX-NEXT: vmulss %xmm0, %xmm1, %xmm0
	; ESTIMATE: # BB#0:			; AVX-NEXT: retq
	; ESTIMATE-NEXT: vrsqrtps %xmm0, %xmm1			;
	; ESTIMATE-NEXT: vmulps %xmm1, %xmm1, %xmm2			%sqrt = tail call float @llvm.sqrt.f32(float %x)
	; ESTIMATE-NEXT: vmulps %xmm2, %xmm0, %xmm0			%div = fdiv fast float 1.0, %sqrt
	; ESTIMATE-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0			ret float %div
	; ESTIMATE-NEXT: vmulps {{.*}}(%rip), %xmm1, %xmm1			}
	; ESTIMATE-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; ESTIMATE-NEXT: retq			define <4 x float> @v4f32_no_estimate(<4 x float> %x) #0 {
				; AVX-LABEL: v4f32_no_estimate:
				; AVX: # BB#0:
				; AVX-NEXT: vsqrtps %xmm0, %xmm0
				; AVX-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
				; AVX-NEXT: vdivps %xmm0, %xmm1, %xmm0
				; AVX-NEXT: retq
				;
	%sqrt = tail call <4 x float> @llvm.sqrt.v4f32(<4 x float> %x)			%sqrt = tail call <4 x float> @llvm.sqrt.v4f32(<4 x float> %x)
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %sqrt			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %sqrt
	ret <4 x float> %div			ret <4 x float> %div
	}			}

	define <8 x float> @reciprocal_square_root_v8f32(<8 x float> %x) #0 {			define <4 x float> @v4f32_estimate(<4 x float> %x) #1 {
	; NORECIP-LABEL: reciprocal_square_root_v8f32:			; AVX-LABEL: v4f32_estimate:
	; NORECIP: # BB#0:			; AVX: # BB#0:
	; NORECIP-NEXT: sqrtps %xmm1, %xmm2			; AVX-NEXT: vrsqrtps %xmm0, %xmm1
	; NORECIP-NEXT: sqrtps %xmm0, %xmm3			; AVX-NEXT: vmulps %xmm1, %xmm1, %xmm2
	; NORECIP-NEXT: movaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; AVX-NEXT: vmulps %xmm2, %xmm0, %xmm0
	; NORECIP-NEXT: movaps %xmm1, %xmm0			; AVX-NEXT: vaddps {{.*}}(%rip), %xmm0, %xmm0
	; NORECIP-NEXT: divps %xmm3, %xmm0			; AVX-NEXT: vmulps {{.*}}(%rip), %xmm1, %xmm1
	; NORECIP-NEXT: divps %xmm2, %xmm1			; AVX-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; NORECIP-NEXT: retq			; AVX-NEXT: retq
	;			;
	; ESTIMATE-LABEL: reciprocal_square_root_v8f32:			%sqrt = tail call <4 x float> @llvm.sqrt.v4f32(<4 x float> %x)
	; ESTIMATE: # BB#0:			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %sqrt
	; ESTIMATE-NEXT: vrsqrtps %ymm0, %ymm1			ret <4 x float> %div
	; ESTIMATE-NEXT: vmulps %ymm1, %ymm1, %ymm2			}
	; ESTIMATE-NEXT: vmulps %ymm2, %ymm0, %ymm0
	; ESTIMATE-NEXT: vaddps {{.*}}(%rip), %ymm0, %ymm0			define <8 x float> @v8f32_no_estimate(<8 x float> %x) #0 {
	; ESTIMATE-NEXT: vmulps {{.*}}(%rip), %ymm1, %ymm1			; AVX-LABEL: v8f32_no_estimate:
	; ESTIMATE-NEXT: vmulps %ymm0, %ymm1, %ymm0			; AVX: # BB#0:
	; ESTIMATE-NEXT: retq			; AVX-NEXT: vsqrtps %ymm0, %ymm0
				; AVX-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
				; AVX-NEXT: vdivps %ymm0, %ymm1, %ymm0
				; AVX-NEXT: retq
				;
				%sqrt = tail call <8 x float> @llvm.sqrt.v8f32(<8 x float> %x)
				%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %sqrt
				ret <8 x float> %div
				}

				define <8 x float> @v8f32_estimate(<8 x float> %x) #1 {
				; AVX-LABEL: v8f32_estimate:
				; AVX: # BB#0:
				; AVX-NEXT: vrsqrtps %ymm0, %ymm1
				; AVX-NEXT: vmulps %ymm1, %ymm1, %ymm2
				; AVX-NEXT: vmulps %ymm2, %ymm0, %ymm0
				; AVX-NEXT: vaddps {{.*}}(%rip), %ymm0, %ymm0
				; AVX-NEXT: vmulps {{.*}}(%rip), %ymm1, %ymm1
				; AVX-NEXT: vmulps %ymm0, %ymm1, %ymm0
				; AVX-NEXT: retq
				;
	%sqrt = tail call <8 x float> @llvm.sqrt.v8f32(<8 x float> %x)			%sqrt = tail call <8 x float> @llvm.sqrt.v8f32(<8 x float> %x)
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %sqrt			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %sqrt
	ret <8 x float> %div			ret <8 x float> %div
	}			}


	attributes #0 = { "unsafe-fp-math"="true" }			attributes #0 = { "unsafe-fp-math"="true" "mrecip"="!sqrtf,!vec-sqrtf,!divf,!vec-divf" }
	attributes #1 = { nounwind readnone }			attributes #1 = { "unsafe-fp-math"="true" "mrecip"="sqrt,vec-sqrt" }
				attributes #2 = { nounwind readnone }

This is an archive of the discontinued LLVM Phabricator instance.

[Target] move reciprocal estimate settings from TargetOptions to TargetLoweringClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 72291

include/llvm/CodeGen/CommandFlags.h

include/llvm/Target/TargetLowering.h

include/llvm/Target/TargetOptions.h

include/llvm/Target/TargetRecip.h

lib/CodeGen/TargetLoweringBase.cpp

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCTargetMachine.cpp

lib/Target/TargetRecip.cpp

lib/Target/X86/X86ISelLowering.cpp

lib/Target/X86/X86TargetMachine.cpp

test/CodeGen/PowerPC/recipest.ll

test/CodeGen/X86/recip-fastmath.ll

test/CodeGen/X86/sqrt-fastmath-mir.ll

test/CodeGen/X86/sqrt-fastmath.ll

[Target] move reciprocal estimate settings from TargetOptions to TargetLowering
ClosedPublic