This is an archive of the discontinued LLVM Phabricator instance.

make reciprocal estimate code generation more flexible by adding command-line options
ClosedPublic

Authored by spatel on Apr 10 2015, 1:27 PM.

Download Raw Diff

Details

Reviewers

alexr
andreadb
echristo
craig.topper
hfinkel

Commits

rG667a7e2a0f24: make reciprocal estimate code generation more flexible by adding command-line…
rG6f031d848efb: make reciprocal estimate code generation more flexible by adding command-line…
rGba2ba8030218: make reciprocal estimate code generation more flexible by adding command-line…
rL239001: make reciprocal estimate code generation more flexible by adding command-line…
rL238842: make reciprocal estimate code generation more flexible by adding command-line…
rL238051: make reciprocal estimate code generation more flexible by adding command-line…

Summary

We need separation of scalar and vector reciprocal codegen to handle an -mrecip clang flag that provides equal functionality to gcc's:
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/i386-and-x86-64-Options.html#index-mrecip_003dopt-1627

This patch adds a Target class for processing all of the recip codegen possibilities. The x86 backend is updated to use the new functionality.

Diff Detail

Event Timeline

spatel updated this revision to Diff 23616.Apr 10 2015, 1:27 PM

spatel retitled this revision from to [x86] make reciprocal estimate code generation more flexible.

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: andreadb, alexr, craig.topper.

spatel added a subscriber: Unknown Object (MLST).

spatel mentioned this in D8989: add the -mrecip driver flag and process its options.Apr 11 2015, 3:24 PM

spatel added a reviewer: hfinkel.Apr 11 2015, 3:29 PM

Whether or not we're allowed to use estimates for operations that are normally required to be exact, div and sqrt, are not really target features. They're much closer, conceptually, to UnsafeFPMath and friends. Plus, we need these same options for other backends (PowerPC, for example). I'd like to see flags for these added to include/llvm/Target/TargetOptions.h (along with UnsafeFPMath) so that we can handle them in a uniform way.

This revision now requires changes to proceed.Apr 11 2015, 4:57 PM

Hi Hal -

Thanks for the quick feedback. I'm looking at the PPC variations that gcc supports. I'm thinking that we'll need an enum with bitmasks to cover all the potential variations. Just making sure that I'm not going into the weeds...

Also (if maintaining compatibility with gcc is a goal), we'll still need to accept and process different flags per arch.

In D8982#155207, @spatel wrote:

Hi Hal -

Thanks for the quick feedback. I'm looking at the PPC variations that gcc supports. I'm thinking that we'll need an enum with bitmasks to cover all the potential variations. Just making sure that I'm not going into the weeds...

I think that sounds right.

Also (if maintaining compatibility with gcc is a goal), we'll still need to accept and process different flags per arch.

Not exactly in that sense. If we accept a superset of the options that gcc supports because we apply them to targets/variants that gcc does not, that's fine. This seems pretty target independent, and I'd rather handle it in a uniform way if possible.

After thinking it over, I decided that a simple enum of reciprocal operation bools wasn't going to do the job. In addition to wanting to enable specific ops on specific data types, we have users asking to customize the number of N-R refinements calculated...and they want different counts based on the type of op.

The best way to achieve this level of flexibility (and further customization requests that are sure to follow) is with a 'TargetRecip' class. So that's what I'm proposing to add in this updated patch.

From the llc command-line, you can now do something like this:
$ llc -recip=vec-divf,sqrtd.2

That translates to: allow vector float division and scalar double square root codegen. Use the target default setting for N-R refinement steps for the div but override the refinement steps for the sqrt to be '2'.

To minimize change in the updated x86 backend, I've defaulted to disabling all recip codegen, but for enabled operations, we use 1 N-R step.

This means only users that were targeting AMD Jaguar will see a functional change from this patch (that chip had defaulted both sqrt and div codegen for scalars and vectors on). We can easily change the x86 target defaults in a follow-on patch. We'll also want to update the PowerPC and R600 backends to use this new command-line functionality.

I wanted to add more testing for the llc parameter parsing, but I'm not finding a way to do it. If there's a way to do that with a generic target, please let me know.

I have a dependent clang driver / front-end patch to pass similar command-line params through to the backend in progress.

Patch updated: minor cleanup to spacing, includes, interface, comments.

Ping.

Ping * 2.

Ping * 3.

spatel added a reviewer: echristo.May 6 2015, 11:01 AM

Ping * 4.

hfinkel added inline comments.May 14 2015, 12:49 PM

include/llvm/Target/TargetOptions.h
216	reciprocal estimate -> reciprocal-estimate
include/llvm/Target/TargetRecip.h
35	To reduce confusion, I think it would be better to have a naming prefix on these. TailCallKind, for example, uses TCK_. Let's stick RO_ on these (including the INVALID one).
35	If you're using INVALID for the "number of", please name it NUM_RECIP_OPS (or similar).
49	Let's not use INVALID for "all". You can add an All to the enum (it can even have the same value as INVALID).
lib/Target/TargetRecip.cpp
59	These strings are user-provided, we can't assert on invalid inputs.
149	Hrmm, there could be duplicates. Just parse them in order (users may provide duplicates).

spatel added inline comments.May 14 2015, 3:58 PM

include/llvm/Target/TargetRecip.h
35	With a named enum, I was expecting that users always have to use the prefix "RecipOps::" like I did in the x86 code. But I see that's not done with TailCallKind...

spatel added inline comments.May 14 2015, 9:06 PM

include/llvm/Target/TargetOptions.h
216	Fixed.
include/llvm/Target/TargetRecip.h
35	Fixed.
35	Fixed.
49	Fixed - added an RO_All that is equal to RO_NUM_RECIP_OPS.
lib/Target/TargetRecip.cpp
59	Fixed - replaced with 'report_fatal_error'.
149	Fixed. Still not sure if there's a decent way to test option parsing for llc, so there are very likely malformed input parsing bugs here.

Patch updated based on feedback from Hal - thanks!

Patch updated:

Use '!' to indicate disabled (match change in D8989)
Use array of strings as keys to a map instead of enum + loosely-coupled array of recip parameters.

hfinkel added inline comments.May 20 2015, 1:45 PM

include/llvm/Target/TargetRecip.h
67	function names should start with a lower-case letter.
lib/Target/TargetRecip.cpp
50	Function names start with a lower-case letter.
113	DisabledPrefix (this is not a macro, don't name it like one)
133	Extra space?
137	Please also try with the 'd' suffix.

spatel added inline comments.May 20 2015, 3:20 PM

include/llvm/Target/TargetRecip.h
67	Fixed.
lib/Target/TargetRecip.cpp
50	Fixed.
113	Fixed.
133	Fixed. Nice catch. :)
137	If we matched a 'd' entry but failed 'f', that would be an assertion failure given the logic below. Let me know if you were thinking of something else.

Patch updated based on Hal's feedback:

Fixed function names to start with lowercase
Fixed variable names to not be all caps
Removed extra space
Added assert for 'f' and 'd' suffix matching

hfinkel added inline comments.May 20 2015, 3:38 PM

lib/Target/X86/X86TargetMachine.cpp
112	Why false? Do you want a target feature here?

spatel added inline comments.May 20 2015, 3:56 PM

lib/Target/X86/X86TargetMachine.cpp
112	We had target features to control these, but I think it would be better to behave like gcc unless we have reason to diverge. That said, this does not match gcc behavior yet; that would be my next patch. For x86 at least, we would turn the following on by default when using -ffast-math: sqrt vec-sqrt vec-div I didn't set these defaults in this patch because it would change -ffast-math codegen for all CPUs other than btver2 (which had the recip codegen enabled for all eligible x86 recip types via target features).

hfinkel added inline comments.May 20 2015, 3:58 PM

lib/Target/X86/X86TargetMachine.cpp
112	I thought that getRsqrtEstimate, etc. were only called when fast-math is on?

spatel added inline comments.May 20 2015, 4:11 PM

lib/Target/X86/X86TargetMachine.cpp
112	That's correct; everything is gated by -ffast-math or an unsafe algebra equivalent. Recip-est codegen should only be active after that check. With gcc, once you have -ffast-math, you also get -mrecip=sqrt,vec-sqrt,vec-div by default. AFAIK, that is independent of arch or CPU subtarget. The lone exception is 'div' (scalar division). Estimating scalar div on x86 (no FMA until recently) breaks a lot of real world code, so I want to keep that off by default (and again match gcc default behavior).

hfinkel added inline comments.May 22 2015, 12:16 PM

lib/Target/X86/X86TargetMachine.cpp
112	But you turn them all off here, right? So that does not match gcc. Please add a comment here explaining what is going on.

spatel added inline comments.May 22 2015, 1:07 PM

lib/Target/X86/X86TargetMachine.cpp
112	Right - my intent was to separate the structural change from the functional change as much as possible. We don't currently generate any reciprocal estimates for x86 with -ffast-math except when targeting btver2. So all x86 codegen should be identical after this patch except in the case where someone has specified -mcpu=btver2 -ffast-math. I'll send the patch to change this default behavior to match GCC defaults as soon as possible, but I wanted to keep that separate in case anyone does not agree that we should change that default behavior. PPC and ARM recip patches would also be independent but similar (assuming they want to match GCC too).

LGTM.

lib/Target/X86/X86TargetMachine.cpp
112	Okay, but please add a comment explaining the current state of things, with a TODO, here (maybe the follow-up with happen soon, but in case there's trouble, the code will be clear in the mean time).

This revision is now accepted and ready to land.May 22 2015, 1:12 PM

Patch updated:
Added TODO comment to explain why x86 reciprocal estimate is defaulted to 'off' and the reason to change that default soon.

Closed by commit rL238051: make reciprocal estimate code generation more flexible by adding command-line… (authored by spatel). · Explain WhyMay 22 2015, 2:13 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in rL238055: add the -mrecip driver flag and process its options.May 22 2015, 2:46 PM

spatel mentioned this in rL238851: add the -mrecip driver flag and process its options (2nd try).Jun 2 2015, 9:59 AM

spatel mentioned this in rL239536: add the -mrecip driver flag and process its options (3rd try).Jun 11 2015, 7:58 AM

spatel mentioned this in D10396: [x86] set default reciprocal (division and square root) codegen to match GCC.Jun 11 2015, 9:41 AM

spatel mentioned this in rL240310: [x86] set default reciprocal (division and square root) codegen to match GCC.Jun 22 2015, 11:34 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

CommandFlags.h

8 lines

Target/

TargetOptions.h

8 lines

TargetRecip.h

74 lines

lib/

Target/

CMakeLists.txt

1 line

TargetRecip.cpp

204 lines

X86/

6 lines

68 lines

12 lines

2 lines

1 line

test/

CodeGen/

X86/

recip-fastmath.ll

4 lines

sqrt-fastmath.ll

2 lines

Diff 25842

include/llvm/CodeGen/CommandFlags.h

Show All 18 Lines
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/MC/MCTargetOptionsCommandFlags.h"		#include "llvm/MC/MCTargetOptionsCommandFlags.h"
#include "llvm//MC/SubtargetFeature.h"		#include "llvm//MC/SubtargetFeature.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
		#include "llvm/Target/TargetRecip.h"
#include <string>		#include <string>
using namespace llvm;		using namespace llvm;

cl::opt<std::string>		cl::opt<std::string>
MArch("march", cl::desc("Architecture to generate code for (see --version)"));		MArch("march", cl::desc("Architecture to generate code for (see --version)"));

cl::opt<std::string>		cl::opt<std::string>
MCPU("mcpu",		MCPU("mcpu",
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	FuseFPOps("fp-contract",
clEnumValN(FPOpFusion::Fast, "fast",		clEnumValN(FPOpFusion::Fast, "fast",
"Fuse FP ops whenever profitable"),		"Fuse FP ops whenever profitable"),
clEnumValN(FPOpFusion::Standard, "on",		clEnumValN(FPOpFusion::Standard, "on",
"Only fuse 'blessed' FP ops."),		"Only fuse 'blessed' FP ops."),
clEnumValN(FPOpFusion::Strict, "off",		clEnumValN(FPOpFusion::Strict, "off",
"Only fuse FP ops when the result won't be effected."),		"Only fuse FP ops when the result won't be effected."),
clEnumValEnd));		clEnumValEnd));

		cl::list<std::string>
		ReciprocalOps("recip",
		cl::CommaSeparated,
		cl::desc("Choose reciprocal operation types and parameters."),
		cl::value_desc("all,none,default,divf,vec-sqrtd,vec-divd:0,sqrtf:9..."));

cl::opt<bool>		cl::opt<bool>
DontPlaceZerosInBSS("nozero-initialized-in-bss",		DontPlaceZerosInBSS("nozero-initialized-in-bss",
cl::desc("Don't place zero-initialized symbols into bss section"),		cl::desc("Don't place zero-initialized symbols into bss section"),
cl::init(false));		cl::init(false));

cl::opt<bool>		cl::opt<bool>
EnableGuaranteedTailCallOpt("tailcallopt",		EnableGuaranteedTailCallOpt("tailcallopt",
cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),		cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

// Common utility function tightly tied to the options listed here. Initializes		// Common utility function tightly tied to the options listed here. Initializes
// a TargetOptions object with CodeGen flags and returns it.		// a TargetOptions object with CodeGen flags and returns it.
static inline TargetOptions InitTargetOptionsFromCodeGenFlags() {		static inline TargetOptions InitTargetOptionsFromCodeGenFlags() {
TargetOptions Options;		TargetOptions Options;
Options.LessPreciseFPMADOption = EnableFPMAD;		Options.LessPreciseFPMADOption = EnableFPMAD;
Options.NoFramePointerElim = DisableFPElim;		Options.NoFramePointerElim = DisableFPElim;
Options.AllowFPOpFusion = FuseFPOps;		Options.AllowFPOpFusion = FuseFPOps;
		Options.Reciprocals = ReciprocalOps;
Options.UnsafeFPMath = EnableUnsafeFPMath;		Options.UnsafeFPMath = EnableUnsafeFPMath;
Options.NoInfsFPMath = EnableNoInfsFPMath;		Options.NoInfsFPMath = EnableNoInfsFPMath;
Options.NoNaNsFPMath = EnableNoNaNsFPMath;		Options.NoNaNsFPMath = EnableNoNaNsFPMath;
Options.HonorSignDependentRoundingFPMathOption =		Options.HonorSignDependentRoundingFPMathOption =
EnableHonorSignDependentRoundingFPMath;		EnableHonorSignDependentRoundingFPMath;
if (FloatABIForCalls != FloatABI::Default)		if (FloatABIForCalls != FloatABI::Default)
Options.FloatABIType = FloatABIForCalls;		Options.FloatABIType = FloatABIForCalls;
Options.NoZerosInBSS = DontPlaceZerosInBSS;		Options.NoZerosInBSS = DontPlaceZerosInBSS;
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

include/llvm/Target/TargetOptions.h

Show All 9 Lines
// This file defines command line option flags that are shared across various		// This file defines command line option flags that are shared across various
// targets.		// targets.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TARGET_TARGETOPTIONS_H		#ifndef LLVM_TARGET_TARGETOPTIONS_H
#define LLVM_TARGET_TARGETOPTIONS_H		#define LLVM_TARGET_TARGETOPTIONS_H

		#include "llvm/Target/TargetRecip.h"
#include "llvm/MC/MCTargetOptions.h"		#include "llvm/MC/MCTargetOptions.h"
#include <string>		#include <string>

namespace llvm {		namespace llvm {
class MachineFunction;		class MachineFunction;
class StringRef;		class StringRef;

// Possible float ABI settings. Used with FloatABIType in TargetOptions.h.		// Possible float ABI settings. Used with FloatABIType in TargetOptions.h.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	TargetOptions()
NoZerosInBSS(false),		NoZerosInBSS(false),
GuaranteedTailCallOpt(false),		GuaranteedTailCallOpt(false),
DisableTailCalls(false), StackAlignmentOverride(0),		DisableTailCalls(false), StackAlignmentOverride(0),
EnableFastISel(false), PositionIndependentExecutable(false),		EnableFastISel(false), PositionIndependentExecutable(false),
UseInitArray(false), DisableIntegratedAS(false),		UseInitArray(false), DisableIntegratedAS(false),
CompressDebugSections(false), FunctionSections(false),		CompressDebugSections(false), FunctionSections(false),
DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),		DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),
TrapFuncName(), FloatABIType(FloatABI::Default),		TrapFuncName(), FloatABIType(FloatABI::Default),
AllowFPOpFusion(FPOpFusion::Standard), JTType(JumpTable::Single),		AllowFPOpFusion(FPOpFusion::Standard), Reciprocals(),
		JTType(JumpTable::Single),
ThreadModel(ThreadModel::POSIX) {}		ThreadModel(ThreadModel::POSIX) {}

/// PrintMachineCode - This flag is enabled when the -print-machineinstrs		/// PrintMachineCode - This flag is enabled when the -print-machineinstrs
/// option is specified on the command line, and should enable debugging		/// option is specified on the command line, and should enable debugging
/// output from the code generator.		/// output from the code generator.
unsigned PrintMachineCode : 1;		unsigned PrintMachineCode : 1;

/// NoFramePointerElim - This flag is enabled when the -disable-fp-elim is		/// NoFramePointerElim - This flag is enabled when the -disable-fp-elim is
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	public:
/// precision won't effect the result.		/// precision won't effect the result.
///		///
/// Note: This option only controls formation of fused ops by the		/// Note: This option only controls formation of fused ops by the
/// optimizers. Fused operations that are explicitly specified (e.g. FMA		/// optimizers. Fused operations that are explicitly specified (e.g. FMA
/// via the llvm.fma.* intrinsic) will always be honored, regardless of		/// via the llvm.fma.* intrinsic) will always be honored, regardless of
/// the value of this option.		/// the value of this option.
FPOpFusion::FPOpFusionMode AllowFPOpFusion;		FPOpFusion::FPOpFusionMode AllowFPOpFusion;

		/// This class encapsulates options for reciprocal-estimate code generation.
		hfinkelUnsubmitted Not Done Reply Inline Actions reciprocal estimate -> reciprocal-estimate hfinkel: reciprocal estimate -> reciprocal-estimate
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. spatel: Fixed.
		TargetRecip Reciprocals;

/// JTType - This flag specifies the type of jump-instruction table to		/// JTType - This flag specifies the type of jump-instruction table to
/// create for functions that have the jumptable attribute.		/// create for functions that have the jumptable attribute.
JumpTable::JumpTableType JTType;		JumpTable::JumpTableType JTType;

/// ThreadModel - This flag specifies the type of threading model to assume		/// ThreadModel - This flag specifies the type of threading model to assume
/// for things like atomics		/// for things like atomics
ThreadModel::Model ThreadModel;		ThreadModel::Model ThreadModel;

Show All 18 Lines	return
ARE_EQUAL(StackAlignmentOverride) &&		ARE_EQUAL(StackAlignmentOverride) &&
ARE_EQUAL(EnableFastISel) &&		ARE_EQUAL(EnableFastISel) &&
ARE_EQUAL(PositionIndependentExecutable) &&		ARE_EQUAL(PositionIndependentExecutable) &&
ARE_EQUAL(UseInitArray) &&		ARE_EQUAL(UseInitArray) &&
ARE_EQUAL(TrapUnreachable) &&		ARE_EQUAL(TrapUnreachable) &&
ARE_EQUAL(TrapFuncName) &&		ARE_EQUAL(TrapFuncName) &&
ARE_EQUAL(FloatABIType) &&		ARE_EQUAL(FloatABIType) &&
ARE_EQUAL(AllowFPOpFusion) &&		ARE_EQUAL(AllowFPOpFusion) &&
		ARE_EQUAL(Reciprocals) &&
ARE_EQUAL(JTType) &&		ARE_EQUAL(JTType) &&
ARE_EQUAL(ThreadModel) &&		ARE_EQUAL(ThreadModel) &&
ARE_EQUAL(MCOptions);		ARE_EQUAL(MCOptions);
#undef ARE_EQUAL		#undef ARE_EQUAL
}		}

inline bool operator!=(const TargetOptions &LHS,		inline bool operator!=(const TargetOptions &LHS,
const TargetOptions &RHS) {		const TargetOptions &RHS) {
return !(LHS == RHS);		return !(LHS == RHS);
}		}

} // End llvm namespace		} // End llvm namespace

#endif		#endif

include/llvm/Target/TargetRecip.h

				//===--------------------- llvm/Target/TargetRecip.h ------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This class is used to customize machine-specific reciprocal estimate code
				// generation in a target-independent way.
				// If a target does not support operations in this specification, then code
				// generation will default to using supported operations.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TARGET_TARGETRECIP_H
				#define LLVM_TARGET_TARGETRECIP_H

				#include <vector>
				#include <string>

				namespace llvm {

				enum RecipOps {
				RO_DivF = 0, // division, float, scalar
				RO_VecDivF, // division, float, vector
				RO_DivD, // division, double, scalar
				RO_VecDivD, // division, double, vector
				RO_SqrtF, // square root, float, scalar
				RO_VecSqrtF, // square root, float, vector
				RO_SqrtD, // square root, double, scalar
				RO_VecSqrtD, // square root, double, vector
				RO_All,
				RO_NUM_RECIP_OPS = RO_All
				hfinkelUnsubmitted Not Done Reply Inline Actions To reduce confusion, I think it would be better to have a naming prefix on these. TailCallKind, for example, uses TCK_. Let's stick RO_ on these (including the INVALID one). hfinkel: To reduce confusion, I think it would be better to have a naming prefix on these. TailCallKind…
				hfinkelUnsubmitted Not Done Reply Inline Actions If you're using INVALID for the "number of", please name it NUM_RECIP_OPS (or similar). hfinkel: If you're using INVALID for the "number of", please name it NUM_RECIP_OPS (or similar).
				spatelAuthorUnsubmitted Not Done Reply Inline Actions With a named enum, I was expecting that users always have to use the prefix "RecipOps::" like I did in the x86 code. But I see that's not done with TailCallKind... spatel: With a named enum, I was expecting that users always have to use the prefix "RecipOps::" like I…
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. spatel: Fixed.
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. spatel: Fixed.
				};

				class TargetRecip {
				public:
				TargetRecip();

				/// Initialize all or part of the operations from command-line options or
				/// encoded strings.
				TargetRecip(const std::vector<std::string> &Args);

				virtual ~TargetRecip();

				/// Set whether a particular reciprocal operation is enabled and how many
				/// refinement steps are needed when using it. Use the 'RO_All' value
				hfinkelUnsubmitted Not Done Reply Inline Actions Let's not use INVALID for "all". You can add an All to the enum (it can even have the same value as INVALID). hfinkel: Let's not use INVALID for "all". You can add an All to the enum (it can even have the same…
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed - added an RO_All that is equal to RO_NUM_RECIP_OPS. spatel: Fixed - added an RO_All that is equal to RO_NUM_RECIP_OPS.
				/// to set enablement and refinement steps for all operations.
				void setDefaults(RecipOps Op, bool Enable, unsigned RefSteps);

				/// Return true if the reciprocal operation has been enabled by default or
				/// from the command-line. Return false if the operations has been disabled
				/// by default or from the command-line.
				bool isEnabled(RecipOps Op) const;

				/// Return the number of iterations necessary to refine the
				/// the result of a machine instruction for the given reciprocal operation.
				unsigned getRefinementSteps(RecipOps Op) const;

				bool operator==(const TargetRecip &Other) const;

				private:
				int8_t Enabled[RO_NUM_RECIP_OPS];
				int8_t RefinementSteps[RO_NUM_RECIP_OPS];

				hfinkelUnsubmitted Not Done Reply Inline Actions function names should start with a lower-case letter. hfinkel: function names should start with a lower-case letter.
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. spatel: Fixed.
				bool ParseGlobalParams(const std::string &Arg);
				void ParseIndividualParams(const std::string &Arg);
				};

				} // End llvm namespace

				#endif

lib/Target/CMakeLists.txt

	list(APPEND LLVM_COMMON_DEPENDS intrinsics_gen)			list(APPEND LLVM_COMMON_DEPENDS intrinsics_gen)

	add_llvm_library(LLVMTarget			add_llvm_library(LLVMTarget
	Target.cpp			Target.cpp
	TargetIntrinsicInfo.cpp			TargetIntrinsicInfo.cpp
	TargetLoweringObjectFile.cpp			TargetLoweringObjectFile.cpp
	TargetMachine.cpp			TargetMachine.cpp
	TargetMachineC.cpp			TargetMachineC.cpp
				TargetRecip.cpp
	TargetSubtargetInfo.cpp			TargetSubtargetInfo.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Target			${LLVM_MAIN_INCLUDE_DIR}/llvm/Target
	)			)

	foreach(t ${LLVM_TARGETS_TO_BUILD})			foreach(t ${LLVM_TARGETS_TO_BUILD})
	message(STATUS "Targeting ${t}")			message(STATUS "Targeting ${t}")
	add_subdirectory(${t})			add_subdirectory(${t})
	endforeach()			endforeach()

lib/Target/TargetRecip.cpp

				//===-------------------------- TargetRecip.cpp ---------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This class is used to customize machine-specific reciprocal estimate code
				// generation in a target-independent way.
				// If a target does not support operations in this specification, then code
				// generation will default to using supported operations.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Target/TargetRecip.h"

				using namespace llvm;

				// These must be in the same order as the corresponding enum values.
				const char *ArgStrings[] = {
				"divf",
				"vec-divf",
				"divd",
				"vec-divd",
				"sqrtf",
				"vec-sqrtf",
				"sqrtd",
				"vec-sqrtd"
				};

				// The uninitialized state is needed for the enablement bits and refinement
				// steps because custom settings may arrive via the command-line before target
				// defaults are set.
				enum {
				Uninitialized = -1
				};

				TargetRecip::TargetRecip() {
				for (int i = 0; i != RO_NUM_RECIP_OPS; ++i) {
				Enabled[i] = Uninitialized;
				RefinementSteps[i] = Uninitialized;
				}
				}

				static bool ParseRefinementStep(const std::string &In, size_t &Position,
				uint8_t &Value) {
				const char REF_STEP_TOKEN = ':';
				hfinkelUnsubmitted Not Done Reply Inline Actions Function names start with a lower-case letter. hfinkel: Function names start with a lower-case letter.
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. spatel: Fixed.
				Position = In.find(REF_STEP_TOKEN);
				if (Position == std::string::npos)
				return false;

				std::string RefStepString = In.substr(Position + 1);
				// Allow exactly one numeric character for the additional refinement
				// step parameter.
				if (RefStepString.length() == 1) {
				char RefStepChar = RefStepString[0];
				hfinkelUnsubmitted Not Done Reply Inline Actions These strings are user-provided, we can't assert on invalid inputs. hfinkel: These strings are user-provided, we can't assert on invalid inputs.
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed - replaced with 'report_fatal_error'. spatel: Fixed - replaced with 'report_fatal_error'.
				if (RefStepChar >= '0' && RefStepChar <= '9') {
				Value = RefStepChar - '0';
				return true;
				}
				}
				report_fatal_error("Invalid refinement step for -recip.");
				}

				bool TargetRecip::ParseGlobalParams(const std::string &Arg) {
				bool Enable;
				bool UseDefaults;
				if (Arg.find("all") == 0) {
				UseDefaults = false;
				Enable = true;
				} else if (Arg.find("none") == 0) {
				UseDefaults = false;
				Enable = false;
				} else if (Arg.find("default") == 0) {
				UseDefaults = true;
				} else {
				// Any other string is invalid or an individual setting.
				return false;
				}

				// All enable values will be initialized to target defaults if 'default' was
				// specified.
				if (!UseDefaults)
				for (int i = 0; i != RO_NUM_RECIP_OPS; ++i)
				Enabled[i] = Enable;

				size_t RefPos;
				uint8_t RefSteps;
				if (ParseRefinementStep(Arg, RefPos, RefSteps)) {
				// Custom refinement count was specified with all, none, or default.
				for (int i = 0; i != RO_NUM_RECIP_OPS; ++i)
				RefinementSteps[i] = RefSteps;
				}
				return true;
				}

				void TargetRecip::ParseIndividualParams(const std::string &Arg) {
				std::string ArgSub = Arg;

				// Each reciprocal type may be enabled ('+') or disabled ('-') individually.
				bool IsEnabled;
				if (Arg[0] == '+') {
				ArgSub = Arg.substr(1);
				IsEnabled = true;
				} else if (Arg[0] == '-') {
				ArgSub = Arg.substr(1);
				IsEnabled = false;
				} else {
				// If no plus or minus, default to plus.
				IsEnabled = true;
				hfinkelUnsubmitted Not Done Reply Inline Actions DisabledPrefix (this is not a macro, don't name it like one) hfinkel: DisabledPrefix (this is not a macro, don't name it like one)
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. spatel: Fixed.
				}

				// Look for an optional setting of the number of refinement steps needed
				// for this type of reciprocal operation.
				size_t RefPos;
				uint8_t RefSteps;
				std::string RefStepString;
				if (ParseRefinementStep(ArgSub, RefPos, RefSteps)) {
				// Split the string for further processing.
				RefStepString = ArgSub.substr(RefPos + 1);
				ArgSub = ArgSub.substr(0, RefPos);
				}

				// Find the reciprocal operation corresponding to this string value.
				RecipOps Op = RO_NUM_RECIP_OPS;
				for (int i = 0; i != RO_NUM_RECIP_OPS; ++i) {
				if (ArgSub == ArgStrings[i]) {
				Op = (RecipOps)i;
				break;
				}
				hfinkelUnsubmitted Not Done Reply Inline Actions Extra space? hfinkel: Extra space?
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. Nice catch. :) spatel: Fixed. Nice catch. :)
				}

				if (Op == RO_NUM_RECIP_OPS)
				report_fatal_error("Invalid option for -recip.");
				hfinkelUnsubmitted Not Done Reply Inline Actions Please also try with the 'd' suffix. hfinkel: Please also try with the 'd' suffix.
				spatelAuthorUnsubmitted Not Done Reply Inline Actions If we matched a 'd' entry but failed 'f', that would be an assertion failure given the logic below. Let me know if you were thinking of something else. spatel: If we matched a 'd' entry but failed 'f', that would be an assertion failure given the logic…


				// Set whether this operation is being enabled or disabled, and optionally
				// set the number of refinement steps for this operation.
				Enabled[Op] = IsEnabled;
				if (!RefStepString.empty()) {
				RefinementSteps[Op] = RefSteps;
				}
				}

				TargetRecip::TargetRecip(const std::vector<std::string> &Args) :
				TargetRecip::TargetRecip() {
				hfinkelUnsubmitted Not Done Reply Inline Actions Hrmm, there could be duplicates. Just parse them in order (users may provide duplicates). hfinkel: Hrmm, there could be duplicates. Just parse them in order (users may provide duplicates).
				spatelAuthorUnsubmitted Not Done Reply Inline Actions Fixed. Still not sure if there's a decent way to test option parsing for llc, so there are very likely malformed input parsing bugs here. spatel: Fixed. Still not sure if there's a decent way to test option parsing for llc, so there are very…
				unsigned NumArgs = Args.size();

				// Check if "all", "default", or "none" was specified.
				if (NumArgs == 1 && ParseGlobalParams(Args[0]))
				return;

				for (unsigned i = 0; i != NumArgs; ++i) {
				std::string Value = Args[i];
				if (Value.empty())
				report_fatal_error("Empty option string for -recip.");
				ParseIndividualParams(Value);
				}
				}

				bool TargetRecip::isEnabled(RecipOps Op) const {
				if (Op == RO_NUM_RECIP_OPS) return false;
				assert(Enabled[Op] != Uninitialized && "Enabled setting was not initialized");
				return Enabled[Op];
				}

				unsigned TargetRecip::getRefinementSteps(RecipOps Op) const {
				if (Op == RO_NUM_RECIP_OPS) return 0;
				assert(RefinementSteps[Op] != Uninitialized &&
				"Refinement step setting was not initialized");
				return RefinementSteps[Op];
				}

				void TargetRecip::setDefaults(RecipOps Op, bool Enable, unsigned RefSteps) {
				if (Op == RO_All) {
				for (int i = 0; i != RO_NUM_RECIP_OPS; ++i) {
				if (Enabled[i] == Uninitialized)
				Enabled[i] = Enable;
				if (RefinementSteps[i] == Uninitialized)
				RefinementSteps[i] = RefSteps;
				}
				} else {
				if (Enabled[Op] == Uninitialized)
				Enabled[Op] = Enable;
				if (RefinementSteps[Op] == Uninitialized)
				RefinementSteps[Op] = RefSteps;
				}
				}

				bool TargetRecip::operator==(const TargetRecip &Other) const {
				for (int i = 0; i != RO_NUM_RECIP_OPS; ++i) {
				if (RefinementSteps[i] != Other.RefinementSteps[i])
				return false;
				if (Enabled[i] != Other.Enabled[i])
				return false;
				}
				return true;
				}

				TargetRecip::~TargetRecip() {
				}

lib/Target/X86/X86.td

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	def FeatureCallRegIndirect : SubtargetFeature<"call-reg-indirect",
"CallRegIndirect", "true",		"CallRegIndirect", "true",
"Call register indirect">;		"Call register indirect">;
def FeatureLEAUsesAG : SubtargetFeature<"lea-uses-ag", "LEAUsesAG", "true",		def FeatureLEAUsesAG : SubtargetFeature<"lea-uses-ag", "LEAUsesAG", "true",
"LEA instruction needs inputs at AG stage">;		"LEA instruction needs inputs at AG stage">;
def FeatureSlowLEA : SubtargetFeature<"slow-lea", "SlowLEA", "true",		def FeatureSlowLEA : SubtargetFeature<"slow-lea", "SlowLEA", "true",
"LEA instruction with certain arguments is slow">;		"LEA instruction with certain arguments is slow">;
def FeatureSlowIncDec : SubtargetFeature<"slow-incdec", "SlowIncDec", "true",		def FeatureSlowIncDec : SubtargetFeature<"slow-incdec", "SlowIncDec", "true",
"INC and DEC instructions are slower than ADD and SUB">;		"INC and DEC instructions are slower than ADD and SUB">;
def FeatureUseSqrtEst : SubtargetFeature<"use-sqrt-est", "UseSqrtEst", "true",
"Use RSQRT* to optimize square root calculations">;
def FeatureUseRecipEst : SubtargetFeature<"use-recip-est", "UseReciprocalEst",
"true", "Use RCP* to optimize division calculations">;
def FeatureSoftFloat		def FeatureSoftFloat
: SubtargetFeature<"soft-float", "UseSoftFloat", "true",		: SubtargetFeature<"soft-float", "UseSoftFloat", "true",
"Use software floating point features.">;		"Use software floating point features.">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 processors supported.		// X86 processors supported.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	def : Proc<"btver1", [FeatureSSSE3, FeatureSSE4A, FeatureCMPXCHG16B,
FeatureSlowSHLD]>;		FeatureSlowSHLD]>;

// Jaguar		// Jaguar
def : ProcessorModel<"btver2", BtVer2Model,		def : ProcessorModel<"btver2", BtVer2Model,
[FeatureAVX, FeatureSSE4A, FeatureCMPXCHG16B,		[FeatureAVX, FeatureSSE4A, FeatureCMPXCHG16B,
FeaturePRFCHW, FeatureAES, FeaturePCLMUL,		FeaturePRFCHW, FeatureAES, FeaturePCLMUL,
FeatureBMI, FeatureF16C, FeatureMOVBE,		FeatureBMI, FeatureF16C, FeatureMOVBE,
FeatureLZCNT, FeaturePOPCNT, FeatureFastUAMem,		FeatureLZCNT, FeaturePOPCNT, FeatureFastUAMem,
FeatureSlowSHLD, FeatureUseSqrtEst, FeatureUseRecipEst]>;		FeatureSlowSHLD]>;

// TODO: We should probably add 'FeatureFastUAMem' to all of the AMD chips.		// TODO: We should probably add 'FeatureFastUAMem' to all of the AMD chips.

// Bulldozer		// Bulldozer
def : Proc<"bdver1", [FeatureXOP, FeatureFMA4, FeatureCMPXCHG16B,		def : Proc<"bdver1", [FeatureXOP, FeatureFMA4, FeatureCMPXCHG16B,
FeatureAES, FeaturePRFCHW, FeaturePCLMUL,		FeatureAES, FeaturePRFCHW, FeaturePCLMUL,
FeatureAVX, FeatureSSE4A, FeatureLZCNT,		FeatureAVX, FeatureSSE4A, FeatureLZCNT,
FeaturePOPCNT, FeatureSlowSHLD]>;		FeaturePOPCNT, FeatureSlowSHLD]>;
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	STATISTIC(NumTailCalls, "Number of tail calls");			STATISTIC(NumTailCalls, "Number of tail calls");

	static cl::opt<bool> ExperimentalVectorWideningLegalization(			static cl::opt<bool> ExperimentalVectorWideningLegalization(
	"x86-experimental-vector-widening-legalization", cl::init(false),			"x86-experimental-vector-widening-legalization", cl::init(false),
	cl::desc("Enable an experimental vector type legalization through widening "			cl::desc("Enable an experimental vector type legalization through widening "
	"rather than promotion."),			"rather than promotion."),
	cl::Hidden);			cl::Hidden);

	static cl::opt<int> ReciprocalEstimateRefinementSteps(
	"x86-recip-refinement-steps", cl::init(1),
	cl::desc("Specify the number of Newton-Raphson iterations applied to the "
	"result of the hardware reciprocal estimate instruction."),
	cl::NotHidden);

	// Forward declarations.			// Forward declarations.
	static SDValue getMOVL(SelectionDAG &DAG, SDLoc dl, EVT VT, SDValue V1,			static SDValue getMOVL(SelectionDAG &DAG, SDLoc dl, EVT VT, SDValue V1,
	SDValue V2);			SDValue V2);

	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,			X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
	const X86Subtarget &STI)			const X86Subtarget &STI)
	: TargetLowering(TM), Subtarget(&STI) {			: TargetLowering(TM), Subtarget(&STI) {
	X86ScalarSSEf64 = Subtarget->hasSSE2();			X86ScalarSSEf64 = Subtarget->hasSSE2();
	▲ Show 20 Lines • Show All 12,785 Lines • ▼ Show 20 Lines
	}			}

	/// The minimum architected relative accuracy is 2^-12. We need one			/// The minimum architected relative accuracy is 2^-12. We need one
	/// Newton-Raphson step to have a good float result (24 bits of precision).			/// Newton-Raphson step to have a good float result (24 bits of precision).
	SDValue X86TargetLowering::getRsqrtEstimate(SDValue Op,			SDValue X86TargetLowering::getRsqrtEstimate(SDValue Op,
	DAGCombinerInfo &DCI,			DAGCombinerInfo &DCI,
	unsigned &RefinementSteps,			unsigned &RefinementSteps,
	bool &UseOneConstNR) const {			bool &UseOneConstNR) const {
	// FIXME: We should use instruction latency models to calculate the cost of
	// each potential sequence, but this is very hard to do reliably because
	// at least Intel's Core* chips have variable timing based on the number of
	// significant digits in the divisor and/or sqrt operand.
	if (!Subtarget->useSqrtEst())
	return SDValue();

	EVT VT = Op.getValueType();			EVT VT = Op.getValueType();
				RecipOps RecipOp;

	// SSE1 has rsqrtss and rsqrtps.			// SSE1 has rsqrtss and rsqrtps. AVX adds a 256-bit variant for rsqrtps.
	// TODO: Add support for AVX512 (v16f32).			// TODO: Add support for AVX512 (v16f32).
	// It is likely not profitable to do this for f64 because a double-precision			// It is likely not profitable to do this for f64 because a double-precision
	// rsqrt estimate with refinement on x86 prior to FMA requires at least 16			// rsqrt estimate with refinement on x86 prior to FMA requires at least 16
	// instructions: convert to single, rsqrtss, convert back to double, refine			// instructions: convert to single, rsqrtss, convert back to double, refine
	// (3 steps = at least 13 insts). If an 'rsqrtsd' variant was added to the ISA			// (3 steps = at least 13 insts). If an 'rsqrtsd' variant was added to the ISA
	// along with FMA, this could be a throughput win.			// along with FMA, this could be a throughput win.
	if ((Subtarget->hasSSE1() && (VT == MVT::f32 \|\| VT == MVT::v4f32)) \|\|			if (VT == MVT::f32 && Subtarget->hasSSE1())
	(Subtarget->hasAVX() && VT == MVT::v8f32)) {			RecipOp = RO_SqrtF;
	RefinementSteps = 1;			else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) \|\|
				(VT == MVT::v8f32 && Subtarget->hasAVX()))
				RecipOp = RO_VecSqrtF;
				else
				return SDValue();

				TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
				if (!Recips.isEnabled(RecipOp))
				return SDValue();

				RefinementSteps = Recips.getRefinementSteps(RecipOp);
	UseOneConstNR = false;			UseOneConstNR = false;
	return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);			return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);
	}			}
	return SDValue();
	}

	/// The minimum architected relative accuracy is 2^-12. We need one			/// The minimum architected relative accuracy is 2^-12. We need one
	/// Newton-Raphson step to have a good float result (24 bits of precision).			/// Newton-Raphson step to have a good float result (24 bits of precision).
	SDValue X86TargetLowering::getRecipEstimate(SDValue Op,			SDValue X86TargetLowering::getRecipEstimate(SDValue Op,
	DAGCombinerInfo &DCI,			DAGCombinerInfo &DCI,
	unsigned &RefinementSteps) const {			unsigned &RefinementSteps) const {
	// FIXME: We should use instruction latency models to calculate the cost of
	// each potential sequence, but this is very hard to do reliably because
	// at least Intel's Core* chips have variable timing based on the number of
	// significant digits in the divisor.
	if (!Subtarget->useReciprocalEst())
	return SDValue();

	EVT VT = Op.getValueType();			EVT VT = Op.getValueType();
				RecipOps RecipOp;

	// SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.			// SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.
	// TODO: Add support for AVX512 (v16f32).			// TODO: Add support for AVX512 (v16f32).
	// It is likely not profitable to do this for f64 because a double-precision			// It is likely not profitable to do this for f64 because a double-precision
	// reciprocal estimate with refinement on x86 prior to FMA requires			// reciprocal estimate with refinement on x86 prior to FMA requires
	// 15 instructions: convert to single, rcpss, convert back to double, refine			// 15 instructions: convert to single, rcpss, convert back to double, refine
	// (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA			// (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA
	// along with FMA, this could be a throughput win.			// along with FMA, this could be a throughput win.
	if ((Subtarget->hasSSE1() && (VT == MVT::f32 \|\| VT == MVT::v4f32)) \|\|			if (VT == MVT::f32 && Subtarget->hasSSE1())
	(Subtarget->hasAVX() && VT == MVT::v8f32)) {			RecipOp = RO_DivF;
	RefinementSteps = ReciprocalEstimateRefinementSteps;			else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) \|\|
	return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);			(VT == MVT::v8f32 && Subtarget->hasAVX()))
	}			RecipOp = RO_VecDivF;
				else
	return SDValue();			return SDValue();

				TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
				if (!Recips.isEnabled(RecipOp))
				return SDValue();

				RefinementSteps = Recips.getRefinementSteps(RecipOp);
				return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);
	}			}

	/// If we have at least two divisions that use the same divisor, convert to			/// If we have at least two divisions that use the same divisor, convert to
	/// multplication by a reciprocal. This may need to be adjusted for a given			/// multplication by a reciprocal. This may need to be adjusted for a given
	/// CPU if a division's cost is not at least twice the cost of a multiplication.			/// CPU if a division's cost is not at least twice the cost of a multiplication.
	/// This is because we still need one division to calculate the reciprocal and			/// This is because we still need one division to calculate the reciprocal and
	/// then we need two multiplies by that reciprocal as replacements for the			/// then we need two multiplies by that reciprocal as replacements for the
	/// original divisions.			/// original divisions.
	▲ Show 20 Lines • Show All 12,178 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	protected:
bool LEAUsesAG;		bool LEAUsesAG;

/// True if the LEA instruction with certain arguments is slow		/// True if the LEA instruction with certain arguments is slow
bool SlowLEA;		bool SlowLEA;

/// True if INC and DEC instructions are slow when writing to flags		/// True if INC and DEC instructions are slow when writing to flags
bool SlowIncDec;		bool SlowIncDec;

/// Use the RSQRT* instructions to optimize square root calculations.
/// For this to be profitable, the cost of FSQRT and FDIV must be
/// substantially higher than normal FP ops like FADD and FMUL.
bool UseSqrtEst;

/// Use the RCP* instructions to optimize FP division calculations.
/// For this to be profitable, the cost of FDIV must be
/// substantially higher than normal FP ops like FADD and FMUL.
bool UseReciprocalEst;

/// Processor has AVX-512 PreFetch Instructions		/// Processor has AVX-512 PreFetch Instructions
bool HasPFI;		bool HasPFI;

/// Processor has AVX-512 Exponential and Reciprocal Instructions		/// Processor has AVX-512 Exponential and Reciprocal Instructions
bool HasERI;		bool HasERI;

/// Processor has AVX-512 Conflict Detection Instructions		/// Processor has AVX-512 Conflict Detection Instructions
bool HasCDI;		bool HasCDI;
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
bool hasSlowDivide32() const { return HasSlowDivide32; }		bool hasSlowDivide32() const { return HasSlowDivide32; }
bool hasSlowDivide64() const { return HasSlowDivide64; }		bool hasSlowDivide64() const { return HasSlowDivide64; }
bool padShortFunctions() const { return PadShortFunctions; }		bool padShortFunctions() const { return PadShortFunctions; }
bool callRegIndirect() const { return CallRegIndirect; }		bool callRegIndirect() const { return CallRegIndirect; }
bool LEAusesAG() const { return LEAUsesAG; }		bool LEAusesAG() const { return LEAUsesAG; }
bool slowLEA() const { return SlowLEA; }		bool slowLEA() const { return SlowLEA; }
bool slowIncDec() const { return SlowIncDec; }		bool slowIncDec() const { return SlowIncDec; }
bool useSqrtEst() const { return UseSqrtEst; }
bool useReciprocalEst() const { return UseReciprocalEst; }
bool hasCDI() const { return HasCDI; }		bool hasCDI() const { return HasCDI; }
bool hasPFI() const { return HasPFI; }		bool hasPFI() const { return HasPFI; }
bool hasERI() const { return HasERI; }		bool hasERI() const { return HasERI; }
bool hasDQI() const { return HasDQI; }		bool hasDQI() const { return HasDQI; }
bool hasBWI() const { return HasBWI; }		bool hasBWI() const { return HasBWI; }
bool hasVLX() const { return HasVLX; }		bool hasVLX() const { return HasVLX; }

bool isAtom() const { return X86ProcFamily == IntelAtom; }		bool isAtom() const { return X86ProcFamily == IntelAtom; }
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.cpp

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
UseLeaForSP = false;		UseLeaForSP = false;
HasSlowDivide32 = false;		HasSlowDivide32 = false;
HasSlowDivide64 = false;		HasSlowDivide64 = false;
PadShortFunctions = false;		PadShortFunctions = false;
CallRegIndirect = false;		CallRegIndirect = false;
LEAUsesAG = false;		LEAUsesAG = false;
SlowLEA = false;		SlowLEA = false;
SlowIncDec = false;		SlowIncDec = false;
UseSqrtEst = false;
UseReciprocalEst = false;
stackAlignment = 4;		stackAlignment = 4;
// FIXME: this is a known good value for Yonah. How about others?		// FIXME: this is a known good value for Yonah. How about others?
MaxInlineSizeThreshold = 128;		MaxInlineSizeThreshold = 128;
UseSoftFloat = false;		UseSoftFloat = false;
}		}

X86Subtarget &X86Subtarget::initializeSubtargetDependencies(StringRef CPU,		X86Subtarget &X86Subtarget::initializeSubtargetDependencies(StringRef CPU,
StringRef FS) {		StringRef FS) {
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	X86TargetMachine::X86TargetMachine(const Target &T, StringRef TT, StringRef CPU,

// Windows stack unwinder gets confused when execution flow "falls through"		// Windows stack unwinder gets confused when execution flow "falls through"
// after a call to 'noreturn' function.		// after a call to 'noreturn' function.
// To prevent that, we emit a trap for 'unreachable' IR instructions.		// To prevent that, we emit a trap for 'unreachable' IR instructions.
// (which on X86, happens to be the 'ud2' instruction)		// (which on X86, happens to be the 'ud2' instruction)
if (Subtarget.isTargetWin64())		if (Subtarget.isTargetWin64())
this->Options.TrapUnreachable = true;		this->Options.TrapUnreachable = true;

		this->Options.Reciprocals.setDefaults(RO_All, false, 1);
		hfinkelUnsubmitted Not Done Reply Inline Actions Why false? Do you want a target feature here? hfinkel: Why false? Do you want a target feature here?
		spatelAuthorUnsubmitted Not Done Reply Inline Actions We had target features to control these, but I think it would be better to behave like gcc unless we have reason to diverge. That said, this does not match gcc behavior yet; that would be my next patch. For x86 at least, we would turn the following on by default when using -ffast-math: sqrt vec-sqrt vec-div I didn't set these defaults in this patch because it would change -ffast-math codegen for all CPUs other than btver2 (which had the recip codegen enabled for all eligible x86 recip types via target features). spatel: We had target features to control these, but I think it would be better to behave like gcc…
		hfinkelUnsubmitted Not Done Reply Inline Actions I thought that getRsqrtEstimate, etc. were only called when fast-math is on? hfinkel: I thought that getRsqrtEstimate, etc. were only called when fast-math is on?
		spatelAuthorUnsubmitted Not Done Reply Inline Actions That's correct; everything is gated by -ffast-math or an unsafe algebra equivalent. Recip-est codegen should only be active after that check. With gcc, once you have -ffast-math, you also get -mrecip=sqrt,vec-sqrt,vec-div by default. AFAIK, that is independent of arch or CPU subtarget. The lone exception is 'div' (scalar division). Estimating scalar div on x86 (no FMA until recently) breaks a lot of real world code, so I want to keep that off by default (and again match gcc default behavior). spatel: That's correct; everything is gated by -ffast-math or an unsafe algebra equivalent. Recip-est…
		hfinkelUnsubmitted Not Done Reply Inline Actions But you turn them all off here, right? So that does not match gcc. Please add a comment here explaining what is going on. hfinkel: But you turn them all off here, right? So that does not match gcc. Please add a comment here…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Right - my intent was to separate the structural change from the functional change as much as possible. We don't currently generate any reciprocal estimates for x86 with -ffast-math except when targeting btver2. So all x86 codegen should be identical after this patch except in the case where someone has specified -mcpu=btver2 -ffast-math. I'll send the patch to change this default behavior to match GCC defaults as soon as possible, but I wanted to keep that separate in case anyone does not agree that we should change that default behavior. PPC and ARM recip patches would also be independent but similar (assuming they want to match GCC too). spatel: Right - my intent was to separate the structural change from the functional change as much as…
		hfinkelUnsubmitted Not Done Reply Inline Actions Okay, but please add a comment explaining the current state of things, with a TODO, here (maybe the follow-up with happen soon, but in case there's trouble, the code will be clear in the mean time). hfinkel: Okay, but please add a comment explaining the current state of things, with a TODO, here (maybe…
initAsmInfo();		initAsmInfo();
}		}

X86TargetMachine::~X86TargetMachine() {}		X86TargetMachine::~X86TargetMachine() {}

const X86Subtarget *		const X86Subtarget *
X86TargetMachine::getSubtargetImpl(const Function &F) const {		X86TargetMachine::getSubtargetImpl(const Function &F) const {
Attribute CPUAttr = F.getFnAttribute("target-cpu");		Attribute CPUAttr = F.getFnAttribute("target-cpu");
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

test/CodeGen/X86/recip-fastmath.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est \| FileCheck %s --check-prefix=RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf,vec-divf \| FileCheck %s --check-prefix=RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est -x86-recip-refinement-steps=2 \| FileCheck %s --check-prefix=REFINE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf:2,vec-divf:2 \| FileCheck %s --check-prefix=REFINE

	; If the target's divss/divps instructions are substantially			; If the target's divss/divps instructions are substantially
	; slower than rcpss/rcpps with a Newton-Raphson refinement,			; slower than rcpss/rcpps with a Newton-Raphson refinement,
	; we should generate the estimate sequence.			; we should generate the estimate sequence.

	; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )			; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )
	; for details about the accuracy, speed, and implementation			; for details about the accuracy, speed, and implementation
	; differences of x86 reciprocal estimates.			; differences of x86 reciprocal estimates.
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

test/CodeGen/X86/sqrt-fastmath.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-sqrt-est \| FileCheck %s --check-prefix=ESTIMATE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=sqrtf,vec-sqrtf \| FileCheck %s --check-prefix=ESTIMATE

	declare double @__sqrt_finite(double) #0			declare double @__sqrt_finite(double) #0
	declare float @__sqrtf_finite(float) #0			declare float @__sqrtf_finite(float) #0
	declare x86_fp80 @__sqrtl_finite(x86_fp80) #0			declare x86_fp80 @__sqrtl_finite(x86_fp80) #0
	declare float @llvm.sqrt.f32(float) #0			declare float @llvm.sqrt.f32(float) #0
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #0			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #0
	declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0			declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0

	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines