This is an archive of the discontinued LLVM Phabricator instance.

make reciprocal estimate code generation more flexible by adding command-line options
ClosedPublic

Authored by spatel on Apr 10 2015, 1:27 PM.

Download Raw Diff

Details

Reviewers

alexr
andreadb
echristo
craig.topper
hfinkel

Commits

rG667a7e2a0f24: make reciprocal estimate code generation more flexible by adding command-line…
rG6f031d848efb: make reciprocal estimate code generation more flexible by adding command-line…
rGba2ba8030218: make reciprocal estimate code generation more flexible by adding command-line…
rL239001: make reciprocal estimate code generation more flexible by adding command-line…
rL238842: make reciprocal estimate code generation more flexible by adding command-line…
rL238051: make reciprocal estimate code generation more flexible by adding command-line…

Summary

We need separation of scalar and vector reciprocal codegen to handle an -mrecip clang flag that provides equal functionality to gcc's:
https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/i386-and-x86-64-Options.html#index-mrecip_003dopt-1627

This patch adds a Target class for processing all of the recip codegen possibilities. The x86 backend is updated to use the new functionality.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 23616.Apr 10 2015, 1:27 PM

spatel retitled this revision from to [x86] make reciprocal estimate code generation more flexible.

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: andreadb, alexr, craig.topper.

spatel added a subscriber: Unknown Object (MLST).

spatel mentioned this in D8989: add the -mrecip driver flag and process its options.Apr 11 2015, 3:24 PM

spatel added a reviewer: hfinkel.Apr 11 2015, 3:29 PM

Whether or not we're allowed to use estimates for operations that are normally required to be exact, div and sqrt, are not really target features. They're much closer, conceptually, to UnsafeFPMath and friends. Plus, we need these same options for other backends (PowerPC, for example). I'd like to see flags for these added to include/llvm/Target/TargetOptions.h (along with UnsafeFPMath) so that we can handle them in a uniform way.

This revision now requires changes to proceed.Apr 11 2015, 4:57 PM

Hi Hal -

Thanks for the quick feedback. I'm looking at the PPC variations that gcc supports. I'm thinking that we'll need an enum with bitmasks to cover all the potential variations. Just making sure that I'm not going into the weeds...

Also (if maintaining compatibility with gcc is a goal), we'll still need to accept and process different flags per arch.

In D8982#155207, @spatel wrote:

Hi Hal -

Thanks for the quick feedback. I'm looking at the PPC variations that gcc supports. I'm thinking that we'll need an enum with bitmasks to cover all the potential variations. Just making sure that I'm not going into the weeds...

I think that sounds right.

Also (if maintaining compatibility with gcc is a goal), we'll still need to accept and process different flags per arch.

Not exactly in that sense. If we accept a superset of the options that gcc supports because we apply them to targets/variants that gcc does not, that's fine. This seems pretty target independent, and I'd rather handle it in a uniform way if possible.

After thinking it over, I decided that a simple enum of reciprocal operation bools wasn't going to do the job. In addition to wanting to enable specific ops on specific data types, we have users asking to customize the number of N-R refinements calculated...and they want different counts based on the type of op.

The best way to achieve this level of flexibility (and further customization requests that are sure to follow) is with a 'TargetRecip' class. So that's what I'm proposing to add in this updated patch.

From the llc command-line, you can now do something like this:
$ llc -recip=vec-divf,sqrtd.2

That translates to: allow vector float division and scalar double square root codegen. Use the target default setting for N-R refinement steps for the div but override the refinement steps for the sqrt to be '2'.

To minimize change in the updated x86 backend, I've defaulted to disabling all recip codegen, but for enabled operations, we use 1 N-R step.

This means only users that were targeting AMD Jaguar will see a functional change from this patch (that chip had defaulted both sqrt and div codegen for scalars and vectors on). We can easily change the x86 target defaults in a follow-on patch. We'll also want to update the PowerPC and R600 backends to use this new command-line functionality.

I wanted to add more testing for the llc parameter parsing, but I'm not finding a way to do it. If there's a way to do that with a generic target, please let me know.

I have a dependent clang driver / front-end patch to pass similar command-line params through to the backend in progress.

Patch updated: minor cleanup to spacing, includes, interface, comments.

Ping.

Ping * 2.

Ping * 3.

spatel added a reviewer: echristo.May 6 2015, 11:01 AM

Ping * 4.

hfinkel added inline comments.May 14 2015, 12:49 PM

include/llvm/Target/TargetOptions.h
240 ↗	(On Diff #23815)	reciprocal estimate -> reciprocal-estimate
include/llvm/Target/TargetRecip.h
34 ↗	(On Diff #23815)	To reduce confusion, I think it would be better to have a naming prefix on these. TailCallKind, for example, uses TCK_. Let's stick RO_ on these (including the INVALID one).
34 ↗	(On Diff #23815)	If you're using INVALID for the "number of", please name it NUM_RECIP_OPS (or similar).
48 ↗	(On Diff #23815)	Let's not use INVALID for "all". You can add an All to the enum (it can even have the same value as INVALID).
lib/Target/TargetRecip.cpp
58 ↗	(On Diff #23815)	These strings are user-provided, we can't assert on invalid inputs.
148 ↗	(On Diff #23815)	Hrmm, there could be duplicates. Just parse them in order (users may provide duplicates).

spatel added inline comments.May 14 2015, 3:58 PM

include/llvm/Target/TargetRecip.h
34 ↗	(On Diff #23815)	With a named enum, I was expecting that users always have to use the prefix "RecipOps::" like I did in the x86 code. But I see that's not done with TailCallKind...

spatel added inline comments.May 14 2015, 9:06 PM

include/llvm/Target/TargetOptions.h
240 ↗	(On Diff #23815)	Fixed.
include/llvm/Target/TargetRecip.h
34 ↗	(On Diff #23815)	Fixed.
34 ↗	(On Diff #23815)	Fixed.
48 ↗	(On Diff #23815)	Fixed - added an RO_All that is equal to RO_NUM_RECIP_OPS.
lib/Target/TargetRecip.cpp
58 ↗	(On Diff #23815)	Fixed - replaced with 'report_fatal_error'.
148 ↗	(On Diff #23815)	Fixed. Still not sure if there's a decent way to test option parsing for llc, so there are very likely malformed input parsing bugs here.

Patch updated based on feedback from Hal - thanks!

Patch updated:

Use '!' to indicate disabled (match change in D8989)
Use array of strings as keys to a map instead of enum + loosely-coupled array of recip parameters.

hfinkel added inline comments.May 20 2015, 1:45 PM

include/llvm/Target/TargetRecip.h
66 ↗	(On Diff #26013)	function names should start with a lower-case letter.
lib/Target/TargetRecip.cpp
49 ↗	(On Diff #26013)	Function names start with a lower-case letter.
112 ↗	(On Diff #26013)	DisabledPrefix (this is not a macro, don't name it like one)
132 ↗	(On Diff #26013)	Extra space?
136 ↗	(On Diff #26013)	Please also try with the 'd' suffix.

spatel added inline comments.May 20 2015, 3:20 PM

include/llvm/Target/TargetRecip.h
66 ↗	(On Diff #26013)	Fixed.
lib/Target/TargetRecip.cpp
49 ↗	(On Diff #26013)	Fixed.
112 ↗	(On Diff #26013)	Fixed.
132 ↗	(On Diff #26013)	Fixed. Nice catch. :)
136 ↗	(On Diff #26013)	If we matched a 'd' entry but failed 'f', that would be an assertion failure given the logic below. Let me know if you were thinking of something else.

Patch updated based on Hal's feedback:

Fixed function names to start with lowercase
Fixed variable names to not be all caps
Removed extra space
Added assert for 'f' and 'd' suffix matching

hfinkel added inline comments.May 20 2015, 3:38 PM

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	Why false? Do you want a target feature here?

spatel added inline comments.May 20 2015, 3:56 PM

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	We had target features to control these, but I think it would be better to behave like gcc unless we have reason to diverge. That said, this does not match gcc behavior yet; that would be my next patch. For x86 at least, we would turn the following on by default when using -ffast-math: sqrt vec-sqrt vec-div I didn't set these defaults in this patch because it would change -ffast-math codegen for all CPUs other than btver2 (which had the recip codegen enabled for all eligible x86 recip types via target features).

hfinkel added inline comments.May 20 2015, 3:58 PM

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	I thought that getRsqrtEstimate, etc. were only called when fast-math is on?

spatel added inline comments.May 20 2015, 4:11 PM

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	That's correct; everything is gated by -ffast-math or an unsafe algebra equivalent. Recip-est codegen should only be active after that check. With gcc, once you have -ffast-math, you also get -mrecip=sqrt,vec-sqrt,vec-div by default. AFAIK, that is independent of arch or CPU subtarget. The lone exception is 'div' (scalar division). Estimating scalar div on x86 (no FMA until recently) breaks a lot of real world code, so I want to keep that off by default (and again match gcc default behavior).

hfinkel added inline comments.May 22 2015, 12:16 PM

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	But you turn them all off here, right? So that does not match gcc. Please add a comment here explaining what is going on.

spatel added inline comments.May 22 2015, 1:07 PM

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	Right - my intent was to separate the structural change from the functional change as much as possible. We don't currently generate any reciprocal estimates for x86 with -ffast-math except when targeting btver2. So all x86 codegen should be identical after this patch except in the case where someone has specified -mcpu=btver2 -ffast-math. I'll send the patch to change this default behavior to match GCC defaults as soon as possible, but I wanted to keep that separate in case anyone does not agree that we should change that default behavior. PPC and ARM recip patches would also be independent but similar (assuming they want to match GCC too).

LGTM.

lib/Target/X86/X86TargetMachine.cpp
108 ↗	(On Diff #26185)	Okay, but please add a comment explaining the current state of things, with a TODO, here (maybe the follow-up with happen soon, but in case there's trouble, the code will be clear in the mean time).

This revision is now accepted and ready to land.May 22 2015, 1:12 PM

Patch updated:
Added TODO comment to explain why x86 reciprocal estimate is defaulted to 'off' and the reason to change that default soon.

Closed by commit rL238051: make reciprocal estimate code generation more flexible by adding command-line… (authored by spatel). · Explain WhyMay 22 2015, 2:13 PM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in rL238055: add the -mrecip driver flag and process its options.May 22 2015, 2:46 PM

spatel mentioned this in rL238851: add the -mrecip driver flag and process its options (2nd try).Jun 2 2015, 9:59 AM

spatel mentioned this in rL239536: add the -mrecip driver flag and process its options (3rd try).Jun 11 2015, 7:58 AM

spatel mentioned this in D10396: [x86] set default reciprocal (division and square root) codegen to match GCC.Jun 11 2015, 9:41 AM

spatel mentioned this in rL240310: [x86] set default reciprocal (division and square root) codegen to match GCC.Jun 22 2015, 11:34 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

CommandFlags.h

8 lines

Target/

TargetOptions.h

8 lines

TargetRecip.h

72 lines

lib/

Target/

CMakeLists.txt

1 line

TargetRecip.cpp

225 lines

X86/

6 lines

68 lines

12 lines

2 lines

7 lines

test/

CodeGen/

X86/

recip-fastmath.ll

4 lines

sqrt-fastmath.ll

2 lines

Diff 26349

llvm/trunk/include/llvm/CodeGen/CommandFlags.h

Show All 18 Lines
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/MC/MCTargetOptionsCommandFlags.h"		#include "llvm/MC/MCTargetOptionsCommandFlags.h"
#include "llvm//MC/SubtargetFeature.h"		#include "llvm//MC/SubtargetFeature.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
		#include "llvm/Target/TargetRecip.h"
#include <string>		#include <string>
using namespace llvm;		using namespace llvm;

cl::opt<std::string>		cl::opt<std::string>
MArch("march", cl::desc("Architecture to generate code for (see --version)"));		MArch("march", cl::desc("Architecture to generate code for (see --version)"));

cl::opt<std::string>		cl::opt<std::string>
MCPU("mcpu",		MCPU("mcpu",
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	FuseFPOps("fp-contract",
clEnumValN(FPOpFusion::Fast, "fast",		clEnumValN(FPOpFusion::Fast, "fast",
"Fuse FP ops whenever profitable"),		"Fuse FP ops whenever profitable"),
clEnumValN(FPOpFusion::Standard, "on",		clEnumValN(FPOpFusion::Standard, "on",
"Only fuse 'blessed' FP ops."),		"Only fuse 'blessed' FP ops."),
clEnumValN(FPOpFusion::Strict, "off",		clEnumValN(FPOpFusion::Strict, "off",
"Only fuse FP ops when the result won't be effected."),		"Only fuse FP ops when the result won't be effected."),
clEnumValEnd));		clEnumValEnd));

		cl::list<std::string>
		ReciprocalOps("recip",
		cl::CommaSeparated,
		cl::desc("Choose reciprocal operation types and parameters."),
		cl::value_desc("all,none,default,divf,!vec-sqrtd,vec-divd:0,sqrt:9..."));

cl::opt<bool>		cl::opt<bool>
DontPlaceZerosInBSS("nozero-initialized-in-bss",		DontPlaceZerosInBSS("nozero-initialized-in-bss",
cl::desc("Don't place zero-initialized symbols into bss section"),		cl::desc("Don't place zero-initialized symbols into bss section"),
cl::init(false));		cl::init(false));

cl::opt<bool>		cl::opt<bool>
EnableGuaranteedTailCallOpt("tailcallopt",		EnableGuaranteedTailCallOpt("tailcallopt",
cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),		cl::desc("Turn fastcc calls into tail calls by (potentially) changing ABI."),
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

// Common utility function tightly tied to the options listed here. Initializes		// Common utility function tightly tied to the options listed here. Initializes
// a TargetOptions object with CodeGen flags and returns it.		// a TargetOptions object with CodeGen flags and returns it.
static inline TargetOptions InitTargetOptionsFromCodeGenFlags() {		static inline TargetOptions InitTargetOptionsFromCodeGenFlags() {
TargetOptions Options;		TargetOptions Options;
Options.LessPreciseFPMADOption = EnableFPMAD;		Options.LessPreciseFPMADOption = EnableFPMAD;
Options.NoFramePointerElim = DisableFPElim;		Options.NoFramePointerElim = DisableFPElim;
Options.AllowFPOpFusion = FuseFPOps;		Options.AllowFPOpFusion = FuseFPOps;
		Options.Reciprocals = ReciprocalOps;
Options.UnsafeFPMath = EnableUnsafeFPMath;		Options.UnsafeFPMath = EnableUnsafeFPMath;
Options.NoInfsFPMath = EnableNoInfsFPMath;		Options.NoInfsFPMath = EnableNoInfsFPMath;
Options.NoNaNsFPMath = EnableNoNaNsFPMath;		Options.NoNaNsFPMath = EnableNoNaNsFPMath;
Options.HonorSignDependentRoundingFPMathOption =		Options.HonorSignDependentRoundingFPMathOption =
EnableHonorSignDependentRoundingFPMath;		EnableHonorSignDependentRoundingFPMath;
if (FloatABIForCalls != FloatABI::Default)		if (FloatABIForCalls != FloatABI::Default)
Options.FloatABIType = FloatABIForCalls;		Options.FloatABIType = FloatABIForCalls;
Options.NoZerosInBSS = DontPlaceZerosInBSS;		Options.NoZerosInBSS = DontPlaceZerosInBSS;
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Target/TargetOptions.h

Show All 9 Lines
// This file defines command line option flags that are shared across various		// This file defines command line option flags that are shared across various
// targets.		// targets.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TARGET_TARGETOPTIONS_H		#ifndef LLVM_TARGET_TARGETOPTIONS_H
#define LLVM_TARGET_TARGETOPTIONS_H		#define LLVM_TARGET_TARGETOPTIONS_H

		#include "llvm/Target/TargetRecip.h"
#include "llvm/MC/MCTargetOptions.h"		#include "llvm/MC/MCTargetOptions.h"
#include <string>		#include <string>

namespace llvm {		namespace llvm {
class MachineFunction;		class MachineFunction;
class StringRef;		class StringRef;

namespace FloatABI {		namespace FloatABI {
Show All 40 Lines	TargetOptions()
NoZerosInBSS(false),		NoZerosInBSS(false),
GuaranteedTailCallOpt(false),		GuaranteedTailCallOpt(false),
DisableTailCalls(false), StackAlignmentOverride(0),		DisableTailCalls(false), StackAlignmentOverride(0),
EnableFastISel(false), PositionIndependentExecutable(false),		EnableFastISel(false), PositionIndependentExecutable(false),
UseInitArray(false), DisableIntegratedAS(false),		UseInitArray(false), DisableIntegratedAS(false),
CompressDebugSections(false), FunctionSections(false),		CompressDebugSections(false), FunctionSections(false),
DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),		DataSections(false), UniqueSectionNames(true), TrapUnreachable(false),
TrapFuncName(), FloatABIType(FloatABI::Default),		TrapFuncName(), FloatABIType(FloatABI::Default),
AllowFPOpFusion(FPOpFusion::Standard), JTType(JumpTable::Single),		AllowFPOpFusion(FPOpFusion::Standard), Reciprocals(),
		JTType(JumpTable::Single),
ThreadModel(ThreadModel::POSIX) {}		ThreadModel(ThreadModel::POSIX) {}

/// PrintMachineCode - This flag is enabled when the -print-machineinstrs		/// PrintMachineCode - This flag is enabled when the -print-machineinstrs
/// option is specified on the command line, and should enable debugging		/// option is specified on the command line, and should enable debugging
/// output from the code generator.		/// output from the code generator.
unsigned PrintMachineCode : 1;		unsigned PrintMachineCode : 1;

/// NoFramePointerElim - This flag is enabled when the -disable-fp-elim is		/// NoFramePointerElim - This flag is enabled when the -disable-fp-elim is
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	public:
/// precision won't effect the result.		/// precision won't effect the result.
///		///
/// Note: This option only controls formation of fused ops by the		/// Note: This option only controls formation of fused ops by the
/// optimizers. Fused operations that are explicitly specified (e.g. FMA		/// optimizers. Fused operations that are explicitly specified (e.g. FMA
/// via the llvm.fma.* intrinsic) will always be honored, regardless of		/// via the llvm.fma.* intrinsic) will always be honored, regardless of
/// the value of this option.		/// the value of this option.
FPOpFusion::FPOpFusionMode AllowFPOpFusion;		FPOpFusion::FPOpFusionMode AllowFPOpFusion;

		/// This class encapsulates options for reciprocal-estimate code generation.
		TargetRecip Reciprocals;

/// JTType - This flag specifies the type of jump-instruction table to		/// JTType - This flag specifies the type of jump-instruction table to
/// create for functions that have the jumptable attribute.		/// create for functions that have the jumptable attribute.
JumpTable::JumpTableType JTType;		JumpTable::JumpTableType JTType;

/// ThreadModel - This flag specifies the type of threading model to assume		/// ThreadModel - This flag specifies the type of threading model to assume
/// for things like atomics		/// for things like atomics
ThreadModel::Model ThreadModel;		ThreadModel::Model ThreadModel;

Show All 18 Lines	return
ARE_EQUAL(StackAlignmentOverride) &&		ARE_EQUAL(StackAlignmentOverride) &&
ARE_EQUAL(EnableFastISel) &&		ARE_EQUAL(EnableFastISel) &&
ARE_EQUAL(PositionIndependentExecutable) &&		ARE_EQUAL(PositionIndependentExecutable) &&
ARE_EQUAL(UseInitArray) &&		ARE_EQUAL(UseInitArray) &&
ARE_EQUAL(TrapUnreachable) &&		ARE_EQUAL(TrapUnreachable) &&
ARE_EQUAL(TrapFuncName) &&		ARE_EQUAL(TrapFuncName) &&
ARE_EQUAL(FloatABIType) &&		ARE_EQUAL(FloatABIType) &&
ARE_EQUAL(AllowFPOpFusion) &&		ARE_EQUAL(AllowFPOpFusion) &&
		ARE_EQUAL(Reciprocals) &&
ARE_EQUAL(JTType) &&		ARE_EQUAL(JTType) &&
ARE_EQUAL(ThreadModel) &&		ARE_EQUAL(ThreadModel) &&
ARE_EQUAL(MCOptions);		ARE_EQUAL(MCOptions);
#undef ARE_EQUAL		#undef ARE_EQUAL
}		}

inline bool operator!=(const TargetOptions &LHS,		inline bool operator!=(const TargetOptions &LHS,
const TargetOptions &RHS) {		const TargetOptions &RHS) {
return !(LHS == RHS);		return !(LHS == RHS);
}		}

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/trunk/include/llvm/Target/TargetRecip.h

				//===--------------------- llvm/Target/TargetRecip.h ------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This class is used to customize machine-specific reciprocal estimate code
				// generation in a target-independent way.
				// If a target does not support operations in this specification, then code
				// generation will default to using supported operations.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TARGET_TARGETRECIP_H
				#define LLVM_TARGET_TARGETRECIP_H

				#include <vector>
				#include <string>
				#include <map>

				namespace llvm {

				struct TargetRecip {
				public:
				TargetRecip();

				/// Initialize all or part of the operations from command-line options or
				/// a front end.
				TargetRecip(const std::vector<std::string> &Args);

				/// Set whether a particular reciprocal operation is enabled and how many
				/// refinement steps are needed when using it. Use "all" to set enablement
				/// and refinement steps for all operations.
				void setDefaults(const StringRef &Key, bool Enable, unsigned RefSteps);

				/// Return true if the reciprocal operation has been enabled by default or
				/// from the command-line. Return false if the operation has been disabled
				/// by default or from the command-line.
				bool isEnabled(const StringRef &Key) const;

				/// Return the number of iterations necessary to refine the
				/// the result of a machine instruction for the given reciprocal operation.
				unsigned getRefinementSteps(const StringRef &Key) const;

				bool operator==(const TargetRecip &Other) const;

				private:
				enum {
				Uninitialized = -1
				};

				struct RecipParams {
				int8_t Enabled;
				int8_t RefinementSteps;

				RecipParams() : Enabled(Uninitialized), RefinementSteps(Uninitialized) {}
				};

				std::map<StringRef, RecipParams> RecipMap;
				typedef std::map<StringRef, RecipParams>::iterator RecipIter;
				typedef std::map<StringRef, RecipParams>::const_iterator ConstRecipIter;

				bool parseGlobalParams(const std::string &Arg);
				void parseIndividualParams(const std::vector<std::string> &Args);
				};

				} // End llvm namespace

				#endif

llvm/trunk/lib/Target/CMakeLists.txt

	list(APPEND LLVM_COMMON_DEPENDS intrinsics_gen)			list(APPEND LLVM_COMMON_DEPENDS intrinsics_gen)

	add_llvm_library(LLVMTarget			add_llvm_library(LLVMTarget
	Target.cpp			Target.cpp
	TargetIntrinsicInfo.cpp			TargetIntrinsicInfo.cpp
	TargetLoweringObjectFile.cpp			TargetLoweringObjectFile.cpp
	TargetMachine.cpp			TargetMachine.cpp
	TargetMachineC.cpp			TargetMachineC.cpp
				TargetRecip.cpp
	TargetSubtargetInfo.cpp			TargetSubtargetInfo.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Target			${LLVM_MAIN_INCLUDE_DIR}/llvm/Target
	)			)

	foreach(t ${LLVM_TARGETS_TO_BUILD})			foreach(t ${LLVM_TARGETS_TO_BUILD})
	message(STATUS "Targeting ${t}")			message(STATUS "Targeting ${t}")
	add_subdirectory(${t})			add_subdirectory(${t})
	endforeach()			endforeach()

llvm/trunk/lib/Target/TargetRecip.cpp

				//===-------------------------- TargetRecip.cpp ---------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This class is used to customize machine-specific reciprocal estimate code
				// generation in a target-independent way.
				// If a target does not support operations in this specification, then code
				// generation will default to using supported operations.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/StringRef.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Target/TargetRecip.h"
				#include <map>

				using namespace llvm;

				// These are the names of the individual reciprocal operations. These are
				// the key strings for queries and command-line inputs.
				// In addition, the command-line interface recognizes the global parameters
				// "all", "none", and "default".
				static const char *RecipOps[] = {
				"divd",
				"divf",
				"vec-divd",
				"vec-divf",
				"sqrtd",
				"sqrtf",
				"vec-sqrtd",
				"vec-sqrtf",
				};

				// The uninitialized state is needed for the enabled settings and refinement
				// steps because custom settings may arrive via the command-line before target
				// defaults are set.
				TargetRecip::TargetRecip() {
				unsigned NumStrings = llvm::array_lengthof(RecipOps);
				for (unsigned i = 0; i < NumStrings; ++i)
				RecipMap.insert(std::make_pair(RecipOps[i], RecipParams()));
				}

				static bool parseRefinementStep(const StringRef &In, size_t &Position,
				uint8_t &Value) {
				const char RefStepToken = ':';
				Position = In.find(RefStepToken);
				if (Position == StringRef::npos)
				return false;

				StringRef RefStepString = In.substr(Position + 1);
				// Allow exactly one numeric character for the additional refinement
				// step parameter.
				if (RefStepString.size() == 1) {
				char RefStepChar = RefStepString[0];
				if (RefStepChar >= '0' && RefStepChar <= '9') {
				Value = RefStepChar - '0';
				return true;
				}
				}
				report_fatal_error("Invalid refinement step for -recip.");
				}

				bool TargetRecip::parseGlobalParams(const std::string &Arg) {
				StringRef ArgSub = Arg;

				// Look for an optional setting of the number of refinement steps needed
				// for this type of reciprocal operation.
				size_t RefPos;
				uint8_t RefSteps;
				StringRef RefStepString;
				if (parseRefinementStep(ArgSub, RefPos, RefSteps)) {
				// Split the string for further processing.
				RefStepString = ArgSub.substr(RefPos + 1);
				ArgSub = ArgSub.substr(0, RefPos);
				}
				bool Enable;
				bool UseDefaults;
				if (ArgSub == "all") {
				UseDefaults = false;
				Enable = true;
				} else if (ArgSub == "none") {
				UseDefaults = false;
				Enable = false;
				} else if (ArgSub == "default") {
				UseDefaults = true;
				} else {
				// Any other string is invalid or an individual setting.
				return false;
				}

				// All enable values will be initialized to target defaults if 'default' was
				// specified.
				if (!UseDefaults)
				for (auto &KV : RecipMap)
				KV.second.Enabled = Enable;

				// Custom refinement count was specified with all, none, or default.
				if (!RefStepString.empty())
				for (auto &KV : RecipMap)
				KV.second.RefinementSteps = RefSteps;

				return true;
				}

				void TargetRecip::parseIndividualParams(const std::vector<std::string> &Args) {
				static const char DisabledPrefix = '!';
				unsigned NumArgs = Args.size();

				for (unsigned i = 0; i != NumArgs; ++i) {
				StringRef Val = Args[i];

				bool IsDisabled = Val[0] == DisabledPrefix;
				// Ignore the disablement token for string matching.
				if (IsDisabled)
				Val = Val.substr(1);

				size_t RefPos;
				uint8_t RefSteps;
				StringRef RefStepString;
				if (parseRefinementStep(Val, RefPos, RefSteps)) {
				// Split the string for further processing.
				RefStepString = Val.substr(RefPos + 1);
				Val = Val.substr(0, RefPos);
				}

				RecipIter Iter = RecipMap.find(Val);
				if (Iter == RecipMap.end()) {
				// Try again specifying float suffix.
				Iter = RecipMap.find(Val.str() + 'f');
				if (Iter == RecipMap.end()) {
				Iter = RecipMap.find(Val.str() + 'd');
				assert(Iter == RecipMap.end() && "Float entry missing from map");
				report_fatal_error("Invalid option for -recip.");
				}

				// The option was specified without a float or double suffix.
				if (RecipMap[Val.str() + 'd'].Enabled != Uninitialized) {
				// Make sure that the double entry was not already specified.
				// The float entry will be checked below.
				report_fatal_error("Duplicate option for -recip.");
				}
				}

				if (Iter->second.Enabled != Uninitialized)
				report_fatal_error("Duplicate option for -recip.");

				// Mark the matched option as found. Do not allow duplicate specifiers.
				Iter->second.Enabled = !IsDisabled;
				if (!RefStepString.empty())
				Iter->second.RefinementSteps = RefSteps;

				// If the precision was not specified, the double entry is also initialized.
				if (Val.back() != 'f' && Val.back() != 'd') {
				RecipMap[Val.str() + 'd'].Enabled = !IsDisabled;
				if (!RefStepString.empty())
				RecipMap[Val.str() + 'd'].RefinementSteps = RefSteps;
				}
				}
				}

				TargetRecip::TargetRecip(const std::vector<std::string> &Args) :
				TargetRecip() {
				unsigned NumArgs = Args.size();

				// Check if "all", "default", or "none" was specified.
				if (NumArgs == 1 && parseGlobalParams(Args[0]))
				return;

				parseIndividualParams(Args);
				}

				bool TargetRecip::isEnabled(const StringRef &Key) const {
				ConstRecipIter Iter = RecipMap.find(Key);
				assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");
				assert(Iter->second.Enabled != Uninitialized &&
				"Enablement setting was not initialized");
				return Iter->second.Enabled;
				}

				unsigned TargetRecip::getRefinementSteps(const StringRef &Key) const {
				ConstRecipIter Iter = RecipMap.find(Key);
				assert(Iter != RecipMap.end() && "Unknown name for reciprocal map");
				assert(Iter->second.RefinementSteps != Uninitialized &&
				"Refinement step setting was not initialized");
				return Iter->second.RefinementSteps;
				}

				/// Custom settings (previously initialized values) override target defaults.
				void TargetRecip::setDefaults(const StringRef &Key, bool Enable,
				unsigned RefSteps) {
				if (Key == "all") {
				for (auto &KV : RecipMap) {
				RecipParams &RP = KV.second;
				if (RP.Enabled == Uninitialized)
				RP.Enabled = Enable;
				if (RP.RefinementSteps == Uninitialized)
				RP.RefinementSteps = RefSteps;
				}
				} else {
				RecipParams &RP = RecipMap[Key];
				if (RP.Enabled == Uninitialized)
				RP.Enabled = Enable;
				if (RP.RefinementSteps == Uninitialized)
				RP.RefinementSteps = RefSteps;
				}
				}

				bool TargetRecip::operator==(const TargetRecip &Other) const {
				for (const auto &KV : RecipMap) {
				const StringRef &Op = KV.first;
				const RecipParams &RP = KV.second;
				const RecipParams &OtherRP = Other.RecipMap.find(Op)->second;
				if (RP.RefinementSteps != OtherRP.RefinementSteps)
				return false;
				if (RP.Enabled != OtherRP.Enabled)
				return false;
				}
				return true;
				}

llvm/trunk/lib/Target/X86/X86.td

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	def FeatureCallRegIndirect : SubtargetFeature<"call-reg-indirect",
"CallRegIndirect", "true",		"CallRegIndirect", "true",
"Call register indirect">;		"Call register indirect">;
def FeatureLEAUsesAG : SubtargetFeature<"lea-uses-ag", "LEAUsesAG", "true",		def FeatureLEAUsesAG : SubtargetFeature<"lea-uses-ag", "LEAUsesAG", "true",
"LEA instruction needs inputs at AG stage">;		"LEA instruction needs inputs at AG stage">;
def FeatureSlowLEA : SubtargetFeature<"slow-lea", "SlowLEA", "true",		def FeatureSlowLEA : SubtargetFeature<"slow-lea", "SlowLEA", "true",
"LEA instruction with certain arguments is slow">;		"LEA instruction with certain arguments is slow">;
def FeatureSlowIncDec : SubtargetFeature<"slow-incdec", "SlowIncDec", "true",		def FeatureSlowIncDec : SubtargetFeature<"slow-incdec", "SlowIncDec", "true",
"INC and DEC instructions are slower than ADD and SUB">;		"INC and DEC instructions are slower than ADD and SUB">;
def FeatureUseSqrtEst : SubtargetFeature<"use-sqrt-est", "UseSqrtEst", "true",
"Use RSQRT* to optimize square root calculations">;
def FeatureUseRecipEst : SubtargetFeature<"use-recip-est", "UseReciprocalEst",
"true", "Use RCP* to optimize division calculations">;
def FeatureSoftFloat		def FeatureSoftFloat
: SubtargetFeature<"soft-float", "UseSoftFloat", "true",		: SubtargetFeature<"soft-float", "UseSoftFloat", "true",
"Use software floating point features.">;		"Use software floating point features.">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86 processors supported.		// X86 processors supported.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	def : Proc<"btver1", [FeatureSSSE3, FeatureSSE4A, FeatureCMPXCHG16B,
FeatureSlowSHLD]>;		FeatureSlowSHLD]>;

// Jaguar		// Jaguar
def : ProcessorModel<"btver2", BtVer2Model,		def : ProcessorModel<"btver2", BtVer2Model,
[FeatureAVX, FeatureSSE4A, FeatureCMPXCHG16B,		[FeatureAVX, FeatureSSE4A, FeatureCMPXCHG16B,
FeaturePRFCHW, FeatureAES, FeaturePCLMUL,		FeaturePRFCHW, FeatureAES, FeaturePCLMUL,
FeatureBMI, FeatureF16C, FeatureMOVBE,		FeatureBMI, FeatureF16C, FeatureMOVBE,
FeatureLZCNT, FeaturePOPCNT, FeatureFastUAMem,		FeatureLZCNT, FeaturePOPCNT, FeatureFastUAMem,
FeatureSlowSHLD, FeatureUseSqrtEst, FeatureUseRecipEst]>;		FeatureSlowSHLD]>;

// TODO: We should probably add 'FeatureFastUAMem' to all of the AMD chips.		// TODO: We should probably add 'FeatureFastUAMem' to all of the AMD chips.

// Bulldozer		// Bulldozer
def : Proc<"bdver1", [FeatureXOP, FeatureFMA4, FeatureCMPXCHG16B,		def : Proc<"bdver1", [FeatureXOP, FeatureFMA4, FeatureCMPXCHG16B,
FeatureAES, FeaturePRFCHW, FeaturePCLMUL,		FeatureAES, FeaturePRFCHW, FeaturePCLMUL,
FeatureAVX, FeatureSSE4A, FeatureLZCNT,		FeatureAVX, FeatureSSE4A, FeatureLZCNT,
FeaturePOPCNT, FeatureSlowSHLD]>;		FeaturePOPCNT, FeatureSlowSHLD]>;
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	STATISTIC(NumTailCalls, "Number of tail calls");			STATISTIC(NumTailCalls, "Number of tail calls");

	static cl::opt<bool> ExperimentalVectorWideningLegalization(			static cl::opt<bool> ExperimentalVectorWideningLegalization(
	"x86-experimental-vector-widening-legalization", cl::init(false),			"x86-experimental-vector-widening-legalization", cl::init(false),
	cl::desc("Enable an experimental vector type legalization through widening "			cl::desc("Enable an experimental vector type legalization through widening "
	"rather than promotion."),			"rather than promotion."),
	cl::Hidden);			cl::Hidden);

	static cl::opt<int> ReciprocalEstimateRefinementSteps(
	"x86-recip-refinement-steps", cl::init(1),
	cl::desc("Specify the number of Newton-Raphson iterations applied to the "
	"result of the hardware reciprocal estimate instruction."),
	cl::NotHidden);

	// Forward declarations.			// Forward declarations.
	static SDValue getMOVL(SelectionDAG &DAG, SDLoc dl, EVT VT, SDValue V1,			static SDValue getMOVL(SelectionDAG &DAG, SDLoc dl, EVT VT, SDValue V1,
	SDValue V2);			SDValue V2);

	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,			X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
	const X86Subtarget &STI)			const X86Subtarget &STI)
	: TargetLowering(TM), Subtarget(&STI) {			: TargetLowering(TM), Subtarget(&STI) {
	X86ScalarSSEf64 = Subtarget->hasSSE2();			X86ScalarSSEf64 = Subtarget->hasSSE2();
	▲ Show 20 Lines • Show All 12,812 Lines • ▼ Show 20 Lines
	}			}

	/// The minimum architected relative accuracy is 2^-12. We need one			/// The minimum architected relative accuracy is 2^-12. We need one
	/// Newton-Raphson step to have a good float result (24 bits of precision).			/// Newton-Raphson step to have a good float result (24 bits of precision).
	SDValue X86TargetLowering::getRsqrtEstimate(SDValue Op,			SDValue X86TargetLowering::getRsqrtEstimate(SDValue Op,
	DAGCombinerInfo &DCI,			DAGCombinerInfo &DCI,
	unsigned &RefinementSteps,			unsigned &RefinementSteps,
	bool &UseOneConstNR) const {			bool &UseOneConstNR) const {
	// FIXME: We should use instruction latency models to calculate the cost of
	// each potential sequence, but this is very hard to do reliably because
	// at least Intel's Core* chips have variable timing based on the number of
	// significant digits in the divisor and/or sqrt operand.
	if (!Subtarget->useSqrtEst())
	return SDValue();

	EVT VT = Op.getValueType();			EVT VT = Op.getValueType();
				const char *RecipOp;

	// SSE1 has rsqrtss and rsqrtps.			// SSE1 has rsqrtss and rsqrtps. AVX adds a 256-bit variant for rsqrtps.
	// TODO: Add support for AVX512 (v16f32).			// TODO: Add support for AVX512 (v16f32).
	// It is likely not profitable to do this for f64 because a double-precision			// It is likely not profitable to do this for f64 because a double-precision
	// rsqrt estimate with refinement on x86 prior to FMA requires at least 16			// rsqrt estimate with refinement on x86 prior to FMA requires at least 16
	// instructions: convert to single, rsqrtss, convert back to double, refine			// instructions: convert to single, rsqrtss, convert back to double, refine
	// (3 steps = at least 13 insts). If an 'rsqrtsd' variant was added to the ISA			// (3 steps = at least 13 insts). If an 'rsqrtsd' variant was added to the ISA
	// along with FMA, this could be a throughput win.			// along with FMA, this could be a throughput win.
	if ((Subtarget->hasSSE1() && (VT == MVT::f32 \|\| VT == MVT::v4f32)) \|\|			if (VT == MVT::f32 && Subtarget->hasSSE1())
	(Subtarget->hasAVX() && VT == MVT::v8f32)) {			RecipOp = "sqrtf";
	RefinementSteps = 1;			else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) \|\|
				(VT == MVT::v8f32 && Subtarget->hasAVX()))
				RecipOp = "vec-sqrtf";
				else
				return SDValue();

				TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
				if (!Recips.isEnabled(RecipOp))
				return SDValue();

				RefinementSteps = Recips.getRefinementSteps(RecipOp);
	UseOneConstNR = false;			UseOneConstNR = false;
	return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);			return DCI.DAG.getNode(X86ISD::FRSQRT, SDLoc(Op), VT, Op);
	}			}
	return SDValue();
	}

	/// The minimum architected relative accuracy is 2^-12. We need one			/// The minimum architected relative accuracy is 2^-12. We need one
	/// Newton-Raphson step to have a good float result (24 bits of precision).			/// Newton-Raphson step to have a good float result (24 bits of precision).
	SDValue X86TargetLowering::getRecipEstimate(SDValue Op,			SDValue X86TargetLowering::getRecipEstimate(SDValue Op,
	DAGCombinerInfo &DCI,			DAGCombinerInfo &DCI,
	unsigned &RefinementSteps) const {			unsigned &RefinementSteps) const {
	// FIXME: We should use instruction latency models to calculate the cost of
	// each potential sequence, but this is very hard to do reliably because
	// at least Intel's Core* chips have variable timing based on the number of
	// significant digits in the divisor.
	if (!Subtarget->useReciprocalEst())
	return SDValue();

	EVT VT = Op.getValueType();			EVT VT = Op.getValueType();
				const char *RecipOp;

	// SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.			// SSE1 has rcpss and rcpps. AVX adds a 256-bit variant for rcpps.
	// TODO: Add support for AVX512 (v16f32).			// TODO: Add support for AVX512 (v16f32).
	// It is likely not profitable to do this for f64 because a double-precision			// It is likely not profitable to do this for f64 because a double-precision
	// reciprocal estimate with refinement on x86 prior to FMA requires			// reciprocal estimate with refinement on x86 prior to FMA requires
	// 15 instructions: convert to single, rcpss, convert back to double, refine			// 15 instructions: convert to single, rcpss, convert back to double, refine
	// (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA			// (3 steps = 12 insts). If an 'rcpsd' variant was added to the ISA
	// along with FMA, this could be a throughput win.			// along with FMA, this could be a throughput win.
	if ((Subtarget->hasSSE1() && (VT == MVT::f32 \|\| VT == MVT::v4f32)) \|\|			if (VT == MVT::f32 && Subtarget->hasSSE1())
	(Subtarget->hasAVX() && VT == MVT::v8f32)) {			RecipOp = "divf";
	RefinementSteps = ReciprocalEstimateRefinementSteps;			else if ((VT == MVT::v4f32 && Subtarget->hasSSE1()) \|\|
	return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);			(VT == MVT::v8f32 && Subtarget->hasAVX()))
	}			RecipOp = "vec-divf";
				else
	return SDValue();			return SDValue();

				TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
				if (!Recips.isEnabled(RecipOp))
				return SDValue();

				RefinementSteps = Recips.getRefinementSteps(RecipOp);
				return DCI.DAG.getNode(X86ISD::FRCP, SDLoc(Op), VT, Op);
	}			}

	/// If we have at least two divisions that use the same divisor, convert to			/// If we have at least two divisions that use the same divisor, convert to
	/// multplication by a reciprocal. This may need to be adjusted for a given			/// multplication by a reciprocal. This may need to be adjusted for a given
	/// CPU if a division's cost is not at least twice the cost of a multiplication.			/// CPU if a division's cost is not at least twice the cost of a multiplication.
	/// This is because we still need one division to calculate the reciprocal and			/// This is because we still need one division to calculate the reciprocal and
	/// then we need two multiplies by that reciprocal as replacements for the			/// then we need two multiplies by that reciprocal as replacements for the
	/// original divisions.			/// original divisions.
	▲ Show 20 Lines • Show All 12,344 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	protected:
bool LEAUsesAG;		bool LEAUsesAG;

/// True if the LEA instruction with certain arguments is slow		/// True if the LEA instruction with certain arguments is slow
bool SlowLEA;		bool SlowLEA;

/// True if INC and DEC instructions are slow when writing to flags		/// True if INC and DEC instructions are slow when writing to flags
bool SlowIncDec;		bool SlowIncDec;

/// Use the RSQRT* instructions to optimize square root calculations.
/// For this to be profitable, the cost of FSQRT and FDIV must be
/// substantially higher than normal FP ops like FADD and FMUL.
bool UseSqrtEst;

/// Use the RCP* instructions to optimize FP division calculations.
/// For this to be profitable, the cost of FDIV must be
/// substantially higher than normal FP ops like FADD and FMUL.
bool UseReciprocalEst;

/// Processor has AVX-512 PreFetch Instructions		/// Processor has AVX-512 PreFetch Instructions
bool HasPFI;		bool HasPFI;

/// Processor has AVX-512 Exponential and Reciprocal Instructions		/// Processor has AVX-512 Exponential and Reciprocal Instructions
bool HasERI;		bool HasERI;

/// Processor has AVX-512 Conflict Detection Instructions		/// Processor has AVX-512 Conflict Detection Instructions
bool HasCDI;		bool HasCDI;
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:
bool useLeaForSP() const { return UseLeaForSP; }		bool useLeaForSP() const { return UseLeaForSP; }
bool hasSlowDivide32() const { return HasSlowDivide32; }		bool hasSlowDivide32() const { return HasSlowDivide32; }
bool hasSlowDivide64() const { return HasSlowDivide64; }		bool hasSlowDivide64() const { return HasSlowDivide64; }
bool padShortFunctions() const { return PadShortFunctions; }		bool padShortFunctions() const { return PadShortFunctions; }
bool callRegIndirect() const { return CallRegIndirect; }		bool callRegIndirect() const { return CallRegIndirect; }
bool LEAusesAG() const { return LEAUsesAG; }		bool LEAusesAG() const { return LEAUsesAG; }
bool slowLEA() const { return SlowLEA; }		bool slowLEA() const { return SlowLEA; }
bool slowIncDec() const { return SlowIncDec; }		bool slowIncDec() const { return SlowIncDec; }
bool useSqrtEst() const { return UseSqrtEst; }
bool useReciprocalEst() const { return UseReciprocalEst; }
bool hasCDI() const { return HasCDI; }		bool hasCDI() const { return HasCDI; }
bool hasPFI() const { return HasPFI; }		bool hasPFI() const { return HasPFI; }
bool hasERI() const { return HasERI; }		bool hasERI() const { return HasERI; }
bool hasDQI() const { return HasDQI; }		bool hasDQI() const { return HasDQI; }
bool hasBWI() const { return HasBWI; }		bool hasBWI() const { return HasBWI; }
bool hasVLX() const { return HasVLX; }		bool hasVLX() const { return HasVLX; }

bool isAtom() const { return X86ProcFamily == IntelAtom; }		bool isAtom() const { return X86ProcFamily == IntelAtom; }
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.cpp

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
UseLeaForSP = false;		UseLeaForSP = false;
HasSlowDivide32 = false;		HasSlowDivide32 = false;
HasSlowDivide64 = false;		HasSlowDivide64 = false;
PadShortFunctions = false;		PadShortFunctions = false;
CallRegIndirect = false;		CallRegIndirect = false;
LEAUsesAG = false;		LEAUsesAG = false;
SlowLEA = false;		SlowLEA = false;
SlowIncDec = false;		SlowIncDec = false;
UseSqrtEst = false;
UseReciprocalEst = false;
stackAlignment = 4;		stackAlignment = 4;
// FIXME: this is a known good value for Yonah. How about others?		// FIXME: this is a known good value for Yonah. How about others?
MaxInlineSizeThreshold = 128;		MaxInlineSizeThreshold = 128;
UseSoftFloat = false;		UseSoftFloat = false;
}		}

X86Subtarget &X86Subtarget::initializeSubtargetDependencies(StringRef CPU,		X86Subtarget &X86Subtarget::initializeSubtargetDependencies(StringRef CPU,
StringRef FS) {		StringRef FS) {
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	: LLVMTargetMachine(T, computeDataLayout(Triple(TT)), TT, CPU, FS, Options,
Subtarget(TT, CPU, FS, *this, Options.StackAlignmentOverride) {		Subtarget(TT, CPU, FS, *this, Options.StackAlignmentOverride) {
// Windows stack unwinder gets confused when execution flow "falls through"		// Windows stack unwinder gets confused when execution flow "falls through"
// after a call to 'noreturn' function.		// after a call to 'noreturn' function.
// To prevent that, we emit a trap for 'unreachable' IR instructions.		// To prevent that, we emit a trap for 'unreachable' IR instructions.
// (which on X86, happens to be the 'ud2' instruction)		// (which on X86, happens to be the 'ud2' instruction)
if (Subtarget.isTargetWin64())		if (Subtarget.isTargetWin64())
this->Options.TrapUnreachable = true;		this->Options.TrapUnreachable = true;

		// TODO: By default, all reciprocal estimate operations are off because
		// that matches the behavior before TargetRecip was added (except for btver2
		// which used subtarget features to enable this type of codegen).
		// We should change this to match GCC behavior where everything but
		// scalar division estimates are turned on by default with -ffast-math.
		this->Options.Reciprocals.setDefaults("all", false, 1);

initAsmInfo();		initAsmInfo();
}		}

X86TargetMachine::~X86TargetMachine() {}		X86TargetMachine::~X86TargetMachine() {}

const X86Subtarget *		const X86Subtarget *
X86TargetMachine::getSubtargetImpl(const Function &F) const {		X86TargetMachine::getSubtargetImpl(const Function &F) const {
Attribute CPUAttr = F.getFnAttribute("target-cpu");		Attribute CPUAttr = F.getFnAttribute("target-cpu");
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/recip-fastmath.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est \| FileCheck %s --check-prefix=RECIP			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf,vec-divf \| FileCheck %s --check-prefix=RECIP
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-recip-est -x86-recip-refinement-steps=2 \| FileCheck %s --check-prefix=REFINE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=divf:2,vec-divf:2 \| FileCheck %s --check-prefix=REFINE

	; If the target's divss/divps instructions are substantially			; If the target's divss/divps instructions are substantially
	; slower than rcpss/rcpps with a Newton-Raphson refinement,			; slower than rcpss/rcpps with a Newton-Raphson refinement,
	; we should generate the estimate sequence.			; we should generate the estimate sequence.

	; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )			; See PR21385 ( http://llvm.org/bugs/show_bug.cgi?id=21385 )
	; for details about the accuracy, speed, and implementation			; for details about the accuracy, speed, and implementation
	; differences of x86 reciprocal estimates.			; differences of x86 reciprocal estimates.
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sqrt-fastmath.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx,use-sqrt-est \| FileCheck %s --check-prefix=ESTIMATE			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx -recip=sqrtf,vec-sqrtf \| FileCheck %s --check-prefix=ESTIMATE

	declare double @__sqrt_finite(double) #0			declare double @__sqrt_finite(double) #0
	declare float @__sqrtf_finite(float) #0			declare float @__sqrtf_finite(float) #0
	declare x86_fp80 @__sqrtl_finite(x86_fp80) #0			declare x86_fp80 @__sqrtl_finite(x86_fp80) #0
	declare float @llvm.sqrt.f32(float) #0			declare float @llvm.sqrt.f32(float) #0
	declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #0			declare <4 x float> @llvm.sqrt.v4f32(<4 x float>) #0
	declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0			declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0

	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines