This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
2
TargetLibraryInfo.h
-
IR/
-
CMakeLists.txt
1
CallingConv.h
5
SVML.td
-
lib/
-
Analysis/
-
CMakeLists.txt
11
TargetLibraryInfo.cpp
-
AsmParser/
-
LLLexer.cpp
-
LLParser.cpp
-
LLToken.h
-
IR/
-
AsmWriter.cpp
-
Verifier.cpp
-
Target/X86/
-
X86/
-
X86CallingConv.td
-
X86ISelLowering.cpp
-
X86RegisterInfo.cpp
-
X86Subtarget.h
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
-
svml-calls.ll
-
utils/
-
TableGen/
-
CMakeLists.txt
4
SVMLEmitter.cpp
-
TableGen.cpp
-
TableGenBackends.h
-
vim/syntax/
-
syntax/
-
llvm.vim

Differential D47188

Intel SVML calling conventions
Needs ReviewPublic

Authored by dvnagorny on May 22 2018, 1:55 AM.

Download Raw Diff

Details

Reviewers

mmasten
craig.topper
hfinkel
spatel

Summary

This patch fixes the problem with improper calls to SVML library as it has non-standard calling conventions.
So accordingly it has SVML calling conventions definitions and code to set CC to the vectorized calls.
As SVML provides several implementations for the math functions we also took into consideration fast attribute and select more fast implementation in such case.
This work is based on original Matt Masten's work.

Diff Detail

Event Timeline

dvnagorny created this revision.May 22 2018, 1:55 AM

Herald added subscribers: llvm-commits, mgorny, mehdi_amini. · View Herald TranscriptMay 22 2018, 1:55 AM

lebedev.ri added a subscriber: lebedev.ri.May 22 2018, 2:13 AM

lebedev.ri added inline comments.

include/llvm/IR/CallingConv.h
225	unneeded comment
lib/Analysis/TargetLibraryInfo.cpp
55	This memory allocation looks unfortunate :/
utils/TableGen/SVMLEmitter.cpp
64	This surely can be `StringRef`

dvnagorny added inline comments.May 22 2018, 2:23 AM

lib/Analysis/TargetLibraryInfo.cpp
55	Could you explain your comment more detailed please. svmlMangle() returns not a reference but the std::string.

lebedev.ri added inline comments.May 22 2018, 2:29 AM

lib/Analysis/TargetLibraryInfo.cpp
55	Not much to explain, that already summarizes it quite well. It will return a `std::string`, not a reference, so unless small string optimization within `std::string` happens, every call to `svmlMangle()` will cause an allocation. And this also caused `getVectorizedFunction()` to return the `std:;string`, and so on.

dvnagorny added inline comments.May 22 2018, 2:36 AM

lib/Analysis/TargetLibraryInfo.cpp
55	I can't agree with you here. Beside of RVO we have move semantics for std::string now. So there shouldn't be no any extra memory allocations. Does LLVM coding standard imply anything special on move semantics?

lebedev.ri added inline comments.May 22 2018, 2:40 AM

lib/Analysis/TargetLibraryInfo.cpp
1497	Sure, here, `move` will happen.
1499–1500	But i don't see how it can happen here.

dvnagorny added inline comments.May 22 2018, 3:21 AM

lib/Analysis/TargetLibraryInfo.cpp
1499–1500	So as the name is dynamically constructed now it should be stored somewhere. I think it's good enough if it occures not more then once.

hfinkel added a subscriber: hfinkel.May 22 2018, 7:09 AM

hfinkel added inline comments.

include/llvm/Analysis/TargetLibraryInfo.h
159–160	I think just "Ignored" is fine.
include/llvm/IR/SVML.td
54	What does this mean? Why would a _ha variant of floor, for example, be needed for vectorization? Or, to put it another way, how would a _ha variant of floor differ from the _ep version?

dvnagorny added inline comments.May 22 2018, 7:31 AM

include/llvm/Analysis/TargetLibraryInfo.h
159–160	Will fix, thank you.
include/llvm/IR/SVML.td
54	Really I don't expect any difference in behaviour, However I think that link-time weak aliases for these symbols are more preferable here. It will not require any additional logic in function name mangling code.

hfinkel added inline comments.May 22 2018, 8:40 AM

include/llvm/IR/SVML.td
54	Do these aliases exist currently? I'm concerned about the comment saying that we're disabling vectorization of floor, etc., and I'm prefer that we not disable vectorization unnecessarily.

craig.topper added a subscriber: craig.topper.May 22 2018, 8:43 AM

craig.topper added inline comments.

lib/Analysis/TargetLibraryInfo.cpp
1485	This looks like it passes 80 columns.
1503	This should probably be std::string() now.
utils/TableGen/SVMLEmitter.cpp
52	Mark these 'const'?
57	This looks to be a dead variable in release builds.
63	I think this should be Records.getAllDerivedDefinitions("SvmlVariant")

dvnagorny added inline comments.May 22 2018, 9:15 AM

include/llvm/IR/SVML.td
54	Really this code will work when vector-library SVML option will be explicitly passed. So it doesn't disable any default vectorizations. On the other hand in the current LLVM state these function's vectorization isn't enabled yet. So I keep their vectorization the same. From SVML library side aliases aren't available yet however this library not the only potential source of aliases. It's quite simple to provide them from the application code.

hfinkel added inline comments.May 22 2018, 10:57 AM

include/llvm/IR/SVML.td
54	Okay, but providing them in application code doesn't help autovectorization, does it? Passing -fveclib=SVML should enable autovectorization of calls to floor, etc. You're saying that it doesn't now, and so I can certainly see that it might be best to change that in a separate patch, but saying that we'll wait for aliases to appear in the library doesn't sound like a good plan (and doesn't help users of existing versions of the library). The vectorizer can call some version which is already there. In short, would it make sense for the comment to read? // TODO: SVML does not currently provide _ha varients of these fucnctions. We should call the _ep variants of these functions in all cases instead.

vchuravy added a subscriber: vchuravy.May 22 2018, 10:58 AM

loladiro added a subscriber: loladiro.May 31 2018, 9:15 AM

dvnagorny updated this revision to Diff 150612.Jun 9 2018, 2:25 AM

dvnagorny edited the summary of this revision. (Show Details)

Herald added a subscriber: steven_wu. · View Herald TranscriptJun 9 2018, 2:25 AM

dvnagorny added reviewers: mmasten, craig.topper, hfinkel, spatel.Jun 9 2018, 2:30 AM

rob.lougher mentioned this in D48193: [LoopVectorizer] Use an interleave count of 1 when using a vector library call.Jul 2 2018, 9:49 AM

hsaito added a subscriber: hsaito.Jul 2 2018, 5:35 PM

hsaito added inline comments.

lib/Analysis/TargetLibraryInfo.cpp
55	I have a problem for the use of "_ha" interface in the non-fast case. Unless the compiler is in a reasonably relaxed mode, I'd like vector computation and scalar computation to be bitwise-identical. "_ha" interface of SVML doesn't produce the bitwise identical result as scalar call. I'm sure there is a room for using "_ha" interface, but we need to carefully design how to enable it. By simply using SVML, I don't think the programmer gave the compiler a license to deviate from bitwise identical results. I may be too much of a paranoid about it, but there is a fair number of people who ask for bitwise identity.

Herald added a subscriber: dexonsmith. · View Herald TranscriptJul 2 2018, 5:35 PM

dvnagorny added inline comments.Jul 4 2018, 4:02 AM

lib/Analysis/TargetLibraryInfo.cpp
55	Quite interesting problem here. AFAIK "_ha" interface provides "high accuracy" versions of functions. (1 ulp), "default" SVML interface provides less precise functions (probably 4ulp). So it means that scalar implementation is lesser precise than vectorized one. On the other hand I'm not sure if it is really possible to have differ libm implementations providing fully bitwise-idenctical results for the whole float or double domain within 4ulps precision. As not exact units should differ by definition.

venkataramanan.kumar.llvm added a subscriber: venkataramanan.kumar.llvm.Jul 12 2018, 10:00 PM

Hardcode84 mentioned this in D101253: Intel SVML calling conventions.Apr 25 2021, 6:53 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetLibraryInfo.h

17 lines

IR/

CMakeLists.txt

4 lines

CallingConv.h

3 lines

SVML.td

62 lines

lib/

Analysis/

CMakeLists.txt

1 line

TargetLibraryInfo.cpp

126 lines

AsmParser/

LLLexer.cpp

1 line

LLParser.cpp

2 lines

LLToken.h

1 line

IR/

AsmWriter.cpp

1 line

Verifier.cpp

1 line

Target/

X86/

59 lines

3 lines

34 lines

1 line

Transforms/

Vectorize/

LoopVectorize.cpp

6 lines

test/

Transforms/

LoopVectorize/

X86/

svml-calls.ll

80 lines

utils/

TableGen/

1 line

110 lines

8 lines

1 line

vim/

syntax/

llvm.vim

1 line

Diff 150612

include/llvm/Analysis/TargetLibraryInfo.h

Show All 32 Lines	struct VecDesc {

enum LibFunc {		enum LibFunc {
#define TLI_DEFINE_ENUM		#define TLI_DEFINE_ENUM
#include "llvm/Analysis/TargetLibraryInfo.def"		#include "llvm/Analysis/TargetLibraryInfo.def"

NumLibFuncs		NumLibFuncs
};		};

		enum SVMLAccuracy {
		SVML_DEFAULT,
		SVML_HA,
		SVML_EP
		};

/// Implementation of the target library information.		/// Implementation of the target library information.
///		///
/// This class constructs tables that hold the target library information and		/// This class constructs tables that hold the target library information and
/// make it available. However, it is somewhat expensive to compute and only		/// make it available. However, it is somewhat expensive to compute and only
/// depends on the triple. So users typically interact with the \c		/// depends on the triple. So users typically interact with the \c
/// TargetLibraryInfo wrapper below.		/// TargetLibraryInfo wrapper below.
class TargetLibraryInfoImpl {		class TargetLibraryInfoImpl {
friend class TargetLibraryInfo;		friend class TargetLibraryInfo;
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	public:

/// Calls addVectorizableFunctions with a known preset of functions for the		/// Calls addVectorizableFunctions with a known preset of functions for the
/// given vector library.		/// given vector library.
void addVectorizableFunctionsFromVecLib(enum VectorLibrary VecLib);		void addVectorizableFunctionsFromVecLib(enum VectorLibrary VecLib);

/// Return true if the function F has a vector equivalent with vectorization		/// Return true if the function F has a vector equivalent with vectorization
/// factor VF.		/// factor VF.
bool isFunctionVectorizable(StringRef F, unsigned VF) const {		bool isFunctionVectorizable(StringRef F, unsigned VF) const {
return !getVectorizedFunction(F, VF).empty();		bool Ignored;
		return !getVectorizedFunction(F, VF, Ignored, false).empty();
		hfinkelUnsubmitted Not Done Reply Inline Actions I think just "Ignored" is fine. hfinkel: I think just "Ignored" is fine.
		dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions Will fix, thank you. dvnagorny: Will fix, thank you.
}		}

/// Return true if the function F has a vector equivalent with any		/// Return true if the function F has a vector equivalent with any
/// vectorization factor.		/// vectorization factor.
bool isFunctionVectorizable(StringRef F) const;		bool isFunctionVectorizable(StringRef F) const;

/// Return the name of the equivalent of F, vectorized with factor VF. If no		/// Return the name of the equivalent of F, vectorized with factor VF. If no
/// such mapping exists, return the empty string.		/// such mapping exists, return the empty string.
StringRef getVectorizedFunction(StringRef F, unsigned VF) const;		std::string getVectorizedFunction(StringRef F, unsigned VF, bool &FromSVML,
		bool IsFast) const;

/// Return true if the function F has a scalar equivalent, and set VF to be		/// Return true if the function F has a scalar equivalent, and set VF to be
/// the vectorization factor.		/// the vectorization factor.
bool isFunctionScalarizable(StringRef F, unsigned &VF) const {		bool isFunctionScalarizable(StringRef F, unsigned &VF) const {
return !getScalarizedFunction(F, VF).empty();		return !getScalarizedFunction(F, VF).empty();
}		}

/// Return the name of the equivalent of F, scalarized. If no such mapping		/// Return the name of the equivalent of F, scalarized. If no such mapping
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	bool has(LibFunc F) const {
return Impl->getState(F) != TargetLibraryInfoImpl::Unavailable;		return Impl->getState(F) != TargetLibraryInfoImpl::Unavailable;
}		}
bool isFunctionVectorizable(StringRef F, unsigned VF) const {		bool isFunctionVectorizable(StringRef F, unsigned VF) const {
return Impl->isFunctionVectorizable(F, VF);		return Impl->isFunctionVectorizable(F, VF);
}		}
bool isFunctionVectorizable(StringRef F) const {		bool isFunctionVectorizable(StringRef F) const {
return Impl->isFunctionVectorizable(F);		return Impl->isFunctionVectorizable(F);
}		}
StringRef getVectorizedFunction(StringRef F, unsigned VF) const {		std::string getVectorizedFunction(StringRef F, unsigned VF, bool &FromSVML,
return Impl->getVectorizedFunction(F, VF);		bool IsFast) const {
		return Impl->getVectorizedFunction(F, VF, FromSVML, IsFast);
}		}

/// Tests if the function is both available and a candidate for optimized code		/// Tests if the function is both available and a candidate for optimized code
/// generation.		/// generation.
bool hasOptimizedCodeGen(LibFunc F) const {		bool hasOptimizedCodeGen(LibFunc F) const {
if (Impl->getState(F) == TargetLibraryInfoImpl::Unavailable)		if (Impl->getState(F) == TargetLibraryInfoImpl::Unavailable)
return false;		return false;
switch (F) {		switch (F) {
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

include/llvm/IR/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS Attributes.td)			set(LLVM_TARGET_DEFINITIONS Attributes.td)
	tablegen(LLVM Attributes.inc -gen-attrs)			tablegen(LLVM Attributes.inc -gen-attrs)

	set(LLVM_TARGET_DEFINITIONS Intrinsics.td)			set(LLVM_TARGET_DEFINITIONS Intrinsics.td)
	tablegen(LLVM Intrinsics.inc -gen-intrinsic)			tablegen(LLVM Intrinsics.inc -gen-intrinsic)
	add_public_tablegen_target(intrinsics_gen)			add_public_tablegen_target(intrinsics_gen)

				set(LLVM_TARGET_DEFINITIONS SVML.td)
				tablegen(LLVM SVML.inc -gen-svml)
				add_public_tablegen_target(svml_gen)

include/llvm/IR/CallingConv.h

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	enum {
/// use.		/// use.
AMDGPU_LS = 95,		AMDGPU_LS = 95,

/// Calling convention used for AMDPAL shader stage before geometry shader		/// Calling convention used for AMDPAL shader stage before geometry shader
/// if geometry is in use. So either the domain (= tessellation evaluation)		/// if geometry is in use. So either the domain (= tessellation evaluation)
/// shader if tessellation is in use, or otherwise the vertex shader.		/// shader if tessellation is in use, or otherwise the vertex shader.
AMDGPU_ES = 96,		AMDGPU_ES = 96,

		/// Intel_SVML - Calling conventions for Intel Short Math Vector Library
		Intel_SVML = 97,

		lebedev.riUnsubmitted Not Done Reply Inline Actions unneeded comment lebedev.ri: unneeded comment
/// The highest possible calling convention ID. Must be some 2^k - 1.		/// The highest possible calling convention ID. Must be some 2^k - 1.
MaxID = 1023		MaxID = 1023
};		};

} // end namespace CallingConv		} // end namespace CallingConv

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_CALLINGCONV_H		#endif // LLVM_IR_CALLINGCONV_H

include/llvm/IR/SVML.td

This file was added.

				//===-- Intel_SVML.td - Defines SVML call variants ---------- tablegen --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file is used by TableGen to define the different typs of SVML function
				// variants used with -fveclib=SVML.
				//
				//===----------------------------------------------------------------------===//

				class SvmlVariant;

				def sin : SvmlVariant;
				def cos : SvmlVariant;
				def pow : SvmlVariant;
				def exp : SvmlVariant;
				def log : SvmlVariant;
				def acos : SvmlVariant;
				def acosh : SvmlVariant;
				def asin : SvmlVariant;
				def asinh : SvmlVariant;
				def atan2 : SvmlVariant;
				def atan : SvmlVariant;
				def atanh : SvmlVariant;
				def cbrt : SvmlVariant;
				def cdfnorm : SvmlVariant;
				def cdfnorminv : SvmlVariant;
				def cosd : SvmlVariant;
				def cosh : SvmlVariant;
				def erf : SvmlVariant;
				def erfc : SvmlVariant;
				def erfcinv : SvmlVariant;
				def erfinv : SvmlVariant;
				def exp10 : SvmlVariant;
				def exp2 : SvmlVariant;
				def expm1 : SvmlVariant;
				def hypot : SvmlVariant;
				def invsqrt : SvmlVariant;
				def log10 : SvmlVariant;
				def log1p : SvmlVariant;
				def log2 : SvmlVariant;
				def sind : SvmlVariant;
				def sinh : SvmlVariant;
				def sqrt : SvmlVariant;
				def tan : SvmlVariant;
				def tanh : SvmlVariant;

				// TODO: SVML does not currently provide _ha and _ep variants of these fucnctions.
				// We should call the default variant of these functions in all cases instead.

				hfinkelUnsubmitted Not Done Reply Inline Actions What does this mean? Why would a _ha variant of floor, for example, be needed for vectorization? Or, to put it another way, how would a _ha variant of floor differ from the _ep version? hfinkel: What does this mean? Why would a _ha variant of floor, for example, be needed for vectorization?
				dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions Really I don't expect any difference in behaviour, However I think that link-time weak aliases for these symbols are more preferable here. It will not require any additional logic in function name mangling code. dvnagorny: Really I don't expect any difference in behaviour, However I think that link-time weak aliases…
				hfinkelUnsubmitted Not Done Reply Inline Actions Do these aliases exist currently? I'm concerned about the comment saying that we're disabling vectorization of floor, etc., and I'm prefer that we not disable vectorization unnecessarily. hfinkel: Do these aliases exist currently? I'm concerned about the comment saying that we're disabling…
				dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions Really this code will work when vector-library SVML option will be explicitly passed. So it doesn't disable any default vectorizations. On the other hand in the current LLVM state these function's vectorization isn't enabled yet. So I keep their vectorization the same. From SVML library side aliases aren't available yet however this library not the only potential source of aliases. It's quite simple to provide them from the application code. dvnagorny: Really this code will work when vector-library SVML option will be explicitly passed. So it…
				hfinkelUnsubmitted Not Done Reply Inline Actions Okay, but providing them in application code doesn't help autovectorization, does it? Passing -fveclib=SVML should enable autovectorization of calls to floor, etc. You're saying that it doesn't now, and so I can certainly see that it might be best to change that in a separate patch, but saying that we'll wait for aliases to appear in the library doesn't sound like a good plan (and doesn't help users of existing versions of the library). The vectorizer can call some version which is already there. In short, would it make sense for the comment to read? // TODO: SVML does not currently provide _ha varients of these fucnctions. We should call the _ep variants of these functions in all cases instead. hfinkel: Okay, but providing them in application code doesn't help autovectorization, does it? Passing…
				// def nearbyint : SvmlVariant;
				// def logb : SvmlVariant;
				// def floor : SvmlVariant;
				// def fmod : SvmlVariant;
				// def ceil : SvmlVariant;
				// def trunc : SvmlVariant;
				// def rint : SvmlVariant;
				// def round : SvmlVariant;

lib/Analysis/CMakeLists.txt

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMAnalysis
ValueTracking.cpp		ValueTracking.cpp
VectorUtils.cpp		VectorUtils.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis		${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
		svml_gen
)		)

lib/Analysis/TargetLibraryInfo.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (T.isMacOSX() && T.isMacOSXVersionLT(10, 9))
return false;		return false;

if (T.isiOS() && T.isOSVersionLT(7, 0))		if (T.isiOS() && T.isOSVersionLT(7, 0))
return false;		return false;

return true;		return true;
}		}

		std::string svmlMangle(StringRef FnName, const bool IsFast) {
		std::string FullName = FnName;
		return IsFast ? FullName : FullName + "_ha";
		lebedev.riUnsubmitted Not Done Reply Inline Actions This memory allocation looks unfortunate :/ lebedev.ri: This memory allocation looks unfortunate :/
		dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions Could you explain your comment more detailed please. svmlMangle() returns not a reference but the std::string. dvnagorny: Could you explain your comment more detailed please. svmlMangle() returns not a reference but…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Not much to explain, that already summarizes it quite well. It will return a `std::string`, not a reference, so unless small string optimization within `std::string` happens, every call to `svmlMangle()` will cause an allocation. And this also caused `getVectorizedFunction()` to return the `std:;string`, and so on. lebedev.ri: Not much to explain, that already summarizes it quite well. It will return a `std::string`…
		dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions I can't agree with you here. Beside of RVO we have move semantics for std::string now. So there shouldn't be no any extra memory allocations. Does LLVM coding standard imply anything special on move semantics? dvnagorny: I can't agree with you here. Beside of RVO we have move semantics for std::string now. So there…
		hsaitoUnsubmitted Not Done Reply Inline Actions I have a problem for the use of "_ha" interface in the non-fast case. Unless the compiler is in a reasonably relaxed mode, I'd like vector computation and scalar computation to be bitwise-identical. "_ha" interface of SVML doesn't produce the bitwise identical result as scalar call. I'm sure there is a room for using "_ha" interface, but we need to carefully design how to enable it. By simply using SVML, I don't think the programmer gave the compiler a license to deviate from bitwise identical results. I may be too much of a paranoid about it, but there is a fair number of people who ask for bitwise identity. hsaito: I have a problem for the use of "_ha" interface in the non-fast case. Unless the compiler is in…
		dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions Quite interesting problem here. AFAIK "_ha" interface provides "high accuracy" versions of functions. (1 ulp), "default" SVML interface provides less precise functions (probably 4ulp). So it means that scalar implementation is lesser precise than vectorized one. On the other hand I'm not sure if it is really possible to have differ libm implementations providing fully bitwise-idenctical results for the whole float or double domain within 4ulps precision. As not exact units should differ by definition. dvnagorny: Quite interesting problem here. AFAIK "_ha" interface provides "high accuracy" versions of…
		}

/// Initialize the set of available library functions based on the specified		/// Initialize the set of available library functions based on the specified
/// target triple. This should be carefully written so that a missing target		/// target triple. This should be carefully written so that a missing target
/// triple gets a sane set of defaults.		/// triple gets a sane set of defaults.
static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,		static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
ArrayRef<StringRef> StandardNames) {		ArrayRef<StringRef> StandardNames) {
// Verify that the StandardNames array is in alphabetical order.		// Verify that the StandardNames array is in alphabetical order.
assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),		assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
[](StringRef LHS, StringRef RHS) {		[](StringRef LHS, StringRef RHS) {
▲ Show 20 Lines • Show All 1,387 Lines • ▼ Show 20 Lines	const VecDesc VecFuncs[] = {
{"acoshf", "vacoshf", 4},		{"acoshf", "vacoshf", 4},
{"atanhf", "vatanhf", 4},		{"atanhf", "vatanhf", 4},
};		};
addVectorizableFunctions(VecFuncs);		addVectorizableFunctions(VecFuncs);
break;		break;
}		}
case SVML: {		case SVML: {
const VecDesc VecFuncs[] = {		const VecDesc VecFuncs[] = {
{"sin", "__svml_sin2", 2},		#define GET_SVML_VARIANTS
{"sin", "__svml_sin4", 4},		#include "llvm/IR/SVML.inc"
{"sin", "__svml_sin8", 8},		#undef GET_SVML_VARIANTS

{"sinf", "__svml_sinf4", 4},
{"sinf", "__svml_sinf8", 8},
{"sinf", "__svml_sinf16", 16},

{"llvm.sin.f64", "__svml_sin2", 2},
{"llvm.sin.f64", "__svml_sin4", 4},
{"llvm.sin.f64", "__svml_sin8", 8},

{"llvm.sin.f32", "__svml_sinf4", 4},
{"llvm.sin.f32", "__svml_sinf8", 8},
{"llvm.sin.f32", "__svml_sinf16", 16},

{"cos", "__svml_cos2", 2},
{"cos", "__svml_cos4", 4},
{"cos", "__svml_cos8", 8},

{"cosf", "__svml_cosf4", 4},
{"cosf", "__svml_cosf8", 8},
{"cosf", "__svml_cosf16", 16},

{"llvm.cos.f64", "__svml_cos2", 2},
{"llvm.cos.f64", "__svml_cos4", 4},
{"llvm.cos.f64", "__svml_cos8", 8},

{"llvm.cos.f32", "__svml_cosf4", 4},
{"llvm.cos.f32", "__svml_cosf8", 8},
{"llvm.cos.f32", "__svml_cosf16", 16},

{"pow", "__svml_pow2", 2},
{"pow", "__svml_pow4", 4},
{"pow", "__svml_pow8", 8},

{"powf", "__svml_powf4", 4},
{"powf", "__svml_powf8", 8},
{"powf", "__svml_powf16", 16},

{ "__pow_finite", "__svml_pow2", 2 },
{ "__pow_finite", "__svml_pow4", 4 },
{ "__pow_finite", "__svml_pow8", 8 },

{ "__powf_finite", "__svml_powf4", 4 },
{ "__powf_finite", "__svml_powf8", 8 },
{ "__powf_finite", "__svml_powf16", 16 },

{"llvm.pow.f64", "__svml_pow2", 2},
{"llvm.pow.f64", "__svml_pow4", 4},
{"llvm.pow.f64", "__svml_pow8", 8},

{"llvm.pow.f32", "__svml_powf4", 4},
{"llvm.pow.f32", "__svml_powf8", 8},
{"llvm.pow.f32", "__svml_powf16", 16},

{"exp", "__svml_exp2", 2},
{"exp", "__svml_exp4", 4},
{"exp", "__svml_exp8", 8},

{"expf", "__svml_expf4", 4},
{"expf", "__svml_expf8", 8},
{"expf", "__svml_expf16", 16},

{ "__exp_finite", "__svml_exp2", 2 },
{ "__exp_finite", "__svml_exp4", 4 },
{ "__exp_finite", "__svml_exp8", 8 },

{ "__expf_finite", "__svml_expf4", 4 },
{ "__expf_finite", "__svml_expf8", 8 },
{ "__expf_finite", "__svml_expf16", 16 },

{"llvm.exp.f64", "__svml_exp2", 2},
{"llvm.exp.f64", "__svml_exp4", 4},
{"llvm.exp.f64", "__svml_exp8", 8},

{"llvm.exp.f32", "__svml_expf4", 4},
{"llvm.exp.f32", "__svml_expf8", 8},
{"llvm.exp.f32", "__svml_expf16", 16},

{"log", "__svml_log2", 2},
{"log", "__svml_log4", 4},
{"log", "__svml_log8", 8},

{"logf", "__svml_logf4", 4},
{"logf", "__svml_logf8", 8},
{"logf", "__svml_logf16", 16},

{ "__log_finite", "__svml_log2", 2 },
{ "__log_finite", "__svml_log4", 4 },
{ "__log_finite", "__svml_log8", 8 },

{ "__logf_finite", "__svml_logf4", 4 },
{ "__logf_finite", "__svml_logf8", 8 },
{ "__logf_finite", "__svml_logf16", 16 },

{"llvm.log.f64", "__svml_log2", 2},
{"llvm.log.f64", "__svml_log4", 4},
{"llvm.log.f64", "__svml_log8", 8},

{"llvm.log.f32", "__svml_logf4", 4},
{"llvm.log.f32", "__svml_logf8", 8},
{"llvm.log.f32", "__svml_logf16", 16},
};		};
addVectorizableFunctions(VecFuncs);		addVectorizableFunctions(VecFuncs);
break;		break;
}		}
case NoLibrary:		case NoLibrary:
break;		break;
}		}
}		}

bool TargetLibraryInfoImpl::isFunctionVectorizable(StringRef funcName) const {		bool TargetLibraryInfoImpl::isFunctionVectorizable(StringRef funcName) const {
funcName = sanitizeFunctionName(funcName);		funcName = sanitizeFunctionName(funcName);
if (funcName.empty())		if (funcName.empty())
return false;		return false;

std::vector<VecDesc>::const_iterator I = std::lower_bound(		std::vector<VecDesc>::const_iterator I = std::lower_bound(
VectorDescs.begin(), VectorDescs.end(), funcName,		VectorDescs.begin(), VectorDescs.end(), funcName,
compareWithScalarFnName);		compareWithScalarFnName);
return I != VectorDescs.end() && StringRef(I->ScalarFnName) == funcName;		return I != VectorDescs.end() && StringRef(I->ScalarFnName) == funcName;
}		}

StringRef TargetLibraryInfoImpl::getVectorizedFunction(StringRef F,		std::string TargetLibraryInfoImpl::getVectorizedFunction(StringRef F,
unsigned VF) const {		unsigned VF,
		craig.topperUnsubmitted Not Done Reply Inline Actions This looks like it passes 80 columns. craig.topper: This looks like it passes 80 columns.
		bool &FromSVML,
		bool IsFast) const {
		FromSVML = ClVectorLibrary == SVML;
F = sanitizeFunctionName(F);		F = sanitizeFunctionName(F);
if (F.empty())		if (F.empty())
return F;		return F;
std::vector<VecDesc>::const_iterator I = std::lower_bound(		std::vector<VecDesc>::const_iterator I = std::lower_bound(
VectorDescs.begin(), VectorDescs.end(), F, compareWithScalarFnName);		VectorDescs.begin(), VectorDescs.end(), F, compareWithScalarFnName);
while (I != VectorDescs.end() && StringRef(I->ScalarFnName) == F) {		while (I != VectorDescs.end() && StringRef(I->ScalarFnName) == F) {
if (I->VectorizationFactor == VF)		if (I->VectorizationFactor == VF) {
		if (FromSVML) {
		return svmlMangle(I->VectorFnName, IsFast);
		lebedev.riUnsubmitted Not Done Reply Inline Actions Sure, here, `move` will happen. lebedev.ri: Sure, here, `move` will happen.
		}
return I->VectorFnName;		return I->VectorFnName;
		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions But i don't see how it can happen here. lebedev.ri: But i don't see how it can happen here.
		dvnagornyAuthorUnsubmitted Not Done Reply Inline Actions So as the name is dynamically constructed now it should be stored somewhere. I think it's good enough if it occures not more then once. dvnagorny: So as the name is dynamically constructed now it should be stored somewhere. I think it's good…
++I;		++I;
}		}
return StringRef();		return std::string();
		craig.topperUnsubmitted Not Done Reply Inline Actions This should probably be std::string() now. craig.topper: This should probably be std::string() now.
}		}

StringRef TargetLibraryInfoImpl::getScalarizedFunction(StringRef F,		StringRef TargetLibraryInfoImpl::getScalarizedFunction(StringRef F,
unsigned &VF) const {		unsigned &VF) const {
F = sanitizeFunctionName(F);		F = sanitizeFunctionName(F);
if (F.empty())		if (F.empty())
return F;		return F;

▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

lib/AsmParser/LLLexer.cpp

Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	#define KEYWORD(STR) \
KEYWORD(msp430_intrcc);		KEYWORD(msp430_intrcc);
KEYWORD(avr_intrcc);		KEYWORD(avr_intrcc);
KEYWORD(avr_signalcc);		KEYWORD(avr_signalcc);
KEYWORD(ptx_kernel);		KEYWORD(ptx_kernel);
KEYWORD(ptx_device);		KEYWORD(ptx_device);
KEYWORD(spir_kernel);		KEYWORD(spir_kernel);
KEYWORD(spir_func);		KEYWORD(spir_func);
KEYWORD(intel_ocl_bicc);		KEYWORD(intel_ocl_bicc);
		KEYWORD(intel_svmlcc);
KEYWORD(x86_64_sysvcc);		KEYWORD(x86_64_sysvcc);
KEYWORD(win64cc);		KEYWORD(win64cc);
KEYWORD(x86_regcallcc);		KEYWORD(x86_regcallcc);
KEYWORD(webkit_jscc);		KEYWORD(webkit_jscc);
KEYWORD(swiftcc);		KEYWORD(swiftcc);
KEYWORD(anyregcc);		KEYWORD(anyregcc);
KEYWORD(preserve_mostcc);		KEYWORD(preserve_mostcc);
KEYWORD(preserve_allcc);		KEYWORD(preserve_allcc);
▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

lib/AsmParser/LLParser.cpp

Show First 20 Lines • Show All 1,772 Lines • ▼ Show 20 Lines	void LLParser::ParseOptionalDLLStorageClass(unsigned &Res) {
Lex.Lex();		Lex.Lex();
}		}

/// ParseOptionalCallingConv		/// ParseOptionalCallingConv
/// ::= /empty/		/// ::= /empty/
/// ::= 'ccc'		/// ::= 'ccc'
/// ::= 'fastcc'		/// ::= 'fastcc'
/// ::= 'intel_ocl_bicc'		/// ::= 'intel_ocl_bicc'
		/// ::= 'intel_svmlcc'
/// ::= 'coldcc'		/// ::= 'coldcc'
/// ::= 'x86_stdcallcc'		/// ::= 'x86_stdcallcc'
/// ::= 'x86_fastcallcc'		/// ::= 'x86_fastcallcc'
/// ::= 'x86_thiscallcc'		/// ::= 'x86_thiscallcc'
/// ::= 'x86_vectorcallcc'		/// ::= 'x86_vectorcallcc'
/// ::= 'arm_apcscc'		/// ::= 'arm_apcscc'
/// ::= 'arm_aapcscc'		/// ::= 'arm_aapcscc'
/// ::= 'arm_aapcs_vfpcc'		/// ::= 'arm_aapcs_vfpcc'
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	bool LLParser::ParseOptionalCallingConv(unsigned &CC) {
case lltok::kw_msp430_intrcc: CC = CallingConv::MSP430_INTR; break;		case lltok::kw_msp430_intrcc: CC = CallingConv::MSP430_INTR; break;
case lltok::kw_avr_intrcc: CC = CallingConv::AVR_INTR; break;		case lltok::kw_avr_intrcc: CC = CallingConv::AVR_INTR; break;
case lltok::kw_avr_signalcc: CC = CallingConv::AVR_SIGNAL; break;		case lltok::kw_avr_signalcc: CC = CallingConv::AVR_SIGNAL; break;
case lltok::kw_ptx_kernel: CC = CallingConv::PTX_Kernel; break;		case lltok::kw_ptx_kernel: CC = CallingConv::PTX_Kernel; break;
case lltok::kw_ptx_device: CC = CallingConv::PTX_Device; break;		case lltok::kw_ptx_device: CC = CallingConv::PTX_Device; break;
case lltok::kw_spir_kernel: CC = CallingConv::SPIR_KERNEL; break;		case lltok::kw_spir_kernel: CC = CallingConv::SPIR_KERNEL; break;
case lltok::kw_spir_func: CC = CallingConv::SPIR_FUNC; break;		case lltok::kw_spir_func: CC = CallingConv::SPIR_FUNC; break;
case lltok::kw_intel_ocl_bicc: CC = CallingConv::Intel_OCL_BI; break;		case lltok::kw_intel_ocl_bicc: CC = CallingConv::Intel_OCL_BI; break;
		case lltok::kw_intel_svmlcc: CC = CallingConv::Intel_SVML; break;
case lltok::kw_x86_64_sysvcc: CC = CallingConv::X86_64_SysV; break;		case lltok::kw_x86_64_sysvcc: CC = CallingConv::X86_64_SysV; break;
case lltok::kw_win64cc: CC = CallingConv::Win64; break;		case lltok::kw_win64cc: CC = CallingConv::Win64; break;
case lltok::kw_webkit_jscc: CC = CallingConv::WebKit_JS; break;		case lltok::kw_webkit_jscc: CC = CallingConv::WebKit_JS; break;
case lltok::kw_anyregcc: CC = CallingConv::AnyReg; break;		case lltok::kw_anyregcc: CC = CallingConv::AnyReg; break;
case lltok::kw_preserve_mostcc:CC = CallingConv::PreserveMost; break;		case lltok::kw_preserve_mostcc:CC = CallingConv::PreserveMost; break;
case lltok::kw_preserve_allcc: CC = CallingConv::PreserveAll; break;		case lltok::kw_preserve_allcc: CC = CallingConv::PreserveAll; break;
case lltok::kw_ghccc: CC = CallingConv::GHC; break;		case lltok::kw_ghccc: CC = CallingConv::GHC; break;
case lltok::kw_swiftcc: CC = CallingConv::Swift; break;		case lltok::kw_swiftcc: CC = CallingConv::Swift; break;
▲ Show 20 Lines • Show All 5,077 Lines • Show Last 20 Lines

lib/AsmParser/LLToken.h

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	enum Kind {
kw_prologue,		kw_prologue,
kw_c,		kw_c,

kw_cc,		kw_cc,
kw_ccc,		kw_ccc,
kw_fastcc,		kw_fastcc,
kw_coldcc,		kw_coldcc,
kw_intel_ocl_bicc,		kw_intel_ocl_bicc,
		kw_intel_svmlcc,
kw_x86_stdcallcc,		kw_x86_stdcallcc,
kw_x86_fastcallcc,		kw_x86_fastcallcc,
kw_x86_thiscallcc,		kw_x86_thiscallcc,
kw_x86_vectorcallcc,		kw_x86_vectorcallcc,
kw_x86_regcallcc,		kw_x86_regcallcc,
kw_arm_apcscc,		kw_arm_apcscc,
kw_arm_aapcscc,		kw_arm_aapcscc,
kw_arm_aapcs_vfpcc,		kw_arm_aapcs_vfpcc,
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

lib/IR/AsmWriter.cpp

Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	static void PrintCallingConv(unsigned cc, raw_ostream &Out) {
case CallingConv::CXX_FAST_TLS: Out << "cxx_fast_tlscc"; break;		case CallingConv::CXX_FAST_TLS: Out << "cxx_fast_tlscc"; break;
case CallingConv::GHC: Out << "ghccc"; break;		case CallingConv::GHC: Out << "ghccc"; break;
case CallingConv::X86_StdCall: Out << "x86_stdcallcc"; break;		case CallingConv::X86_StdCall: Out << "x86_stdcallcc"; break;
case CallingConv::X86_FastCall: Out << "x86_fastcallcc"; break;		case CallingConv::X86_FastCall: Out << "x86_fastcallcc"; break;
case CallingConv::X86_ThisCall: Out << "x86_thiscallcc"; break;		case CallingConv::X86_ThisCall: Out << "x86_thiscallcc"; break;
case CallingConv::X86_RegCall: Out << "x86_regcallcc"; break;		case CallingConv::X86_RegCall: Out << "x86_regcallcc"; break;
case CallingConv::X86_VectorCall:Out << "x86_vectorcallcc"; break;		case CallingConv::X86_VectorCall:Out << "x86_vectorcallcc"; break;
case CallingConv::Intel_OCL_BI: Out << "intel_ocl_bicc"; break;		case CallingConv::Intel_OCL_BI: Out << "intel_ocl_bicc"; break;
		case CallingConv::Intel_SVML: Out << "intel_svmlcc"; break;
case CallingConv::ARM_APCS: Out << "arm_apcscc"; break;		case CallingConv::ARM_APCS: Out << "arm_apcscc"; break;
case CallingConv::ARM_AAPCS: Out << "arm_aapcscc"; break;		case CallingConv::ARM_AAPCS: Out << "arm_aapcscc"; break;
case CallingConv::ARM_AAPCS_VFP: Out << "arm_aapcs_vfpcc"; break;		case CallingConv::ARM_AAPCS_VFP: Out << "arm_aapcs_vfpcc"; break;
case CallingConv::MSP430_INTR: Out << "msp430_intrcc"; break;		case CallingConv::MSP430_INTR: Out << "msp430_intrcc"; break;
case CallingConv::AVR_INTR: Out << "avr_intrcc "; break;		case CallingConv::AVR_INTR: Out << "avr_intrcc "; break;
case CallingConv::AVR_SIGNAL: Out << "avr_signalcc "; break;		case CallingConv::AVR_SIGNAL: Out << "avr_signalcc "; break;
case CallingConv::PTX_Kernel: Out << "ptx_kernel"; break;		case CallingConv::PTX_Kernel: Out << "ptx_kernel"; break;
case CallingConv::PTX_Device: Out << "ptx_device"; break;		case CallingConv::PTX_Device: Out << "ptx_device"; break;
▲ Show 20 Lines • Show All 3,848 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 2,085 Lines • ▼ Show 20 Lines	void Verifier::visitFunction(const Function &F) {
case CallingConv::AMDGPU_PS:		case CallingConv::AMDGPU_PS:
case CallingConv::AMDGPU_CS:		case CallingConv::AMDGPU_CS:
Assert(!F.hasStructRetAttr(),		Assert(!F.hasStructRetAttr(),
"Calling convention does not allow sret", &F);		"Calling convention does not allow sret", &F);
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case CallingConv::Fast:		case CallingConv::Fast:
case CallingConv::Cold:		case CallingConv::Cold:
case CallingConv::Intel_OCL_BI:		case CallingConv::Intel_OCL_BI:
		case CallingConv::Intel_SVML:
case CallingConv::PTX_Kernel:		case CallingConv::PTX_Kernel:
case CallingConv::PTX_Device:		case CallingConv::PTX_Device:
Assert(!F.isVarArg(), "Calling convention does not support varargs or "		Assert(!F.isVarArg(), "Calling convention does not support varargs or "
"perfect forwarding!",		"perfect forwarding!",
&F);		&F);
break;		break;
}		}

▲ Show 20 Lines • Show All 3,005 Lines • Show Last 20 Lines

lib/Target/X86/X86CallingConv.td

Show First 20 Lines • Show All 470 Lines • ▼ Show 20 Lines	def RetCC_X86_64 : CallingConv<[

// Mingw64 and native Win64 use Win64 CC		// Mingw64 and native Win64 use Win64 CC
CCIfSubtarget<"isTargetWin64()", CCDelegateTo<RetCC_X86_Win64_C>>,		CCIfSubtarget<"isTargetWin64()", CCDelegateTo<RetCC_X86_Win64_C>>,

// Otherwise, drop to normal X86-64 CC		// Otherwise, drop to normal X86-64 CC
CCDelegateTo<RetCC_X86_64_C>		CCDelegateTo<RetCC_X86_64_C>
]>;		]>;

		// Intel_SVML return-value convention.
		def RetCC_Intel_SVML : CallingConv<[
		// Vector types are returned in XMM0,XMM1
		CCIfType<[v4f32, v2f64],
		CCAssignToReg<[XMM0,XMM1]>>,

		// 256-bit FP vectors
		CCIfType<[v8f32, v4f64],
		CCAssignToReg<[YMM0,YMM1]>>,

		// 512-bit FP vectors
		CCIfType<[v16f32, v8f64],
		CCAssignToReg<[ZMM0,ZMM1]>>
		]>;

// This is the return-value convention used for the entire X86 backend.		// This is the return-value convention used for the entire X86 backend.
def RetCC_X86 : CallingConv<[		def RetCC_X86 : CallingConv<[

// Check if this is the Intel OpenCL built-ins calling convention		// Check if this is the Intel OpenCL built-ins calling convention
CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo<RetCC_Intel_OCL_BI>>,		CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo<RetCC_Intel_OCL_BI>>,

		CCIfCC<"CallingConv::Intel_SVML", CCDelegateTo<RetCC_Intel_SVML>>,

CCIfSubtarget<"is64Bit()", CCDelegateTo<RetCC_X86_64>>,		CCIfSubtarget<"is64Bit()", CCDelegateTo<RetCC_X86_64>>,
CCDelegateTo<RetCC_X86_32>		CCDelegateTo<RetCC_X86_32>
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// X86-64 Argument Calling Conventions		// X86-64 Argument Calling Conventions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	def CC_Intel_OCL_BI : CallingConv<[
// Pass masks in mask registers		// Pass masks in mask registers
CCIfType<[v16i1, v8i1], CCAssignToReg<[K1]>>,		CCIfType<[v16i1, v8i1], CCAssignToReg<[K1]>>,

CCIfSubtarget<"isTargetWin64()", CCDelegateTo<CC_X86_Win64_C>>,		CCIfSubtarget<"isTargetWin64()", CCDelegateTo<CC_X86_Win64_C>>,
CCIfSubtarget<"is64Bit()", CCDelegateTo<CC_X86_64_C>>,		CCIfSubtarget<"is64Bit()", CCDelegateTo<CC_X86_64_C>>,
CCDelegateTo<CC_X86_32_C>		CCDelegateTo<CC_X86_32_C>
]>;		]>;

		// X86-64 Intel Short Vector Math Library calling convention.
		def CC_Intel_SVML : CallingConv<[

		// The SSE vector arguments are passed in XMM registers.
		CCIfType<[v4f32, v2f64],
		CCAssignToReg<[XMM0, XMM1, XMM2]>>,

		// The 256-bit vector arguments are passed in YMM registers.
		CCIfType<[v8f32, v4f64],
		CCAssignToReg<[YMM0, YMM1, YMM2]>>,

		// The 512-bit vector arguments are passed in ZMM registers.
		CCIfType<[v16f32, v8f64],
		CCAssignToReg<[ZMM0, ZMM1, ZMM2]>>
		]>;

def CC_X86_32_Intr : CallingConv<[		def CC_X86_32_Intr : CallingConv<[
CCAssignToStack<4, 4>		CCAssignToStack<4, 4>
]>;		]>;

def CC_X86_64_Intr : CallingConv<[		def CC_X86_64_Intr : CallingConv<[
CCAssignToStack<8, 8>		CCAssignToStack<8, 8>
]>;		]>;

Show All 40 Lines	def CC_X86_64 : CallingConv<[

// Otherwise, drop to normal X86-64 CC		// Otherwise, drop to normal X86-64 CC
CCDelegateTo<CC_X86_64_C>		CCDelegateTo<CC_X86_64_C>
]>;		]>;

// This is the argument convention used for the entire X86 backend.		// This is the argument convention used for the entire X86 backend.
def CC_X86 : CallingConv<[		def CC_X86 : CallingConv<[
CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo<CC_Intel_OCL_BI>>,		CCIfCC<"CallingConv::Intel_OCL_BI", CCDelegateTo<CC_Intel_OCL_BI>>,
		CCIfCC<"CallingConv::Intel_SVML", CCDelegateTo<CC_Intel_SVML>>,
CCIfSubtarget<"is64Bit()", CCDelegateTo<CC_X86_64>>,		CCIfSubtarget<"is64Bit()", CCDelegateTo<CC_X86_64>>,
CCDelegateTo<CC_X86_32>		CCDelegateTo<CC_X86_32>
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Callee-saved Registers.		// Callee-saved Registers.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
def CSR_Win64_RegCall_NoSSE : CalleeSavedRegs<(add RBX, RBP, RSP,		def CSR_Win64_RegCall_NoSSE : CalleeSavedRegs<(add RBX, RBP, RSP,
(sequence "R%u", 10, 15))>;		(sequence "R%u", 10, 15))>;
def CSR_Win64_RegCall : CalleeSavedRegs<(add CSR_Win64_RegCall_NoSSE,		def CSR_Win64_RegCall : CalleeSavedRegs<(add CSR_Win64_RegCall_NoSSE,
(sequence "XMM%u", 8, 15))>;		(sequence "XMM%u", 8, 15))>;
def CSR_SysV64_RegCall_NoSSE : CalleeSavedRegs<(add RBX, RBP, RSP,		def CSR_SysV64_RegCall_NoSSE : CalleeSavedRegs<(add RBX, RBP, RSP,
(sequence "R%u", 12, 15))>;		(sequence "R%u", 12, 15))>;
def CSR_SysV64_RegCall : CalleeSavedRegs<(add CSR_SysV64_RegCall_NoSSE,		def CSR_SysV64_RegCall : CalleeSavedRegs<(add CSR_SysV64_RegCall_NoSSE,
(sequence "XMM%u", 8, 15))>;		(sequence "XMM%u", 8, 15))>;

		// SVML calling convention
		def CSR_32_Intel_SVML : CalleeSavedRegs<(add CSR_32_RegCall_NoSSE)>;
		def CSR_32_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_32_Intel_SVML,
		K4, K5, K6, K7)>;

		def CSR_64_Intel_SVML_NoSSE : CalleeSavedRegs<(add RBX, RSI, RDI, RBP, RSP, R12, R13, R14, R15)>;

		def CSR_64_Intel_SVML : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
		(sequence "XMM%u", 8, 15))>;
		def CSR_Win64_Intel_SVML : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
		(sequence "XMM%u", 6, 15))>;

		def CSR_64_Intel_SVML_AVX : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
		(sequence "YMM%u", 8, 15))>;
		def CSR_Win64_Intel_SVML_AVX : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
		(sequence "YMM%u", 6, 15))>;

		def CSR_64_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
		(sequence "ZMM%u", 16, 31),
		K4, K5, K6, K7)>;
		def CSR_Win64_Intel_SVML_AVX512 : CalleeSavedRegs<(add CSR_64_Intel_SVML_NoSSE,
		(sequence "ZMM%u", 6, 21),
		K4, K5, K6, K7)>;

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,252 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerFormalArguments(
}		}

if (isVarArg && MFI.hasMustTailInVarArgFunc()) {		if (isVarArg && MFI.hasMustTailInVarArgFunc()) {
// Find the largest legal vector type.		// Find the largest legal vector type.
MVT VecVT = MVT::Other;		MVT VecVT = MVT::Other;
// FIXME: Only some x86_32 calling conventions support AVX512.		// FIXME: Only some x86_32 calling conventions support AVX512.
if (Subtarget.hasAVX512() &&		if (Subtarget.hasAVX512() &&
(Is64Bit \|\| (CallConv == CallingConv::X86_VectorCall \|\|		(Is64Bit \|\| (CallConv == CallingConv::X86_VectorCall \|\|
CallConv == CallingConv::Intel_OCL_BI)))		CallConv == CallingConv::Intel_OCL_BI \|\|
		CallConv == CallingConv::Intel_SVML)))
VecVT = MVT::v16f32;		VecVT = MVT::v16f32;
else if (Subtarget.hasAVX())		else if (Subtarget.hasAVX())
VecVT = MVT::v8f32;		VecVT = MVT::v8f32;
else if (Subtarget.hasSSE2())		else if (Subtarget.hasSSE2())
VecVT = MVT::v4f32;		VecVT = MVT::v4f32;

// We forward some GPRs and some vector types.		// We forward some GPRs and some vector types.
SmallVector<MVT, 2> RegParmTypes;		SmallVector<MVT, 2> RegParmTypes;
▲ Show 20 Lines • Show All 37,126 Lines • Show Last 20 Lines

lib/Target/X86/X86RegisterInfo.cpp

Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	case CallingConv::Intel_OCL_BI: {
if (HasAVX && IsWin64)		if (HasAVX && IsWin64)
return CSR_Win64_Intel_OCL_BI_AVX_SaveList;		return CSR_Win64_Intel_OCL_BI_AVX_SaveList;
if (HasAVX && Is64Bit)		if (HasAVX && Is64Bit)
return CSR_64_Intel_OCL_BI_AVX_SaveList;		return CSR_64_Intel_OCL_BI_AVX_SaveList;
if (!HasAVX && !IsWin64 && Is64Bit)		if (!HasAVX && !IsWin64 && Is64Bit)
return CSR_64_Intel_OCL_BI_SaveList;		return CSR_64_Intel_OCL_BI_SaveList;
break;		break;
}		}
		case CallingConv::Intel_SVML: {
		if (Is64Bit) {
		if (HasAVX512)
		return IsWin64 ? CSR_Win64_Intel_SVML_AVX512_SaveList :
		CSR_64_Intel_SVML_AVX512_SaveList;
		if (HasAVX)
		return IsWin64 ? CSR_Win64_Intel_SVML_AVX_SaveList :
		CSR_64_Intel_SVML_AVX_SaveList;

		return IsWin64 ? CSR_Win64_Intel_SVML_SaveList :
		CSR_64_Intel_SVML_SaveList;
		} else { // Is32Bit
		if (HasAVX512)
		return CSR_32_Intel_SVML_AVX512_SaveList;
		return CSR_32_Intel_SVML_SaveList;
		}
		}
case CallingConv::HHVM:		case CallingConv::HHVM:
return CSR_64_HHVM_SaveList;		return CSR_64_HHVM_SaveList;
case CallingConv::X86_RegCall:		case CallingConv::X86_RegCall:
if (Is64Bit) {		if (Is64Bit) {
if (IsWin64) {		if (IsWin64) {
return (HasSSE ? CSR_Win64_RegCall_SaveList :		return (HasSSE ? CSR_Win64_RegCall_SaveList :
CSR_Win64_RegCall_NoSSE_SaveList);		CSR_Win64_RegCall_NoSSE_SaveList);
} else {		} else {
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	case CallingConv::Intel_OCL_BI: {
if (HasAVX && IsWin64)		if (HasAVX && IsWin64)
return CSR_Win64_Intel_OCL_BI_AVX_RegMask;		return CSR_Win64_Intel_OCL_BI_AVX_RegMask;
if (HasAVX && Is64Bit)		if (HasAVX && Is64Bit)
return CSR_64_Intel_OCL_BI_AVX_RegMask;		return CSR_64_Intel_OCL_BI_AVX_RegMask;
if (!HasAVX && !IsWin64 && Is64Bit)		if (!HasAVX && !IsWin64 && Is64Bit)
return CSR_64_Intel_OCL_BI_RegMask;		return CSR_64_Intel_OCL_BI_RegMask;
break;		break;
}		}
		case CallingConv::Intel_SVML: {
		if (Is64Bit) {
		if (HasAVX512)
		return IsWin64 ? CSR_Win64_Intel_SVML_AVX512_RegMask :
		CSR_64_Intel_SVML_AVX512_RegMask;
		if (HasAVX)
		return IsWin64 ? CSR_Win64_Intel_SVML_AVX_RegMask :
		CSR_64_Intel_SVML_AVX_RegMask;

		return IsWin64 ? CSR_Win64_Intel_SVML_RegMask :
		CSR_64_Intel_SVML_RegMask;
		} else { // Is32Bit
		if (HasAVX512)
		return CSR_32_Intel_SVML_AVX512_RegMask;
		return CSR_32_Intel_SVML_RegMask;
		}
		}
case CallingConv::HHVM:		case CallingConv::HHVM:
return CSR_64_HHVM_RegMask;		return CSR_64_HHVM_RegMask;
case CallingConv::X86_RegCall:		case CallingConv::X86_RegCall:
if (Is64Bit) {		if (Is64Bit) {
if (IsWin64) {		if (IsWin64) {
return (HasSSE ? CSR_Win64_RegCall_RegMask :		return (HasSSE ? CSR_Win64_RegCall_RegMask :
CSR_Win64_RegCall_NoSSE_RegMask);		CSR_Win64_RegCall_NoSSE_RegMask);
} else {		} else {
▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 762 Lines • ▼ Show 20 Lines	bool isCallingConvWin64(CallingConv::ID CC) const {
case CallingConv::C:		case CallingConv::C:
case CallingConv::Fast:		case CallingConv::Fast:
case CallingConv::Swift:		case CallingConv::Swift:
case CallingConv::X86_FastCall:		case CallingConv::X86_FastCall:
case CallingConv::X86_StdCall:		case CallingConv::X86_StdCall:
case CallingConv::X86_ThisCall:		case CallingConv::X86_ThisCall:
case CallingConv::X86_VectorCall:		case CallingConv::X86_VectorCall:
case CallingConv::Intel_OCL_BI:		case CallingConv::Intel_OCL_BI:
		case CallingConv::Intel_SVML:
return isTargetWin64();		return isTargetWin64();
// This convention allows using the Win64 convention on other targets.		// This convention allows using the Win64 convention on other targets.
case CallingConv::Win64:		case CallingConv::Win64:
return true;		return true;
// This convention allows using the SysV convention on Windows targets.		// This convention allows using the SysV convention on Windows targets.
case CallingConv::X86_64_SysV:		case CallingConv::X86_64_SysV:
return false;		return false;
// Otherwise, who knows what this is.		// Otherwise, who knows what this is.
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,095 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
// Some intrinsics have a scalar argument - don't replace it with a		// Some intrinsics have a scalar argument - don't replace it with a
// vector.		// vector.
if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i))		if (!UseVectorIntrinsic \|\| !hasVectorInstrinsicScalarOpd(ID, i))
Arg = getOrCreateVectorValue(CI->getArgOperand(i), Part);		Arg = getOrCreateVectorValue(CI->getArgOperand(i), Part);
Args.push_back(Arg);		Args.push_back(Arg);
}		}

Function *VectorF;		Function *VectorF;
		bool FromSVML = false;
if (UseVectorIntrinsic) {		if (UseVectorIntrinsic) {
// Use vector version of the intrinsic.		// Use vector version of the intrinsic.
Type *TysForDecl[] = {CI->getType()};		Type *TysForDecl[] = {CI->getType()};
if (VF > 1)		if (VF > 1)
TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);		TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);
VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);		VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);
} else {		} else {
// Use vector version of the library call.		// Use vector version of the library call.
StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);		bool IsFast = CI->getFastMathFlags().isFast();
		std::string VFnName = TLI->getVectorizedFunction(FnName, VF, FromSVML, IsFast);
assert(!VFnName.empty() && "Vector function name is empty.");		assert(!VFnName.empty() && "Vector function name is empty.");
VectorF = M->getFunction(VFnName);		VectorF = M->getFunction(VFnName);
if (!VectorF) {		if (!VectorF) {
// Generate a declaration		// Generate a declaration
FunctionType *FTy = FunctionType::get(RetTy, Tys, false);		FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
VectorF =		VectorF =
Function::Create(FTy, Function::ExternalLinkage, VFnName, M);		Function::Create(FTy, Function::ExternalLinkage, VFnName, M);
VectorF->copyAttributesFrom(F);		VectorF->copyAttributesFrom(F);
}		}
}		}
assert(VectorF && "Can't create vector function.");		assert(VectorF && "Can't create vector function.");

SmallVector<OperandBundleDef, 1> OpBundles;		SmallVector<OperandBundleDef, 1> OpBundles;
CI->getOperandBundlesAsDefs(OpBundles);		CI->getOperandBundlesAsDefs(OpBundles);
CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);		CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);

if (isa<FPMathOperator>(V))		if (isa<FPMathOperator>(V))
V->copyFastMathFlags(CI);		V->copyFastMathFlags(CI);
		if (FromSVML) V->setCallingConv(CallingConv::Intel_SVML);
VectorLoopValueMap.setVectorValue(&I, Part, V);		VectorLoopValueMap.setVectorValue(&I, Part, V);
addMetadata(V, &I);		addMetadata(V, &I);
}		}

break;		break;
}		}

default:		default:
▲ Show 20 Lines • Show All 3,523 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/svml-calls.ll

	Show All 25 Lines
	declare double @log(double) #0			declare double @log(double) #0
	declare float @logf(float) #0			declare float @logf(float) #0
	declare double @llvm.log.f64(double) #0			declare double @llvm.log.f64(double) #0
	declare float @llvm.log.f32(float) #0			declare float @llvm.log.f32(float) #0


	define void @sin_f64(double* nocapture %varray) {			define void @sin_f64(double* nocapture %varray) {
	; CHECK-LABEL: @sin_f64(			; CHECK-LABEL: @sin_f64(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_sin4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @sin(double %conv)			%call = tail call double @sin(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @sin_f32(float* nocapture %varray) {			define void @sin_f32(float* nocapture %varray) {
	; CHECK-LABEL: @sin_f32(			; CHECK-LABEL: @sin_f32(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_sinf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_sinf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @sinf(float %conv)			%call = tail call float @sinf(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @sin_f64_intrinsic(double* nocapture %varray) {			define void @sin_f64_intrinsic(double* nocapture %varray) {
	; CHECK-LABEL: @sin_f64_intrinsic(			; CHECK-LABEL: @sin_f64_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_sin4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_sin4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @llvm.sin.f64(double %conv)			%call = tail call double @llvm.sin.f64(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @sin_f32_intrinsic(float* nocapture %varray) {			define void @sin_f32_intrinsic(float* nocapture %varray) {
	; CHECK-LABEL: @sin_f32_intrinsic(			; CHECK-LABEL: @sin_f32_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_sinf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_sinf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @llvm.sin.f32(float %conv)			%call = tail call float @llvm.sin.f32(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @cos_f64(double* nocapture %varray) {			define void @cos_f64(double* nocapture %varray) {
	; CHECK-LABEL: @cos_f64(			; CHECK-LABEL: @cos_f64(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_cos4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @cos(double %conv)			%call = tail call double @cos(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @cos_f32(float* nocapture %varray) {			define void @cos_f32(float* nocapture %varray) {
	; CHECK-LABEL: @cos_f32(			; CHECK-LABEL: @cos_f32(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_cosf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_cosf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @cosf(float %conv)			%call = tail call float @cosf(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @cos_f64_intrinsic(double* nocapture %varray) {			define void @cos_f64_intrinsic(double* nocapture %varray) {
	; CHECK-LABEL: @cos_f64_intrinsic(			; CHECK-LABEL: @cos_f64_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_cos4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_cos4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @llvm.cos.f64(double %conv)			%call = tail call double @llvm.cos.f64(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @cos_f32_intrinsic(float* nocapture %varray) {			define void @cos_f32_intrinsic(float* nocapture %varray) {
	; CHECK-LABEL: @cos_f32_intrinsic(			; CHECK-LABEL: @cos_f32_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_cosf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_cosf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @llvm.cos.f32(float %conv)			%call = tail call float @llvm.cos.f32(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {			define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
	; CHECK-LABEL: @pow_f64(			; CHECK-LABEL: @pow_f64(
	; CHECK: [[TMP8:%.]] = call <4 x double> @__svml_pow4(<4 x double> [[TMP4:%.]], <4 x double> [[WIDE_LOAD:%.*]])			; CHECK: [[TMP8:%.]] = call intel_svmlcc <4 x double> @__svml_pow4_ha(<4 x double> [[TMP4:%.]], <4 x double> [[WIDE_LOAD:%.*]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%arrayidx = getelementptr inbounds double, double* %exp, i64 %iv			%arrayidx = getelementptr inbounds double, double* %exp, i64 %iv
	%tmp1 = load double, double* %arrayidx, align 4			%tmp1 = load double, double* %arrayidx, align 4
	%tmp2 = tail call double @pow(double %conv, double %tmp1)			%tmp2 = tail call double @pow(double %conv, double %tmp1)
	%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv
	store double %tmp2, double* %arrayidx2, align 4			store double %tmp2, double* %arrayidx2, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @pow_f64_intrinsic(double* nocapture %varray, double* nocapture readonly %exp) {			define void @pow_f64_intrinsic(double* nocapture %varray, double* nocapture readonly %exp) {
	; CHECK-LABEL: @pow_f64_intrinsic(			; CHECK-LABEL: @pow_f64_intrinsic(
	; CHECK: [[TMP8:%.]] = call <4 x double> @__svml_pow4(<4 x double> [[TMP4:%.]], <4 x double> [[WIDE_LOAD:%.*]])			; CHECK: [[TMP8:%.]] = call intel_svmlcc <4 x double> @__svml_pow4_ha(<4 x double> [[TMP4:%.]], <4 x double> [[WIDE_LOAD:%.*]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%arrayidx = getelementptr inbounds double, double* %exp, i64 %iv			%arrayidx = getelementptr inbounds double, double* %exp, i64 %iv
	%tmp1 = load double, double* %arrayidx, align 4			%tmp1 = load double, double* %arrayidx, align 4
	%tmp2 = tail call double @llvm.pow.f64(double %conv, double %tmp1)			%tmp2 = tail call double @llvm.pow.f64(double %conv, double %tmp1)
	%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %iv
	store double %tmp2, double* %arrayidx2, align 4			store double %tmp2, double* %arrayidx2, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {			define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
	; CHECK-LABEL: @pow_f32(			; CHECK-LABEL: @pow_f32(
	; CHECK: [[TMP8:%.]] = call <4 x float> @__svml_powf4(<4 x float> [[TMP4:%.]], <4 x float> [[WIDE_LOAD:%.*]])			; CHECK: [[TMP8:%.]] = call intel_svmlcc <4 x float> @__svml_powf4_ha(<4 x float> [[TMP4:%.]], <4 x float> [[WIDE_LOAD:%.*]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%arrayidx = getelementptr inbounds float, float* %exp, i64 %iv			%arrayidx = getelementptr inbounds float, float* %exp, i64 %iv
	%tmp1 = load float, float* %arrayidx, align 4			%tmp1 = load float, float* %arrayidx, align 4
	%tmp2 = tail call float @powf(float %conv, float %tmp1)			%tmp2 = tail call float @powf(float %conv, float %tmp1)
	%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv
	store float %tmp2, float* %arrayidx2, align 4			store float %tmp2, float* %arrayidx2, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) {			define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) {
	; CHECK-LABEL: @pow_f32_intrinsic(			; CHECK-LABEL: @pow_f32_intrinsic(
	; CHECK: [[TMP8:%.]] = call <4 x float> @__svml_powf4(<4 x float> [[TMP4:%.]], <4 x float> [[WIDE_LOAD:%.*]])			; CHECK: [[TMP8:%.]] = call intel_svmlcc <4 x float> @__svml_powf4_ha(<4 x float> [[TMP4:%.]], <4 x float> [[WIDE_LOAD:%.*]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%arrayidx = getelementptr inbounds float, float* %exp, i64 %iv			%arrayidx = getelementptr inbounds float, float* %exp, i64 %iv
	%tmp1 = load float, float* %arrayidx, align 4			%tmp1 = load float, float* %arrayidx, align 4
	%tmp2 = tail call float @llvm.pow.f32(float %conv, float %tmp1)			%tmp2 = tail call float @llvm.pow.f32(float %conv, float %tmp1)
	%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv
	store float %tmp2, float* %arrayidx2, align 4			store float %tmp2, float* %arrayidx2, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @exp_f64(double* nocapture %varray) {			define void @exp_f64(double* nocapture %varray) {
	; CHECK-LABEL: @exp_f64(			; CHECK-LABEL: @exp_f64(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_exp4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @exp(double %conv)			%call = tail call double @exp(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @exp_f32(float* nocapture %varray) {			define void @exp_f32(float* nocapture %varray) {
	; CHECK-LABEL: @exp_f32(			; CHECK-LABEL: @exp_f32(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_expf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_expf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @expf(float %conv)			%call = tail call float @expf(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @exp_f64_intrinsic(double* nocapture %varray) {			define void @exp_f64_intrinsic(double* nocapture %varray) {
	; CHECK-LABEL: @exp_f64_intrinsic(			; CHECK-LABEL: @exp_f64_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_exp4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_exp4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @llvm.exp.f64(double %conv)			%call = tail call double @llvm.exp.f64(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @exp_f32_intrinsic(float* nocapture %varray) {			define void @exp_f32_intrinsic(float* nocapture %varray) {
	; CHECK-LABEL: @exp_f32_intrinsic(			; CHECK-LABEL: @exp_f32_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_expf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_expf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @llvm.exp.f32(float %conv)			%call = tail call float @llvm.exp.f32(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @log_f64(double* nocapture %varray) {			define void @log_f64(double* nocapture %varray) {
	; CHECK-LABEL: @log_f64(			; CHECK-LABEL: @log_f64(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_log4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @log(double %conv)			%call = tail call double @log(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @log_f32(float* nocapture %varray) {			define void @log_f32(float* nocapture %varray) {
	; CHECK-LABEL: @log_f32(			; CHECK-LABEL: @log_f32(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_logf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_logf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @logf(float %conv)			%call = tail call float @logf(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @log_f64_intrinsic(double* nocapture %varray) {			define void @log_f64_intrinsic(double* nocapture %varray) {
	; CHECK-LABEL: @log_f64_intrinsic(			; CHECK-LABEL: @log_f64_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x double> @__svml_log4(<4 x double> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x double> @__svml_log4_ha(<4 x double> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to double			%conv = sitofp i32 %tmp to double
	%call = tail call double @llvm.log.f64(double %conv)			%call = tail call double @llvm.log.f64(double %conv)
	%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv			%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
	store double %call, double* %arrayidx, align 4			store double %call, double* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @log_f32_intrinsic(float* nocapture %varray) {			define void @log_f32_intrinsic(float* nocapture %varray) {
	; CHECK-LABEL: @log_f32_intrinsic(			; CHECK-LABEL: @log_f32_intrinsic(
	; CHECK: [[TMP5:%.]] = call <4 x float> @__svml_logf4(<4 x float> [[TMP4:%.]])			; CHECK: [[TMP5:%.]] = call intel_svmlcc <4 x float> @__svml_logf4_ha(<4 x float> [[TMP4:%.]])
	; CHECK: ret void			; CHECK: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%tmp = trunc i64 %iv to i32			%tmp = trunc i64 %iv to i32
	%conv = sitofp i32 %tmp to float			%conv = sitofp i32 %tmp to float
	%call = tail call float @llvm.log.f32(float %conv)			%call = tail call float @llvm.log.f32(float %conv)
	%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv			%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
	store float %call, float* %arrayidx, align 4			store float %call, float* %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, 1000			%exitcond = icmp eq i64 %iv.next, 1000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end:			for.end:
	ret void			ret void
	}			}

	attributes #0 = { nounwind readnone }			; CHECK-LABEL: @atan2_finite
				; CHECK: intel_svmlcc <8 x double> @__svml_atan28
				; CHECK: ret

				declare double @__atan2_finite(double, double) local_unnamed_addr #0

				define void @atan2_finite([100 x double]* nocapture %varray) local_unnamed_addr #0 {
				entry:
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc7, %entry
				%indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for.inc7 ]
				%0 = trunc i64 %indvars.iv19 to i32
				%conv = sitofp i32 %0 to double
				br label %for.body3

				for.body3: ; preds = %for.body3, %for.cond1.preheader
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%1 = trunc i64 %indvars.iv.next to i32
				%conv4 = sitofp i32 %1 to double
				%call = tail call fast double @__atan2_finite(double %conv, double %conv4)
				%arrayidx6 = getelementptr inbounds [100 x double], [100 x double]* %varray, i64 %indvars.iv19, i64 %indvars.iv
				store double %call, double* %arrayidx6, align 8
				%exitcond = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond, label %for.inc7, label %for.body3, !llvm.loop !5

				for.inc7: ; preds = %for.body3
				%indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
				%exitcond21 = icmp eq i64 %indvars.iv.next20, 100
				br i1 %exitcond21, label %for.end9, label %for.cond1.preheader

				for.end9: ; preds = %for.inc7
				ret void
				}

				attributes #0 = { nounwind readnone }
				!5 = distinct !{!5, !6, !7}
				!6 = !{!"llvm.loop.vectorize.width", i32 8}
				!7 = !{!"llvm.loop.vectorize.enable", i1 true}

utils/TableGen/CMakeLists.txt

Show All 32 Lines	add_tablegen(llvm-tblgen LLVM
PseudoLoweringEmitter.cpp		PseudoLoweringEmitter.cpp
RISCVCompressInstEmitter.cpp		RISCVCompressInstEmitter.cpp
RegisterBankEmitter.cpp		RegisterBankEmitter.cpp
RegisterInfoEmitter.cpp		RegisterInfoEmitter.cpp
SDNodeProperties.cpp		SDNodeProperties.cpp
SearchableTableEmitter.cpp		SearchableTableEmitter.cpp
SubtargetEmitter.cpp		SubtargetEmitter.cpp
SubtargetFeatureInfo.cpp		SubtargetFeatureInfo.cpp
		SVMLEmitter.cpp
TableGen.cpp		TableGen.cpp
Types.cpp		Types.cpp
X86DisassemblerTables.cpp		X86DisassemblerTables.cpp
X86EVEX2VEXTablesEmitter.cpp		X86EVEX2VEXTablesEmitter.cpp
X86FoldTablesEmitter.cpp		X86FoldTablesEmitter.cpp
X86ModRMFilters.cpp		X86ModRMFilters.cpp
X86RecognizableInstr.cpp		X86RecognizableInstr.cpp
WebAssemblyDisassemblerEmitter.cpp		WebAssemblyDisassemblerEmitter.cpp
CTagsEmitter.cpp		CTagsEmitter.cpp
)		)
set_target_properties(llvm-tblgen PROPERTIES FOLDER "Tablegenning")		set_target_properties(llvm-tblgen PROPERTIES FOLDER "Tablegenning")

utils/TableGen/SVMLEmitter.cpp

This file was added.

				//===------ SVMLEmitter.cpp - Generate SVML function variants -------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This tablegen backend emits the scalar to svml function map for TLI.
				//
				//===----------------------------------------------------------------------===//

				#include "CodeGenTarget.h"
				#include "llvm/Support/Format.h"
				#include "llvm/TableGen/Error.h"
				#include "llvm/TableGen/Record.h"
				#include "llvm/TableGen/TableGenBackend.h"
				#include <map>
				#include <vector>

				using namespace llvm;

				#define DEBUG_TYPE "SVMLVariants"
				#include "llvm/Support/Debug.h"

				namespace {

				class SVMLVariantsEmitter {

				RecordKeeper &Records;

				private:
				void emitSVMLVariants(raw_ostream &OS);

				public:
				SVMLVariantsEmitter(RecordKeeper &R) : Records(R) {}

				void run(raw_ostream &OS);
				};
				} // End anonymous namespace

				/// \brief Emit the set of SVML variant function names.
				// The default is to emit the high accuracy SVML variants until a mechanism is
				// introduced to allow a selection of different variants through precision
				// requirements specified by the user. This code generates mappings to svml
				// that are in the scalar form of llvm intrinsics, math library calls, or the
				// finite variants of math library calls.
				void SVMLVariantsEmitter::emitSVMLVariants(raw_ostream &OS) {

				const unsigned MinSinglePrecVL = 4;
				const unsigned MaxSinglePrecVL = 16;
				craig.topperUnsubmitted Not Done Reply Inline Actions Mark these 'const'? craig.topper: Mark these 'const'?
				const unsigned MinDoublePrecVL = 2;
				const unsigned MaxDoublePrecVL = 8;

				OS << "#ifdef GET_SVML_VARIANTS\n";

				craig.topperUnsubmitted Not Done Reply Inline Actions This looks to be a dead variable in release builds. craig.topper: This looks to be a dead variable in release builds.
				for (const auto &D : Records.getAllDerivedDefinitions("SvmlVariant")) {
				StringRef SvmlVariantNameStr = D->getName();
				// Single Precision SVML
				for (unsigned VL = MinSinglePrecVL; VL <= MaxSinglePrecVL; VL *= 2) {
				// Emit the scalar math library function to svml function entry.
				OS << "{\"" << SvmlVariantNameStr << "f" << "\", ";
				craig.topperUnsubmitted Not Done Reply Inline Actions I think this should be Records.getAllDerivedDefinitions("SvmlVariant") craig.topper: I think this should be Records.getAllDerivedDefinitions("SvmlVariant")
				OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", "
				lebedev.riUnsubmitted Not Done Reply Inline Actions This surely can be `StringRef` lebedev.ri: This surely can be `StringRef`
				<< VL << "},\n";

				// Emit the scalar intrinsic to svml function entry.
				OS << "{\"" << "llvm." << SvmlVariantNameStr << ".f32" << "\", ";
				OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", "
				<< VL << "},\n";

				// Emit the finite math library function to svml function entry.
				OS << "{\"__" << SvmlVariantNameStr << "f_finite" << "\", ";
				OS << "\"" << "__svml_" << SvmlVariantNameStr << "f" << VL << "\", "
				<< VL << "},\n";
				}

				// Double Precision SVML
				for (unsigned VL = MinDoublePrecVL; VL <= MaxDoublePrecVL; VL *= 2) {
				// Emit the scalar math library function to svml function entry.
				OS << "{\"" << SvmlVariantNameStr << "\", ";
				OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " << VL
				<< "},\n";

				// Emit the scalar intrinsic to svml function entry.
				OS << "{\"" << "llvm." << SvmlVariantNameStr << ".f64" << "\", ";
				OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", " << VL
				<< "},\n";

				// Emit the finite math library function to svml function entry.
				OS << "{\"__" << SvmlVariantNameStr << "_finite" << "\", ";
				OS << "\"" << "__svml_" << SvmlVariantNameStr << VL << "\", "
				<< VL << "},\n";
				}
				}

				OS << "#endif // GET_SVML_VARIANTS\n\n";
				}

				void SVMLVariantsEmitter::run(raw_ostream &OS) {
				emitSVMLVariants(OS);
				}

				namespace llvm {

				void EmitSVMLVariants(RecordKeeper &RK, raw_ostream &OS) {
				SVMLVariantsEmitter(RK).run(OS);
				}

				} // End llvm namespace

utils/TableGen/TableGen.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	enum ActionType {
GenOptParserDefs,		GenOptParserDefs,
GenCTags,		GenCTags,
GenAttributes,		GenAttributes,
GenSearchableTables,		GenSearchableTables,
GenGlobalISel,		GenGlobalISel,
GenX86EVEX2VEXTables,		GenX86EVEX2VEXTables,
GenX86FoldTables,		GenX86FoldTables,
GenRegisterBank,		GenRegisterBank,
		GenSVMLVariants,
};		};

namespace {		namespace {
cl::opt<ActionType>		cl::opt<ActionType>
Action(cl::desc("Action to perform:"),		Action(cl::desc("Action to perform:"),
cl::values(clEnumValN(PrintRecords, "print-records",		cl::values(clEnumValN(PrintRecords, "print-records",
"Print all records to stdout (default)"),		"Print all records to stdout (default)"),
clEnumValN(GenEmitter, "gen-emitter",		clEnumValN(GenEmitter, "gen-emitter",
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	Action(cl::desc("Action to perform:"),
"Generate generic binary-searchable table"),		"Generate generic binary-searchable table"),
clEnumValN(GenGlobalISel, "gen-global-isel",		clEnumValN(GenGlobalISel, "gen-global-isel",
"Generate GlobalISel selector"),		"Generate GlobalISel selector"),
clEnumValN(GenX86EVEX2VEXTables, "gen-x86-EVEX2VEX-tables",		clEnumValN(GenX86EVEX2VEXTables, "gen-x86-EVEX2VEX-tables",
"Generate X86 EVEX to VEX compress tables"),		"Generate X86 EVEX to VEX compress tables"),
clEnumValN(GenX86FoldTables, "gen-x86-fold-tables",		clEnumValN(GenX86FoldTables, "gen-x86-fold-tables",
"Generate X86 fold tables"),		"Generate X86 fold tables"),
clEnumValN(GenRegisterBank, "gen-register-bank",		clEnumValN(GenRegisterBank, "gen-register-bank",
"Generate registers bank descriptions")));		"Generate registers bank descriptions"),
		clEnumValN(GenSVMLVariants, "gen-svml",
		"Generate SVML variant function names")));

cl::OptionCategory PrintEnumsCat("Options for -print-enums");		cl::OptionCategory PrintEnumsCat("Options for -print-enums");
cl::opt<std::string>		cl::opt<std::string>
Class("class", cl::desc("Print Enum list for this class"),		Class("class", cl::desc("Print Enum list for this class"),
cl::value_desc("class name"), cl::cat(PrintEnumsCat));		cl::value_desc("class name"), cl::cat(PrintEnumsCat));

bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) {		bool LLVMTableGenMain(raw_ostream &OS, RecordKeeper &Records) {
switch (Action) {		switch (Action) {
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	case GenRegisterBank:
EmitRegisterBank(Records, OS);		EmitRegisterBank(Records, OS);
break;		break;
case GenX86EVEX2VEXTables:		case GenX86EVEX2VEXTables:
EmitX86EVEX2VEXTables(Records, OS);		EmitX86EVEX2VEXTables(Records, OS);
break;		break;
case GenX86FoldTables:		case GenX86FoldTables:
EmitX86FoldTables(Records, OS);		EmitX86FoldTables(Records, OS);
break;		break;
		case GenSVMLVariants:
		EmitSVMLVariants(Records, OS);
		break;
}		}

return false;		return false;
}		}
}		}

int main(int argc, char **argv) {		int main(int argc, char **argv) {
sys::PrintStackTraceOnErrorSignal(argv[0]);		sys::PrintStackTraceOnErrorSignal(argv[0]);
Show All 16 Lines

utils/TableGen/TableGenBackends.h

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	void EmitOptParser(RecordKeeper &RK, raw_ostream &OS);			void EmitOptParser(RecordKeeper &RK, raw_ostream &OS);
	void EmitCTags(RecordKeeper &RK, raw_ostream &OS);			void EmitCTags(RecordKeeper &RK, raw_ostream &OS);
	void EmitAttributes(RecordKeeper &RK, raw_ostream &OS);			void EmitAttributes(RecordKeeper &RK, raw_ostream &OS);
	void EmitSearchableTables(RecordKeeper &RK, raw_ostream &OS);			void EmitSearchableTables(RecordKeeper &RK, raw_ostream &OS);
	void EmitGlobalISel(RecordKeeper &RK, raw_ostream &OS);			void EmitGlobalISel(RecordKeeper &RK, raw_ostream &OS);
	void EmitX86EVEX2VEXTables(RecordKeeper &RK, raw_ostream &OS);			void EmitX86EVEX2VEXTables(RecordKeeper &RK, raw_ostream &OS);
	void EmitX86FoldTables(RecordKeeper &RK, raw_ostream &OS);			void EmitX86FoldTables(RecordKeeper &RK, raw_ostream &OS);
	void EmitRegisterBank(RecordKeeper &RK, raw_ostream &OS);			void EmitRegisterBank(RecordKeeper &RK, raw_ostream &OS);
				void EmitSVMLVariants(RecordKeeper &RK, raw_ostream &OS);

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

utils/vim/syntax/llvm.vim

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	syn keyword llvmKeyword
\ hhvmcc		\ hhvmcc
\ hhvm_ccc		\ hhvm_ccc
\ hidden		\ hidden
\ initialexec		\ initialexec
\ inlinehint		\ inlinehint
\ inreg		\ inreg
\ inteldialect		\ inteldialect
\ intel_ocl_bicc		\ intel_ocl_bicc
		\ intel_svmlcc
\ internal		\ internal
\ linkonce		\ linkonce
\ linkonce_odr		\ linkonce_odr
\ localdynamic		\ localdynamic
\ localexec		\ localexec
\ local_unnamed_addr		\ local_unnamed_addr
\ minsize		\ minsize
\ module		\ module
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Intel SVML calling conventionsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 150612

include/llvm/Analysis/TargetLibraryInfo.h

include/llvm/IR/CMakeLists.txt

include/llvm/IR/CallingConv.h

include/llvm/IR/SVML.td

lib/Analysis/CMakeLists.txt

lib/Analysis/TargetLibraryInfo.cpp

lib/AsmParser/LLLexer.cpp

lib/AsmParser/LLParser.cpp

lib/AsmParser/LLToken.h

lib/IR/AsmWriter.cpp

lib/IR/Verifier.cpp

lib/Target/X86/X86CallingConv.td

lib/Target/X86/X86ISelLowering.cpp

lib/Target/X86/X86RegisterInfo.cpp

lib/Target/X86/X86Subtarget.h

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/X86/svml-calls.ll

utils/TableGen/CMakeLists.txt

utils/TableGen/SVMLEmitter.cpp

utils/TableGen/TableGen.cpp

utils/TableGen/TableGenBackends.h

utils/vim/syntax/llvm.vim

Intel SVML calling conventions
Needs ReviewPublic