This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
CodeGenOptions.h
-
CodeGenOptions.def
-
Driver/
2
Options.td
-
lib/
-
CodeGen/
1
BackendUtil.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/Driver/
-
Driver/
-
autocomplete.c
-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
3
TargetLibraryInfo.h
-
VecFuncs.def
-
lib/Analysis/
-
Analysis/
-
TargetLibraryInfo.cpp
-
test/Transforms/
-
Transforms/
-
LoopVectorize/X86/
-
X86/
10
libm-vector-calls-finite.ll
3
libm-vector-calls.ll
-
Util/
-
add-TLI-mappings.ll

Differential D88154

Initial support for vectorization using Libmvec (GLIBC vector math library).
ClosedPublic

Authored by venkataramanan.kumar.llvm on Sep 23 2020, 8:02 AM.

Download Raw Diff

Details

Reviewers

Florian
abique
nemanjai
hfinkel
mmasten
mzolotukhin
rengolin
fpetrogalli
craig.topper
spatel

Commits

rG57cdc52c4df0: Initial support for vectorization using Libmvec (GLIBC vector math library)

Summary

Initial support for vectorization using Libmvec (GLIBC vector math library).

Diff Detail

Event Timeline

venkataramanan.kumar.llvm created this revision.Sep 23 2020, 8:02 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 23 2020, 8:02 AM

Herald added subscribers: llvm-commits, cfe-commits, dang and 2 others. · View Herald Transcript

venkataramanan.kumar.llvm requested review of this revision.Sep 23 2020, 8:02 AM

Initial version I supported the following vector functions (VF 2 and 4 ).

sin
cos
exp
pow
log

Also added test cases similar to SVML under X86. I am not sure about other targets.

steleman added a subscriber: steleman.Sep 23 2020, 8:15 AM

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

clang/include/clang/Driver/Options.td
1582	I think glibc always refer to the library as "libmvec" in lower case, should we do so here?

Harbormaster completed remote builds in B72659: Diff 293736.Sep 23 2020, 8:41 AM

jdoerfert added a subscriber: jdoerfert.Sep 23 2020, 8:56 AM

venkataramanan.kumar.llvm added reviewers: nemanjai, hfinkel, mmasten, mzolotukhin.Sep 24 2020, 5:50 AM

venkataramanan.kumar.llvm added a reviewer: rengolin.Sep 24 2020, 11:17 PM

rengolin added a reviewer: fpetrogalli.Sep 25 2020, 1:43 AM

rengolin added inline comments.Sep 25 2020, 1:47 AM

clang/include/clang/Driver/Options.td
1582	Yes, so clang can be a drop in replacement.

venkataramanan.kumar.llvm added a reviewer: craig.topper.Sep 29 2020, 6:21 AM

Selection of Glibc vector math library is enabled via the option -fvec-lib=libmvec .

Herald added a subscriber: pengfei. · View Herald TranscriptOct 4 2020, 3:01 AM

In D88154#2290205, @abique wrote:

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

m128d compute_sin(m128d x)
{

return __builtin_sin(x);

}

error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'

Pinging for review comments.

In D88154#2310653, @venkataramanan.kumar.llvm wrote:
In D88154#2290205, @abique wrote:

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

m128d compute_sin(m128d x)
{
return __builtin_sin(x);
}

error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'

We have LLVM intrinsics for sin/cos that may use vector types:
http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic
...but I don't know of a way to produce those directly from C source.

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
2	Why does this test file use command-line options to specify the vector factor and the other uses metadata? If we can use metadata, then can you vary it to get better coverage (for example <2 x double> or <8 x float>)?
222–225	It would be better to consistently put the FileCheck lines after the 'define'. Can you auto-generate the CHECK lines using llvm/utils/update_test_checks.py ?

fpetrogalli added inline comments.Oct 8 2020, 1:47 AM

llvm/include/llvm/Analysis/TargetLibraryInfo.h
91	Can we call this LIBMVEC-X86? Libmvec itself is supposed to support other architectures, I can see list of mappings for each of the supported targets. Then, the logic of selecting the correct one in the frontent clang would depend on the value of `-fvec-lib=libmvec` plus the value of `-target`.

nemanjai added inline comments.Oct 8 2020, 3:54 AM

llvm/include/llvm/Analysis/TargetLibraryInfo.h
91	So if I follow correctly, we can choose the various vendor-specific libraries as well as `libmvec` which itself has target-specific ports. Would it make sense to just add an overload of `addVectorizableFunctions()` that would consider the `Triple` and remove any entries from `VectorDescs` that the target doesn't support? Or even more specifically, simply add the `Triple` argument to `addVectorizableFunctionsFromVecLib()` and call something like `removeLIBMVECEntriesForTarget(const Triple &T)` that would do the job. And of course, if the triple isn't provided and the user is targeting an architecture that doesn't provide some entry, that is just user error.

fpetrogalli added inline comments.Oct 8 2020, 7:16 AM

llvm/include/llvm/Analysis/TargetLibraryInfo.h
91	The overload of the `addVectorizableFunctions()` might be feasible, but for the sake of simplicity I think that having `LIBMVEC_<TARGET>` in `enum VectorLibrary` for each of the <TARGET> to support would avoid having to deal with overload of methods. Given that these lists are static, I'd rather see them explicitly instead of having them filled up by add/remove methods. All in all, I think it is easier to add the logic for the target triple in clang as it is just a matter of modifying the changes in `BackendUtils.cpp` (warning, pseudocode ahead): case CodeGenOptions::LIBMVEC: switch(Triple) { case X: TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_X); break case Y: TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_Y); break; case ... } break;

In D88154#2314352, @spatel wrote:
In D88154#2310653, @venkataramanan.kumar.llvm wrote:
In D88154#2290205, @abique wrote:

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

m128d compute_sin(m128d x)
{
return __builtin_sin(x);
}

error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'
We have LLVM intrinsics for sin/cos that may use vector types:
http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic
...but I don't know of a way to produce those directly from C source.

Ok I see intrinsic for fp128 type in LangRef.

---Snip--
declare fp128 @llvm.cos.f128(fp128)

define fp128 @test_cos(float %float, double %double, fp128 %fp128) {

%cosfp128 = call fp128 @llvm.cos.f128(fp128 %fp128)
ret fp128 %cosfp128

}
--Snip--

f128 is treated a long double I see call to cosl.

vmovaps %xmm2, %xmm0
 jmp     cosl

so I am not sure how to generate vector calls via built-ins.

Changed library naming to LIBMVEC-X86 as per comments and also selected based on Target Tripple in clang.
I am still working on auto generating FileCheck for the test cases.

As per review comments from Sanjay, updated the test case to use metadata. Also autogenerated the checks in the test cases using llvm/utils/update_test_checks.py.

In D88154#2325713, @venkataramanan.kumar.llvm wrote:

As per review comments from Sanjay, updated the test case to use metadata. Also autogenerated the checks in the test cases using llvm/utils/update_test_checks.py.

Thanks. But wouldn't it be better test coverage to vary the "llvm.loop.vectorize.width". Why is it always "4"?
I don't have any experience with this lib, so no real feedback on the code itself. Hopefully another reviewer can approve if there are no other concerns.

Hi,

Thank you for modifying the implementation to facilitate the extension to other targets.

Please add libmvec-specific tests is clang/Driver/fveclib.c, and simplify the loop-vectorize tests.

Other than that, I only have minor comments.

Francesco

clang/lib/CodeGen/BackendUtil.cpp
377	Nit: is this misaligned?
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
2	`-inject-tli-mappings` is not required here, as a pass itself is required by the loop vectorizer.
6	Nit: it is standard practice to put all declarations at the end of the file.
9	I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping. You are not checking for the whole auto-vectorization process here, just the vectorization of the function call. Same for all the tests for this patch in which you are doing something similar to this one test. ; CHECK-LABEL: @exp_f32( ; CHECK-LABEL: vector.body: ; CHECK: call fast <4 x float> @_ZGVbN4v___expf_finite(<4 x float>

spatel added inline comments.Oct 13 2020, 6:14 AM

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
9	I requested using "utils/update_test_checks.py" to auto-generate the assertions consistently. We have standardized on this practice for tests in several passes because it provides extra test coverage (at the risk of over-testing), and it makes updating tests in the future nearly automatic. The time cost of checking the extra lines is negligible vs. the benefit that we have gotten in finding/avoiding bugs. If the consensus is that it's not worth it on this particular file, I'm ok with that. But the general trend is definitely towards auto-generating full checks.
69	The script should have warned you about using variables named "tmp". Independent of whether we choose to use the scripted assertions or not, you should change this value name (even plain "t" for "trunc" is an improvement over "tmp").

fhahn added a subscriber: fhahn.Oct 13 2020, 6:35 AM

fhahn added inline comments.

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
2	I guess it still doesn't hurt to be explicit. Also, can you add a line for the new pass manager?
9	FWIW in this case I would also slightly prefer to have more targeted test lines than auto-generating them (same for most loop-vectorize tests). The tests are large and LV generates a lot of code, and a lot of the code is completely unrelated/uninteresting to the code in the patch. IMO it would be enough to check the arguments of the vector function calls, together with the calls and maybe the induction variable. The auto-generated check lines make things much more brittle and unrelated changes lead to us requiring to update lots of tests. And I am not sure if it is feasible to audit all details of the generated check lines (in the current patch ~500-800 new CHECK lines). So to me it seems like auto-generating checks here gives a false sense of security and make things harder down the line.
82	I don't think we need this. You can just pass `-force-vector-width=4` to the command line and avoid the extra metadata duplicated for each test

fhahn added inline comments.Oct 13 2020, 6:39 AM

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
2	FWIW most LV tests use `-force-vector-width`. If we decide to just go for the 'essential' check lines, it should be easy to add multiple run lines with difference VFs from the command line?

fhahn added inline comments.Oct 13 2020, 6:40 AM

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
82	Apologies, I missed that this was suggested earlier to test different VFs. If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using `force-vector-width`.

Updated the patch as per review comments received.

The test cases are updated with the checks based on the below comment from Francesco.
---Snip--
I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping.
---Snip--
Florian also suggesting the same.

Another comment from Florian.
---Snip--
If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using force-vector-width.
--Snip--

I still use metadata suggested by Sanjay. It is because I am currently testing only VF=4 .

We have both float and double type lib calls in the test case and libmvec has vector call support for VF=4 . For VF say 8 there is no vector call support for double types.

I can add few more float type tests with meta data for VF=8. please let me know your suggestions.

LGTM from the perspective of making sure that this solution can be extended to any of the architectures that libmvec supports.

Thank you.

In D88154#2328312, @venkataramanan.kumar.llvm wrote:

I can add few more float type tests with meta data for VF=8. please let me know your suggestions.

I may be missing some subtlety of the vectorizer behavior. Can we vary the test types + metadata in 1 file, so that there is coverage for something like this v2f64 call : TLI_DEFINE_VECFUNC("llvm.sin.f64", "_ZGVbN2v_sin", 2)?
I'm just trying to make sure we don't fall into some blind-spot by only testing VF=4.

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
2	We need to be explicit about that pass with new-pass-manager as shown here: df5576a cc @aeubanks as I'm not sure if we want to update tests with NPM RUN lines or if we want to silently transition whenever the default gets changed.

Added a test case for testing vector library calls for VF=2 and VF=8.

Herald added a subscriber: dexonsmith. · View Herald TranscriptOct 20 2020, 12:18 AM

Remove an incorrect file that got attached with my earlier patch.

LGTM.
I'm not sure why we need 3 different test files to test the very similar patterns - just programmer preference?
Wait a day or so to commit in case anyone else has comments.

This revision is now accepted and ready to land.Oct 20 2020, 9:03 AM

Thanks @spatel , @Florian and @fpetrogalli for the review comments and approval. Can someone please commit it on my behalf.

Closed by commit rG57cdc52c4df0: Initial support for vectorization using Libmvec (GLIBC vector math library) (authored by venkataramanan.kumar.llvm, committed by spatel). · Explain WhyOct 22 2020, 1:10 PM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG57cdc52c4df0: Initial support for vectorization using Libmvec (GLIBC vector math library).

tim.schmielau mentioned this in D116879: [llvm] Allow forced auto-vectorization of sincos() using libmvec.Jan 9 2022, 12:34 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

CodeGenOptions.h

2 lines

CodeGenOptions.def

2 lines

Driver/

Options.td

2 lines

lib/

CodeGen/

BackendUtil.cpp

10 lines

Frontend/

CompilerInvocation.cpp

2 lines

test/

Driver/

autocomplete.c

1 line

llvm/

include/

llvm/

Analysis/

TargetLibraryInfo.h

1 line

VecFuncs.def

82 lines

lib/

Analysis/

TargetLibraryInfo.cpp

10 lines

test/

Transforms/

LoopVectorize/

X86/

libm-vector-calls-finite.ll

484 lines

libm-vector-calls.ll

1010 lines

Util/

add-TLI-mappings.ll

12 lines

Diff 297654

clang/include/clang/Basic/CodeGenOptions.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	enum InliningMethod {
NormalInlining, // Use the standard function inlining pass.		NormalInlining, // Use the standard function inlining pass.
OnlyHintInlining, // Inline only (implicitly) hinted functions.		OnlyHintInlining, // Inline only (implicitly) hinted functions.
OnlyAlwaysInlining // Only run the always inlining pass.		OnlyAlwaysInlining // Only run the always inlining pass.
};		};

enum VectorLibrary {		enum VectorLibrary {
NoLibrary, // Don't use any vector library.		NoLibrary, // Don't use any vector library.
Accelerate, // Use the Accelerate framework.		Accelerate, // Use the Accelerate framework.
		LIBMVEC, // GLIBC vector math library.
MASSV, // IBM MASS vector library.		MASSV, // IBM MASS vector library.
SVML // Intel short vector math library.		SVML // Intel short vector math library.
};		};


enum ObjCDispatchMethodKind {		enum ObjCDispatchMethodKind {
Legacy = 0,		Legacy = 0,
NonLegacy = 1,		NonLegacy = 1,
Mixed = 2		Mixed = 2
};		};

enum TLSModel {		enum TLSModel {
GeneralDynamicTLSModel,		GeneralDynamicTLSModel,
▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines

clang/include/clang/Basic/CodeGenOptions.def

	Show First 20 Lines • Show All 343 Lines • ▼ Show 20 Lines

	/// Whether to emit the .debug$H section containing hashes of CodeView types.			/// Whether to emit the .debug$H section containing hashes of CodeView types.
	CODEGENOPT(CodeViewGHash, 1, 0)			CODEGENOPT(CodeViewGHash, 1, 0)

	/// The kind of inlining to perform.			/// The kind of inlining to perform.
	ENUM_CODEGENOPT(Inlining, InliningMethod, 2, NormalInlining)			ENUM_CODEGENOPT(Inlining, InliningMethod, 2, NormalInlining)

	// Vector functions library to use.			// Vector functions library to use.
	ENUM_CODEGENOPT(VecLib, VectorLibrary, 2, NoLibrary)			ENUM_CODEGENOPT(VecLib, VectorLibrary, 3, NoLibrary)

	/// The default TLS model to use.			/// The default TLS model to use.
	ENUM_CODEGENOPT(DefaultTLSModel, TLSModel, 2, GeneralDynamicTLSModel)			ENUM_CODEGENOPT(DefaultTLSModel, TLSModel, 2, GeneralDynamicTLSModel)

	/// Bit size of immediate TLS offsets (0 == use the default).			/// Bit size of immediate TLS offsets (0 == use the default).
	VALUE_CODEGENOPT(TLSSize, 8, 0)			VALUE_CODEGENOPT(TLSSize, 8, 0)

	/// Number of path components to strip when emitting checks. (0 == full			/// Number of path components to strip when emitting checks. (0 == full
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,573 Lines • ▼ Show 20 Lines
	def fno_global_isel : Flag<["-"], "fno-global-isel">, Group<f_clang_Group>,			def fno_global_isel : Flag<["-"], "fno-global-isel">, Group<f_clang_Group>,
	HelpText<"Disables the global instruction selector">;			HelpText<"Disables the global instruction selector">;
	def fno_experimental_isel : Flag<["-"], "fno-experimental-isel">, Group<f_clang_Group>,			def fno_experimental_isel : Flag<["-"], "fno-experimental-isel">, Group<f_clang_Group>,
	Alias<fno_global_isel>;			Alias<fno_global_isel>;
	def fno_experimental_new_pass_manager : Flag<["-"], "fno-experimental-new-pass-manager">,			def fno_experimental_new_pass_manager : Flag<["-"], "fno-experimental-new-pass-manager">,
	Group<f_clang_Group>, Flags<[CC1Option]>,			Group<f_clang_Group>, Flags<[CC1Option]>,
	HelpText<"Disables an experimental new pass manager in LLVM.">;			HelpText<"Disables an experimental new pass manager in LLVM.">;
	def fveclib : Joined<["-"], "fveclib=">, Group<f_Group>, Flags<[CC1Option]>,			def fveclib : Joined<["-"], "fveclib=">, Group<f_Group>, Flags<[CC1Option]>,
	HelpText<"Use the given vector functions library">, Values<"Accelerate,MASSV,SVML,none">;			HelpText<"Use the given vector functions library">, Values<"Accelerate,libmvec,MASSV,SVML,none">;
				abiqueUnsubmitted Not Done Reply Inline Actions I think glibc always refer to the library as "libmvec" in lower case, should we do so here? abique: I think glibc always refer to the library as "libmvec" in lower case, should we do so here?
				rengolinUnsubmitted Not Done Reply Inline Actions Yes, so clang can be a drop in replacement. rengolin: Yes, so clang can be a drop in replacement.
	def fno_lax_vector_conversions : Flag<["-"], "fno-lax-vector-conversions">, Group<f_Group>,			def fno_lax_vector_conversions : Flag<["-"], "fno-lax-vector-conversions">, Group<f_Group>,
	Alias<flax_vector_conversions_EQ>, AliasArgs<["none"]>;			Alias<flax_vector_conversions_EQ>, AliasArgs<["none"]>;
	def fno_merge_all_constants : Flag<["-"], "fno-merge-all-constants">, Group<f_Group>,			def fno_merge_all_constants : Flag<["-"], "fno-merge-all-constants">, Group<f_Group>,
	HelpText<"Disallow merging of constants">;			HelpText<"Disallow merging of constants">;
	def fno_modules : Flag <["-"], "fno-modules">, Group<f_Group>,			def fno_modules : Flag <["-"], "fno-modules">, Group<f_Group>,
	Flags<[DriverOption]>;			Flags<[DriverOption]>;
	def fno_implicit_module_maps : Flag <["-"], "fno-implicit-module-maps">, Group<f_Group>,			def fno_implicit_module_maps : Flag <["-"], "fno-implicit-module-maps">, Group<f_Group>,
	Flags<[DriverOption]>;			Flags<[DriverOption]>;
	▲ Show 20 Lines • Show All 3,362 Lines • Show Last 20 Lines

clang/lib/CodeGen/BackendUtil.cpp

	Show First 20 Lines • Show All 365 Lines • ▼ Show 20 Lines
	static TargetLibraryInfoImpl *createTLII(llvm::Triple &TargetTriple,			static TargetLibraryInfoImpl *createTLII(llvm::Triple &TargetTriple,
	const CodeGenOptions &CodeGenOpts) {			const CodeGenOptions &CodeGenOpts) {
	TargetLibraryInfoImpl *TLII = new TargetLibraryInfoImpl(TargetTriple);			TargetLibraryInfoImpl *TLII = new TargetLibraryInfoImpl(TargetTriple);

	switch (CodeGenOpts.getVecLib()) {			switch (CodeGenOpts.getVecLib()) {
	case CodeGenOptions::Accelerate:			case CodeGenOptions::Accelerate:
	TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::Accelerate);			TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::Accelerate);
	break;			break;
				case CodeGenOptions::LIBMVEC:
				switch(TargetTriple.getArch()) {
				default:
				break;
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Nit: is this misaligned? fpetrogalli: Nit: is this misaligned?
				case llvm::Triple::x86_64:
				TLII->addVectorizableFunctionsFromVecLib(
				TargetLibraryInfoImpl::LIBMVEC_X86);
				break;
				}
				break;
	case CodeGenOptions::MASSV:			case CodeGenOptions::MASSV:
	TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::MASSV);			TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::MASSV);
	break;			break;
	case CodeGenOptions::SVML:			case CodeGenOptions::SVML:
	TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::SVML);			TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::SVML);
	break;			break;
	default:			default:
	break;			break;
	▲ Show 20 Lines • Show All 1,316 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 743 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
Opts.DebugPassManager =		Opts.DebugPassManager =
Args.hasFlag(OPT_fdebug_pass_manager, OPT_fno_debug_pass_manager,		Args.hasFlag(OPT_fdebug_pass_manager, OPT_fno_debug_pass_manager,
/* Default */ false);		/* Default */ false);

if (Arg *A = Args.getLastArg(OPT_fveclib)) {		if (Arg *A = Args.getLastArg(OPT_fveclib)) {
StringRef Name = A->getValue();		StringRef Name = A->getValue();
if (Name == "Accelerate")		if (Name == "Accelerate")
Opts.setVecLib(CodeGenOptions::Accelerate);		Opts.setVecLib(CodeGenOptions::Accelerate);
		else if (Name == "libmvec")
		Opts.setVecLib(CodeGenOptions::LIBMVEC);
else if (Name == "MASSV")		else if (Name == "MASSV")
Opts.setVecLib(CodeGenOptions::MASSV);		Opts.setVecLib(CodeGenOptions::MASSV);
else if (Name == "SVML")		else if (Name == "SVML")
Opts.setVecLib(CodeGenOptions::SVML);		Opts.setVecLib(CodeGenOptions::SVML);
else if (Name == "none")		else if (Name == "none")
Opts.setVecLib(CodeGenOptions::NoLibrary);		Opts.setVecLib(CodeGenOptions::NoLibrary);
else		else
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Name;		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Name;
▲ Show 20 Lines • Show All 3,297 Lines • Show Last 20 Lines

clang/test/Driver/autocomplete.c

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	// FFPALL: fast			// FFPALL: fast
	// FFPALL-NEXT: off			// FFPALL-NEXT: off
	// FFPALL-NEXT: on			// FFPALL-NEXT: on
	// RUN: %clang --autocomplete=-flto= \| FileCheck %s -check-prefix=FLTOALL			// RUN: %clang --autocomplete=-flto= \| FileCheck %s -check-prefix=FLTOALL
	// FLTOALL: full			// FLTOALL: full
	// FLTOALL-NEXT: thin			// FLTOALL-NEXT: thin
	// RUN: %clang --autocomplete=-fveclib= \| FileCheck %s -check-prefix=FVECLIBALL			// RUN: %clang --autocomplete=-fveclib= \| FileCheck %s -check-prefix=FVECLIBALL
	// FVECLIBALL: Accelerate			// FVECLIBALL: Accelerate
				// FVECLIBALL-NEXT: libmvec
	// FVECLIBALL-NEXT: MASSV			// FVECLIBALL-NEXT: MASSV
	// FVECLIBALL-NEXT: none			// FVECLIBALL-NEXT: none
	// FVECLIBALL-NEXT: SVML			// FVECLIBALL-NEXT: SVML
	// RUN: %clang --autocomplete=-fshow-overloads= \| FileCheck %s -check-prefix=FSOVERALL			// RUN: %clang --autocomplete=-fshow-overloads= \| FileCheck %s -check-prefix=FSOVERALL
	// FSOVERALL: all			// FSOVERALL: all
	// FSOVERALL-NEXT: best			// FSOVERALL-NEXT: best
	// RUN: %clang --autocomplete=-fvisibility= \| FileCheck %s -check-prefix=FVISIBILITYALL			// RUN: %clang --autocomplete=-fvisibility= \| FileCheck %s -check-prefix=FVISIBILITYALL
	// FVISIBILITYALL: default			// FVISIBILITYALL: default
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetLibraryInfo.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	public:
/// The vector-functions library defines, which functions are vectorizable		/// The vector-functions library defines, which functions are vectorizable
/// and with which factor. The library can be specified by either frontend,		/// and with which factor. The library can be specified by either frontend,
/// or a commandline option, and then used by		/// or a commandline option, and then used by
/// addVectorizableFunctionsFromVecLib for filling up the tables of		/// addVectorizableFunctionsFromVecLib for filling up the tables of
/// vectorizable functions.		/// vectorizable functions.
enum VectorLibrary {		enum VectorLibrary {
NoLibrary, // Don't use any vector library.		NoLibrary, // Don't use any vector library.
Accelerate, // Use Accelerate framework.		Accelerate, // Use Accelerate framework.
		LIBMVEC_X86,// GLIBC Vector Math library.
		fpetrogalliUnsubmitted Not Done Reply Inline Actions Can we call this LIBMVEC-X86? Libmvec itself is supposed to support other architectures, I can see list of mappings for each of the supported targets. Then, the logic of selecting the correct one in the frontent clang would depend on the value of `-fvec-lib=libmvec` plus the value of `-target`. fpetrogalli: Can we call this LIBMVEC-X86? Libmvec itself is supposed to support other architectures, I can…
		nemanjaiUnsubmitted Not Done Reply Inline Actions So if I follow correctly, we can choose the various vendor-specific libraries as well as `libmvec` which itself has target-specific ports. Would it make sense to just add an overload of `addVectorizableFunctions()` that would consider the `Triple` and remove any entries from `VectorDescs` that the target doesn't support? Or even more specifically, simply add the `Triple` argument to `addVectorizableFunctionsFromVecLib()` and call something like `removeLIBMVECEntriesForTarget(const Triple &T)` that would do the job. And of course, if the triple isn't provided and the user is targeting an architecture that doesn't provide some entry, that is just user error. nemanjai: So if I follow correctly, we can choose the various vendor-specific libraries as well as…
		fpetrogalliUnsubmitted Not Done Reply Inline Actions The overload of the `addVectorizableFunctions()` might be feasible, but for the sake of simplicity I think that having `LIBMVEC_<TARGET>` in `enum VectorLibrary` for each of the <TARGET> to support would avoid having to deal with overload of methods. Given that these lists are static, I'd rather see them explicitly instead of having them filled up by add/remove methods. All in all, I think it is easier to add the logic for the target triple in clang as it is just a matter of modifying the changes in `BackendUtils.cpp` (warning, pseudocode ahead): case CodeGenOptions::LIBMVEC: switch(Triple) { case X: TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_X); break case Y: TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_Y); break; case ... } break; fpetrogalli: The overload of the `addVectorizableFunctions()` might be feasible, but for the sake of…
MASSV, // IBM MASS vector library.		MASSV, // IBM MASS vector library.
SVML // Intel short vector math library.		SVML // Intel short vector math library.
};		};

TargetLibraryInfoImpl();		TargetLibraryInfoImpl();
explicit TargetLibraryInfoImpl(const Triple &T);		explicit TargetLibraryInfoImpl(const Triple &T);

// Provide value semantics.		// Provide value semantics.
▲ Show 20 Lines • Show All 373 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/VecFuncs.def

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	TLI_DEFINE_VECFUNC("sinhf", "vsinhf", 4)			TLI_DEFINE_VECFUNC("sinhf", "vsinhf", 4)
	TLI_DEFINE_VECFUNC("coshf", "vcoshf", 4)			TLI_DEFINE_VECFUNC("coshf", "vcoshf", 4)
	TLI_DEFINE_VECFUNC("tanhf", "vtanhf", 4)			TLI_DEFINE_VECFUNC("tanhf", "vtanhf", 4)
	TLI_DEFINE_VECFUNC("asinhf", "vasinhf", 4)			TLI_DEFINE_VECFUNC("asinhf", "vasinhf", 4)
	TLI_DEFINE_VECFUNC("acoshf", "vacoshf", 4)			TLI_DEFINE_VECFUNC("acoshf", "vacoshf", 4)
	TLI_DEFINE_VECFUNC("atanhf", "vatanhf", 4)			TLI_DEFINE_VECFUNC("atanhf", "vatanhf", 4)


				#elif defined(TLI_DEFINE_LIBMVEC_X86_VECFUNCS)
				// GLIBC Vector math Functions

				TLI_DEFINE_VECFUNC("sin", "_ZGVbN2v_sin", 2)
				TLI_DEFINE_VECFUNC("sin", "_ZGVdN4v_sin", 4)

				TLI_DEFINE_VECFUNC("sinf", "_ZGVbN4v_sinf", 4)
				TLI_DEFINE_VECFUNC("sinf", "_ZGVdN8v_sinf", 8)

				TLI_DEFINE_VECFUNC("llvm.sin.f64", "_ZGVbN2v_sin", 2)
				TLI_DEFINE_VECFUNC("llvm.sin.f64", "_ZGVdN4v_sin", 4)

				TLI_DEFINE_VECFUNC("llvm.sin.f32", "_ZGVbN4v_sinf", 4)
				TLI_DEFINE_VECFUNC("llvm.sin.f32", "_ZGVdN8v_sinf", 8)

				TLI_DEFINE_VECFUNC("cos", "_ZGVbN2v_cos", 2)
				TLI_DEFINE_VECFUNC("cos", "_ZGVdN4v_cos", 4)

				TLI_DEFINE_VECFUNC("cosf", "_ZGVbN4v_cosf", 4)
				TLI_DEFINE_VECFUNC("cosf", "_ZGVdN8v_cosf", 8)

				TLI_DEFINE_VECFUNC("llvm.cos.f64", "_ZGVbN2v_cos", 2)
				TLI_DEFINE_VECFUNC("llvm.cos.f64", "_ZGVdN4v_cos", 4)

				TLI_DEFINE_VECFUNC("llvm.cos.f32", "_ZGVbN4v_cosf", 4)
				TLI_DEFINE_VECFUNC("llvm.cos.f32", "_ZGVdN8v_cosf", 8)

				TLI_DEFINE_VECFUNC("pow", "_ZGVbN2vv_pow", 2)
				TLI_DEFINE_VECFUNC("pow", "_ZGVdN4vv_pow", 4)

				TLI_DEFINE_VECFUNC("powf", "_ZGVbN4vv_powf", 4)
				TLI_DEFINE_VECFUNC("powf", "_ZGVdN8vv_powf", 8)

				TLI_DEFINE_VECFUNC("__pow_finite", "_ZGVbN2vv___pow_finite", 2)
				TLI_DEFINE_VECFUNC("__pow_finite", "_ZGVdN4vv___pow_finite", 4)

				TLI_DEFINE_VECFUNC("__powf_finite", "_ZGVbN4vv___powf_finite", 4)
				TLI_DEFINE_VECFUNC("__powf_finite", "_ZGVdN8vv___powf_finite", 8)

				TLI_DEFINE_VECFUNC("llvm.pow.f64", "_ZGVbN2vv_pow", 2)
				TLI_DEFINE_VECFUNC("llvm.pow.f64", "_ZGVdN4vv_pow", 4)

				TLI_DEFINE_VECFUNC("llvm.pow.f32", "_ZGVbN4vv_powf", 4)
				TLI_DEFINE_VECFUNC("llvm.pow.f32", "_ZGVdN8vv_powf", 8)

				TLI_DEFINE_VECFUNC("exp", "_ZGVbN2v_exp", 2)
				TLI_DEFINE_VECFUNC("exp", "_ZGVdN4v_exp", 4)

				TLI_DEFINE_VECFUNC("expf", "_ZGVbN4v_expf", 4)
				TLI_DEFINE_VECFUNC("expf", "_ZGVdN8v_expf", 8)

				TLI_DEFINE_VECFUNC("__exp_finite", "_ZGVbN2v___exp_finite", 2)
				TLI_DEFINE_VECFUNC("__exp_finite", "_ZGVdN4v___exp_finite", 4)

				TLI_DEFINE_VECFUNC("__expf_finite", "_ZGVbN4v___expf_finite", 4)
				TLI_DEFINE_VECFUNC("__expf_finite", "_ZGVdN8v___expf_finite", 8)

				TLI_DEFINE_VECFUNC("llvm.exp.f64", "_ZGVbN2v_exp", 2)
				TLI_DEFINE_VECFUNC("llvm.exp.f64", "_ZGVdN4v_exp", 4)

				TLI_DEFINE_VECFUNC("llvm.exp.f32", "_ZGVbN4v_expf", 4)
				TLI_DEFINE_VECFUNC("llvm.exp.f32", "_ZGVdN8v_expf", 8)

				TLI_DEFINE_VECFUNC("log", "_ZGVbN2v_log", 2)
				TLI_DEFINE_VECFUNC("log", "_ZGVdN4v_log", 4)

				TLI_DEFINE_VECFUNC("logf", "_ZGVbN4v_logf", 4)
				TLI_DEFINE_VECFUNC("logf", "_ZGVdN8v_logf", 8)

				TLI_DEFINE_VECFUNC("__log_finite", "_ZGVbN2v___log_finite", 2)
				TLI_DEFINE_VECFUNC("__log_finite", "_ZGVdN4v___log_finite", 4)

				TLI_DEFINE_VECFUNC("__logf_finite", "_ZGVbN4v___logf_finite", 4)
				TLI_DEFINE_VECFUNC("__logf_finite", "_ZGVdN8v___logf_finite", 8)

				TLI_DEFINE_VECFUNC("llvm.log.f64", "_ZGVbN2v_log", 2)
				TLI_DEFINE_VECFUNC("llvm.log.f64", "_ZGVdN4v_log", 4)

				TLI_DEFINE_VECFUNC("llvm.log.f32", "_ZGVbN4v_logf", 4)
				TLI_DEFINE_VECFUNC("llvm.log.f32", "_ZGVdN8v_logf", 8)

	#elif defined(TLI_DEFINE_MASSV_VECFUNCS)			#elif defined(TLI_DEFINE_MASSV_VECFUNCS)
	// IBM MASS library's vector Functions			// IBM MASS library's vector Functions

	// Floating-Point Arithmetic and Auxiliary Functions			// Floating-Point Arithmetic and Auxiliary Functions
	TLI_DEFINE_VECFUNC("cbrt", "__cbrtd2_massv", 2)			TLI_DEFINE_VECFUNC("cbrt", "__cbrtd2_massv", 2)
	TLI_DEFINE_VECFUNC("cbrtf", "__cbrtf4_massv", 4)			TLI_DEFINE_VECFUNC("cbrtf", "__cbrtf4_massv", 4)
	TLI_DEFINE_VECFUNC("pow", "__powd2_massv", 2)			TLI_DEFINE_VECFUNC("pow", "__powd2_massv", 2)
	TLI_DEFINE_VECFUNC("llvm.pow.f64", "__powd2_massv", 2)			TLI_DEFINE_VECFUNC("llvm.pow.f64", "__powd2_massv", 2)
	▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
	TLI_DEFINE_VECFUNC("__exp2f_finite", "__svml_exp2f16", 16)			TLI_DEFINE_VECFUNC("__exp2f_finite", "__svml_exp2f16", 16)

	#else			#else
	#error "Must choose which vector library functions are to be defined."			#error "Must choose which vector library functions are to be defined."
	#endif			#endif

	#undef TLI_DEFINE_VECFUNC			#undef TLI_DEFINE_VECFUNC
	#undef TLI_DEFINE_ACCELERATE_VECFUNCS			#undef TLI_DEFINE_ACCELERATE_VECFUNCS
				#undef TLI_DEFINE_LIBMVEC_X86_VECFUNCS
	#undef TLI_DEFINE_MASSV_VECFUNCS			#undef TLI_DEFINE_MASSV_VECFUNCS
	#undef TLI_DEFINE_SVML_VECFUNCS			#undef TLI_DEFINE_SVML_VECFUNCS
	#undef TLI_DEFINE_MASSV_VECFUNCS_NAMES			#undef TLI_DEFINE_MASSV_VECFUNCS_NAMES

llvm/lib/Analysis/TargetLibraryInfo.cpp

Show All 18 Lines

static cl::opt<TargetLibraryInfoImpl::VectorLibrary> ClVectorLibrary(		static cl::opt<TargetLibraryInfoImpl::VectorLibrary> ClVectorLibrary(
"vector-library", cl::Hidden, cl::desc("Vector functions library"),		"vector-library", cl::Hidden, cl::desc("Vector functions library"),
cl::init(TargetLibraryInfoImpl::NoLibrary),		cl::init(TargetLibraryInfoImpl::NoLibrary),
cl::values(clEnumValN(TargetLibraryInfoImpl::NoLibrary, "none",		cl::values(clEnumValN(TargetLibraryInfoImpl::NoLibrary, "none",
"No vector functions library"),		"No vector functions library"),
clEnumValN(TargetLibraryInfoImpl::Accelerate, "Accelerate",		clEnumValN(TargetLibraryInfoImpl::Accelerate, "Accelerate",
"Accelerate framework"),		"Accelerate framework"),
		clEnumValN(TargetLibraryInfoImpl::LIBMVEC_X86, "LIBMVEC-X86",
		"GLIBC Vector Math library"),
clEnumValN(TargetLibraryInfoImpl::MASSV, "MASSV",		clEnumValN(TargetLibraryInfoImpl::MASSV, "MASSV",
"IBM MASS vector library"),		"IBM MASS vector library"),
clEnumValN(TargetLibraryInfoImpl::SVML, "SVML",		clEnumValN(TargetLibraryInfoImpl::SVML, "SVML",
"Intel SVML library")));		"Intel SVML library")));

StringLiteral const TargetLibraryInfoImpl::StandardNames[LibFunc::NumLibFuncs] =		StringLiteral const TargetLibraryInfoImpl::StandardNames[LibFunc::NumLibFuncs] =
{		{
#define TLI_DEFINE_STRING		#define TLI_DEFINE_STRING
▲ Show 20 Lines • Show All 1,519 Lines • ▼ Show 20 Lines	void TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib(
case Accelerate: {		case Accelerate: {
const VecDesc VecFuncs[] = {		const VecDesc VecFuncs[] = {
#define TLI_DEFINE_ACCELERATE_VECFUNCS		#define TLI_DEFINE_ACCELERATE_VECFUNCS
#include "llvm/Analysis/VecFuncs.def"		#include "llvm/Analysis/VecFuncs.def"
};		};
addVectorizableFunctions(VecFuncs);		addVectorizableFunctions(VecFuncs);
break;		break;
}		}
		case LIBMVEC_X86: {
		const VecDesc VecFuncs[] = {
		#define TLI_DEFINE_LIBMVEC_X86_VECFUNCS
		#include "llvm/Analysis/VecFuncs.def"
		};
		addVectorizableFunctions(VecFuncs);
		break;
		}
case MASSV: {		case MASSV: {
const VecDesc VecFuncs[] = {		const VecDesc VecFuncs[] = {
#define TLI_DEFINE_MASSV_VECFUNCS		#define TLI_DEFINE_MASSV_VECFUNCS
#include "llvm/Analysis/VecFuncs.def"		#include "llvm/Analysis/VecFuncs.def"
};		};
addVectorizableFunctions(VecFuncs);		addVectorizableFunctions(VecFuncs);
break;		break;
}		}
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s
				fpetrogalliUnsubmitted Not Done Reply Inline Actions `-inject-tli-mappings` is not required here, as a pass itself is required by the loop vectorizer. fpetrogalli: `-inject-tli-mappings` is not required here, as a pass itself is required by the loop…
				fhahnUnsubmitted Not Done Reply Inline Actions I guess it still doesn't hurt to be explicit. Also, can you add a line for the new pass manager? fhahn: I guess it still doesn't hurt to be explicit. Also, can you add a line for the new pass manager?
				spatelUnsubmitted Not Done Reply Inline Actions We need to be explicit about that pass with new-pass-manager as shown here: df5576a cc @aeubanks as I'm not sure if we want to update tests with NPM RUN lines or if we want to silently transition whenever the default gets changed. spatel: We need to be explicit about that pass with new-pass-manager as shown here: df5576a cc…
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				declare float @__expf_finite(float) #0
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Nit: it is standard practice to put all declarations at the end of the file. fpetrogalli: Nit: it is standard practice to put all declarations at the end of the file.

				define void @exp_f32(float* nocapture %varray) {
				; CHECK-LABEL: @exp_f32(
				fpetrogalliUnsubmitted Not Done Reply Inline Actions I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping. You are not checking for the whole auto-vectorization process here, just the vectorization of the function call. Same for all the tests for this patch in which you are doing something similar to this one test. ; CHECK-LABEL: @exp_f32( ; CHECK-LABEL: vector.body: ; CHECK: call fast <4 x float> @_ZGVbN4v___expf_finite(<4 x float> fpetrogalli: I think you are over-testing here. It is enough to check that inside the vector body there is a…
				spatelUnsubmitted Not Done Reply Inline Actions I requested using "utils/update_test_checks.py" to auto-generate the assertions consistently. We have standardized on this practice for tests in several passes because it provides extra test coverage (at the risk of over-testing), and it makes updating tests in the future nearly automatic. The time cost of checking the extra lines is negligible vs. the benefit that we have gotten in finding/avoiding bugs. If the consensus is that it's not worth it on this particular file, I'm ok with that. But the general trend is definitely towards auto-generating full checks. spatel: I requested using "utils/update_test_checks.py" to auto-generate the assertions consistently.
				fhahnUnsubmitted Not Done Reply Inline Actions FWIW in this case I would also slightly prefer to have more targeted test lines than auto-generating them (same for most loop-vectorize tests). The tests are large and LV generates a lot of code, and a lot of the code is completely unrelated/uninteresting to the code in the patch. IMO it would be enough to check the arguments of the vector function calls, together with the calls and maybe the induction variable. The auto-generated check lines make things much more brittle and unrelated changes lead to us requiring to update lots of tests. And I am not sure if it is feasible to audit all details of the generated check lines (in the current patch ~500-800 new CHECK lines). So to me it seems like auto-generating checks here gives a false sense of security and make things harder down the line. fhahn: FWIW in this case I would also slightly prefer to have more targeted test lines than auto…
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call fast <4 x float> @_ZGVbN4v___expf_finite(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call fast <4 x float> @_ZGVbN4v___expf_finite(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call fast float @__expf_finite(float [[CONV]]) [[ATTR0:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP2:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				spatelUnsubmitted Not Done Reply Inline Actions The script should have warned you about using variables named "tmp". Independent of whether we choose to use the scripted assertions or not, you should change this value name (even plain "t" for "trunc" is an improvement over "tmp"). spatel: The script should have warned you about using variables named "tmp". Independent of whether we…
				%conv = sitofp i32 %tmp to float
				%call = tail call fast float @__expf_finite(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %call, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1

				for.end: ; preds = %for.body
				ret void
				}

				!1 = distinct !{!1, !2, !3}
				fhahnUnsubmitted Not Done Reply Inline Actions I don't think we need this. You can just pass `-force-vector-width=4` to the command line and avoid the extra metadata duplicated for each test fhahn: I don't think we need this. You can just pass `-force-vector-width=4` to the command line and…
				fhahnUnsubmitted Not Done Reply Inline Actions Apologies, I missed that this was suggested earlier to test different VFs. If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using `force-vector-width`. fhahn: Apologies, I missed that this was suggested earlier to test different VFs. If we decide to go…
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.enable", i1 true}


				declare double @__exp_finite(double) #0

				define void @exp_f64(double* nocapture %varray) {
				; CHECK-LABEL: @exp_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND1]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.*]] = call fast <4 x double> @_ZGVdN4v___exp_finite(<4 x double> [[TMP4]])
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP7]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[CALL:%.]] = tail call fast double @__exp_finite(double [[CONV]]) [[ATTR1:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store double [[CALL]], double* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP5:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to double
				%call = tail call fast double @__exp_finite(double %conv)
				%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
				store double %call, double* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !11

				for.end: ; preds = %for.body
				ret void
				}

				!11 = distinct !{!11, !12, !13}
				!12 = !{!"llvm.loop.vectorize.width", i32 4}
				!13 = !{!"llvm.loop.vectorize.enable", i1 true}




				declare float @__logf_finite(float) #0

				define void @log_f32(float* nocapture %varray) {
				; CHECK-LABEL: @log_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call fast <4 x float> @_ZGVbN4v___logf_finite(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call fast <4 x float> @_ZGVbN4v___logf_finite(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call fast float @__logf_finite(float [[CONV]]) [[ATTR2:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP7:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call fast float @__logf_finite(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %call, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !21

				for.end: ; preds = %for.body
				ret void
				}

				!21 = distinct !{!21, !22, !23}
				!22 = !{!"llvm.loop.vectorize.width", i32 4}
				!23 = !{!"llvm.loop.vectorize.enable", i1 true}


				declare double @__log_finite(double) #0

				define void @log_f64(double* nocapture %varray) {
				; CHECK-LABEL: @log_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND1]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.*]] = call fast <4 x double> @_ZGVdN4v___log_finite(<4 x double> [[TMP4]])
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP7]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[CALL:%.]] = tail call fast double @__log_finite(double [[CONV]]) [[ATTR3:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store double [[CALL]], double* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP9:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to double
				%call = tail call fast double @__log_finite(double %conv)
				%arrayidx = getelementptr inbounds double, double* %varray, i64 %indvars.iv
				store double %call, double* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31

				for.end: ; preds = %for.body
				ret void
				}

				!31 = distinct !{!31, !32, !33}
				!32 = !{!"llvm.loop.vectorize.width", i32 4}
				!33 = !{!"llvm.loop.vectorize.enable", i1 true}


				declare float @__powf_finite(float, float) #0

				define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
				; CHECK-LABEL: @pow_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VARRAY1:%.]] = bitcast float [[VARRAY:%.]] to i8
				; CHECK-NEXT: [[EXP3:%.]] = bitcast float [[EXP:%.]] to i8
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[VARRAY]], i64 1000
				; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast float [[SCEVGEP]] to i8*
				; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr float, float [[EXP]], i64 1000
				; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast float [[SCEVGEP4]] to i8*
				; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[VARRAY1]], [[SCEVGEP45]]
				; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[EXP3]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: [[MEMCHECK_CONFLICT:%.*]] = and i1 [[FOUND_CONFLICT]], true
				; CHECK-NEXT: br i1 [[MEMCHECK_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND6:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT7:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND6]] to <4 x float>
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[EXP]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4, !alias.scope !10
				; CHECK-NEXT: [[TMP8:%.*]] = call fast <4 x float> @_ZGVbN4vv___powf_finite(<4 x float> [[TMP4]], <4 x float> [[WIDE_LOAD]])
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP9]], i32 0
				; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP10]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP11]], align 4, !alias.scope !13, !noalias !10
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT7]] = add <4 x i32> [[VEC_IND6]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP15:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[EXP]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[TMP2:%.]] = tail call fast float @__powf_finite(float [[CONV]], float [[TMP1]]) [[ATTR4:#.]]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[TMP2]], float* [[ARRAYIDX2]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP16:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv
				%tmp1 = load float, float* %arrayidx, align 4
				%tmp2 = tail call fast float @__powf_finite(float %conv, float %tmp1)
				%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %tmp2, float* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !41

				for.end: ; preds = %for.body
				ret void
				}

				!41 = distinct !{!41, !42, !43}
				!42 = !{!"llvm.loop.vectorize.width", i32 4}
				!43 = !{!"llvm.loop.vectorize.enable", i1 true}


				declare double @__pow_finite(double, double) #0

				define void @pow_f64(double* nocapture %varray, double* nocapture readonly %exp) {
				; CHECK-LABEL: @pow_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VARRAY1:%.]] = bitcast double [[VARRAY:%.]] to i8
				; CHECK-NEXT: [[EXP3:%.]] = bitcast double [[EXP:%.]] to i8
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr double, double [[VARRAY]], i64 1000
				; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast double [[SCEVGEP]] to i8*
				; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr double, double [[EXP]], i64 1000
				; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast double [[SCEVGEP4]] to i8*
				; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[VARRAY1]], [[SCEVGEP45]]
				; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[EXP3]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: [[MEMCHECK_CONFLICT:%.*]] = and i1 [[FOUND_CONFLICT]], true
				; CHECK-NEXT: br i1 [[MEMCHECK_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND6:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT7:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND6]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[EXP]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 4, !alias.scope !17
				; CHECK-NEXT: [[TMP8:%.*]] = call fast <4 x double> @_ZGVdN4vv___pow_finite(<4 x double> [[TMP4]], <4 x double> [[WIDE_LOAD]])
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[TMP9]], i32 0
				; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[TMP10]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP8]], <4 x double>* [[TMP11]], align 4, !alias.scope !20, !noalias !17
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT7]] = add <4 x i32> [[VEC_IND6]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP22:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[EXP]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP1:%.]] = load double, double [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[TMP2:%.]] = tail call fast double @__pow_finite(double [[CONV]], double [[TMP1]]) [[ATTR5:#.]]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store double [[TMP2]], double* [[ARRAYIDX2]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP23:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to double
				%arrayidx = getelementptr inbounds double, double* %exp, i64 %indvars.iv
				%tmp1 = load double, double* %arrayidx, align 4
				%tmp2 = tail call fast double @__pow_finite(double %conv, double %tmp1)
				%arrayidx2 = getelementptr inbounds double, double* %varray, i64 %indvars.iv
				store double %tmp2, double* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !51

				for.end: ; preds = %for.body
				ret void
				}

				!51 = distinct !{!51, !52, !53}
				!52 = !{!"llvm.loop.vectorize.width", i32 4}
				!53 = !{!"llvm.loop.vectorize.enable", i1 true}

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -loop-vectorize -S < %s \| FileCheck %s
				spatelUnsubmitted Not Done Reply Inline Actions Why does this test file use command-line options to specify the vector factor and the other uses metadata? If we can use metadata, then can you vary it to get better coverage (for example <2 x double> or <8 x float>)? spatel: Why does this test file use command-line options to specify the vector factor and the other…
				fhahnUnsubmitted Not Done Reply Inline Actions FWIW most LV tests use `-force-vector-width`. If we decide to just go for the 'essential' check lines, it should be easy to add multiple run lines with difference VFs from the command line? fhahn: FWIW most LV tests use `-force-vector-width`. If we decide to just go for the 'essential' check…

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				declare double @sin(double) #0
				declare float @sinf(float) #0
				declare double @llvm.sin.f64(double) #0
				declare float @llvm.sin.f32(float) #0

				declare double @cos(double) #0
				declare float @cosf(float) #0
				declare double @llvm.cos.f64(double) #0
				declare float @llvm.cos.f32(float) #0

				define void @sin_f64(double* nocapture %varray) {
				; CHECK-LABEL: @sin_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND1]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.*]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4]])
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP7]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[CALL:%.]] = tail call double @sin(double [[CONV]]) [[ATTR2:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store double [[CALL]], double* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP2:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to double
				%call = tail call double @sin(double %conv)
				%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
				store double %call, double* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1

				for.end:
				ret void
				}

				!1 = distinct !{!1, !2, !3}
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.enable", i1 true}


				define void @sin_f32(float* nocapture %varray) {
				; CHECK-LABEL: @sin_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call float @sinf(float [[CONV]]) [[ATTR3:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP5:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call float @sinf(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
				store float %call, float* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !21

				for.end:
				ret void
				}

				!21 = distinct !{!21, !22, !23}
				!22 = !{!"llvm.loop.vectorize.width", i32 4}
				!23 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @sin_f64_intrinsic(double* nocapture %varray) {
				; CHECK-LABEL: @sin_f64_intrinsic(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND1]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.*]] = call <4 x double> @_ZGVdN4v_sin(<4 x double> [[TMP4]])
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP7]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[CALL:%.]] = tail call double @llvm.sin.f64(double [[CONV]]) [[ATTR4:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store double [[CALL]], double* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP7:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to double
				%call = tail call double @llvm.sin.f64(double %conv)
				%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
				store double %call, double* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !31

				for.end:
				ret void
				}

				!31 = distinct !{!31, !32, !33}
				!32 = !{!"llvm.loop.vectorize.width", i32 4}
				spatelUnsubmitted Not Done Reply Inline Actions It would be better to consistently put the FileCheck lines after the 'define'. Can you auto-generate the CHECK lines using llvm/utils/update_test_checks.py ? spatel: It would be better to consistently put the FileCheck lines after the 'define'. Can you auto…
				!33 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @sin_f32_intrinsic(float* nocapture %varray) {
				; CHECK-LABEL: @sin_f32_intrinsic(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call <4 x float> @_ZGVbN4v_sinf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call float @llvm.sin.f32(float [[CONV]]) [[ATTR5:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP9:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call float @llvm.sin.f32(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
				store float %call, float* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !41

				for.end:
				ret void
				}

				!41 = distinct !{!41, !42, !43}
				!42 = !{!"llvm.loop.vectorize.width", i32 4}
				!43 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @cos_f64(double* nocapture %varray) {
				; CHECK-LABEL: @cos_f64(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND1]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.*]] = call <4 x double> @_ZGVdN4v_cos(<4 x double> [[TMP4]])
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP7]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP10:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[CALL:%.]] = tail call double @cos(double [[CONV]]) [[ATTR6:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store double [[CALL]], double* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP11:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to double
				%call = tail call double @cos(double %conv)
				%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
				store double %call, double* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !51

				for.end:
				ret void
				}

				!51 = distinct !{!51, !52, !53}
				!52 = !{!"llvm.loop.vectorize.width", i32 4}
				!53 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @cos_f32(float* nocapture %varray) {
				; CHECK-LABEL: @cos_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call float @cosf(float [[CONV]]) [[ATTR7:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP13:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call float @cosf(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
				store float %call, float* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !61

				for.end:
				ret void
				}

				!61 = distinct !{!61, !62, !63}
				!62 = !{!"llvm.loop.vectorize.width", i32 4}
				!63 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @cos_f64_intrinsic(double* nocapture %varray) {
				; CHECK-LABEL: @cos_f64_intrinsic(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND1:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND1]] to <4 x double>
				; CHECK-NEXT: [[TMP5:%.*]] = call <4 x double> @_ZGVdN4v_cos(<4 x double> [[TMP4]])
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds double, double [[TMP6]], i32 0
				; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[TMP7]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[TMP8]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <4 x i32> [[VEC_IND1]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to double
				; CHECK-NEXT: [[CALL:%.]] = tail call double @llvm.cos.f64(double [[CONV]]) [[ATTR8:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store double [[CALL]], double* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP15:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to double
				%call = tail call double @llvm.cos.f64(double %conv)
				%arrayidx = getelementptr inbounds double, double* %varray, i64 %iv
				store double %call, double* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !71

				for.end:
				ret void
				}

				!71 = distinct !{!71, !72, !73}
				!72 = !{!"llvm.loop.vectorize.width", i32 4}
				!73 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @cos_f32_intrinsic(float* nocapture %varray) {
				; CHECK-LABEL: @cos_f32_intrinsic(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call <4 x float> @_ZGVbN4v_cosf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP16:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call float @llvm.cos.f32(float [[CONV]]) [[ATTR9:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP17:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%tmp = trunc i64 %iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call float @llvm.cos.f32(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %iv
				store float %call, float* %arrayidx, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !81

				for.end:
				ret void
				}

				!81 = distinct !{!81, !82, !83}
				!82 = !{!"llvm.loop.vectorize.width", i32 4}
				!83 = !{!"llvm.loop.vectorize.enable", i1 true}

				declare float @expf(float) #0

				define void @exp_f32(float* nocapture %varray) {
				; CHECK-LABEL: @exp_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call fast <4 x float> @_ZGVbN4v_expf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call fast <4 x float> @_ZGVbN4v_expf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP18:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call fast float @expf(float [[CONV]]) [[ATTR10:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP19:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call fast float @expf(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %call, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !91

				for.end: ; preds = %for.body
				ret void
				}

				!91 = distinct !{!91, !92, !93}
				!92 = !{!"llvm.loop.vectorize.width", i32 4}
				!93 = !{!"llvm.loop.vectorize.enable", i1 true}

				declare float @llvm.exp.f32(float) #0

				define void @exp_f32_intrin(float* nocapture %varray) {
				; CHECK-LABEL: @exp_f32_intrin(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call fast <4 x float> @_ZGVbN4v_expf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call fast <4 x float> @_ZGVbN4v_expf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP20:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call fast float @llvm.exp.f32(float [[CONV]]) [[ATTR11:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP21:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call fast float @llvm.exp.f32(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %call, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !101

				for.end: ; preds = %for.body
				ret void
				}

				!101 = distinct !{!101, !102, !103}
				!102 = !{!"llvm.loop.vectorize.width", i32 4}
				!103 = !{!"llvm.loop.vectorize.enable", i1 true}

				declare float @logf(float) #0

				define void @log_f32(float* nocapture %varray) {
				; CHECK-LABEL: @log_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND2:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT5:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5
				; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6
				; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7
				; CHECK-NEXT: [[STEP_ADD3:%.*]] = add <4 x i32> [[VEC_IND2]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP8:%.*]] = sitofp <4 x i32> [[VEC_IND2]] to <4 x float>
				; CHECK-NEXT: [[TMP9:%.*]] = sitofp <4 x i32> [[STEP_ADD3]] to <4 x float>
				; CHECK-NEXT: [[TMP10:%.*]] = call fast <4 x float> @_ZGVbN4v_logf(<4 x float> [[TMP8]])
				; CHECK-NEXT: [[TMP11:%.*]] = call fast <4 x float> @_ZGVbN4v_logf(<4 x float> [[TMP9]])
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds float, float [[VARRAY:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds float, float [[TMP12]], i32 0
				; CHECK-NEXT: [[TMP15:%.]] = bitcast float [[TMP14]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP15]], align 4
				; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[TMP12]], i32 4
				; CHECK-NEXT: [[TMP17:%.]] = bitcast float [[TMP16]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP17]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT5]] = add <4 x i32> [[STEP_ADD3]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP22:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[CALL:%.]] = tail call fast float @logf(float [[CONV]]) [[ATTR12:#.]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[CALL]], float* [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP23:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%call = tail call fast float @logf(float %conv)
				%arrayidx = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %call, float* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !111

				for.end: ; preds = %for.body
				ret void
				}

				!111 = distinct !{!111, !112, !113}
				!112 = !{!"llvm.loop.vectorize.width", i32 4}
				!113 = !{!"llvm.loop.vectorize.enable", i1 true}

				declare float @powf(float, float) #0

				define void @pow_f32(float* nocapture %varray, float* nocapture readonly %exp) {
				; CHECK-LABEL: @pow_f32(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VARRAY1:%.]] = bitcast float [[VARRAY:%.]] to i8
				; CHECK-NEXT: [[EXP3:%.]] = bitcast float [[EXP:%.]] to i8
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[VARRAY]], i64 1000
				; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast float [[SCEVGEP]] to i8*
				; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr float, float [[EXP]], i64 1000
				; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast float [[SCEVGEP4]] to i8*
				; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[VARRAY1]], [[SCEVGEP45]]
				; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[EXP3]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: [[MEMCHECK_CONFLICT:%.*]] = and i1 [[FOUND_CONFLICT]], true
				; CHECK-NEXT: br i1 [[MEMCHECK_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND6:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT7:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND6]] to <4 x float>
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[EXP]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4, !alias.scope !24
				; CHECK-NEXT: [[TMP8:%.*]] = call fast <4 x float> @_ZGVbN4vv_powf(<4 x float> [[TMP4]], <4 x float> [[WIDE_LOAD]])
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP9]], i32 0
				; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP10]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP11]], align 4, !alias.scope !27, !noalias !24
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT7]] = add <4 x i32> [[VEC_IND6]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP29:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[EXP]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[TMP2:%.]] = tail call fast float @powf(float [[CONV]], float [[TMP1]]) [[ATTR13:#.]]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[TMP2]], float* [[ARRAYIDX2]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP30:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv
				%tmp1 = load float, float* %arrayidx, align 4
				%tmp2 = tail call fast float @powf(float %conv, float %tmp1)
				%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %tmp2, float* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !121

				for.end: ; preds = %for.body
				ret void
				}

				!121 = distinct !{!121, !122, !123}
				!122 = !{!"llvm.loop.vectorize.width", i32 4}
				!123 = !{!"llvm.loop.vectorize.enable", i1 true}

				declare float @llvm.pow.f32(float, float) #0

				define void @pow_f32_intrin(float* nocapture %varray, float* nocapture readonly %exp) {
				; CHECK-LABEL: @pow_f32_intrin(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[VARRAY1:%.]] = bitcast float [[VARRAY:%.]] to i8
				; CHECK-NEXT: [[EXP3:%.]] = bitcast float [[EXP:%.]] to i8
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr float, float [[VARRAY]], i64 1000
				; CHECK-NEXT: [[SCEVGEP2:%.]] = bitcast float [[SCEVGEP]] to i8*
				; CHECK-NEXT: [[SCEVGEP4:%.]] = getelementptr float, float [[EXP]], i64 1000
				; CHECK-NEXT: [[SCEVGEP45:%.]] = bitcast float [[SCEVGEP4]] to i8*
				; CHECK-NEXT: [[BOUND0:%.]] = icmp ult i8 [[VARRAY1]], [[SCEVGEP45]]
				; CHECK-NEXT: [[BOUND1:%.]] = icmp ult i8 [[EXP3]], [[SCEVGEP2]]
				; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
				; CHECK-NEXT: [[MEMCHECK_CONFLICT:%.*]] = and i1 [[FOUND_CONFLICT]], true
				; CHECK-NEXT: br i1 [[MEMCHECK_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND6:%.]] = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT7:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP4:%.*]] = sitofp <4 x i32> [[VEC_IND6]] to <4 x float>
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[EXP]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <4 x float>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x float>, <4 x float> [[TMP7]], align 4, !alias.scope !31
				; CHECK-NEXT: [[TMP8:%.*]] = call fast <4 x float> @_ZGVbN4vv_powf(<4 x float> [[TMP4]], <4 x float> [[WIDE_LOAD]])
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP9]], i32 0
				; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP10]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP8]], <4 x float>* [[TMP11]], align 4, !alias.scope !34, !noalias !31
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
				; CHECK-NEXT: [[VEC_IND_NEXT7]] = add <4 x i32> [[VEC_IND6]], <i32 4, i32 4, i32 4, i32 4>
				; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
				; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP36:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP:%.*]] = trunc i64 [[INDVARS_IV]] to i32
				; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP]] to float
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[EXP]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[TMP2:%.]] = tail call fast float @llvm.pow.f32(float [[CONV]], float [[TMP1]]) [[ATTR14:#.]]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[VARRAY]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store float [[TMP2]], float* [[ARRAYIDX2]], align 4
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1000
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP37:!llvm.loop !.*]]
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%tmp = trunc i64 %indvars.iv to i32
				%conv = sitofp i32 %tmp to float
				%arrayidx = getelementptr inbounds float, float* %exp, i64 %indvars.iv
				%tmp1 = load float, float* %arrayidx, align 4
				%tmp2 = tail call fast float @llvm.pow.f32(float %conv, float %tmp1)
				%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %indvars.iv
				store float %tmp2, float* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !131

				for.end: ; preds = %for.body
				ret void
				}

				!131 = distinct !{!131, !132, !133}
				!132 = !{!"llvm.loop.vectorize.width", i32 4}
				!133 = !{!"llvm.loop.vectorize.enable", i1 true}

				attributes #0 = { nounwind readnone }

llvm/test/Transforms/Util/add-TLI-mappings.ll

	; RUN: opt -vector-library=SVML -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,SVML			; RUN: opt -vector-library=SVML -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,SVML
	; RUN: opt -vector-library=SVML -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,SVML			; RUN: opt -vector-library=SVML -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,SVML
	; RUN: opt -vector-library=MASSV -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,MASSV			; RUN: opt -vector-library=MASSV -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,MASSV
	; RUN: opt -vector-library=MASSV -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,MASSV			; RUN: opt -vector-library=MASSV -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,MASSV
	; RUN: opt -vector-library=Accelerate -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,ACCELERATE			; RUN: opt -vector-library=Accelerate -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,ACCELERATE
				; RUN: opt -vector-library=LIBMVEC-X86 -inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,LIBMVEC-X86
				; RUN: opt -vector-library=LIBMVEC-X86 -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,LIBMVEC-X86
	; RUN: opt -vector-library=Accelerate -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,ACCELERATE			; RUN: opt -vector-library=Accelerate -passes=inject-tli-mappings -S < %s \| FileCheck %s --check-prefixes=COMMON,ACCELERATE

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; COMMON-LABEL: @llvm.compiler.used = appending global			; COMMON-LABEL: @llvm.compiler.used = appending global
	; SVML-SAME: [6 x i8*] [			; SVML-SAME: [6 x i8*] [
	; SVML-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__svml_sin2 to i8*),			; SVML-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__svml_sin2 to i8*),
	; SVML-SAME: i8* bitcast (<4 x double> (<4 x double>)* @__svml_sin4 to i8*),			; SVML-SAME: i8* bitcast (<4 x double> (<4 x double>)* @__svml_sin4 to i8*),
	; SVML-SAME: i8* bitcast (<8 x double> (<8 x double>)* @__svml_sin8 to i8*),			; SVML-SAME: i8* bitcast (<8 x double> (<8 x double>)* @__svml_sin8 to i8*),
	; SVML-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__svml_log10f4 to i8*),			; SVML-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__svml_log10f4 to i8*),
	; SVML-SAME: i8* bitcast (<8 x float> (<8 x float>)* @__svml_log10f8 to i8*),			; SVML-SAME: i8* bitcast (<8 x float> (<8 x float>)* @__svml_log10f8 to i8*),
	; SVML-SAME: i8* bitcast (<16 x float> (<16 x float>)* @__svml_log10f16 to i8*)			; SVML-SAME: i8* bitcast (<16 x float> (<16 x float>)* @__svml_log10f16 to i8*)
	; MASSV-SAME: [2 x i8*] [			; MASSV-SAME: [2 x i8*] [
	; MASSV-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__sind2_massv to i8*),			; MASSV-SAME: i8* bitcast (<2 x double> (<2 x double>)* @__sind2_massv to i8*),
	; MASSV-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__log10f4_massv to i8*)			; MASSV-SAME: i8* bitcast (<4 x float> (<4 x float>)* @__log10f4_massv to i8*)
	; ACCELERATE-SAME: [1 x i8*] [			; ACCELERATE-SAME: [1 x i8*] [
	; ACCELERATE-SAME: i8* bitcast (<4 x float> (<4 x float>)* @vlog10f to i8*)			; ACCELERATE-SAME: i8* bitcast (<4 x float> (<4 x float>)* @vlog10f to i8*)
				; LIBMVEC-X86-SAME: [2 x i8*] [
				; LIBMVEC-X86-SAME: i8* bitcast (<2 x double> (<2 x double>)* @_ZGVbN2v_sin to i8*),
				; LIBMVEC-X86-SAME: i8* bitcast (<4 x double> (<4 x double>)* @_ZGVdN4v_sin to i8*)
	; COMMON-SAME: ], section "llvm.metadata"			; COMMON-SAME: ], section "llvm.metadata"

	define double @sin_f64(double %in) {			define double @sin_f64(double %in) {
	; COMMON-LABEL: @sin_f64(			; COMMON-LABEL: @sin_f64(
	; SVML: call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]			; SVML: call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]
	; MASSV: call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]			; MASSV: call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]
	; ACCELERATE: call double @sin(double %{{.*}})			; ACCELERATE: call double @sin(double %{{.*}})
				; LIBMVEC-X86: call double @sin(double %{{.*}}) #[[SIN:[0-9]+]]
	; No mapping of "sin" to a vector function for Accelerate.			; No mapping of "sin" to a vector function for Accelerate.
	; ACCELERATE-NOT: _ZGV_LLVM_{{.}}_sin({{.}})			; ACCELERATE-NOT: _ZGV_LLVM_{{.}}_sin({{.}})
	%call = tail call double @sin(double %in)			%call = tail call double @sin(double %in)
	ret double %call			ret double %call
	}			}

	declare double @sin(double) #0			declare double @sin(double) #0

	define float @call_llvm.log10.f32(float %in) {			define float @call_llvm.log10.f32(float %in) {
	; COMMON-LABEL: @call_llvm.log10.f32(			; COMMON-LABEL: @call_llvm.log10.f32(
	; SVML: call float @llvm.log10.f32(float %{{.*}})			; SVML: call float @llvm.log10.f32(float %{{.*}})
				; LIBMVEC-X86: call float @llvm.log10.f32(float %{{.*}})
	; MASSV: call float @llvm.log10.f32(float %{{.*}}) #[[LOG10:[0-9]+]]			; MASSV: call float @llvm.log10.f32(float %{{.*}}) #[[LOG10:[0-9]+]]
	; ACCELERATE: call float @llvm.log10.f32(float %{{.*}}) #[[LOG10:[0-9]+]]			; ACCELERATE: call float @llvm.log10.f32(float %{{.*}}) #[[LOG10:[0-9]+]]
	; No mapping of "llvm.log10.f32" to a vector function for SVML.			; No mapping of "llvm.log10.f32" to a vector function for SVML.
	; SVML-NOT: _ZGV_LLVM_{{.}}_llvm.log10.f32({{.}})			; SVML-NOT: _ZGV_LLVM_{{.}}_llvm.log10.f32({{.}})
				; LIBMVEC-X86-NOT: _ZGV_LLVM_{{.}}_llvm.log10.f32({{.}})
	%call = tail call float @llvm.log10.f32(float %in)			%call = tail call float @llvm.log10.f32(float %in)
	ret float %call			ret float %call
	}			}

	declare float @llvm.log10.f32(float) #0			declare float @llvm.log10.f32(float) #0
	attributes #0 = { nounwind readnone }			attributes #0 = { nounwind readnone }

	; SVML: attributes #[[SIN]] = { "vector-function-abi-variant"=			; SVML: attributes #[[SIN]] = { "vector-function-abi-variant"=
	; SVML-SAME: "_ZGV_LLVM_N2v_sin(__svml_sin2),			; SVML-SAME: "_ZGV_LLVM_N2v_sin(__svml_sin2),
	; SVML-SAME: _ZGV_LLVM_N4v_sin(__svml_sin4),			; SVML-SAME: _ZGV_LLVM_N4v_sin(__svml_sin4),
	; SVML-SAME: _ZGV_LLVM_N8v_sin(__svml_sin8)" }			; SVML-SAME: _ZGV_LLVM_N8v_sin(__svml_sin8)" }

	; MASSV: attributes #[[SIN]] = { "vector-function-abi-variant"=			; MASSV: attributes #[[SIN]] = { "vector-function-abi-variant"=
	; MASSV-SAME: "_ZGV_LLVM_N2v_sin(__sind2_massv)" }			; MASSV-SAME: "_ZGV_LLVM_N2v_sin(__sind2_massv)" }
	; MASSV: attributes #[[LOG10]] = { "vector-function-abi-variant"=			; MASSV: attributes #[[LOG10]] = { "vector-function-abi-variant"=
	; MASSV-SAME: "_ZGV_LLVM_N4v_llvm.log10.f32(__log10f4_massv)" }			; MASSV-SAME: "_ZGV_LLVM_N4v_llvm.log10.f32(__log10f4_massv)" }

	; ACCELERATE: attributes #[[LOG10]] = { "vector-function-abi-variant"=			; ACCELERATE: attributes #[[LOG10]] = { "vector-function-abi-variant"=
	; ACCELERATE-SAME: "_ZGV_LLVM_N4v_llvm.log10.f32(vlog10f)" }			; ACCELERATE-SAME: "_ZGV_LLVM_N4v_llvm.log10.f32(vlog10f)" }

				; LIBMVEC-X86: attributes #[[SIN]] = { "vector-function-abi-variant"=
				; LIBMVEC-X86-SAME: "_ZGV_LLVM_N2v_sin(_ZGVbN2v_sin),
				; LIBMVEC-X86-SAME: _ZGV_LLVM_N4v_sin(_ZGVdN4v_sin)" }

This is an archive of the discontinued LLVM Phabricator instance.

Initial support for vectorization using Libmvec (GLIBC vector math library).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 297654

clang/include/clang/Basic/CodeGenOptions.h

clang/include/clang/Basic/CodeGenOptions.def

clang/include/clang/Driver/Options.td

clang/lib/CodeGen/BackendUtil.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/Driver/autocomplete.c

llvm/include/llvm/Analysis/TargetLibraryInfo.h

llvm/include/llvm/Analysis/VecFuncs.def

llvm/lib/Analysis/TargetLibraryInfo.cpp

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll

llvm/test/Transforms/Util/add-TLI-mappings.ll

Initial support for vectorization using Libmvec (GLIBC vector math library).
ClosedPublic