This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
6
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
37
VectorVariant.h
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/Utils/
-
Utils/
8
VecClone.h
-
lib/
-
Analysis/
-
CMakeLists.txt
-
TargetTransformInfo.cpp
-
VectorVariant.cpp
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Target/X86/
-
X86/
-
X86TargetTransformInfo.h
2
X86TargetTransformInfo.cpp
-
Transforms/
-
IPO/
2
PassManagerBuilder.cpp
-
Utils/
-
CMakeLists.txt
23
VecClone.cpp
-
test/Transforms/
-
Transforms/
-
LoopVectorize/
2
masked_simd_func.ll
1
simd_func.ll
-
simd_func_scalar.ll
-
VecClone/
-
all_parm_types.ll
-
broadcast.ll
-
convert_linear.ll
-
external_array.ll
-
linear.ll
-
linear_mem2reg.ll
-
struct_linear_ptr.ll
-
two_vec_sum.ll
-
two_vec_sum_mask.ll
-
two_vec_sum_mem2reg.ll
-
uniform.ll
-
vector_ptr.ll
-
void_foo.ll
-
tools/
-
bugpoint/
-
bugpoint.cpp
-
opt/
-
opt.cpp

Differential D22792

VecClone Pass
Needs ReviewPublic

Authored by mmasten on Jul 25 2016, 5:37 PM.

Download Raw Diff

Details

Reviewers

mzolotukhin
Ayal
hfinkel
javed.absar

Summary

This work is part of an RFC sent by Xinmin back on 3/2/2016 regarding explicit function vectorization. The VecClone pass translates functions marked with "#pragma omp declare simd" into vector length trip count loops. Please see the RFC for details. It can be found at http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html.

Diff Detail

Event Timeline

mmasten updated this revision to Diff 65447.Jul 25 2016, 5:37 PM

mmasten retitled this revision from to VecClone Pass.

mmasten updated this object.

mmasten added reviewers: mzolotukhin, hfinkel, Ayal.

Herald added subscribers: mzolotukhin, mehdi_amini. · View Herald TranscriptJul 25 2016, 5:37 PM

Thanks for the nicely commented code and the provided tests.
A few inlined comments after a quick overview.

include/llvm/Analysis/VectorUtils.h
146 ↗	(On Diff #65447)	Why do all these need to be public APIs?
include/llvm/Transforms/Utils/VecClone.h
12	Why does this need to be in a public header?
lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	This is all not very inefficient: `getVectorVariantAttributes` creates a temporary vector and copy attributes in it, just for iterating over a few lines later to convert these as strings (you can't keep StringRef because the attribute was copied I guess). FuncVars is queried two times for no apparent good reason. It seems that the only use for this function is to populate a list of function to "vectorize". Populating a "vector<Function *>" should be enough for that. A nice property would be to be able to figure easily/efficiently if a function needs to be "vectorized". Maybe reworking the attribute to be more "structured" instead of separate strings.
lib/Transforms/Utils/VecClone.cpp
15	Some overview with an example/overview of what the pass accomplish would be nice. It is already captured in the RFC, but for someone reading the code it'd be helpful.
21	I think it should be spelled-out more clearly that it is required for correctness to have all the variants mentioned in the attribute list to be codegen'd, even at O0, because even if the vectorizer does not run in this module, it may run in another module that would expect these variant to exist.
1432	So this will walk all the functions in the module and walk all the attributes and do a string comparison on all of these. I'm not sure if it is fine to pay this when this pass has nothing to do (i.e. the early exit should be fast).
1436	Coding style: no braces (other places as well).
1439	Spurious empty line (other places as well).
1487	What about a SmallVector and move it out-of-the loop (the call to `clear()` below is already handling the reset between iterations).

Thanks for the comments, Mehdi. I had some other things come up, but I'm making some corrections now.

mmasten added inline comments.Sep 1 2016, 10:44 AM

include/llvm/Analysis/VectorUtils.h
146 ↗	(On Diff #65447)	They don't need to be. I can move these inside the VecClone class. Some of these could eventually become more generalized utilities, but I'll change it for now.
include/llvm/Transforms/Utils/VecClone.h
12	Do you mean that the class definition should be moved to VecClone.cpp?
lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	I will change #2. We need a little more than vector<Function*> here because each simd function will have multiple vector variants. FuncVars is a 1-many mapping of the original function to the string encodings corresponding to the variants that will be generated later. The string representation is essentially what is defined in https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf. We can easily tell which functions need to be vectorized by checking if any of these encodings exist as attributes on the original function. Can you elaborate on what you mean by "more structured"? Are you suggested an alternative representation to string-based function attributes? Thanks
lib/Transforms/Utils/VecClone.cpp
1432	We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP switch has been used. Otherwise, I don't know of another way to figure out which functions will need to be generated. Do you have a suggestion?

mehdi_amini added inline comments.Sep 1 2016, 12:11 PM

lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	Let me clarify what I meant with a `vector<Function >`. I meant that my take on `getFunctionsToVectorize` was that it sole purpose was to set a single* entry in `FuncVars`, and this entry is a `std::vector<std::string>` containing the list of variants. This is used exactly once here: for (auto& pair : FunctionsToVectorize) { Function& F = pair.first; DeclaredVariants &DeclaredVariants = pair.second; If `FunctionsToVectorize` was a vector of `Function `, you could get the list of `DeclaredVariants` on the fly while iterating on the attributes, without creating any temporary "string list".

mehdi_amini added inline comments.Sep 1 2016, 12:17 PM

lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	To come back to the "more structured", the question is "how to get the list of function that need variants very quickly/cheaply". Having to look at all the attributes and do any kind of string processing when we need to generate a variant is not a problem, I'm more interested in doing as little work as possible when there is no variant. Maybe having an attribute on the function which is "hasVariants" and could be queried cheaply could be a solution. I'd have to look at how attributes works internally again. (Ideally I'd like a storage `key->value` where the key would be "variants" and the value would be a list of variants to generate, instead of the flat list where variants strings are mixed with the others and recognized by "magic" pattern matching)

mmasten updated this revision to Diff 73523.Oct 4 2016, 11:53 AM

mmasten marked an inline comment as done.

Herald added subscribers: modocache, mgorny, beanz. · View Herald TranscriptOct 4 2016, 11:53 AM

sodeh added a subscriber: sodeh.Nov 16 2016, 1:46 AM

simoll added a subscriber: simoll.Jan 23 2017, 1:22 AM

hfinkel added inline comments.Jan 27 2017, 4:14 AM

include/llvm/Analysis/VectorVariant.h
140	There's a lot of target information here in the target-independent code. Given that we're not going to vectorize without a target code model regardless, I'd like to see this information pushing into TargetTransformInfo. VecClone can then use TTI to convert the particular ISA tags into information about vector lengths, etc. Other architectures that are adapting this scheme can then extend this in a natural way.
lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	I'd like to come back to this. I agree with Mehdi, having unadorned mangled names as attributes isn't the right design - the "magic" pattern matching is unnecessary. It would seem much better to use something like: attributes #0 = { "vector-variants"="_ZGVbM4l_foo,_ZGVbN4l_foo,_ZGVcM8l_foo,_ZGVcN8l_foo,_ZGVdM8l_foo,_ZGVdN8l_foo,_ZGVeM16l_foo,_ZGVeN16l_foo" instead of: attributes #0 = { hasvectorvariants "_ZGVbM4l_foo" "_ZGVbN4l_foo" "_ZGVcM8l_foo" "_ZGVcN8l_foo" "_ZGVdM8l_foo" "_ZGVdN8l_foo" "_ZGVeM16l_foo" "_ZGVeN16l_foo" exactly like we do for "target-features".

New changes are:

Update function attributes to "vector-variants"="<variant list>" format.
Move target-specific code in the VectorVariant class to TTI.

Thanks for the feedback, Hal. I made the changes you suggested.

Hahnfeld added a subscriber: Hahnfeld.Mar 21 2017, 6:16 AM

fpetrogalli added a subscriber: fpetrogalli.Sep 1 2017, 2:59 AM

fpetrogalli added inline comments.Sep 1 2017, 3:14 AM

lib/Transforms/Utils/VecClone.cpp
42	Hello Matt, thank you for working on this. <nitpicking>Should the comment be updated with the `"vector-variants"="_ZGVbM4ul_, ZGVbN4ul_"` syntax? </nitpicking> Cheers, Francesco

Hello again,

I think it would be good to have references to the document that describes the vector ABI. Is it the one at [1]?

If so, I think it would be good to cover with tests the "linear clause" cases of Table 1 in section 2.4, "Element Data Type to Vector Data Type Mapping".

Francesco

[1] https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf

fpetrogalli added inline comments.Sep 1 2017, 3:46 AM

lib/Target/X86/X86TargetTransformInfo.cpp
2850	Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'. Are you basing this on [2] for gcc compatibility? Why do [1] and [2] differ? Francesco [1] https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf [2] https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt

hfinkel added inline comments.Sep 1 2017, 4:04 PM

lib/Transforms/Utils/VecClone.cpp
39	This function doesn't return anything, and I think that makes the example hard to understand.
100	I don't understand what's going on here. Is it not possible to write the transformation such that it's semantically correct regardless of whether we've run mem2reg? This is not always an either/or situation.

Also, does the fact that we now have the VPlan infrastructure in the vectorizer change, in any way, how we'd like to approach this problem?

Hello - gentle ping on this patch. I think it is an important addition to the compiler. Do we need to reactivate the discussion in the mailing list?

Francesco

egarcia added a subscriber: egarcia.Oct 23 2017, 4:38 PM

Hi Francesco,

I'm working on updating my patch based on the recent comments and other modifications that have been made since the last patch was submitted. I should have it ready in the next couple of days.

Thanks,

Matt

Hahnfeld added a subscriber: llvm-commits.Oct 27 2017, 7:32 AM

mmasten added inline comments.Oct 31 2017, 12:26 PM

lib/Target/X86/X86TargetTransformInfo.cpp
2850	Currently, the implementation is based on gcc compatibility. However, it would be nice to extend to support both gcc and icc. The IsaClass values are different because of the calling convention differences. Intel icc uses regcall calling convention and gcc uses standard calling convention. I've added references to both ABI documents.
lib/Transforms/Utils/VecClone.cpp
39	I agree, this was a bad bug with the example. This has been fixed and updated with an example that demonstrates the behavior of all three types of parameters (uniform, linear, and vector). Please let me know if the bitcasts shown in the example are confusing. The purpose of the bitcasts is so that the loop can appear in scalar form to the loop vectorizer. i.e., we can use a .scalar gep and index to reference vectors in the loop.
42	Thanks Francesco. Fixed.
100	Perhaps the comments were written very clearly. The pass has been written to handle parameters regardless of whether or not mem2reg has run. Hopefully, the updated comments are better.

mmasten updated this revision to Diff 121035.Oct 31 2017, 12:29 PM

Herald added a subscriber: javed.absar. · View Herald TranscriptOct 31 2017, 12:29 PM

Hello,

thank you for updating the patch.

I have added one more comment, and I also have a question. Could you please add (here or in a different patch) the plumbing needed in the LoopVectorizer to make the functions available for vectorizationvia this pass? Apologies if you have already done it and I am just missing it.

Francesco

include/llvm/Analysis/VectorVariant.h
35–38	We will have to support OpenMP 4.5 linear modifiers, which are rendered as 2-char tokens in the mangled names. Woudn't it better to avoid #defines of chars and instead use enums inside the VectorKind class?

fhahn added a subscriber: fhahn.Nov 6 2017, 6:11 AM

mmasten added inline comments.Nov 6 2017, 11:49 AM

include/llvm/Analysis/VectorVariant.h
35–38	Ok, I can change this to an enum inside the VectorKind class. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. I also have a patch for clang that enables the "vector-variants" attribute support.

fpetrogalli added inline comments.Nov 6 2017, 2:25 PM

include/llvm/Analysis/VectorVariant.h
35–38	Ok, I can change this to an enum inside the VectorKind class. Thank you. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. That sounds great, thanks! I also have a patch for clang that enables the "vector-variants" attribute support. Clang is already generating the list of mangled vector names as string attributes. I believe that you need to rearrange the code so that strings get produced in the vector-variants attribute.

tschuett added a subscriber: tschuett.Nov 7 2017, 3:20 AM

a.elovikov added a subscriber: a.elovikov.Nov 8 2017, 1:08 PM

mmasten added inline comments.Nov 27 2017, 4:38 PM

include/llvm/Analysis/VectorVariant.h
35–38	Hi Francesco, I have the LoopVectorize part of this done. Are there any objections to making it part of this review? Thanks, Matt

fpetrogalli added inline comments.Nov 28 2017, 12:41 AM

include/llvm/Analysis/VectorVariant.h
35–38	Hello Matt, please don't add code to this review. I'd prefer to see the changes related to the Vectorizer in a separate patch. Kind regards, Francesco

mmasten mentioned this in D40575: LoopVectorize support for simd functions.Nov 28 2017, 1:11 PM

Moved calcCharacteristicType() function to VectorUtils so that VecClone and LV can share.
Removed vector-variant function attributes in LV instead of VecClone because LV needs to see them.

mmasten added inline comments.Nov 28 2017, 1:17 PM

include/llvm/Analysis/VectorVariant.h
35–38	Thanks Francesco. I made a couple of updates to this patch and created a new patch for the LV side of things. The LV patch is revision D40575.

mmasten mentioned this in D40577: Clang support for simd functions.Nov 28 2017, 1:38 PM

In general, I think the VecClone pass is too complicated because it tries to handle the "optimized code" vs. "non-optimized code" cases separately. I don't think we should (or, in a theoretical sense, can) do that. We should have a uniform algorithm to handle all incoming IR. I think that we can do something like this:

Split the entry block at the top and move all allocas in the original entry block to the new entry block.
For each constant-sized alloca in the entry block, expand it by a factor of VL. For vectorizable types, you can do an alloca of the vector type. Otherwise, use an alloca to generate an array of VL items (i.e., use VL as the alloca's second parameter). Generate an alloca of <RetTy x VL> to hold the return value.
Generate a new loop around all of the rest of the function (for i = [0, VL-1]) and a new return block (which loads the value from the return-value alloca and returns it).
Replace all uses of each entry-block alloca, a, with &a[i]. Remove the old allocas.
All uses of function parameters are now inside the loop. vector parameters will get a vector alloca and store in the entry block, and uses in the loop of the parameter will be replaced by load param[i]. linear parameters get replaced by (p+i). uniform parameters are unchanged.
Replace the returns with a store to RetVal[i] and a branch to the loop-exit block.

Something like that should work for all functions, optimized or not.

include/llvm/Analysis/TargetTransformInfo.h
633	This is X86 specific, and doesn't seem to belong here (also, it's not used anywhere).
648	This shouldn't be an enum like this. We don't need X86-specific things in the TTI interface. It looks like you've done a good job at making this opaque, so you can move this enum itself into the X86TTI implementation. The only thing we need here is the definition of UnknownISA (maybe define that to be an integer = -1).
653	UnknownISA
895	This shouldn't start with a capital letter. Either name it `isaClassMaxRegisterWidth` or `getISAClassMaxRegisterWidth`. I prefer the latter.
898	This shouldn't start with a capital letter. Either name it `isaClassToString` or `convertISAClassToString`. I prefer the latter.
include/llvm/Analysis/VectorVariant.h
35	These preprocessor should just be constexpr/static const ints.
include/llvm/Transforms/Utils/VecClone.h
27	Don't use all caps for these names. IT_Alloca or Alloca could work.
73	Please use consistent terminology: marked as SIMD -> requires a vector variant.
lib/Transforms/IPO/PassManagerBuilder.cpp
35	You also need to add the pass to the new pass manager: lib/Passes/PassBuilder.cpp
98	I wouldn't bother with this. It should be safe if no functions with the vector-variants attribute are present (and, if they are, then you need to run the pass of things might not link).
lib/Transforms/Utils/VecClone.cpp
87	What cleans up the extra loads/stores here (if we're optimizing)? Are assuming that there's a run of SROA afterwards, or does InstCombine do a good job here, or something else?
163	1-many -> one-to-many
290	Move this near the top of the function and check, before you generate code, that he function doesn't already exist. If it does, bail out (e.g., return a nullptr and the caller can move on to the next variant/function).
346	I don't think you need the iterators; this can just be: for (auto &Arg : Clone->args()) and you use this pattern a lot (declaring two iterators and then using them in a for loop). In almost all of these cases, you should use a range-based for loop instead.
419	What happens if there's more than one return in the function? You might want, in that case, to create a new block with the return and convert all other returns to branches to that block.
484	In addition to branches, you need to handle SwitchInst and IndirectBrInst (and, for the latter, you need to find any place where the address of the return block is taken, and replace it with the address of the LoopExitBlock).
535	Use CreateAdd here so you can set both nuw and nsw on this increment.
639	What happens if there is more than one store user?

Thanks for the comments, Hal. Just to clarify your point #2, I think what you're saying is that we should start from a common parameter representation; i.e., parameters should be loaded/stored through memory. Please correct me if I'm wrong. I certainly think this would be a great way to reduce the complexity of the algorithm. The remainder of items in your list should already be covered, but some tweaking may be involved.

In D22792#962822, @mmasten wrote:

Thanks for the comments, Hal.

No problem. Thanks for working on this!

Just to clarify your point #2, I think what you're saying is that we should start from a common parameter representation; i.e., parameters should be loaded/stored through memory. Please correct me if I'm wrong. I certainly think this would be a great way to reduce the complexity of the algorithm. The remainder of items in your list should already be covered, but some tweaking may be involved.

For point #2, I'm saying that we should take all local stack allocations and make them wider by a factor of VL. Thinking about this as having VL simultaneously-running copies of the function, one per vector lane, each of those gets a separate "lane" of the local stack allocations. In point #5, I sketched how I'd handle parameters (I'm not exactly sure what you mean by common representation, as different kinds of parameters do require different handling (i.e., vector, uniform, scalar)). What is true is that, for unoptimized code, where the function arguments are generally stored in local stack allocations, all of those stores are now just inside the loop with everything else, so nothing special needs to happen. Does that make sense?

Ok, after doing some experimentation I believe I understand where you're heading with this. Once I have done some more refactoring I'll post a new version of VecClone for review to make sure we're on the same page.

kmitropo added a subscriber: kmitropo.Jan 24 2018, 4:16 PM

Herald added a subscriber: hintonda. · View Herald TranscriptJan 24 2018, 4:16 PM

mmasten added inline comments.Jan 26 2018, 4:09 PM

lib/Transforms/Utils/VecClone.cpp
87	InstCombine will do it. I also know that EarlyCSE will work.
419	In all the test cases that I have used to this point, this type of re-wiring has already been done. Granted, I have mainly been testing some very simple multiple return functions, but if you have a test case where this happens it will be helpful. Thanks.

Extensive update to the VecClone algorithm based on Hal's feedback. VecClone pass is now supported through the new pass manager. Other minor code changes made.

The latest update includes some pretty extensive rework of the VecClone algorithm that Hal suggested. I also added support for VecClone in the new pass manager and since it is my first time doing this, I would welcome any specific comments on whether or not this was done correctly. Please note that I'm still working on fixing existing tests and will be adding new ones, but I wanted to make sure the overall algorithm is headed in the right direction. Thanks all.

Just a gentle reminder to have a look at the latest VecClone algorithm. Thanks, Matt.

Hi Matt - overall the patch looks good, I just have a couple of comments.

I think you should remove some of the testing you do on the vectorizer side of things in the vectorizer patch, not here. Here you should be testing only the cloning of the function.

Moreover, I would like to see more testing happening in the following cases:

when there is only one vector variant listed in the vector-variants attribute
when the vector variant listed in the attribute is not supported by the target and the cloning does not happen (say you are compiling for SSE but you only have AVX512 vector functions listed in the attribute).

Moreover, I have added one comment about using IR vector types instead of integers to maps ISAs to vector register sizes. That would probably make extending this pass to SVE easier because for vector length agnostic types we will need to do some symbolic computation on vector sizes.

Thank you,

Francesco

include/llvm/Analysis/TargetTransformInfo.h
898	Could you replace information about register size to use IR vector types? For example, you could use <16 x i8> for 128-bit wide registers, or <64 x i8> for 512-bit registers. You could then base all the computations of the VLEN of a vector function on that. I am asking because that will make easier the handling of Vector Length Agnostic (VLA) vector functions that we will end up using for targets like the AArch64 Scalable Vector Extension (SVE).
test/Transforms/LoopVectorize/masked_simd_func.ll
1	Any test that checks the caller side functionality should be be under the patch that enables the loop vectorizer to use the VecClone pass.
test/Transforms/LoopVectorize/simd_func.ll
83	I think we should unit test each variant, not a single test that picks up one vector function from a list of "vector-variants".

mmasten added inline comments.Apr 26 2018, 11:04 AM

test/Transforms/LoopVectorize/masked_simd_func.ll
1	Thanks Francesco. Looks like I accidentally included this test in this patch. I looked to make sure it was already included in the LoopVectorize patch.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 26 2018, 11:04 AM

joel_k_jones added a subscriber: joel_k_jones.Aug 20 2018, 4:40 PM

vchuravy added a subscriber: vchuravy.Oct 11 2018, 4:09 AM

ABataev added a subscriber: ABataev.Apr 12 2019, 11:27 AM

ABataev added inline comments.

include/llvm/Analysis/VectorVariant.h
38	Description of the class?
41	Explicitly set the base type to `char`
48	Use member initializers
69	`VectorKind(const VectorKind &Other) = default;`
75	No need to use `\brief` tag, you must remove it
77	const function
80	const function
83	const function
86	const function
91	const function
94	const function
97	const function
100	const function
103	const function
111	Why not `SmallString`?
112	Why not `llvm::raw_svector_ostream`?
132	`char`->`ParmKind`
144	Use `SmallVector` or `ArrayRef`
146	`std::string`->`StringRef`. Is this target-specific?
152	const function
155	const function
158	const function
161	const function
166	`SmallString`
166	const function
186	const function
193	const function
206	Target-specific?
212	Target-specific?
include/llvm/Transforms/Utils/VecClone.h
31	SmallVector
34	DenseMap
41	const function
51	const function

zsrkmyn added a subscriber: zsrkmyn.Mar 19 2020, 9:11 AM

Re ping

riccibruno added a subscriber: riccibruno.Jun 21 2020, 1:25 PM

Nuullll mentioned this in D141650: [VectorUtils] Enhance VFABI demangling API.Jan 18 2023, 7:08 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

59 lines

TargetTransformInfoImpl.h

30 lines

VectorVariant.h

228 lines

InitializePasses.h

1 line

LinkAllPasses.h

2 lines

Transforms/

Utils/

VecClone.h

122 lines

lib/

Analysis/

CMakeLists.txt

1 line

TargetTransformInfo.cpp

32 lines

VectorVariant.cpp

117 lines

Passes/

PassBuilder.cpp

5 lines

PassRegistry.def

1 line

Target/

X86/

X86TargetTransformInfo.h

18 lines

X86TargetTransformInfo.cpp

131 lines

Transforms/

IPO/

PassManagerBuilder.cpp

8 lines

Utils/

CMakeLists.txt

1 line

VecClone.cpp

893 lines

test/

Transforms/

LoopVectorize/

masked_simd_func.ll

107 lines

simd_func.ll

99 lines

simd_func_scalar.ll

111 lines

VecClone/

46 lines

19 lines

32 lines

35 lines

29 lines

22 lines

40 lines

59 lines

71 lines

two_vec_sum_mem2reg.ll

31 lines

uniform.ll

25 lines

vector_ptr.ll

25 lines

void_foo.ll

19 lines

tools/

bugpoint/

bugpoint.cpp

1 line

opt/

opt.cpp

1 line

Diff 131670

include/llvm/Analysis/TargetTransformInfo.h

Show All 23 Lines

#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
#include <functional>		#include <functional>
		#include <string>

namespace llvm {		namespace llvm {

namespace Intrinsic {		namespace Intrinsic {
enum ID : unsigned;		enum ID : unsigned;
}		}

class Function;		class Function;
▲ Show 20 Lines • Show All 583 Lines • ▼ Show 20 Lines	enum OperandValueKind {
OK_UniformValue, // Operand is uniform (splat of a value).		OK_UniformValue, // Operand is uniform (splat of a value).
OK_UniformConstantValue, // Operand is uniform constant.		OK_UniformConstantValue, // Operand is uniform constant.
OK_NonUniformConstantValue // Operand is a non uniform constant value.		OK_NonUniformConstantValue // Operand is a non uniform constant value.
};		};

/// \brief Additional properties of an operand's values.		/// \brief Additional properties of an operand's values.
enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };		enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };

		/// \brief Default ISA for vector functions.
		static const int UnknownISA = -1;
		hfinkelUnsubmitted Not Done Reply Inline Actions This is X86 specific, and doesn't seem to belong here (also, it's not used anywhere). hfinkel: This is X86 specific, and doesn't seem to belong here (also, it's not used anywhere).

/// \return The number of scalar or vector registers that the target has.		/// \return The number of scalar or vector registers that the target has.
/// If 'Vectors' is true, it returns the number of vector registers. If it is		/// If 'Vectors' is true, it returns the number of vector registers. If it is
/// set to false, it returns the number of scalar registers.		/// set to false, it returns the number of scalar registers.
unsigned getNumberOfRegisters(bool Vector) const;		unsigned getNumberOfRegisters(bool Vector) const;

/// \return The width of the largest scalar or vector register type.		/// \return The width of the largest scalar or vector register type.
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;

/// \return The width of the smallest vector register type.		/// \return The width of the smallest vector register type.
unsigned getMinVectorRegisterBitWidth() const;		unsigned getMinVectorRegisterBitWidth() const;

/// \return True if it should be considered for address type promotion.		/// \return True if it should be considered for address type promotion.
/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is		/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is
/// profitable without finding other extensions fed by the same input.		/// profitable without finding other extensions fed by the same input.
		hfinkelUnsubmitted Not Done Reply Inline Actions This shouldn't be an enum like this. We don't need X86-specific things in the TTI interface. It looks like you've done a good job at making this opaque, so you can move this enum itself into the X86TTI implementation. The only thing we need here is the definition of UnknownISA (maybe define that to be an integer = -1). hfinkel: This shouldn't be an enum like this. We don't need X86-specific things in the TTI interface. It…
bool shouldConsiderAddressTypePromotion(		bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) const;

/// \return The size of a cache line in bytes.		/// \return The size of a cache line in bytes.
unsigned getCacheLineSize() const;		unsigned getCacheLineSize() const;
		hfinkelUnsubmitted Not Done Reply Inline Actions UnknownISA hfinkel: UnknownISA

/// The possible cache levels		/// The possible cache levels
enum class CacheLevel {		enum class CacheLevel {
L1D, // The L1 data cache		L1D, // The L1 data cache
L2D, // The L2 data cache		L2D, // The L2 data cache

// We currently do not model L3 caches, as their sizes differ widely between		// We currently do not model L3 caches, as their sizes differ widely between
// microarchitectures. Also, we currently do not have a use for L3 cache		// microarchitectures. Also, we currently do not have a use for L3 cache
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const;		VectorType *VecTy) const;

/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes stores or has a better vector factor.		/// SizeInBytes stores or has a better vector factor.
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const;		VectorType *VecTy) const;

		/// \returns The maximum vector register width for \p IsaClass.
		unsigned getISAClassMaxRegisterWidth(int ISAClass) const;
		hfinkelUnsubmitted Not Done Reply Inline Actions This shouldn't start with a capital letter. Either name it `isaClassMaxRegisterWidth` or `getISAClassMaxRegisterWidth`. I prefer the latter. hfinkel: This shouldn't start with a capital letter. Either name it `isaClassMaxRegisterWidth` or…

		/// \returns The ISA class as a string.
		std::string isaClassToString(int ISAClass) const;
		hfinkelUnsubmitted Not Done Reply Inline Actions This shouldn't start with a capital letter. Either name it `isaClassToString` or `convertISAClassToString`. I prefer the latter. hfinkel: This shouldn't start with a capital letter. Either name it `isaClassToString` or…
		fpetrogalliUnsubmitted Not Done Reply Inline Actions Could you replace information about register size to use IR vector types? For example, you could use <16 x i8> for 128-bit wide registers, or <64 x i8> for 512-bit registers. You could then base all the computations of the VLEN of a vector function on that. I am asking because that will make easier the handling of Vector Length Agnostic (VLA) vector functions that we will end up using for targets like the AArch64 Scalable Vector Extension (SVE). fpetrogalli: Could you replace information about register size to use IR vector types? For example, you…

		/// \returns The ISAClass based on the maximum vector register size supported
		/// by the target.
		int getISAClassForMaxVecRegSize() const;

		/// \returns The maximum vector register width based on ISAClass \p Class,
		/// as defined in the vector function ABI.
		unsigned maximumSizeofISAClassVectorRegister(int ISAClass, Type *Ty) const;

		/// \returns The encoded ISA class for the mangled vector variant name based
		/// on \p IsaClass.
		char encodeISAClass(int ISAClass) const;

		/// \returns The ISAClass from the character encoded \p IsaClass of the
		/// mangled vector variant function name.
		int decodeISAClass(char ISAClass) const;

		/// \returns The target legalized type of \P Ty based on ISAClass \p IsaClass.
		Type* promoteToSupportedType(Type *Ty, int ISAClass) const;

/// Flags describing the kind of vector reduction.		/// Flags describing the kind of vector reduction.
struct ReductionFlags {		struct ReductionFlags {
ReductionFlags() : IsMaxOp(false), IsSigned(false), NoNaN(false) {}		ReductionFlags() : IsMaxOp(false), IsSigned(false), NoNaN(false) {}
bool IsMaxOp; ///< If the op a min/max kind, true if it's a max operation.		bool IsMaxOp; ///< If the op a min/max kind, true if it's a max operation.
bool IsSigned; ///< Whether the operation is a signed int reduction.		bool IsSigned; ///< Whether the operation is a signed int reduction.
bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.		bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.
};		};

▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const = 0;		unsigned AddrSpace) const = 0;
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
		virtual unsigned getISAClassMaxRegisterWidth(int ISAClass) const = 0;
		virtual std::string isaClassToString(int ISAClass) const = 0;
		virtual int getISAClassForMaxVecRegSize() const = 0;
		virtual unsigned maximumSizeofISAClassVectorRegister(int ISAClass,
		Type *Ty) const = 0;
		virtual char encodeISAClass(int ISAClass) const = 0;
		virtual int decodeISAClass(char ISAClass) const = 0;
		virtual Type* promoteToSupportedType(Type *Ty, int ISAClass) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual int getInstructionLatency(const Instruction *I) = 0;		virtual int getInstructionLatency(const Instruction *I) = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
▲ Show 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);		return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
}		}
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}
		unsigned getISAClassMaxRegisterWidth(int ISAClass) const override {
		return Impl.getISAClassMaxRegisterWidth(ISAClass);
		}
		std::string isaClassToString(int ISAClass) const override {
		return Impl.isaClassToString(ISAClass);
		}
		int getISAClassForMaxVecRegSize() const override {
		return Impl.getISAClassForMaxVecRegSize();
		}
		unsigned maximumSizeofISAClassVectorRegister(int ISAClass,
		Type *Ty) const override {
		return Impl.maximumSizeofISAClassVectorRegister(ISAClass, Ty);
		}
		char encodeISAClass(int ISAClass) const override {
		return Impl.encodeISAClass(ISAClass);
		}
		int decodeISAClass(char ISAClass) const override {
		return Impl.decodeISAClass(ISAClass);
		}
		Type* promoteToSupportedType(Type *Ty, int ISAClass) const override {
		return Impl.promoteToSupportedType(Ty, ISAClass);
		}
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const override {		ReductionFlags Flags) const override {
return Impl.useReductionIntrinsic(Opcode, Ty, Flags);		return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
}		}
bool shouldExpandReduction(const IntrinsicInst *II) const override {		bool shouldExpandReduction(const IntrinsicInst *II) const override {
return Impl.shouldExpandReduction(II);		return Impl.shouldExpandReduction(II);
}		}
int getInstructionLatency(const Instruction *I) override {		int getInstructionLatency(const Instruction *I) override {
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show All 18 Lines
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
		#include <string>

namespace llvm {		namespace llvm {

/// \brief Base class for use as a mix-in that aids implementing		/// \brief Base class for use as a mix-in that aids implementing
/// a TargetTransformInfo-compatible class.		/// a TargetTransformInfo-compatible class.
class TargetTransformInfoImplBase {		class TargetTransformInfoImplBase {
protected:		protected:
typedef TargetTransformInfo TTI;		typedef TargetTransformInfo TTI;
▲ Show 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	public:
}		}

unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const {		VectorType *VecTy) const {
return VF;		return VF;
}		}

		unsigned getISAClassMaxRegisterWidth(int ISAClass) const {
		return 0;
		}

		std::string isaClassToString(int ISAClass) const {
		return "Unknown ISA";
		}

		int getISAClassForMaxVecRegSize() const {
		return TTI::UnknownISA;
		}

		// Used by VectorVariant to determine the VF of the simd function.
		unsigned maximumSizeofISAClassVectorRegister(int ISAClass, Type *Ty) const {
		return 0;
		}

		char encodeISAClass(int ISAClass) const {
		return '?';
		}

		int decodeISAClass(char ISAClass) const {
		return TTI::UnknownISA;
		}

		Type* promoteToSupportedType(Type *Ty, int ISAClass) const {
		return Ty;
		}

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
return false;		return false;
}		}

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

include/llvm/Analysis/VectorVariant.h

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//===---- llvm/Transforms/VectorVariant.h - Vector utilities -- C++ -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This header file defines the VectorVariant class and implements the encoding
				/// and decoding utilities for VectorVariant objects. Multiple VectorVariant
				/// objects can be created (masked, non-masked, etc.) and associated with the
				/// original scalar function. These objects are then used to clone new functions
				/// that can be vectorized. This class follows the standards defined in the
				/// vector function ABI.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_UTILS_INTEL_VECTORVARIANT_H
				#define LLVM_TRANSFORMS_UTILS_INTEL_VECTORVARIANT_H

				#include <vector>
				#include <sstream>
				#include <cctype>
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/DerivedTypes.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/CommandLine.h"

				static const int NOT_ALIGNED = 1;
				static const int POSITIVE = 1;
				static const int NEGATIVE = -1;

				hfinkelUnsubmitted Not Done Reply Inline Actions These preprocessor should just be constexpr/static const ints. hfinkel: These preprocessor should just be constexpr/static const ints.
				namespace llvm {

				class VectorKind {
				fpetrogalliUnsubmitted Not Done Reply Inline Actions We will have to support OpenMP 4.5 linear modifiers, which are rendered as 2-char tokens in the mangled names. Woudn't it better to avoid #defines of chars and instead use enums inside the VectorKind class? fpetrogalli: We will have to support OpenMP 4.5 linear modifiers, which are rendered as 2-char tokens in the…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Ok, I can change this to an enum inside the VectorKind class. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. I also have a patch for clang that enables the "vector-variants" attribute support. mmasten: Ok, I can change this to an enum inside the VectorKind class. I don't have a patch ready for…
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Ok, I can change this to an enum inside the VectorKind class. Thank you. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. That sounds great, thanks! I also have a patch for clang that enables the "vector-variants" attribute support. Clang is already generating the list of mangled vector names as string attributes. I believe that you need to rearrange the code so that strings get produced in the vector-variants attribute. fpetrogalli: > Ok, I can change this to an enum inside the VectorKind class. Thank you. > I don't have a…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Hi Francesco, I have the LoopVectorize part of this done. Are there any objections to making it part of this review? Thanks, Matt mmasten: Hi Francesco, I have the LoopVectorize part of this done. Are there any objections to making…
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Hello Matt, please don't add code to this review. I'd prefer to see the changes related to the Vectorizer in a separate patch. Kind regards, Francesco fpetrogalli: Hello Matt, please don't add code to this review. I'd prefer to see the changes related to the…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Thanks Francesco. I made a couple of updates to this patch and created a new patch for the LV side of things. The LV patch is revision D40575. mmasten: Thanks Francesco. I made a couple of updates to this patch and created a new patch for the LV…
				ABataevUnsubmitted Not Done Reply Inline Actions Description of the class? ABataev: Description of the class?

				public:
				enum ParmKind {
				ABataevUnsubmitted Not Done Reply Inline Actions Explicitly set the base type to `char` ABataev: Explicitly set the base type to `char`
				StrideParmKind = 's',
				LinearParmKind = 'l',
				UniformParmKind = 'u',
				VectorParmKind = 'v'
				};

				VectorKind(char K, int S, int A = NOT_ALIGNED) {
				ABataevUnsubmitted Not Done Reply Inline Actions Use member initializers ABataev: Use member initializers

				assert((S == notAValue() \|\| K == StrideParmKind \|\| K == LinearParmKind) &&
				"only linear vectors have strides");

				assert((K != LinearParmKind \|\| S != notAValue()) &&
				"linear vectors must have a stride");

				assert((K != StrideParmKind \|\| S != notAValue()) &&
				"variable stride vectors must have a stride");

				assert((K != StrideParmKind \|\| S >= 0) &&
				"variable stride position must be non-negative");

				assert(A > 0 && "alignment must be positive");

				Kind = K;
				Stride = S;
				Alignment = A;
				}

				VectorKind(const VectorKind &Other) {
				ABataevUnsubmitted Not Done Reply Inline Actions `VectorKind(const VectorKind &Other) = default;` ABataev: `VectorKind(const VectorKind &Other) = default;`
				Kind = Other.Kind;
				Stride = Other.Stride;
				Alignment = Other.Alignment;
				}

				/// \brief Is the stride for a linear parameter a uniform variable? (i.e.,
				ABataevUnsubmitted Not Done Reply Inline Actions No need to use `\brief` tag, you must remove it ABataev: No need to use `\brief` tag, you must remove it
				/// the stride is stored in a variable but is uniform)
				bool isVariableStride() { return Kind == StrideParmKind; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is the stride for a linear variable non-unit stride?
				bool isNonUnitStride() { return Kind == LinearParmKind && Stride != 1; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is the stride for a linear variable unit stride?
				bool isUnitStride() { return Kind == LinearParmKind && Stride == 1; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is this a linear parameter?
				bool isLinear() {
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				return isVariableStride() \|\| isNonUnitStride() \|\| isUnitStride();
				}

				/// \brief Is this a uniform parameter?
				bool isUniform() { return Kind == UniformParmKind; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is this a vector parameter?
				bool isVector() { return Kind == VectorParmKind; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is the parameter aligned?
				bool isAligned() { return Alignment != NOT_ALIGNED; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Get the stride associated with a linear parameter.
				int getStride() { return Stride; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Get the alignment associated with a linear parameter.
				int getAlignment() { return Alignment; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Represents a don't care value for strides of parameters other
				/// than linear parameters.
				static int notAValue() { return -1; }

				/// \brief Encode the parameter information into a mangled string
				/// corresponding to the standards defined in the vector function ABI.
				std::string encode() {
				ABataevUnsubmitted Not Done Reply Inline Actions Why not `SmallString`? ABataev: Why not `SmallString`?
				std::stringstream SST;
				ABataevUnsubmitted Not Done Reply Inline Actions Why not `llvm::raw_svector_ostream`? ABataev: Why not `llvm::raw_svector_ostream`?
				SST << Kind;

				if (isNonUnitStride()) {
				if (Stride >= 0)
				SST << Stride;
				else
				SST << "n" << -Stride;
				}

				if (isVariableStride())
				SST << Stride;

				if (isAligned())
				SST << 'a' << Alignment;

				return SST.str();
				}

				private:
				char Kind; // linear, uniform, vector
				ABataevUnsubmitted Not Done Reply Inline Actions `char`->`ParmKind` ABataev: `char`->`ParmKind`
				int Stride;
				int Alignment;
				};

				class VectorVariant {

				private:
				const TargetTransformInfo *TTI;
				hfinkelUnsubmitted Not Done Reply Inline Actions There's a lot of target information here in the target-independent code. Given that we're not going to vectorize without a target code model regardless, I'd like to see this information pushing into TargetTransformInfo. VecClone can then use TTI to convert the particular ISA tags into information about vector lengths, etc. Other architectures that are adapting this scheme can then extend this in a natural way. hfinkel: There's a lot of target information here in the target-independent code. Given that we're not…
				int ISAClass;
				bool Mask;
				unsigned int Vlen;
				std::vector<VectorKind> Parameters;
				ABataevUnsubmitted Not Done Reply Inline Actions Use `SmallVector` or `ArrayRef` ABataev: Use `SmallVector` or `ArrayRef`

				static std::string prefix() { return "_ZGV"; }
				ABataevUnsubmitted Not Done Reply Inline Actions `std::string`->`StringRef`. Is this target-specific? ABataev: `std::string`->`StringRef`. Is this target-specific?

				public:
				VectorVariant(StringRef FuncName, const TargetTransformInfo *TTI);

				/// \brief Get the ISA corresponding to this vector variant.
				int getISA() { return ISAClass; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is this a masked vector function variant?
				bool isMasked() { return Mask; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Get the vector length of the vector variant.
				unsigned int getVlen() { return Vlen; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Get the parameters of the vector variant.
				std::vector<VectorKind> &getParameters() { return Parameters; }
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Build the mangled name for the vector variant. This function
				/// builds a mangled name by including the encodings for the ISA class,
				/// mask information, and all parameters.
				std::string encode() {
				ABataevUnsubmitted Not Done Reply Inline Actions `SmallString` ABataev: `SmallString`
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				std::stringstream SST;
				SST << prefix() << TTI->encodeISAClass(ISAClass) << encodeMask(Mask) << Vlen;

				std::vector<VectorKind>::iterator It = Parameters.begin();
				std::vector<VectorKind>::iterator End = Parameters.end();

				if (isMasked())
				End--; // mask parameter is not encoded

				for (; It != End; ++It)
				SST << (*It).encode();

				SST << "_";

				return SST.str();
				}

				/// \brief Generate a function name corresponding to a vector variant.
				std::string generateFunctionName(StringRef ScalarFuncName) {
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				std::string Name = encode();
				return Name + ScalarFuncName.str();
				}

				/// \brief Some targets do not support particular types, so promote to a type
				/// that is supported.
				Type promoteToSupportedType(Type Ty) {
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				return TTI->promoteToSupportedType(Ty, getISA());
				}

				/// \brief Check to see if this is a vector variant based on the function
				/// name.
				static bool isVectorVariant(StringRef FuncName) {
				return FuncName.startswith(prefix());
				}

				/// \brief Encode the mask information for the mangled variant name.
				static char encodeMask(bool EncodeMask) {

				return EncodeMask ? 'M' : 'N';
				ABataevUnsubmitted Not Done Reply Inline Actions Target-specific? ABataev: Target-specific?
				}

				/// \brief Decode the mask information from the mangled variant name.
				static bool decodeMask(char MaskToDecode) {

				switch (MaskToDecode) {
				ABataevUnsubmitted Not Done Reply Inline Actions Target-specific? ABataev: Target-specific?
				case 'M':
				return true;
				case 'N':
				return false;
				}

				llvm_unreachable("unsupported mask");
				}

				/// \brief Calculate the vector length for the vector variant.
				unsigned calcVlen(int ISAClass, Type *Ty);
				};

				} // llvm namespace

				#endif // LLVM_TRANSFORMS_UTILS_INTEL_VECTORVARIANT_H

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 371 Lines • ▼ Show 20 Lines
	void initializeVerifierLegacyPassPass(PassRegistry&);			void initializeVerifierLegacyPassPass(PassRegistry&);
	void initializeVirtRegMapPass(PassRegistry&);			void initializeVirtRegMapPass(PassRegistry&);
	void initializeVirtRegRewriterPass(PassRegistry&);			void initializeVirtRegRewriterPass(PassRegistry&);
	void initializeWholeProgramDevirtPass(PassRegistry&);			void initializeWholeProgramDevirtPass(PassRegistry&);
	void initializeWinEHPreparePass(PassRegistry&);			void initializeWinEHPreparePass(PassRegistry&);
	void initializeWriteBitcodePassPass(PassRegistry&);			void initializeWriteBitcodePassPass(PassRegistry&);
	void initializeWriteThinLTOBitcodePass(PassRegistry&);			void initializeWriteThinLTOBitcodePass(PassRegistry&);
	void initializeXRayInstrumentationPass(PassRegistry&);			void initializeXRayInstrumentationPass(PassRegistry&);
				void initializeVecClonePass(PassRegistry&);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_INITIALIZEPASSES_H			#endif // LLVM_INITIALIZEPASSES_H

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/ObjCARC.h"		#include "llvm/Transforms/ObjCARC.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"		#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
		#include "llvm/Transforms/Utils/VecClone.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <cstdlib>		#include <cstdlib>

namespace {		namespace {
struct ForcePassLinking {		struct ForcePassLinking {
ForcePassLinking() {		ForcePassLinking() {
// We must reference the passes in such a way that compilers will not		// We must reference the passes in such a way that compilers will not
// delete it all as dead code, even with whole program optimization,		// delete it all as dead code, even with whole program optimization,
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createSpeculativeExecutionPass();		(void) llvm::createSpeculativeExecutionPass();
(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();		(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();
(void) llvm::createRewriteSymbolsPass();		(void) llvm::createRewriteSymbolsPass();
(void) llvm::createStraightLineStrengthReducePass();		(void) llvm::createStraightLineStrengthReducePass();
(void) llvm::createMemDerefPrinter();		(void) llvm::createMemDerefPrinter();
(void) llvm::createFloat2IntPass();		(void) llvm::createFloat2IntPass();
(void) llvm::createEliminateAvailableExternallyPass();		(void) llvm::createEliminateAvailableExternallyPass();
(void) llvm::createScalarizeMaskedMemIntrinPass();		(void) llvm::createScalarizeMaskedMemIntrinPass();
		(void) llvm::createVecClonePass();

(void)new llvm::IntervalPartition();		(void)new llvm::IntervalPartition();
(void)new llvm::ScalarEvolutionWrapperPass();		(void)new llvm::ScalarEvolutionWrapperPass();
llvm::Function::Create(nullptr, llvm::GlobalValue::ExternalLinkage)->viewCFGOnly();		llvm::Function::Create(nullptr, llvm::GlobalValue::ExternalLinkage)->viewCFGOnly();
llvm::RGPassManager RGM;		llvm::RGPassManager RGM;
llvm::TargetLibraryInfoImpl TLII;		llvm::TargetLibraryInfoImpl TLII;
llvm::TargetLibraryInfo TLI(TLII);		llvm::TargetLibraryInfo TLI(TLII);
llvm::AliasAnalysis AA(TLI);		llvm::AliasAnalysis AA(TLI);
Show All 9 Lines

include/llvm/Transforms/Utils/VecClone.h

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//===-------------- VecClone.h - Class definition -- C++ ----------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				// ===--------------------------------------------------------------------=== //
				///
				/// \file
				/// This file defines the VecClone pass class.
				///
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why does this need to be in a public header? mehdi_amini: Why does this need to be in a public header?
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Do you mean that the class definition should be moved to VecClone.cpp? mmasten: Do you mean that the class definition should be moved to VecClone.cpp?
				// ===--------------------------------------------------------------------=== //

				#include "llvm/ADT/SmallSet.h"
				#include "llvm/Analysis/VectorVariant.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/Pass.h"

				#ifndef LLVM_TRANSFORMS_VPO_VECCLONE_H
				#define LLVM_TRANSFORMS_VPO_VECCLONE_H

				namespace llvm {

				hfinkelUnsubmitted Not Done Reply Inline Actions Don't use all caps for these names. IT_Alloca or Alloca could work. hfinkel: Don't use all caps for these names. IT_Alloca or Alloca could work.
				class ModulePass;

				/// \brief Contains the names of the declared vector function variants
				typedef std::vector<StringRef> DeclaredVariants;
				ABataevUnsubmitted Not Done Reply Inline Actions SmallVector ABataev: SmallVector

				/// \brief Contains a mapping of a function to its vector function variants
				typedef std::map<Function*, DeclaredVariants> FunctionVariants;
				ABataevUnsubmitted Not Done Reply Inline Actions DenseMap ABataev: DenseMap

				struct VecClonePass : public PassInfoMixin<VecClonePass> {

				public:
				/// \brief Get all functions marked for vectorization in module and their
				/// list of variants.
				void getFunctionsToVectorize(Module &M, FunctionVariants &FuncVars);
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);

				// Glue for old PM
				bool runImpl(Module &M, Function &F, VectorVariant &Variant);

				private:
				/// \brief Returns a floating point or integer constant depending on Ty.
				template <typename T>
				Constant* getConstantValue(Type *Ty, LLVMContext &Context, T Val);
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Make a copy of the function if it requires a vector variant.
				Function* CloneFunction(Module &M, Function &F, VectorVariant &V);

				/// \brief Update the users of vector and linear parameters. Vector
				/// parameters must be now be indexed to reference the appropriate
				/// element and for linear parameters the stride will be added.
				void updateParameterUsers(Function *Clone, VectorVariant &Variant,
				BasicBlock &EntryBlock, PHINode *Phi,
				const DataLayout &DL);

				/// \brief Performs a translation of a -> &a[i] for widened alloca
				/// instructions within the loop body of a simd function.
				void updateAllocaUsers(Function Clone, PHINode Phi,
				DenseMap<AllocaInst, Instruction> &AllocaMap);

				/// \brief Widen alloca instructions. Vector parameters will have a vector
				/// alloca of size VF and and linear/uniform parameters will have an array
				/// alloca of size VF.
				void widenAllocaInstructions(
				Function *Clone,
				DenseMap<AllocaInst, Instruction> &AllocaMap,
				hfinkelUnsubmitted Not Done Reply Inline Actions Please use consistent terminology: marked as SIMD -> requires a vector variant. hfinkel: Please use consistent terminology: marked as SIMD -> requires a vector variant.
				BasicBlock &EntryBlock,
				VectorVariant &Variant,
				const DataLayout &DL);

				/// \brief Generate a loop around the function body.
				PHINode* generateLoopForFunctionBody(Function *Clone,
				BasicBlock *EntryBlock,
				BasicBlock *LoopBlock,
				BasicBlock *LoopExitBlock,
				BasicBlock *ReturnBlock,
				int VF);

				/// \brief Remove any incompatible parameter attributes as a result of
				/// widening vector parameters.
				void removeIncompatibleAttributes(Function *Clone);

				/// \brief Check to see if the function is simple enough that a loop does
				/// not need to be inserted into the function.
				bool isSimpleFunction(Function *Clone, BasicBlock &EntryBlock);

				/// \brief Inserts the if/else split and mask condition for masked SIMD
				/// functions.
				void insertSplitForMaskedVariant(Function Clone, BasicBlock LoopBlock,
				BasicBlock *LoopExitBlock,
				Instruction Mask, PHINode Phi);

				/// \brief Adds metadata to the conditional branch of the simd loop latch to
				/// prevent loop unrolling and to force vectorization at VF.
				void addLoopMetadata(BasicBlock *Latch, unsigned VF);
				};

				class VecClone : public ModulePass {

				bool runOnModule(Module &M) override;

				public:
				static char ID;
				VecClone();
				void print(raw_ostream &OS, const Module * = nullptr) const override;
				void getAnalysisUsage(AnalysisUsage &AU) const override;
				VecClonePass Impl;

				}; // end pass class

				ModulePass *createVecClonePass();

				} // end llvm namespace

				#endif // LLVM_TRANSFORMS_VPO_VECCLONE_H

lib/Analysis/CMakeLists.txt

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMAnalysis
Trace.cpp		Trace.cpp
TypeBasedAliasAnalysis.cpp		TypeBasedAliasAnalysis.cpp
TypeMetadataUtils.cpp		TypeMetadataUtils.cpp
ScopedNoAliasAA.cpp		ScopedNoAliasAA.cpp
ValueLattice.cpp		ValueLattice.cpp
ValueLatticeUtils.cpp		ValueLatticeUtils.cpp
ValueTracking.cpp		ValueTracking.cpp
VectorUtils.cpp		VectorUtils.cpp
		VectorVariant.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis		${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines

	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,			unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
	unsigned StoreSize,			unsigned StoreSize,
	unsigned ChainSizeInBytes,			unsigned ChainSizeInBytes,
	VectorType *VecTy) const {			VectorType *VecTy) const {
	return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);			return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
	}			}

				unsigned
				TargetTransformInfo::getISAClassMaxRegisterWidth(int ISAClass) const {
				return TTIImpl->getISAClassMaxRegisterWidth(ISAClass);
				}

				std::string TargetTransformInfo::isaClassToString(int ISAClass) const {
				return TTIImpl->isaClassToString(ISAClass);
				}

				int TargetTransformInfo::getISAClassForMaxVecRegSize() const {
				return TTIImpl->getISAClassForMaxVecRegSize();
				}

				unsigned TargetTransformInfo::maximumSizeofISAClassVectorRegister(
				int ISAClass, Type *Ty) const {

				return TTIImpl->maximumSizeofISAClassVectorRegister(ISAClass, Ty);
				}

				char TargetTransformInfo::encodeISAClass(int ISAClass) const {
				return TTIImpl->encodeISAClass(ISAClass);
				}

				int TargetTransformInfo::decodeISAClass(char ISAClass) const {
				return TTIImpl->decodeISAClass(ISAClass);
				}

				Type* TargetTransformInfo::promoteToSupportedType(Type *Ty,
				int ISAClass) const {
				return TTIImpl->promoteToSupportedType(Ty, ISAClass);
				}

	bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,			bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
	Type *Ty, ReductionFlags Flags) const {			Type *Ty, ReductionFlags Flags) const {
	return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);			return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
	}			}

	bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {			bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
	return TTIImpl->shouldExpandReduction(II);			return TTIImpl->shouldExpandReduction(II);
	}			}
	▲ Show 20 Lines • Show All 601 Lines • Show Last 20 Lines

lib/Analysis/VectorVariant.cpp

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//===---------- VectorVariant.cpp - Vector function ABI -- C++ ----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file implements the VectorVariant class and corresponding utilities.
				/// VectorVariant objects are associated with a scalar function and are used
				/// to generate new functions that can be vectorized. VectorVariants are
				/// determined by inspecting the function attributes associated with the scalar
				/// function. When a mangled function name is found in the attributes (indicated
				/// as "_ZGV"), a VectorVariant object is created. The class and utilities
				/// in this file follow the standards defined in the vector function ABI.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/VectorVariant.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/IR/Type.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				/// \brief Generate a vector variant by decoding the mangled string for the
				/// variant contained in the original scalar function's attributes. For
				/// example: "_ZGVxN4". The name mangling is defined in the vector function
				/// ABI. Based on this string, the parameter kinds (uniform, linear, vector),
				/// vector length, parameter alignment, and masking are determined.
				VectorVariant::VectorVariant(StringRef FuncName, const TargetTransformInfo *TTI)
				: TTI(TTI) {

				assert(isVectorVariant(FuncName) && "invalid vector variant format");

				std::stringstream SST(FuncName.drop_front(prefix().size()));

				// mandatory annotations
				char EncodedISA;
				SST.get(EncodedISA);
				ISAClass = TTI->decodeISAClass(EncodedISA);

				char EncodedMask;
				SST.get(EncodedMask);
				Mask = decodeMask(EncodedMask);
				SST >> Vlen;

				// optional parameter annotations
				while (SST.peek() != '_') {

				char Kind;
				int Stride = VectorKind::notAValue();
				int StrideSign = POSITIVE;
				int Alignment = NOT_ALIGNED;

				// Get parameter kind
				SST.get(Kind);

				// Default stride for linear is 1. If the stride for a parameter is 1,
				// then the front-end will not encode it and we will not have set the
				// correct stride below.
				if (Kind == VectorKind::LinearParmKind)
				Stride = 1;

				// Handle optional stride
				if (SST.peek() == 'n') {
				// Stride is negative
				SST.ignore(1);
				StrideSign = NEGATIVE;
				}

				if (std::isdigit(SST.peek())) {
				// Extract constant stride
				SST >> Stride;
				assert((Kind != VectorKind::StrideParmKind \|\| Stride >= 0) &&
				"variable stride argument index cannot be negative");
				}

				Stride *= StrideSign;
				// Handle optional alignment
				if (SST.peek() == 'a') {
				SST.ignore(1);
				SST >> Alignment;
				}

				VectorKind VecKind(Kind, Stride, Alignment);
				Parameters.push_back(VecKind);
				}

				if (Mask) {
				// Masked variants will have an additional mask parameter
				VectorKind VecKind(VectorKind::VectorParmKind, VectorKind::notAValue());
				Parameters.push_back(VecKind);
				}
				}

				/// \brief Determine the vector variant's vector length based on the
				/// characteristic data type defined in the vector function ABI and target
				/// vector register width.
				unsigned int VectorVariant::calcVlen(int ISAClass,
				Type *CharacteristicDataType) {
				assert(CharacteristicDataType &&
				CharacteristicDataType->getPrimitiveSizeInBits() != 0 &&
				"expected characteristic data type to have a primitive size in bits");

				unsigned int VectorRegisterSize =
				TTI->maximumSizeofISAClassVectorRegister(ISAClass,
				CharacteristicDataType);

				assert(VectorRegisterSize != 0 && "could not find vector register size for "
				"ISAClass - check to make sure it's "
				"supported in TTI");

				return VectorRegisterSize / CharacteristicDataType->getPrimitiveSizeInBits();
				}

lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"		#include "llvm/Transforms/Utils/LibCallsShrinkWrap.h"
#include "llvm/Transforms/Utils/LoopSimplify.h"		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LowerInvoke.h"		#include "llvm/Transforms/Utils/LowerInvoke.h"
#include "llvm/Transforms/Utils/Mem2Reg.h"		#include "llvm/Transforms/Utils/Mem2Reg.h"
#include "llvm/Transforms/Utils/NameAnonGlobals.h"		#include "llvm/Transforms/Utils/NameAnonGlobals.h"
#include "llvm/Transforms/Utils/PredicateInfo.h"		#include "llvm/Transforms/Utils/PredicateInfo.h"
#include "llvm/Transforms/Utils/SimplifyInstructions.h"		#include "llvm/Transforms/Utils/SimplifyInstructions.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
		#include "llvm/Transforms/Utils/VecClone.h"
#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "llvm/Transforms/Vectorize/SLPVectorizer.h"		#include "llvm/Transforms/Vectorize/SLPVectorizer.h"

#include <type_traits>		#include <type_traits>

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned> MaxDevirtIterations("pm-max-devirt-iterations",		static cl::opt<unsigned> MaxDevirtIterations("pm-max-devirt-iterations",
▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopRotatePass()));		OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopRotatePass()));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
OptimizePM.addPass(LoopDistributePass());		OptimizePM.addPass(LoopDistributePass());

		// Insert a loop with VF trip count around the body of functions that are
		// vector variants.
		MPM.addPass(VecClonePass());

// Now run the core loop vectorizer.		// Now run the core loop vectorizer.
OptimizePM.addPass(LoopVectorizePass());		OptimizePM.addPass(LoopVectorizePass());

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
OptimizePM.addPass(LoopLoadEliminationPass());		OptimizePM.addPass(LoopLoadEliminationPass());

// Cleanup after the loop optimization passes.		// Cleanup after the loop optimization passes.
▲ Show 20 Lines • Show All 1,052 Lines • Show Last 20 Lines

lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	MODULE_PASS("print", PrintModulePass(dbgs()))			MODULE_PASS("print", PrintModulePass(dbgs()))
	MODULE_PASS("print-lcg", LazyCallGraphPrinterPass(dbgs()))			MODULE_PASS("print-lcg", LazyCallGraphPrinterPass(dbgs()))
	MODULE_PASS("print-lcg-dot", LazyCallGraphDOTPrinterPass(dbgs()))			MODULE_PASS("print-lcg-dot", LazyCallGraphDOTPrinterPass(dbgs()))
	MODULE_PASS("rewrite-symbols", RewriteSymbolPass())			MODULE_PASS("rewrite-symbols", RewriteSymbolPass())
	MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())			MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())
	MODULE_PASS("sample-profile", SampleProfileLoaderPass())			MODULE_PASS("sample-profile", SampleProfileLoaderPass())
	MODULE_PASS("strip-dead-prototypes", StripDeadPrototypesPass())			MODULE_PASS("strip-dead-prototypes", StripDeadPrototypesPass())
	MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass())			MODULE_PASS("wholeprogramdevirt", WholeProgramDevirtPass())
				MODULE_PASS("vec-clone", VecClonePass())
	MODULE_PASS("verify", VerifierPass())			MODULE_PASS("verify", VerifierPass())
	#undef MODULE_PASS			#undef MODULE_PASS

	#ifndef CGSCC_ANALYSIS			#ifndef CGSCC_ANALYSIS
	#define CGSCC_ANALYSIS(NAME, CREATE_PASS)			#define CGSCC_ANALYSIS(NAME, CREATE_PASS)
	#endif			#endif
	CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())			CGSCC_ANALYSIS("no-op-cgscc", NoOpCGSCCAnalysis())
	CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())			CGSCC_ANALYSIS("fam-proxy", FunctionAnalysisManagerCGSCCProxy())
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	llvm::Optional<unsigned> getCacheSize(
TargetTransformInfo::CacheLevel Level) const;		TargetTransformInfo::CacheLevel Level) const;
llvm::Optional<unsigned> getCacheAssociativity(		llvm::Optional<unsigned> getCacheAssociativity(
TargetTransformInfo::CacheLevel Level) const;		TargetTransformInfo::CacheLevel Level) const;
/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

		/// ISA classes defined in the vector function ABI.
		enum ISAClass {
		SSE,
		AVX,
		AVX2,
		AVX512,
		ISAClassesNum
		};

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(bool Vector);
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;		unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedStore(Type *DataType);		bool isLegalMaskedStore(Type *DataType);
bool isLegalMaskedGather(Type *DataType);		bool isLegalMaskedGather(Type *DataType);
bool isLegalMaskedScatter(Type *DataType);		bool isLegalMaskedScatter(Type *DataType);
bool hasDivRemOp(Type *DataType, bool IsSigned);		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
bool enableMemCmpExpansion(unsigned &MaxLoadSize);		bool enableMemCmpExpansion(unsigned &MaxLoadSize);
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();

		unsigned getISAClassMaxRegisterWidth(int ISAClass) const;
		std::string isaClassToString(int ISAClass) const;
		int getISAClassForMaxVecRegSize() const;
		unsigned maximumSizeofISAClassVectorRegister(int ISAClass,
		Type *Ty) const;
		char encodeISAClass(int ISAClass) const;
		int decodeISAClass(char ISAClass) const;
		Type* promoteToSupportedType(Type *Ty, int ISAClass) const;
private:		private:
int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,		int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,802 Lines • ▼ Show 20 Lines	return getInterleavedMemoryOpCostAVX512(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
if (ST->hasAVX2())		if (ST->hasAVX2())
return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,		return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);

return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
}		}

		unsigned X86TTIImpl::getISAClassMaxRegisterWidth(int ISAClass) const {
		switch (ISAClass) {
		case SSE:
		return 128;
		case AVX:
		case AVX2:
		return 256;
		case AVX512:
		return 512;
		default:
		llvm_unreachable("unsupported ISA class");
		}
		}

		std::string X86TTIImpl::isaClassToString(int ISAClass) const {
		switch (ISAClass) {
		case SSE:
		return "SSE";
		case AVX:
		return "AVX";
		case AVX2:
		return "AVX2";
		case AVX512:
		return "AVX512";
		default:
		llvm_unreachable("Unknown ISA class");
		}
		}

		int X86TTIImpl::getISAClassForMaxVecRegSize() const {
		ISAClass TargetIsaClass;
		unsigned TargetMaxRegWidth = getRegisterBitWidth(true);
		switch (TargetMaxRegWidth) {
		case 128:
		TargetIsaClass = SSE;
		break;
		case 256:
		if (ST->hasAVX2())
		TargetIsaClass = AVX2;
		fpetrogalliUnsubmitted Not Done Reply Inline Actions Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'. Are you basing this on [2] for gcc compatibility? Why do [1] and [2] differ? Francesco [1] https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf [2] https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt fpetrogalli: Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'.
		mmastenAuthorUnsubmitted Not Done Reply Inline Actions Currently, the implementation is based on gcc compatibility. However, it would be nice to extend to support both gcc and icc. The IsaClass values are different because of the calling convention differences. Intel icc uses regcall calling convention and gcc uses standard calling convention. I've added references to both ABI documents. mmasten: Currently, the implementation is based on gcc compatibility. However, it would be nice to…
		else
		TargetIsaClass = AVX;
		break;
		case 512:
		TargetIsaClass = AVX512;
		break;
		default:
		llvm_unreachable("Invalid target vector register width");
		}
		return TargetIsaClass;
		}

		unsigned X86TTIImpl::maximumSizeofISAClassVectorRegister(int ISAClass,
		Type *Ty) const {

		assert((Ty->isIntegerTy() \|\| Ty->isFloatTy() \|\| Ty->isDoubleTy() \|\|
		Ty->isPointerTy()) &&
		"unsupported type");

		unsigned int VectorRegisterSize = 0;

		switch (ISAClass) {
		case SSE:
		VectorRegisterSize = 128;
		break;
		case AVX:
		if (Ty->isIntegerTy() \|\| Ty->isPointerTy())
		VectorRegisterSize = 128;
		else
		VectorRegisterSize = 256;
		break;
		case AVX2:
		if (Ty->isIntegerTy(8))
		VectorRegisterSize = 128;
		else
		VectorRegisterSize = 256;
		break;
		case AVX512:
		VectorRegisterSize = 512;
		break;
		default:
		llvm_unreachable("unknown isa class");
		return 0;
		}

		assert(VectorRegisterSize != 0 && "unsupported ISA/type combination");
		return VectorRegisterSize;
		}

		char X86TTIImpl::encodeISAClass(int ISAClass) const {
		switch (ISAClass) {
		case SSE:
		return 'b';
		case AVX:
		return 'c';
		case AVX2:
		return 'd';
		case AVX512:
		return 'e';
		default:
		break;
		}

		assert(false && "unsupported ISA class");
		return '?';
		}

		int X86TTIImpl::decodeISAClass(char ISAClass) const {
		switch (ISAClass) {
		case 'b':
		return SSE;
		case 'c':
		return AVX;
		case 'd':
		return AVX2;
		case 'e':
		return AVX512;
		default:
		llvm_unreachable("unsupported ISA class");
		return SSE;
		}
		}

		Type* X86TTIImpl::promoteToSupportedType(Type *Ty, int ISAClass) const {
		// On ZMM promote char and short to int
		if (ISAClass == AVX512 && (Ty->isIntegerTy(8) \|\|
		Ty->isIntegerTy(16)))
		return Type::getInt32Ty(Ty->getContext());

		return Ty;
		}

lib/Transforms/IPO/PassManagerBuilder.cpp

Show All 26 Lines
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/ModuleSummaryIndex.h"		#include "llvm/IR/ModuleSummaryIndex.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ManagedStatic.h"		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"		#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
		hfinkelUnsubmitted Not Done Reply Inline Actions You also need to add the pass to the new pass manager: lib/Passes/PassBuilder.cpp hfinkel: You also need to add the pass to the new pass manager: lib/Passes/PassBuilder.cpp
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/InferFunctionAttrs.h"		#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
		#include "llvm/Transforms/Utils/VecClone.h"

using namespace llvm;		using namespace llvm;

static cl::opt<bool>		static cl::opt<bool>
RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,		RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,
cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));		cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));

static cl::opt<bool>		static cl::opt<bool>
Show All 38 Lines	UseCFLAA("use-cfl-aa", cl::init(CFLAAType::None), cl::Hidden,
"Enable inclusion-based CFL-AA"),		"Enable inclusion-based CFL-AA"),
clEnumValN(CFLAAType::Both, "both",		clEnumValN(CFLAAType::Both, "both",
"Enable both variants of CFL-AA")));		"Enable both variants of CFL-AA")));

static cl::opt<bool> EnableLoopInterchange(		static cl::opt<bool> EnableLoopInterchange(
"enable-loopinterchange", cl::init(false), cl::Hidden,		"enable-loopinterchange", cl::init(false), cl::Hidden,
cl::desc("Enable the new, experimental LoopInterchange Pass"));		cl::desc("Enable the new, experimental LoopInterchange Pass"));

static cl::opt<bool>		static cl::opt<bool>
		hfinkelUnsubmitted Not Done Reply Inline Actions I wouldn't bother with this. It should be safe if no functions with the vector-variants attribute are present (and, if they are, then you need to run the pass of things might not link). hfinkel: I wouldn't bother with this. It should be safe if no functions with the vector-variants…
EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,		EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,
cl::desc("Enable preparation for ThinLTO."));		cl::desc("Enable preparation for ThinLTO."));

static cl::opt<bool> RunPGOInstrGen(		static cl::opt<bool> RunPGOInstrGen(
"profile-generate", cl::init(false), cl::Hidden,		"profile-generate", cl::init(false), cl::Hidden,
cl::desc("Enable PGO instrumentation."));		cl::desc("Enable PGO instrumentation."));

static cl::opt<std::string>		static cl::opt<std::string>
▲ Show 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	if (OptLevel == 0) {
addExtensionsToPM(EP_EnabledOnOptLevel0, MPM);		addExtensionsToPM(EP_EnabledOnOptLevel0, MPM);

// Rename anon globals to be able to export them in the summary.		// Rename anon globals to be able to export them in the summary.
// This has to be done after we add the extensions to the pass manager		// This has to be done after we add the extensions to the pass manager
// as there could be passes (e.g. Adddress sanitizer) which introduce		// as there could be passes (e.g. Adddress sanitizer) which introduce
// new unnamed globals.		// new unnamed globals.
if (PrepareForThinLTO)		if (PrepareForThinLTO)
MPM.add(createNameAnonGlobalPass());		MPM.add(createNameAnonGlobalPass());

		MPM.add(createVecClonePass());

return;		return;
}		}

// Add LibraryInfo if we have some.		// Add LibraryInfo if we have some.
if (LibraryInfo)		if (LibraryInfo)
MPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));		MPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));

addInitialAliasAnalysisPasses(MPM);		addInitialAliasAnalysisPasses(MPM);
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
MPM.add(createLoopDistributePass());		MPM.add(createLoopDistributePass());

		// Insert a VF trip count loop around the body of functions that have vector
		// variants.
		MPM.add(createVecClonePass());

MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));		MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
MPM.add(createLoopLoadEliminationPass());		MPM.add(createLoopLoadEliminationPass());

// FIXME: Because of #pragma vectorize enable, the passes below are always		// FIXME: Because of #pragma vectorize enable, the passes below are always
// inserted in the pipeline, even when the vectorizer doesn't run (ex. when		// inserted in the pipeline, even when the vectorizer doesn't run (ex. when
▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

lib/Transforms/Utils/CMakeLists.txt

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMTransformUtils
SimplifyInstructions.cpp		SimplifyInstructions.cpp
SimplifyLibCalls.cpp		SimplifyLibCalls.cpp
SplitModule.cpp		SplitModule.cpp
StripNonLineTableDebugInfo.cpp		StripNonLineTableDebugInfo.cpp
SymbolRewriter.cpp		SymbolRewriter.cpp
UnifyFunctionExitNodes.cpp		UnifyFunctionExitNodes.cpp
Utils.cpp		Utils.cpp
ValueMapper.cpp		ValueMapper.cpp
		VecClone.cpp
VNCoercion.cpp		VNCoercion.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Utils		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Utils

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/Transforms/Utils/VecClone.cpp

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//=------- VecClone.cpp - Vector function to loop transform -- C++ --------=//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				// ===--------------------------------------------------------------------=== //
				///
				/// \file
				/// This pass inserts the body of a vector function inside a vector length
				/// trip count scalar loop for functions that are declared SIMD. The pass
				/// currently follows the gcc vector ABI requirements for name mangling
				/// encodings, but will be extended in the future to also support the Intel
				/// vector ABI. References to both ABIs can be found here:
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Some overview with an example/overview of what the pass accomplish would be nice. It is already captured in the RFC, but for someone reading the code it'd be helpful. mehdi_amini: Some overview with an example/overview of what the pass accomplish would be nice. It is already…
				///
				/// https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt
				/// https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf
				///
				/// Conceptually, this pass performs the following transformation:
				///
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think it should be spelled-out more clearly that it is required for correctness to have all the variants mentioned in the attribute list to be codegen'd, even at O0, because even if the vectorizer does not run in this module, it may run in another module that would expect these variant to exist. mehdi_amini: I think it should be spelled-out more clearly that it is required for correctness to have all…
				/// Before Translation:
				///
				/// main.cpp
				///
				/// #pragma omp declare simd uniform(a) linear(k)
				/// extern float dowork(float *a, float b, int k);
				///
				/// float a[4096];
				/// float b[4096];
				/// int main() {
				/// int k;
				/// for (k = 0; k < 4096; k++) {
				/// b[k] = k;
				/// }
				/// #pragma clang loop vectorize(enable)
				/// for (k = 0; k < 4096; k++) {
				/// a[k] = k * 0.5;
				/// a[k] = dowork(a, b[k], k);
				hfinkelUnsubmitted Not Done Reply Inline Actions This function doesn't return anything, and I think that makes the example hard to understand. hfinkel: This function doesn't return anything, and I think that makes the example hard to understand.
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions I agree, this was a bad bug with the example. This has been fixed and updated with an example that demonstrates the behavior of all three types of parameters (uniform, linear, and vector). Please let me know if the bitcasts shown in the example are confusing. The purpose of the bitcasts is so that the loop can appear in scalar form to the loop vectorizer. i.e., we can use a .scalar gep and index to reference vectors in the loop. mmasten: I agree, this was a bad bug with the example. This has been fixed and updated with an example…
				/// }
				/// }
				///
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Hello Matt, thank you for working on this. <nitpicking>Should the comment be updated with the `"vector-variants"="_ZGVbM4ul_, ZGVbN4ul_"` syntax? </nitpicking> Cheers, Francesco fpetrogalli: Hello Matt, thank you for working on this. <nitpicking>Should the comment be updated with the…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Thanks Francesco. Fixed. mmasten: Thanks Francesco. Fixed.
				/// dowork.cpp
				///
				/// #pragma omp declare simd uniform(a) linear(k) #0
				/// float dowork(float *a, float b, int k) {
				/// return sinf(a[k]) + b;
				/// }
				///
				/// attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4uvl_",
				/// "ZGVbN4uvl_", ... }
				///
				/// After Translation:
				///
				/// dowork.cpp
				///
				/// // Non-masked variant
				///
				/// <VL x float> "_ZGVbN4uvl_dowork(float *a, <VL x float> b, int k) {
				/// alloc <VL x float> vec_ret;
				/// alloc <VL x float> vec_b;
				/// // casts from vector to scalar pointer allows loop to be in a scalar form
				/// // that can be vectorized easily.
				/// ret_cast = bitcast <VL x float>* vec_ret to float*;
				/// vec_b_cast = bitcast <VL x float>* vec_b to float*;
				/// store <VL x float> b, <VL x float>* vec_b;
				/// for (int i = 0; i < VL; i++, k++) {
				/// ret_cast[i] = sinf(a[k]) + vec_b_cast[i];
				/// }
				/// return vec_ret;
				/// }
				///
				/// // Masked variant
				///
				/// <VL x float> "_ZGVbM4uvl_dowork(float *a, <VL x float> b, int k, <VL x int>
				/// mask) {
				/// alloc <VL x float> vec_ret;
				/// alloc <VL x float> vec_b;
				/// ret_cast = bitcast <VL x float>* vec_ret to float*;
				/// vec_b_cast = bitcast <VL x float>* vec_b to float*;
				/// store <VL x float> b, <VL x float>* vec_b;
				/// for (int i = 0; i < VL; i++, k++) {
				/// if (mask[i] != 0)
				/// ret_cast[i] = sinf(a[k]) + vec_b_cast[i];
				/// }
				/// return vec_ret;
				/// }
				hfinkelUnsubmitted Not Done Reply Inline Actions What cleans up the extra loads/stores here (if we're optimizing)? Are assuming that there's a run of SROA afterwards, or does InstCombine do a good job here, or something else? hfinkel: What cleans up the extra loads/stores here (if we're optimizing)? Are assuming that there's a…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions InstCombine will do it. I also know that EarlyCSE will work. mmasten: InstCombine will do it. I also know that EarlyCSE will work.
				///
				// ===--------------------------------------------------------------------=== //

				// This pass is flexible enough to recognize whether or not parameters have been
				// registerized so that the users of the parameter can be properly updated. For
				// instance, we need to know where the users of linear parameters are so that
				// the stride can be added to them.
				//
				// In the following example, %i and %x are used directly by %add directly, so
				// in this case the pass can just look for users of %i and %x.
				//
				// define i32 @foo(i32 %i, i32 %x) #0 {
				// entry:
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't understand what's going on here. Is it not possible to write the transformation such that it's semantically correct regardless of whether we've run mem2reg? This is not always an either/or situation. hfinkel: I don't understand what's going on here. Is it not possible to write the transformation such…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Perhaps the comments were written very clearly. The pass has been written to handle parameters regardless of whether or not mem2reg has run. Hopefully, the updated comments are better. mmasten: Perhaps the comments were written very clearly. The pass has been written to handle parameters…
				// %add = add nsw i32 %x, %i
				// ret i32 %add
				// }
				//
				// When parameters have not been registerized, parameters are used indirectly
				// through a store/load of the parameter to/from memory that has been allocated
				// for them in the function. Thus, in this case, the pass looks for users of
				// %0 and %1.
				//
				// define i32 @foo(i32 %i, i32 %x) #0 {
				// entry:
				// %i.addr = alloca i32, align 4
				// %x.addr = alloca i32, align 4
				// store i32 %i, i32* %i.addr, align 4
				// store i32 %x, i32* %x.addr, align 4
				// %0 = load i32, i32* %x.addr, align 4
				// %1 = load i32, i32* %i.addr, align 4
				// %add = add nsw i32 %0, %1
				// ret i32 %add
				// }
				//
				// The pass must run at all optimization levels because it is possible that
				// a loop calling the vector function is vectorized, but the vector function
				// itself is not vectorized. For example, above main.cpp may be compiled at
				// -O2, but dowork.cpp may be compiled at -O0. Therefore, it is required that
				// the attribute list for the vector function specify all variants that must
				// be generated by this pass so as to avoid any linking problems. This pass
				// also serves to canonicalize the input IR to the loop vectorizer.

				#include "llvm/Transforms/Utils/VecClone.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/Passes.h"
				#include "llvm/Analysis/VectorUtils.h"
				#include "llvm/Analysis/VectorVariant.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/PassRegistry.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include <map>
				#include <set>

				#define SV_NAME "vec-clone"
				#define DEBUG_TYPE "VecClone"

				using namespace llvm;

				VecClone::VecClone() : ModulePass(ID) {}

				void VecClone::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.addRequired<TargetTransformInfoWrapperPass>();
				}

				void VecClonePass::getFunctionsToVectorize(llvm::Module &M,
				FunctionVariants &FuncVars) {

				// FuncVars will contain a one-to-many mapping between the original scalar
				// function and the vector variant encoding strings (represented as
				hfinkelUnsubmitted Not Done Reply Inline Actions 1-many -> one-to-many hfinkel: 1-many -> one-to-many
				// attributes). The encodings correspond to functions that will be created by
				// the caller of this function as vector versions of the original function.
				// For example, if foo() is a function marked as a simd function, it will have
				// several vector variant encodings like: "_ZGVbM4_foo", "_ZGVbN4_foo",
				// "_ZGVcM8_foo", "_ZGVcN8_foo", "_ZGVdM8_foo", "_ZGVdN8_foo", "_ZGVeM16_foo",
				// "_ZGVeN16_foo". The caller of this function will then clone foo() and name
				// the clones using the above name manglings. The variant encodings correspond
				// to differences in masked/non-masked execution, vector length, and target
				// vector register size, etc. For more details, please refer to the vector
				// function abi references listed at the top of this file.

				for (auto &F : M.functions()) {
				if (F.hasFnAttribute("vector-variants")) {
				Attribute Attr = F.getFnAttribute("vector-variants");
				StringRef VariantsStr = Attr.getValueAsString();
				SmallVector<StringRef, 8> Variants;
				VariantsStr.split(Variants, ',');
				for (auto V : Variants)
				FuncVars[&F].push_back(V);
				}
				}
				}

				template Constant *
				VecClonePass::getConstantValue<int>(Type *Ty, LLVMContext &Context, int Val);
				template Constant *
				VecClonePass::getConstantValue<float>(Type *Ty, LLVMContext &Context, float Val);
				template Constant *
				VecClonePass::getConstantValue<double>(Type *Ty, LLVMContext &Context, double Val);
				template <typename T>
				Constant VecClonePass::getConstantValue(Type Ty, LLVMContext &Context, T Val) {
				Constant *ConstVal = nullptr;

				if (Ty->isIntegerTy()) {
				ConstVal = ConstantInt::get(Ty, Val);
				} else if (Ty->isFloatTy()) {
				ConstVal = ConstantFP::get(Ty, Val);
				}

				assert(ConstVal && "Could not generate constant for type");

				return ConstVal;
				}

				Function *VecClonePass::CloneFunction(Module &M, Function &F,
				VectorVariant &V) {

				std::string VariantName = V.generateFunctionName(F.getName());
				if (M.getFunction(VariantName))
				return nullptr;

				FunctionType *OrigFunctionType = F.getFunctionType();
				Type *ReturnType = F.getReturnType();
				Type *CharacteristicType = calcCharacteristicType(F, V);

				// Expand return type to vector.
				if (!ReturnType->isVoidTy())
				ReturnType = VectorType::get(ReturnType, V.getVlen());

				std::vector<VectorKind> ParmKinds = V.getParameters();
				SmallVector<Type *, 4> ParmTypes;
				std::vector<VectorKind>::iterator VKIt = ParmKinds.begin();
				for (auto *ParamTy : OrigFunctionType->params()) {
				if (VKIt->isVector())
				ParmTypes.push_back(
				VectorType::get(ParamTy->getScalarType(), V.getVlen()));
				else
				ParmTypes.push_back(ParamTy);
				++VKIt;
				}

				if (V.isMasked()) {
				Type *MaskVecTy = VectorType::get(CharacteristicType, V.getVlen());
				ParmTypes.push_back(MaskVecTy);
				}

				FunctionType *CloneFuncType = FunctionType::get(ReturnType, ParmTypes, false);
				Function *Clone = Function::Create(
				CloneFuncType, GlobalValue::ExternalLinkage, VariantName, F.getParent());

				ValueToValueMapTy Vmap;
				Function::arg_iterator NewArgIt = Clone->arg_begin();
				for (auto &Arg: F.args()) {
				NewArgIt->setName(Arg.getName());
				Vmap[&Arg] = &*NewArgIt;
				++NewArgIt;
				}

				if (V.isMasked()) {
				Argument &MaskArg = *NewArgIt;
				MaskArg.setName("mask");
				}

				SmallVector<ReturnInst *, 8> Returns;
				CloneFunctionInto(Clone, &F, Vmap, true, Returns);

				// Remove incompatible argument attributes (applied to the scalar argument,
				// does not apply to its vector counterpart). This must be done after cloning
				// the function because CloneFunctionInto() transfers parameter attributes
				// from the original parameters in the Vmap.
				AttrBuilder AB;
				uint64_t Idx = 0;
				for (auto &Arg : Clone->args()) {
				Type *ArgType = Arg.getType();
				AB = AttributeFuncs::typeIncompatible(ArgType);
				Clone->removeParamAttrs(Idx, AB);
				++Idx;
				}

				AB = AttributeFuncs::typeIncompatible(ReturnType);
				Clone->removeAttributes(AttributeList::ReturnIndex, AB);

				// Don't propagate vector variant attributes to the cloned function. These
				// attributes are kept for the original function, however, because they
				// are needed by the vectorizer.
				Clone->removeFnAttr("vector-variants");

				DEBUG(dbgs() << "After Cloning and Function Signature widening\n");
				DEBUG(Clone->dump());

				return Clone;
				}

				PHINode *VecClonePass::generateLoopForFunctionBody(
				Function Clone, BasicBlock EntryBlock, BasicBlock *LoopBlock,
				BasicBlock LoopExitBlock, BasicBlock ReturnBlock, int VectorLength) {

				hfinkelUnsubmitted Not Done Reply Inline Actions Move this near the top of the function and check, before you generate code, that he function doesn't already exist. If it does, bail out (e.g., return a nullptr and the caller can move on to the next variant/function). hfinkel: Move this near the top of the function and check, before you generate code, that he function…
				// Create the phi node for the top of the loop block and add the back
				// edge to the loop from the loop exit.

				PHINode *Phi = PHINode::Create(Type::getInt32Ty(Clone->getContext()), 2,
				"index", &*LoopBlock->getFirstInsertionPt());

				Constant *Inc = ConstantInt::get(Type::getInt32Ty(Clone->getContext()), 1);
				Constant *IndInit =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), 0);

				Instruction *Induction = BinaryOperator::CreateAdd(Phi, Inc, "indvar");
				Induction->insertBefore(LoopExitBlock->getTerminator());

				Constant *VL =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), VectorLength);

				Instruction *VLCmp =
				new ICmpInst(CmpInst::ICMP_ULT, Induction, VL, "vl.cond");
				VLCmp->insertAfter(Induction);

				LoopExitBlock->getTerminator()->eraseFromParent();
				BranchInst::Create(LoopBlock, ReturnBlock, VLCmp, LoopExitBlock);

				Phi->addIncoming(IndInit, EntryBlock);
				Phi->addIncoming(Induction, LoopExitBlock);

				DEBUG(dbgs() << "After Loop Insertion\n");
				DEBUG(Clone->dump());

				return Phi;
				}

				bool VecClonePass::isSimpleFunction(Function *Clone, BasicBlock &EntryBlock) {
				// For really simple functions, there is no need to go through the process
				// of inserting a loop.

				// Example:
				//
				// void foo(void) {
				// return;
				// }
				//
				// No need to insert a loop for this case since it's basically a no-op. Just
				// clone the function and return. It's possible that we could have some code
				// inside of a vector function that modifies global memory. Let that case go
				// through.
				ReturnInst *RetInst = dyn_cast<ReturnInst>(EntryBlock.getTerminator());
				if (RetInst && Clone->getReturnType()->isVoidTy())
				return true;

				return false;
				}

				void VecClonePass::removeIncompatibleAttributes(Function *Clone) {
				for (auto &Arg : Clone->args()) {
				// For functions that only have a return instruction and are not void,
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't think you need the iterators; this can just be: for (auto &Arg : Clone->args()) and you use this pattern a lot (declaring two iterators and then using them in a for loop). In almost all of these cases, you should use a range-based for loop instead. hfinkel: I don't think you need the iterators; this can just be: for (auto &Arg : Clone->args()) and…
				// the return type is widened to vector. For this case, the returned
				// attribute becomes incompatible and must be removed.
				if (Clone->hasParamAttribute(Arg.getArgNo(), Attribute::Returned))
				Clone->removeParamAttr(Arg.getArgNo(), Attribute::Returned);
				}
				}

				void VecClonePass::insertSplitForMaskedVariant(
				Function *Clone,
				BasicBlock *LoopBlock,
				BasicBlock *LoopExitBlock,
				Instruction Mask, PHINode Phi) {

				BasicBlock *LoopThenBlock =
				LoopBlock->splitBasicBlock(LoopBlock->getFirstNonPHI(), "simd.loop.then");

				BasicBlock *LoopElseBlock = BasicBlock::Create(
				Clone->getContext(), "simd.loop.else", Clone, LoopExitBlock);

				BranchInst::Create(LoopExitBlock, LoopElseBlock);

				BitCastInst *BitCast = dyn_cast<BitCastInst>(Mask);
				PointerType *BitCastType = dyn_cast<PointerType>(BitCast->getType());
				Type *PointeeType = BitCastType->getElementType();

				GetElementPtrInst *MaskGep = GetElementPtrInst::Create(
				PointeeType, Mask, Phi, "mask.gep", LoopBlock->getTerminator());

				LoadInst *MaskLoad =
				new LoadInst(MaskGep, "mask.parm", LoopBlock->getTerminator());

				Type *CompareTy = MaskLoad->getType();
				Instruction *MaskCmp;
				Constant *Zero;

				// Generate the compare instruction to see if the mask bit is on. In ICC, we
				// use the movemask intrinsic which takes both float/int mask registers and
				// converts to an integer scalar value, one bit representing each element.
				// AVR construction will be complicated if this intrinsic is introduced here,
				// so the current solution is to just generate either an integer or floating
				// point compare instruction for now. This may change anyway if we decide to
				// go to a vector of i1 values for the mask. I suppose this would be one
				// positive reason to use vector of i1.
				if (CompareTy->isIntegerTy()) {
				Zero = getConstantValue(CompareTy, Clone->getContext(), 0);
				MaskCmp = new ICmpInst(LoopBlock->getTerminator(), CmpInst::ICMP_NE,
				MaskLoad, Zero, "mask.cond");
				} else if (CompareTy->isFloatingPointTy()) {
				Zero = getConstantValue(CompareTy, Clone->getContext(), 0.0);
				MaskCmp = new FCmpInst(LoopBlock->getTerminator(), CmpInst::FCMP_UNE,
				MaskLoad, Zero, "mask.cond");
				} else {
				assert(0 && "Unsupported mask compare");
				}

				TerminatorInst *Term = LoopBlock->getTerminator();
				Term->eraseFromParent();
				BranchInst::Create(LoopThenBlock, LoopElseBlock, MaskCmp, LoopBlock);

				DEBUG(dbgs() << "After Split Insertion For Masked Variant\n");
				DEBUG(Clone->dump());
				}

				void VecClonePass::addLoopMetadata(BasicBlock *Latch, unsigned VF) {
				// This function sets the loop metadata for the new loop inserted around
				// the simd function body. This metadata includes disabling unrolling just
				// in case for some reason that unrolling occurs in between this pass and
				// the vectorizer. Also, the loop vectorization metadata is set to try
				// and force vectorization at the specified VF of the simd function.
				//
				// Set disable unroll metadata on the conditional branch of the loop latch
				// for the simd loop. The following is an example of what the loop latch
				// and Metadata will look like. The !llvm.loop marks the beginning of the
				hfinkelUnsubmitted Not Done Reply Inline Actions What happens if there's more than one return in the function? You might want, in that case, to create a new block with the return and convert all other returns to branches to that block. hfinkel: What happens if there's more than one return in the function? You might want, in that case, to…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions In all the test cases that I have used to this point, this type of re-wiring has already been done. Granted, I have mainly been testing some very simple multiple return functions, but if you have a test case where this happens it will be helpful. Thanks. mmasten: In all the test cases that I have used to this point, this type of re-wiring has already been…
				// loop Metadata and is always placed on the terminator of the loop latch.
				// (i.e., simd.loop.exit in this case). According to LLVM documentation, to
				// properly set the loop Metadata, the 1st operand of !16 must be a self-
				// reference to avoid some type of Metadata merging conflicts that have
				// apparently arisen in the past. This is part of LLVM history that I do not
				// know. Also, according to LLVM documentation, any Metadata nodes referring
				// to themselves are marked as distinct. As such, all Metadata corresponding
				// to a loop belongs to that loop alone and no sharing of Metadata can be
				// done across different loops.
				//
				// simd.loop.exit: ; preds = %simd.loop, %if.else, %if.then
				// %indvar = add nuw i32 %index, 1
				// %vl.cond = icmp ult i32 %indvar, 2
				// br i1 %vl.cond, label %simd.loop, label %simd.end.region, !llvm.loop !16
				//
				// !16 = distinct !{!16, !17}
				// !17 = !{!"llvm.loop.unroll.disable"}

				SmallVector<Metadata *, 4> MDs;

				// Reserve first location for self reference to the LoopID metadata node.
				MDs.push_back(nullptr);

				// Add unroll(disable) metadata to disable future unrolling.
				LLVMContext &Context = Latch->getContext();
				SmallVector<Metadata *, 2> DisableOps;
				DisableOps.push_back(MDString::get(Context, "llvm.loop.unroll.disable"));
				DisableOps.push_back(MDString::get(Context, "llvm.loop.vectorize.enable"));
				Metadata *Vals[] = {
				MDString::get(Context, "llvm.loop.vectorize.width"),
				ConstantAsMetadata::get(ConstantInt::get(Type::getInt32Ty(Context), VF))};
				DisableOps.push_back(MDNode::get(Context, Vals));
				MDNode *DisableNode = MDNode::get(Context, DisableOps);
				MDs.push_back(DisableNode);

				MDNode *NewLoopID = MDNode::get(Context, MDs);
				// Set operand 0 to refer to the loop id itself.
				NewLoopID->replaceOperandWith(0, NewLoopID);
				Latch->getTerminator()->setMetadata("llvm.loop", NewLoopID);
				}

				void VecClonePass::widenAllocaInstructions(
				Function *Clone,
				DenseMap<AllocaInst, Instruction> &AllocaMap,
				BasicBlock &EntryBlock,
				VectorVariant &Variant,
				const DataLayout &DL) {

				DenseMap<AllocaInst , Instruction >::iterator AllocaMapIt;
				SmallVector<StoreInst*, 4> StoresToRemove;

				for (auto &Arg : Clone->args()) {
				SmallVector<User*, 4> ArgUsers;
				for (auto *U : Arg.users()) {
				// Only update parameter users in the loop.
				if (Instruction *Inst = dyn_cast<Instruction>(U))
				if (Inst->getParent() != &EntryBlock)
				ArgUsers.push_back(U);
				}

				Type *ArgTy = Arg.getType();
				VectorType *VecArgType = dyn_cast<VectorType>(ArgTy);
				StringRef ArgName = Arg.getName();
				for (auto *U : ArgUsers) {
				// For non-optimized parameters, i.e., for parameters that are loads and
				hfinkelUnsubmitted Not Done Reply Inline Actions In addition to branches, you need to handle SwitchInst and IndirectBrInst (and, for the latter, you need to find any place where the address of the return block is taken, and replace it with the address of the LoopExitBlock). hfinkel: In addition to branches, you need to handle SwitchInst and IndirectBrInst (and, for the latter…
				// stores through memory (allocas), we need to know which alloca belongs
				// to which parameter. This can be done by finding the store of the
				// parameter to an alloca. Set up a map that maintains this relationship
				// so that we can update the users of the original allocas with the new
				// widened ones. When widening the allocas, vector parameters will be
				// stored to a vector alloca, and linear/uniform parameters will be
				// stored to an array, using the loop index as the "lane". Nothing else
				// needs to be done for optimized parameters. Later, this map will be
				// used to update all alloca users.
				StoreInst *StoreUser = dyn_cast<StoreInst>(U);
				LoadInst *LoadUser = dyn_cast<LoadInst>(U);
				AllocaInst *Alloca = nullptr;

				if (LoadUser)
				Alloca = dyn_cast<AllocaInst>(LoadUser->getPointerOperand());

				if (StoreUser)
				Alloca = dyn_cast<AllocaInst>(StoreUser->getPointerOperand());

				if (StoreUser && Alloca) {
				AllocaMapIt = AllocaMap.find(Alloca);
				if (AllocaMapIt == AllocaMap.end()) {
				if (VecArgType) {
				AllocaInst *VecAlloca = new AllocaInst(
				VecArgType, DL.getAllocaAddrSpace(), "vec." + ArgName,
				EntryBlock.getTerminator());
				StoreInst *VecStore = new StoreInst(&Arg, VecAlloca);
				VecStore->insertAfter(VecAlloca);
				PointerType *ElemTypePtr =
				PointerType::get(VecArgType->getElementType(),
				VecAlloca->getType()->getAddressSpace());
				BitCastInst *VecAllocaCast = new BitCastInst(
				VecAlloca, ElemTypePtr, VecAlloca->getName() + ".cast");
				VecAllocaCast->insertAfter(VecStore);
				AllocaMap[Alloca] = VecAllocaCast;
				StoresToRemove.push_back(StoreUser);
				} else {
				ArrayType *ArrType = ArrayType::get(ArgTy, Variant.getVlen());
				AllocaInst *ArrAlloca = new AllocaInst(
				ArrType, DL.getAllocaAddrSpace(), "arr." + ArgName,
				EntryBlock.getTerminator());
				AllocaMap[Alloca] = ArrAlloca;
				}
				}
				}
				}
				}

				// Remove the store of the parameter to the original alloca. A new one
				// was just created for the new alloca.
				for (auto *Store : StoresToRemove)
				hfinkelUnsubmitted Not Done Reply Inline Actions Use CreateAdd here so you can set both nuw and nsw on this increment. hfinkel: Use CreateAdd here so you can set both nuw and nsw on this increment.
				Store->eraseFromParent();
				}

				void VecClonePass::updateAllocaUsers(
				Function *Clone,
				PHINode *Phi,
				DenseMap<AllocaInst, Instruction> &AllocaMap) {

				SmallVector<User *, 10> AllocaUsers;
				for (auto Pair : AllocaMap) {
				AllocaInst *OldAlloca = Pair.first;
				for (auto *U : OldAlloca->users()) {
				if (isa<Instruction>(U))
				AllocaUsers.push_back(U);
				}
				}

				// Update all alloca users by doing an a -> &a[i] transformation. This
				// involves inserting a gep just before each use of the alloca. The only
				// exception is for vector stores to an alloca. These are moved to the
				// entry block of the function just after the widened alloca.
				//for (unsigned j = 0; j < AllocaUsers.size(); j++) {
				for (auto *U : AllocaUsers) {
				unsigned NumOps = U->getNumOperands();
				for (unsigned k = 0; k < NumOps; k++) {
				if (AllocaInst *OldAlloca = dyn_cast<AllocaInst>(U->getOperand(k))) {
				if (AllocaInst *NewAlloca =
				dyn_cast<AllocaInst>(AllocaMap[OldAlloca])) {
				// If this is an alloca for a linear/uniform parameter, then insert
				// a gep for the load/store and use the loop index to reference the
				// proper value for each "lane".
				SmallVector<Value *, 2> GepIndices;
				Constant *Idx0 =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), 0);
				GepIndices.push_back(Idx0);
				GepIndices.push_back(Phi);
				SequentialType *SeqTy =
				cast<SequentialType>(NewAlloca->getAllocatedType());
				GetElementPtrInst *AllocaGep =
				GetElementPtrInst::Create(SeqTy, NewAlloca, GepIndices,
				NewAlloca->getName() + ".gep");
				AllocaGep->insertBefore(cast<Instruction>(U));
				U->setOperand(k, AllocaGep);
				} else if (BitCastInst *NewAllocaCast =
				dyn_cast<BitCastInst>(AllocaMap[OldAlloca])) {
				SmallVector<Value *, 2> GepIndices;
				GepIndices.push_back(Phi);
				GetElementPtrInst *AllocaCastGep =
				GetElementPtrInst::Create(OldAlloca->getAllocatedType(),
				NewAllocaCast, GepIndices,
				NewAllocaCast->getName() + ".gep");
				AllocaCastGep->insertBefore(cast<Instruction>(U));
				U->setOperand(k, AllocaCastGep);
				} else {
				llvm_unreachable(
				"Expected array alloca for linear/uniform parameters or a "
				"cast of vector alloca for vector parameters");
				}
				}
				}
				}
				}

				void VecClonePass::updateParameterUsers(Function *Clone, VectorVariant &Variant,
				BasicBlock &EntryBlock, PHINode *Phi,
				const DataLayout &DL) {

				// Update non-alloca parameter users based on type of parameter. Any users of
				// the parameters that are also users of an alloca will not be updated again
				// here since this has already been done.
				std::vector<VectorKind> ParmKinds = Variant.getParameters();
				DenseMap<Argument , BitCastInst > VecParmCasts;
				DenseMap<Argument , BitCastInst >::iterator VecParmCastsIt;

				for (auto &Arg : Clone->args()) {
				SmallVector<User*, 4> ArgUsers;
				for (auto *U : Arg.users()) {
				// Only update parameter users in the loop.
				if (Instruction *Inst = dyn_cast<Instruction>(U))
				if (Inst->getParent() != &EntryBlock)
				ArgUsers.push_back(U);
				}

				Type *ArgTy = Arg.getType();
				unsigned ArgNo = Arg.getArgNo();
				StringRef ArgName = Arg.getName();
				VectorType *VecArgType = dyn_cast<VectorType>(ArgTy);
				for (unsigned j = 0; j < ArgUsers.size(); j++) {
				User *U = ArgUsers[j];
				if (ParmKinds[ArgNo].isVector()) {
				VecParmCastsIt = VecParmCasts.find(&Arg);
				if (VecParmCastsIt == VecParmCasts.end()) {
				AllocaInst *VecAlloca =
				new AllocaInst(VecArgType, DL.getAllocaAddrSpace(),
				"vec." + ArgName, EntryBlock.getTerminator());
				StoreInst *VecStore = new StoreInst(&Arg, VecAlloca);
				VecStore->insertAfter(VecAlloca);
				PointerType *ElemTypePtr =
				PointerType::get(VecArgType->getElementType(),
				VecAlloca->getType()->getAddressSpace());
				BitCastInst *VecAllocaCast = new BitCastInst(
				VecAlloca, ElemTypePtr, VecAlloca->getName() + ".cast");
				VecAllocaCast->insertAfter(VecStore);
				VecParmCasts[&Arg] = VecAllocaCast;
				hfinkelUnsubmitted Not Done Reply Inline Actions What happens if there is more than one store user? hfinkel: What happens if there is more than one store user?
				}
				GetElementPtrInst *VecAllocaCastGep = GetElementPtrInst::Create(
				VecArgType->getElementType(), VecParmCasts[&Arg], Phi,
				VecParmCasts[&Arg]->getName() + ".gep", cast<Instruction>(U));
				LoadInst *ArgElemLoad =
				new LoadInst(VecAllocaCastGep, "vec." + ArgName + ".elem");
				ArgElemLoad->insertAfter(VecAllocaCastGep);
				unsigned NumOps = U->getNumOperands();
				for (unsigned Op = 0; Op < NumOps; Op++) {
				if (U->getOperand(Op) == &Arg)
				U->setOperand(Op, ArgElemLoad);
				}
				} else if (ParmKinds[ArgNo].isLinear()) {
				int Stride = ParmKinds[ArgNo].getStride();
				Constant *StrideConst =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), Stride);
				Instruction *Mul =
				BinaryOperator::CreateMul(StrideConst, Phi, "stride.mul");
				Mul->insertBefore(cast<Instruction>(U));
				Value *UserOp = nullptr;
				if (ArgTy->isPointerTy()) {
				PointerType *ParmPtrType = dyn_cast<PointerType>(ArgTy);
				GetElementPtrInst *LinearParmGep = GetElementPtrInst::Create(
				ParmPtrType->getElementType(), &Arg, Mul, ArgName + ".gep");
				LinearParmGep->insertAfter(Mul);
				UserOp = LinearParmGep;
				} else {
				if (Mul->getType() != ArgTy) {
				CastInst *MulCast = CastInst::CreateSExtOrBitCast(
				Mul, ArgTy, Mul->getName() + ".cast");
				MulCast->insertAfter(Mul);
				Mul = MulCast;
				}
				BinaryOperator *Add =
				BinaryOperator::CreateAdd(&Arg, Mul, "stride.add");
				Add->insertAfter(Mul);
				UserOp = Add;
				}

				unsigned NumOps = U->getNumOperands();
				for (unsigned Op = 0; Op < NumOps; Op++) {
				if (U->getOperand(Op) == &Arg)
				U->setOperand(Op, UserOp);
				}
				}
				}
				}
				}

				bool VecClonePass::runImpl(Module &M, Function &F, VectorVariant &Variant) {

				DEBUG(dbgs() << "Before SIMD Function Cloning\n");
				DEBUG(F.dump());
				DEBUG(dbgs() << "Generating variant '" <<
				Variant.generateFunctionName(F.getName()) << "'\n\n");

				// Clone the original function.
				Function *Clone = CloneFunction(M, F, Variant);
				if (!Clone)
				return false;

				BasicBlock &EntryBlock = Clone->getEntryBlock();
				if (isSimpleFunction(Clone, EntryBlock))
				return false;

				// Remove any incompatible attributes that happen as part of widening
				// function vector parameters.
				removeIncompatibleAttributes(Clone);

				const DataLayout &DL = Clone->getParent()->getDataLayout();
				DenseMap<AllocaInst , Instruction > AllocaMap;
				// Split the entry block at the beginning and create a block for the
				// loop entry.
				BasicBlock *LoopBlock = EntryBlock.splitBasicBlock(EntryBlock.begin(),
				"simd.loop");

				// On the split, the alloca instructions are moved into LoopBlock. Move
				// them back to the entry block.
				SmallVector<AllocaInst *, 4> Allocas;
				SmallVector<StoreInst *, 4> VecStores;
				BasicBlock::iterator BBIt = LoopBlock->begin();
				BasicBlock::iterator BBEnd = LoopBlock->end();
				for (; BBIt != BBEnd; ++BBIt) {
				if (AllocaInst Alloca = dyn_cast<AllocaInst>(&BBIt))
				Allocas.push_back(Alloca);
				}
				for (auto *Alloca : Allocas)
				Alloca->moveBefore(EntryBlock.getTerminator());

				widenAllocaInstructions(Clone, AllocaMap, EntryBlock, Variant, DL);

				// Create a vector alloca for the return. The return type of the clone
				// has already been widened, so the type can be used directly.
				AllocaInst *VecRetAlloca = nullptr;
				Type *VecRetTy = Clone->getReturnType();
				if (!VecRetTy->isVoidTy()) {
				VecRetAlloca = new AllocaInst(VecRetTy, DL.getAllocaAddrSpace(),
				"vec.ret", EntryBlock.getTerminator());
				}

				// Find the basic block containing the return. We need to know where
				// to replace the return instruction with a store to the return vector
				// and where to split off a loop exit block containing the loop exit
				// condition.
				Function::iterator FuncIt = Clone->begin();
				Function::iterator FuncEnd = Clone->end();
				BasicBlock *ReturnBlock = nullptr;
				Instruction *RetInst = nullptr;
				unsigned NumRets = 1;
				for (; FuncIt != FuncEnd; ++FuncIt) {
				if (isa<ReturnInst>(FuncIt->getTerminator())) {
				// TODO: Haven't yet found (or created) a test case where there are
				// multiple ret instructions. Assert for now.
				assert(NumRets == 1 &&
				"Unsupported function due to multiple return instructions");
				ReturnBlock = &*FuncIt;
				RetInst = FuncIt->getTerminator();
				NumRets++;
				}
				}

				// Create a basic block that will contain the loop exit condition.
				BasicBlock *LoopExitBlock =
				ReturnBlock->splitBasicBlock(RetInst, "simd.loop.exit");

				// Create a new return block that will contain the load of the return
				// vector and the new return instruction.
				BasicBlock *NewReturnBlock =
				LoopExitBlock->splitBasicBlock(LoopExitBlock->getTerminator(), "return");

				// Generate the phi for the loop index, the loop index increment, and
				// loop exit condition and put these instructions in LoopExitBlock.
				PHINode *Phi = generateLoopForFunctionBody(Clone, &EntryBlock, LoopBlock,
				LoopExitBlock, NewReturnBlock,
				Variant.getVlen());

				// Generate the load from the return vector and new return instruction
				// and put them in the new return basic block.
				LoadInst *VecReturn =
				new LoadInst(VecRetAlloca, "vec.ret", NewReturnBlock);
				ReturnInst::Create(Clone->getContext(), VecReturn, NewReturnBlock);

				// Change the return instruction to a store to the return vector.
				Value *StoreVal = RetInst->getOperand(0);
				Type *StoreValTy = StoreVal->getType();
				PointerType *ElemTypePtr =
				PointerType::get(StoreValTy, DL.getAllocaAddrSpace());
				BitCastInst *RetAllocaCast = new BitCastInst(
				VecRetAlloca, ElemTypePtr, VecRetAlloca->getName() + ".cast");
				RetAllocaCast->insertAfter(VecRetAlloca);
				GetElementPtrInst *RetAllocaCastGep = GetElementPtrInst::Create(
				StoreValTy, RetAllocaCast, Phi, RetAllocaCast->getName() + ".gep");
				RetAllocaCastGep->insertBefore(ReturnBlock->getTerminator());
				StoreInst *RetStore = new StoreInst(RetInst->getOperand(0), RetAllocaCastGep);
				RetStore->insertAfter(RetAllocaCastGep);
				RetInst->eraseFromParent();

				updateAllocaUsers(Clone, Phi, AllocaMap);

				updateParameterUsers(Clone, Variant, EntryBlock, Phi, DL);

				// For masked variants, create a vector mask parameter and insert the mask
				// bit checks.
				if (Variant.isMasked()) {
				// Create a vector alloca for the mask parameter.
				Function::arg_iterator MaskParam = Clone->arg_end();
				MaskParam--;
				AllocaInst *MaskAlloca = new AllocaInst(MaskParam->getType(),
				DL.getAllocaAddrSpace(),
				"vec." + MaskParam->getName(),
				EntryBlock.getTerminator());
				StoreInst *MaskStore = new StoreInst(MaskParam, MaskAlloca);
				MaskStore->insertAfter(MaskAlloca);
				VectorType *MaskTy = cast<VectorType>(MaskParam->getType());
				PointerType *ElemTypePtr =
				PointerType::get(MaskTy->getElementType(), DL.getAllocaAddrSpace());
				BitCastInst *MaskAllocaCast = new BitCastInst(MaskAlloca,
				ElemTypePtr,"mask.cast");
				MaskAllocaCast->insertAfter(MaskStore);

				insertSplitForMaskedVariant(Clone, LoopBlock, LoopExitBlock,
				MaskAllocaCast, Phi);
				}

				// Remove old allocas
				for (auto Pair : AllocaMap) {
				AllocaInst *OldAlloca = Pair.first;
				OldAlloca->eraseFromParent();
				}

				// Prevent unrolling from kicking in before loop vectorization and force
				// vectorization of the loop to the VF of the simd function.
				addLoopMetadata(LoopExitBlock, Variant.getVlen());

				DEBUG(dbgs() << "After SIMD Function Cloning\n");
				DEBUG(Clone->dump());

				return true; // LLVM IR has been modified
				}

				bool VecClone::runOnModule(Module &M) {
				bool Changed = false;
				FunctionVariants FunctionsToVectorize;
				Impl.getFunctionsToVectorize(M, FunctionsToVectorize);
				for (auto Pair : FunctionsToVectorize) {
				Function &F = *(Pair.first);
				std::vector<StringRef> Variants = Pair.second;
				TargetTransformInfo *TTI =
				&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				for (auto V : Variants) {
				VectorVariant Variant(V, TTI);
				Changed \|= Impl.runImpl(M, F, Variant);
				}
				}

				return Changed;
				}

				PreservedAnalyses VecClonePass::run(Module &M,
				ModuleAnalysisManager &AM) {
				bool Changed = false;
				auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
				FunctionVariants FunctionsToVectorize;
				getFunctionsToVectorize(M, FunctionsToVectorize);
				for (auto Pair : FunctionsToVectorize) {
				Function &F = *(Pair.first);
				std::vector<StringRef> Variants = Pair.second;
				TargetTransformInfo *TTI = &FAM.getResult<TargetIRAnalysis>(F);
				for (auto V : Variants) {
				VectorVariant Variant(V, TTI);
				Changed \|= runImpl(M, F, Variant);
				}
				}

				if (Changed)
				return PreservedAnalyses::none();
				return PreservedAnalyses::all();
				}

				void VecClone::print(raw_ostream &OS, const Module *M) const {
				// TODO
				}

				ModulePass *llvm::createVecClonePass() { return new llvm::VecClone(); }

				char VecClone::ID = 0;

				static const char lv_name[] = "VecClone";
				INITIALIZE_PASS_BEGIN(VecClone, SV_NAME, lv_name, false /* modifies CFG */,
				false /* transform pass */)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(VecClone, SV_NAME, lv_name, false /* modififies CFG */,
				false /* transform pass */)

				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Coding style: no braces (other places as well). mehdi_amini: Coding style: no braces (other places as well).
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Spurious empty line (other places as well). mehdi_amini: Spurious empty line (other places as well).
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions So this will walk all the functions in the module and walk all the attributes and do a string comparison on all of these. I'm not sure if it is fine to pay this when this pass has nothing to do (i.e. the early exit should be fast). mehdi_amini: So this will walk all the functions in the module and walk all the attributes and do a string…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP switch has been used. Otherwise, I don't know of another way to figure out which functions will need to be generated. Do you have a suggestion? mmasten: We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions What about a SmallVector and move it out-of-the loop (the call to `clear()` below is already handling the reset between iterations). mehdi_amini: What about a SmallVector and move it out-of-the loop (the call to `clear()` below is already…

test/Transforms/LoopVectorize/masked_simd_func.ll

				; Note: Test the simd function caller side functionality. The function side vectorization is tested under VecClone.
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Any test that checks the caller side functionality should be be under the patch that enables the loop vectorizer to use the VecClone pass. fpetrogalli: Any test that checks the caller side functionality should be be under the patch that enables…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Thanks Francesco. Looks like I accidentally included this test in this patch. I looked to make sure it was already included in the LoopVectorize patch. mmasten: Thanks Francesco. Looks like I accidentally included this test in this patch. I looked to make…

				; RUN: opt < %s -vec-clone -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

				; CHECK: call <8 x i32> @_ZGVdM8vlu_dowork

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define i32 @dowork(i32 %b, i32 %k, i32 %c) #0 {
				entry:
				%add = add nsw i32 %b, %k
				%add1 = add nsw i32 %add, %c
				ret i32 %add1
				}

				; Function Attrs: noinline nounwind uwtable
				define i32 @main() local_unnamed_addr #1 {
				entry:
				%a = alloca [4096 x i32], align 16
				%b = alloca [4096 x i32], align 16
				%0 = bitcast [4096 x i32]* %a to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %0) #5
				%1 = bitcast [4096 x i32]* %b to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %1) #5
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv39 = phi i64 [ 0, %entry ], [ %indvars.iv.next40, %for.body ]
				%arrayidx = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv39
				%2 = trunc i64 %indvars.iv39 to i32
				store i32 %2, i32* %arrayidx, align 4, !tbaa !2
				%indvars.iv.next40 = add nuw nsw i64 %indvars.iv39, 1
				%exitcond41 = icmp eq i64 %indvars.iv.next40, 4096
				br i1 %exitcond41, label %for.end, label %for.body

				for.end: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 3
				%3 = load i32, i32* %arrayidx1, align 4, !tbaa !2
				br label %omp.inner.for.body

				omp.inner.for.body: ; preds = %omp.inner.for.inc, %for.end
				%indvars.iv36 = phi i64 [ 0, %for.end ], [ %indvars.iv.next37, %omp.inner.for.inc ]
				%4 = trunc i64 %indvars.iv36 to i32
				%rem = and i32 %4, 1
				%tobool = icmp eq i32 %rem, 0
				br i1 %tobool, label %omp.inner.for.inc, label %if.then

				if.then: ; preds = %omp.inner.for.body
				%arrayidx5 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv36
				%5 = load i32, i32* %arrayidx5, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%call = tail call i32 @dowork(i32 %5, i32 %4, i32 %3), !llvm.mem.parallel_loop_access !6
				%arrayidx7 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv36
				store i32 %call, i32* %arrayidx7, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				br label %omp.inner.for.inc

				omp.inner.for.inc: ; preds = %omp.inner.for.body, %if.then
				%indvars.iv.next37 = add nuw nsw i64 %indvars.iv36, 1
				%exitcond38 = icmp eq i64 %indvars.iv.next37, 4096
				br i1 %exitcond38, label %omp.inner.for.end, label %omp.inner.for.body, !llvm.loop !6

				omp.inner.for.end: ; preds = %omp.inner.for.inc
				br label %for.body11

				for.body11: ; preds = %for.body11, %omp.inner.for.end
				%indvars.iv = phi i64 [ 0, %omp.inner.for.end ], [ %indvars.iv.next, %for.body11 ]
				%arrayidx13 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv
				%6 = load i32, i32* %arrayidx13, align 4, !tbaa !2
				%call14 = tail call i32 (i8, ...) @printf(i8 getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %6)
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4096
				br i1 %exitcond, label %for.end17, label %for.body11

				for.end17: ; preds = %for.body11
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %1) #5
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %0) #5
				ret i32 0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #2

				declare i32 @printf(i8*, ...) #3

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN4vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN16vlu_dowork,_ZGVbM4vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM16vlu_dowork" }
				attributes #1 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { argmemonly nounwind }
				attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #4 = { nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = distinct !{!6, !7}
				!7 = !{!"llvm.loop.vectorize.enable", i1 true}

test/Transforms/LoopVectorize/simd_func.ll

				; Note: Test the simd function caller side functionality. The function side vectorization is tested under VecClone.

				; RUN: opt < %s -vec-clone -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

				; CHECK: call <8 x i32> @_ZGVdN8vlu_dowork

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define i32 @dowork(i32 %b, i32 %k, i32 %c) #0 {
				entry:
				%add = add nsw i32 %b, %k
				%add1 = add nsw i32 %add, %c
				ret i32 %add1
				}

				; Function Attrs: noinline nounwind uwtable
				define i32 @main() local_unnamed_addr #1 {
				entry:
				%a = alloca [4096 x i32], align 16
				%b = alloca [4096 x i32], align 16
				%0 = bitcast [4096 x i32]* %a to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %0) #5
				%1 = bitcast [4096 x i32]* %b to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %1) #5
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv38 = phi i64 [ 0, %entry ], [ %indvars.iv.next39, %for.body ]
				%arrayidx = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv38
				%2 = trunc i64 %indvars.iv38 to i32
				store i32 %2, i32* %arrayidx, align 4, !tbaa !2
				%indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
				%exitcond40 = icmp eq i64 %indvars.iv.next39, 4096
				br i1 %exitcond40, label %for.end, label %for.body

				for.end: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 3
				%3 = load i32, i32* %arrayidx1, align 4, !tbaa !2
				br label %omp.inner.for.body

				omp.inner.for.body: ; preds = %omp.inner.for.body, %for.end
				%indvars.iv35 = phi i64 [ 0, %for.end ], [ %indvars.iv.next36, %omp.inner.for.body ]
				%arrayidx5 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv35
				%4 = load i32, i32* %arrayidx5, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%5 = trunc i64 %indvars.iv35 to i32
				%call = tail call i32 @dowork(i32 %4, i32 %5, i32 %3), !llvm.mem.parallel_loop_access !6
				%arrayidx7 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv35
				store i32 %call, i32* %arrayidx7, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%exitcond37 = icmp eq i64 %indvars.iv.next36, 4096
				br i1 %exitcond37, label %omp.inner.for.end, label %omp.inner.for.body, !llvm.loop !6

				omp.inner.for.end: ; preds = %omp.inner.for.body
				br label %for.body11

				for.body11: ; preds = %for.body11, %omp.inner.for.end
				%indvars.iv = phi i64 [ 0, %omp.inner.for.end ], [ %indvars.iv.next, %for.body11 ]
				%arrayidx13 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv
				%6 = load i32, i32* %arrayidx13, align 4, !tbaa !2
				%call14 = tail call i32 (i8, ...) @printf(i8 getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %6)
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4096
				br i1 %exitcond, label %for.end17, label %for.body11

				for.end17: ; preds = %for.body11
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %1) #5
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %0) #5
				ret i32 0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #2

				declare i32 @printf(i8*, ...) #3

				attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN4vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN16vlu_dowork,_ZGVbM4vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM16vlu_dowork" }
				fpetrogalliUnsubmitted Not Done Reply Inline Actions I think we should unit test each variant, not a single test that picks up one vector function from a list of "vector-variants". fpetrogalli: I think we should unit test each variant, not a single test that picks up one vector function…
				attributes #1 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { argmemonly nounwind }
				attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #4 = { nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = distinct !{!6, !7}
				!7 = !{!"llvm.loop.vectorize.enable", i1 true}

test/Transforms/LoopVectorize/simd_func_scalar.ll

				; Note: Test the simd function caller side functionality. The function side vectorization is tested under VecClone.

				; RUN: opt < %s -vec-clone -force-vector-interleave=1 -loop-vectorize -S \| FileCheck %s

				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork
				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork
				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork
				; CHECK: extractelement <4 x i32>
				; CHECK: extractelement <4 x i32>
				; CHECK: call i32 @dowork

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define i32 @dowork(i32 %b, i32 %k, i32 %c) #0 {
				entry:
				%add = add nsw i32 %b, %k
				%add1 = add nsw i32 %add, %c
				ret i32 %add1
				}

				; Function Attrs: noinline nounwind uwtable
				define i32 @main() local_unnamed_addr #1 {
				entry:
				%a = alloca [4096 x i32], align 16
				%b = alloca [4096 x i32], align 16
				%0 = bitcast [4096 x i32]* %a to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %0) #5
				%1 = bitcast [4096 x i32]* %b to i8*
				call void @llvm.lifetime.start.p0i8(i64 16384, i8* nonnull %1) #5
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv38 = phi i64 [ 0, %entry ], [ %indvars.iv.next39, %for.body ]
				%arrayidx = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv38
				%2 = trunc i64 %indvars.iv38 to i32
				store i32 %2, i32* %arrayidx, align 4, !tbaa !2
				%indvars.iv.next39 = add nuw nsw i64 %indvars.iv38, 1
				%exitcond40 = icmp eq i64 %indvars.iv.next39, 4096
				br i1 %exitcond40, label %for.end, label %for.body

				for.end: ; preds = %for.body
				%arrayidx1 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 3
				%3 = load i32, i32* %arrayidx1, align 4, !tbaa !2
				br label %omp.inner.for.body

				omp.inner.for.body: ; preds = %omp.inner.for.body, %for.end
				%indvars.iv35 = phi i64 [ 0, %for.end ], [ %indvars.iv.next36, %omp.inner.for.body ]
				%arrayidx5 = getelementptr inbounds [4096 x i32], [4096 x i32]* %b, i64 0, i64 %indvars.iv35
				%4 = load i32, i32* %arrayidx5, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%5 = trunc i64 %indvars.iv35 to i32
				%call = tail call i32 @dowork(i32 %4, i32 %5, i32 %3), !llvm.mem.parallel_loop_access !6
				%arrayidx7 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv35
				store i32 %call, i32* %arrayidx7, align 4, !tbaa !2, !llvm.mem.parallel_loop_access !6
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%exitcond37 = icmp eq i64 %indvars.iv.next36, 4096
				br i1 %exitcond37, label %omp.inner.for.end, label %omp.inner.for.body, !llvm.loop !6

				omp.inner.for.end: ; preds = %omp.inner.for.body
				br label %for.body11

				for.body11: ; preds = %for.body11, %omp.inner.for.end
				%indvars.iv = phi i64 [ 0, %omp.inner.for.end ], [ %indvars.iv.next, %for.body11 ]
				%arrayidx13 = getelementptr inbounds [4096 x i32], [4096 x i32]* %a, i64 0, i64 %indvars.iv
				%6 = load i32, i32* %arrayidx13, align 4, !tbaa !2
				%call14 = tail call i32 (i8, ...) @printf(i8 getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), i32 %6)
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 4096
				br i1 %exitcond, label %for.end17, label %for.body11

				for.end17: ; preds = %for.body11
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %1) #5
				call void @llvm.lifetime.end.p0i8(i64 16384, i8* nonnull %0) #5
				ret i32 0
				}

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #2

				declare i32 @printf(i8*, ...) #3

				attributes #0 = { noinline norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN8vlu_dowork,_ZGVcN8vlu_dowork,_ZGVdN8vlu_dowork,_ZGVeN8vlu_dowork,_ZGVbM8vlu_dowork,_ZGVcM8vlu_dowork,_ZGVdM8vlu_dowork,_ZGVeM8vlu_dowork" }
				attributes #1 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { argmemonly nounwind }
				attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #4 = { nounwind }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}
				!6 = distinct !{!6, !7, !8}
				!7 = !{!"llvm.loop.vectorize.width", i32 4}
				!8 = !{!"llvm.loop.vectorize.enable", i1 true}

test/Transforms/VecClone/all_parm_types.ll

				; Test all different kinds of parameters (uniform, linear, vector), multiple uses of linear k, and that stride calculations can handle type conversions.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4uvl_dowork
				; CHECK: simd.loop:
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %stride.cast{{.}} = sext i32 %stride.mul{{.}}
				; CHECK: %stride.add{{.}} = add i64 %k, %stride.cast{{.}}
				; CHECK: %arrayidx = getelementptr inbounds float, float* %a, i64 %stride.add{{.*}}
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %stride.cast{{.}} = bitcast i32 %stride.mul{{.}} to float
				; CHECK: %stride.add{{.}} = fadd float %conv, %stride.cast{{.}}
				; CHECK: %add{{.}} = fadd float %add, %stride.add{{.}}

				; ModuleID = 'rfc.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define float @dowork(float* %a, float %b, i64 %k) #0 {
				entry:
				%arrayidx = getelementptr inbounds float, float* %a, i64 %k
				%0 = load float, float* %arrayidx, align 4, !tbaa !2
				%call = call float @sinf(float %0) #5
				%add = fadd float %call, %b
				%conv = sitofp i64 %k to float
				%add1 = fadd float %add, %conv
				ret float %add1
				}

				; Function Attrs: nounwind
				declare float @sinf(float) #1

				attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN4uvl_dowork,_ZGVcN8uvl_dowork,_ZGVdN8uvl_dowork,_ZGVeN16uvl_dowork,_ZGVbM4uvl_dowork,_ZGVcM8uvl_dowork,_ZGVdM8uvl_dowork,_ZGVeM16uvl_dowork" }
				attributes #1 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"float", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}

test/Transforms/VecClone/broadcast.ll

				; Check broadcast of a constant. The store of the constant should be moved inside of the loop.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4_foo
				; CHECK: simd.loop:
				; CHECK: store i32 99, i32* %ret.cast.gep

				; ModuleID = 'foo.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo() #0 {
				entry:
				ret i32 99
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM4_foo,_ZGVbN4_foo,_ZGVcM8_foo,_ZGVcN8_foo,_ZGVdM8_foo,_ZGVdN8_foo,_ZGVeM16_foo,_ZGVeN16_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/convert_linear.ll

				; Check handling of upconverting a linear (variable %i) to ensure stride calculation
				; is inserted correctly and the old convert (sext) uses the stride instead of the old
				; reference to %i.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN2vl_foo
				; CHECK: simd.loop:
				; CHECK: %0 = load i32, i32* %i.addr
				; CHECK-NEXT: %stride.mul = mul i32 1, %index
				; CHECK-NEXT: %stride.add = add i32 %0, %stride.mul
				; CHECK-NEXT: %conv = sext i32 %stride.add to i64

				; ModuleID = 'convert.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i64 @foo(i64 %x, i32 %i) #0 {
				entry:
				%x.addr = alloca i64, align 8
				%i.addr = alloca i32, align 4
				store i64 %x, i64* %x.addr, align 8
				store i32 %i, i32* %i.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%conv = sext i32 %0 to i64
				%1 = load i64, i64* %x.addr, align 8
				%add = add nsw i64 %conv, %1
				ret i64 %add
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM2vl_foo,_ZGVbN2vl_foo,_ZGVcM4vl_foo,_ZGVcN4vl_foo,_ZGVdM4vl_foo,_ZGVdN4vl_foo,_ZGVeM8vl_foo,_ZGVeN8vl_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/external_array.ll

				; Check to see that we are applying the correct updated linear index for an external array access gep.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4ul_foo
				; CHECK: simd.loop:
				; CHECK: %1 = load i32, i32* %i.addr
				; CHECK: %stride.mul = mul i32 1, %index
				; CHECK: %stride.add = add i32 %1, %stride.mul
				; CHECK: %idxprom = sext i32 %stride.add to i64
				; CHECK: %arrayidx = getelementptr inbounds [128 x i32], [128 x i32]* @ext_a, i64 0, i64 %idxprom
				; CHECK: store i32 %0, i32* %arrayidx

				; ModuleID = 'external_array_assign.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@ext_a = common global [128 x i32] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @foo(i32 %x, i32 %i) #0 {
				entry:
				%x.addr = alloca i32, align 4
				%i.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %i, i32* %i.addr, align 4
				%0 = load i32, i32* %x.addr, align 4
				%1 = load i32, i32* %i.addr, align 4
				%idxprom = sext i32 %1 to i64
				%arrayidx = getelementptr inbounds [128 x i32], [128 x i32]* @ext_a, i64 0, i64 %idxprom
				store i32 %0, i32* %arrayidx, align 4
				ret void
				}

				attributes #0 = { norecurse nounwind uwtable "vector-variants"="_ZGVbM4ul_foo,_ZGVbN4ul_foo,_ZGVcM8ul_foo,_ZGVcN8ul_foo,_ZGVdM8ul_foo,_ZGVdN8ul_foo,_ZGVeM16ul_foo,_ZGVeN16ul_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/linear.ll

				; Check to see that the linear parameter i is updated with the correct stride, indicated by a mul/add instruction sequence after the load.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4lu_foo
				; CHECK: simd.loop:
				; CHECK: %1 = load i32, i32* %i.addr
				; CHECK: %stride.mul = mul i32 1, %index
				; CHECK: %stride.add = add i32 %1, %stride.mul
				; CHECK: %add = add nsw i32 %0, %stride.add

				; ModuleID = 'linear.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %i, i32 %x) #0 {
				entry:
				%i.addr = alloca i32, align 4
				%x.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %x, i32* %x.addr, align 4
				%0 = load i32, i32* %x.addr, align 4
				%1 = load i32, i32* %i.addr, align 4
				%add = add nsw i32 %0, %1
				ret i32 %add
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM4lu_foo,_ZGVbN4lu_foo,_ZGVcM8lu_foo,_ZGVcN8lu_foo,_ZGVdM8lu_foo,_ZGVdN8lu_foo,_ZGVeM16lu_foo,_ZGVeN16lu_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/linear_mem2reg.ll

				; Check to see that the linear parameter i is updated with the correct stride when Mem2Reg is on.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4lu_foo
				; CHECK: simd.loop:
				; CHECK: %stride.mul = mul i32 1, %index
				; CHECK-NEXT: %stride.add = add i32 %i, %stride.mul
				; CHECK-NEXT: %add = add nsw i32 %x, %stride.add

				;ModuleID = 'linear.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %i, i32 %x) #0 {
				entry:
				%add = add nsw i32 %x, %i
				ret i32 %add
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM4lu_foo,_ZGVbN4lu_foo,_ZGVcM8lu_foo,_ZGVcN8lu_foo,_ZGVdM8lu_foo,_ZGVdN8lu_foo,_ZGVeM16lu_foo,_ZGVeN16lu_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/struct_linear_ptr.ll

				; Test that the stride is being applied correctly to struct field accesses.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4l_foo
				; CHECK: simd.loop:
				; CHECK: %0 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %s.addr.gep{{.}} = getelementptr %struct.my_struct, %struct.my_struct %0, i32 %stride.mul{{.*}}
				; CHECK: %field1 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %s.addr.gep{{.*}}, i32 0, i32 0
				; CHECK: %1 = load float, float* %field1, align 8
				; CHECK: %2 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %s.addr.gep{{.}} = getelementptr %struct.my_struct, %struct.my_struct %2, i32 %stride.mul{{.*}}
				; CHECK: %field5 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %s.addr.gep{{.*}}, i32 0, i32 4
				; CHECK: %3 = load float, float* %field5, align 8
				; CHECK: %add = fadd float %1, %3

				; ModuleID = 'struct_linear_ptr.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.my_struct = type { float, i8, i32, i16, float, i64 }

				; Function Attrs: nounwind uwtable
				define float @foo(%struct.my_struct* %s) #0 {
				entry:
				%s.addr = alloca %struct.my_struct*, align 8
				store %struct.my_struct* %s, %struct.my_struct** %s.addr, align 8
				%0 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				%field1 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %0, i32 0, i32 0
				%1 = load float, float* %field1, align 8
				%2 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				%field5 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %2, i32 0, i32 4
				%3 = load float, float* %field5, align 8
				%add = fadd float %1, %3
				ret float %add
				}

				attributes #0 = { norecurse nounwind readonly uwtable "vector-variants"="_ZGVbM4l_foo,_ZGVbN4l_foo,_ZGVcM8l_foo,_ZGVcN8l_foo,_ZGVdM8l_foo,_ZGVdN8l_foo,_ZGVeM16l_foo,_ZGVeN16l_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/two_vec_sum.ll

				; Do a sanity check on the structure of the LLVM that VecClone produces for the non-masked variant.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; Begin non-masked variant checking
				; NOTE: This test checks order very strictly and can change depending on optimization level used.
				; FYI, the IR here was generated using -O0 in the event an issue needs to be reproduced.

				; CHECK-LABEL: <4 x i32> @_ZGVbN4vv_vec_sum(<4 x i32> %i, <4 x i32> %j)
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %vec.i = alloca <4 x i32>
				; CHECK-NEXT: %vec.j = alloca <4 x i32>
				; CHECK-NEXT: %vec.retval = alloca <4 x i32>
				; CHECK-NEXT: store <4 x i32> %i, <4 x i32>* %vec.i
				; CHECK-NEXT: store <4 x i32> %j, <4 x i32>* %vec.j
				; CHECK-NEXT: %vec.i.cast = bitcast <4 x i32>* %vec.i to i32*
				; CHECK-NEXT: %vec.j.cast = bitcast <4 x i32>* %vec.j to i32*
				; CHECK-NEXT: %ret.cast = bitcast <4 x i32>* %vec.retval to i32*
				; CHECK-NEXT: br label %simd.loop

				; CHECK: simd.loop:
				; CHECK-NEXT: %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				; CHECK-NEXT: %vec.i.cast.gep = getelementptr i32, i32* %vec.i.cast, i32 %index
				; CHECK-NEXT: %0 = load i32, i32* %vec.i.cast.gep, align 4
				; CHECK-NEXT: %vec.j.cast.gep = getelementptr i32, i32* %vec.j.cast, i32 %index
				; CHECK-NEXT: %1 = load i32, i32* %vec.j.cast.gep, align 4
				; CHECK-NEXT: %add = add nsw i32 %0, %1
				; CHECK-NEXT: %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				; CHECK-NEXT: store i32 %add, i32* %ret.cast.gep
				; CHECK-NEXT: br label %simd.loop.exit

				; CHECK: simd.loop.exit:
				; CHECK-NEXT: %indvar = add nuw i32 %index, 1
				; CHECK-NEXT: %vl.cond = icmp ult i32 %indvar, 4
				; CHECK-NEXT: br i1 %vl.cond, label %simd.loop, label %return

				; CHECK: return:
				; CHECK-NEXT: %vec.ret.cast = bitcast i32* %ret.cast to <4 x i32>*
				; CHECK-NEXT: %vec.ret = load <4 x i32>, <4 x i32>* %vec.ret.cast
				; CHECK-NEXT: ret <4 x i32> %vec.ret

				; ModuleID = 'two_vec_sum.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @vec_sum(i32 %i, i32 %j) #0 {
				entry:
				%i.addr = alloca i32, align 4
				%j.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %j, i32* %j.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%1 = load i32, i32* %j.addr, align 4
				%add = add nsw i32 %0, %1
				ret i32 %add
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16vv_vec_sum" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/two_vec_sum_mask.ll

				; Do a sanity check on the structure of the LLVM that VecClone produces for the masked variant.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s
				; NOTE: This test checks order very strictly and can change depending on optimization level used.
				; FYI, the IR here was generated using -O0 in the event an issue needs to be reproduced.

				; Begin non-masked variant checking

				; CHECK-LABEL: <4 x i32> @_ZGVbM4vv_vec_sum(<4 x i32> %i, <4 x i32> %j, <4 x i32> %mask)
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %vec.i = alloca <4 x i32>
				; CHECK-NEXT: %vec.j = alloca <4 x i32>
				; CHECK-NEXT: %vec.mask = alloca <4 x i32>
				; CHECK-NEXT: %vec.retval = alloca <4 x i32>
				; CHECK-NEXT: store <4 x i32> %i, <4 x i32>* %vec.i, align 4
				; CHECK-NEXT: store <4 x i32> %j, <4 x i32>* %vec.j, align 4
				; CHECK-NEXT: store <4 x i32> %mask, <4 x i32>* %vec.mask
				; CHECK-NEXT: %vec.i.cast = bitcast <4 x i32>* %vec.i to i32*
				; CHECK-NEXT: %vec.j.cast = bitcast <4 x i32>* %vec.j to i32*
				; CHECK-NEXT: %ret.cast = bitcast <4 x i32>* %vec.retval to i32*
				; CHECK-NEXT: %mask.cast = bitcast <4 x i32>* %vec.mask to i32*
				; CHECK-NEXT: br label %simd.loop

				; CHECK: simd.loop:
				; CHECK-NEXT: %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				; CHECK-NEXT: %mask.gep = getelementptr i32, i32* %mask.cast, i32 %index
				; CHECK-NEXT: %mask.parm = load i32, i32* %mask.gep
				; CHECK-NEXT: %mask.cond = icmp ne i32 %mask.parm, 0
				; CHECK-NEXT: br i1 %mask.cond, label %simd.loop.then, label %simd.loop.else

				; CHECK: simd.loop.then:
				; CHECK-NEXT: %vec.i.cast.gep = getelementptr i32, i32* %vec.i.cast, i32 %index
				; CHECK-NEXT: %0 = load i32, i32* %vec.i.cast.gep, align 4
				; CHECK-NEXT: %vec.j.cast.gep = getelementptr i32, i32* %vec.j.cast, i32 %index
				; CHECK-NEXT: %1 = load i32, i32* %vec.j.cast.gep, align 4
				; CHECK-NEXT: %add = add nsw i32 %0, %1
				; CHECK-NEXT: %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				; CHECK-NEXT: store i32 %add, i32* %ret.cast.gep
				; CHECK-NEXT: br label %simd.loop.exit

				; CHECK: simd.loop.else:
				; CHECK-NEXT: br label %simd.loop.exit

				; CHECK: simd.loop.exit:
				; CHECK-NEXT: %indvar = add nuw i32 %index, 1
				; CHECK-NEXT: %vl.cond = icmp ult i32 %indvar, 4
				; CHECK-NEXT: br i1 %vl.cond, label %simd.loop, label %return

				; CHECK: return:
				; CHECK-NEXT: %vec.ret.cast = bitcast i32* %ret.cast to <4 x i32>*
				; CHECK-NEXT: %vec.ret = load <4 x i32>, <4 x i32>* %vec.ret.cast
				; CHECK-NEXT: ret <4 x i32> %vec.ret

				; ModuleID = 'two_vec_sum.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @vec_sum(i32 %i, i32 %j) #0 {
				entry:
				%i.addr = alloca i32, align 4
				%j.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %j, i32* %j.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%1 = load i32, i32* %j.addr, align 4
				%add = add nsw i32 %0, %1
				ret i32 %add
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16vv_vec_sum" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/two_vec_sum_mem2reg.ll

				; Check to be sure that when Mem2Reg is on that all updates to instructions referring to the original
				; parameter are updated correctly. When Mem2Reg is on, instructions will refer to the parameters
				; directly and not through a load, which is why this is tested separately.

				; Note: the LLVM IR used as input to this test has already had Mem2Reg applied to it, so no need to
				; do that here. This happens at higher optimization levels such as -O2.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; Begin non-masked variant checking

				; CHECK-LABEL: @_ZGVbN4vv_vec_sum
				; CHECK: simd.loop:
				; CHECK: %vec.i.cast.gep = getelementptr i32, i32* %vec.i.cast, i32 %index
				; CHECK: %vec.i.elem = load i32, i32* %vec.i.cast.gep
				; CHECK: %vec.j.cast.gep = getelementptr i32, i32* %vec.j.cast, i32 %index
				; CHECK: %vec.j.elem = load i32, i32* %vec.j.cast.gep
				; CHECK: %add = add nsw i32 %vec.i.elem, %vec.j.elem

				; ModuleID = 'two_vec_sum.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @vec_sum(i32 %i, i32 %j) #0 {
				entry:
				%add = add nsw i32 %i, %j
				ret i32 %add
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16vv_vec_sum" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/uniform.ll

				; Check to make sure the initial parameter store of the uniform parameter is sunk into the loop.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: <4 x i32> @_ZGVbN4u_foo(i32 %b)
				; CHECK: simd.loop:
				; CHECK: store i32 %b

				; ModuleID = 'uniform.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %b) #0 {
				entry:
				%b.addr = alloca i32, align 4
				store i32 %b, i32* %b.addr, align 4
				%0 = load i32, i32* %b.addr, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %b.addr, align 4
				%1 = load i32, i32* %b.addr, align 4
				ret i32 %1
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4u_foo,_ZGVbN4u_foo,_ZGVcM8u_foo,_ZGVcN8u_foo,_ZGVdM8u_foo,_ZGVdN8u_foo,_ZGVeM16u_foo,_ZGVeN16u_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/vector_ptr.ll

				; Test that vector of pointers are handled with correctly in loop and that incompatible function return/arg attributes are removed.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN2v_dowork
				; CHECK: simd.loop:
				; CHECK: %vec.p.cast.gep = getelementptr float, float* %vec.p.cast, i32 %index
				; CHECK: %vec.p.elem = load float, float* %vec.p.cast.gep
				; CHECK: %add.ptr = getelementptr inbounds float, float* %vec.p.elem, i64 1
				; CHECK: %ret.cast.gep = getelementptr float, float* %ret.cast, i32 %index
				; CHECK: store float* %add.ptr, float** %ret.cast.gep

				source_filename = "vector_ptr.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: norecurse nounwind readnone uwtable
				define nonnull float* @dowork(float* readnone %p) local_unnamed_addr #0 {
				entry:
				%add.ptr = getelementptr inbounds float, float* %p, i64 1
				ret float* %add.ptr
				}

				attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN2v_dowork,_ZGVcN4v_dowork,_ZGVdN4v_dowork,_ZGVeN8v_
				dowork,_ZGVbM2v_dowork,_ZGVcM4v_dowork,_ZGVdM4v_dowork,_ZGVeM8v_dowork" }

test/Transforms/VecClone/void_foo.ll

				; Check to make sure we can handle void foo() function

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: void @_ZGVbN4_foo()
				; CHECK: entry:
				; CHECK: ret void

				; ModuleID = 'foo.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define void @foo() #0 {
				entry:
				ret void
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4_foo1,_ZGVbN4_foo1,_ZGVcM8_foo1,_ZGVcN8_foo1,_ZGVdM8_foo1,_ZGVdN8_foo1,_ZGVeM16_foo1,_ZGVeN16_foo1" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

tools/bugpoint/bugpoint.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	#endif
initializeObjCARCOpts(Registry);		initializeObjCARCOpts(Registry);
initializeVectorization(Registry);		initializeVectorization(Registry);
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
		initializeVecClonePass(Registry);

#ifdef LINK_POLLY_INTO_TOOLS		#ifdef LINK_POLLY_INTO_TOOLS
polly::initializePollyPasses(Registry);		polly::initializePollyPasses(Registry);
#endif		#endif

if (std::getenv("bar") == (char*) -1) {		if (std::getenv("bar") == (char*) -1) {
InitializeAllTargets();		InitializeAllTargets();
InitializeAllTargetMCs();		InitializeAllTargetMCs();
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

tools/opt/opt.cpp

Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeObjCARCOpts(Registry);		initializeObjCARCOpts(Registry);
initializeVectorization(Registry);		initializeVectorization(Registry);
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
		initializeVecClonePass(Registry);
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
initializeScalarizeMaskedMemIntrinPass(Registry);		initializeScalarizeMaskedMemIntrinPass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
▲ Show 20 Lines • Show All 391 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

VecClone PassNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 131670

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/Analysis/VectorVariant.h

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Utils/VecClone.h

lib/Analysis/CMakeLists.txt

lib/Analysis/TargetTransformInfo.cpp

lib/Analysis/VectorVariant.cpp

lib/Passes/PassBuilder.cpp

lib/Passes/PassRegistry.def

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Utils/CMakeLists.txt

lib/Transforms/Utils/VecClone.cpp

test/Transforms/LoopVectorize/masked_simd_func.ll

test/Transforms/LoopVectorize/simd_func.ll

test/Transforms/LoopVectorize/simd_func_scalar.ll

test/Transforms/VecClone/all_parm_types.ll

test/Transforms/VecClone/broadcast.ll

test/Transforms/VecClone/convert_linear.ll

test/Transforms/VecClone/external_array.ll

test/Transforms/VecClone/linear.ll

test/Transforms/VecClone/linear_mem2reg.ll

test/Transforms/VecClone/struct_linear_ptr.ll

test/Transforms/VecClone/two_vec_sum.ll

test/Transforms/VecClone/two_vec_sum_mask.ll

test/Transforms/VecClone/two_vec_sum_mem2reg.ll

test/Transforms/VecClone/uniform.ll

test/Transforms/VecClone/vector_ptr.ll

test/Transforms/VecClone/void_foo.ll

tools/bugpoint/bugpoint.cpp

tools/opt/opt.cpp

VecClone Pass
Needs ReviewPublic