This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
6
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
37
VectorVariant.h
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/Utils/
-
Utils/
8
VecClone.h
-
lib/
-
Analysis/
-
CMakeLists.txt
-
TargetTransformInfo.cpp
-
VectorVariant.cpp
-
Target/X86/
-
X86/
-
X86TargetTransformInfo.h
2
X86TargetTransformInfo.cpp
-
Transforms/
-
IPO/
2
PassManagerBuilder.cpp
-
Utils/
-
CMakeLists.txt
23
VecClone.cpp
-
test/Transforms/VecClone/
-
Transforms/
-
VecClone/
-
all_parm_types.ll
-
broadcast.ll
-
convert_linear.ll
-
external_array.ll
-
linear.ll
-
linear_mem2reg.ll
-
struct_linear_ptr.ll
-
two_vec_sum.ll
-
two_vec_sum_mask.ll
-
two_vec_sum_mem2reg.ll
-
uniform.ll
-
vector_ptr.ll
-
void_foo.ll
-
tools/
-
bugpoint/
-
bugpoint.cpp
-
opt/
-
opt.cpp

Differential D22792

VecClone Pass
Needs ReviewPublic

Authored by mmasten on Jul 25 2016, 5:37 PM.

Download Raw Diff

Details

Reviewers

mzolotukhin
Ayal
hfinkel
javed.absar

Summary

This work is part of an RFC sent by Xinmin back on 3/2/2016 regarding explicit function vectorization. The VecClone pass translates functions marked with "#pragma omp declare simd" into vector length trip count loops. Please see the RFC for details. It can be found at http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html.

Diff Detail

Event Timeline

mmasten updated this revision to Diff 65447.Jul 25 2016, 5:37 PM

mmasten retitled this revision from to VecClone Pass.

mmasten updated this object.

mmasten added reviewers: mzolotukhin, hfinkel, Ayal.

Herald added subscribers: mzolotukhin, mehdi_amini. · View Herald TranscriptJul 25 2016, 5:37 PM

Thanks for the nicely commented code and the provided tests.
A few inlined comments after a quick overview.

include/llvm/Analysis/VectorUtils.h
146 ↗	(On Diff #65447)	Why do all these need to be public APIs?
include/llvm/Transforms/Utils/VecClone.h
12	Why does this need to be in a public header?
lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	This is all not very inefficient: `getVectorVariantAttributes` creates a temporary vector and copy attributes in it, just for iterating over a few lines later to convert these as strings (you can't keep StringRef because the attribute was copied I guess). FuncVars is queried two times for no apparent good reason. It seems that the only use for this function is to populate a list of function to "vectorize". Populating a "vector<Function *>" should be enough for that. A nice property would be to be able to figure easily/efficiently if a function needs to be "vectorized". Maybe reworking the attribute to be more "structured" instead of separate strings.
lib/Transforms/Utils/VecClone.cpp
15	Some overview with an example/overview of what the pass accomplish would be nice. It is already captured in the RFC, but for someone reading the code it'd be helpful.
21	I think it should be spelled-out more clearly that it is required for correctness to have all the variants mentioned in the attribute list to be codegen'd, even at O0, because even if the vectorizer does not run in this module, it may run in another module that would expect these variant to exist.
1432	So this will walk all the functions in the module and walk all the attributes and do a string comparison on all of these. I'm not sure if it is fine to pay this when this pass has nothing to do (i.e. the early exit should be fast).
1436	Coding style: no braces (other places as well).
1439	Spurious empty line (other places as well).
1487	What about a SmallVector and move it out-of-the loop (the call to `clear()` below is already handling the reset between iterations).

Thanks for the comments, Mehdi. I had some other things come up, but I'm making some corrections now.

mmasten added inline comments.Sep 1 2016, 10:44 AM

include/llvm/Analysis/VectorUtils.h
146 ↗	(On Diff #65447)	They don't need to be. I can move these inside the VecClone class. Some of these could eventually become more generalized utilities, but I'll change it for now.
include/llvm/Transforms/Utils/VecClone.h
12	Do you mean that the class definition should be moved to VecClone.cpp?
lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	I will change #2. We need a little more than vector<Function*> here because each simd function will have multiple vector variants. FuncVars is a 1-many mapping of the original function to the string encodings corresponding to the variants that will be generated later. The string representation is essentially what is defined in https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf. We can easily tell which functions need to be vectorized by checking if any of these encodings exist as attributes on the original function. Can you elaborate on what you mean by "more structured"? Are you suggested an alternative representation to string-based function attributes? Thanks
lib/Transforms/Utils/VecClone.cpp
1432	We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP switch has been used. Otherwise, I don't know of another way to figure out which functions will need to be generated. Do you have a suggestion?

mehdi_amini added inline comments.Sep 1 2016, 12:11 PM

lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	Let me clarify what I meant with a `vector<Function >`. I meant that my take on `getFunctionsToVectorize` was that it sole purpose was to set a single* entry in `FuncVars`, and this entry is a `std::vector<std::string>` containing the list of variants. This is used exactly once here: for (auto& pair : FunctionsToVectorize) { Function& F = pair.first; DeclaredVariants &DeclaredVariants = pair.second; If `FunctionsToVectorize` was a vector of `Function `, you could get the list of `DeclaredVariants` on the fly while iterating on the attributes, without creating any temporary "string list".

mehdi_amini added inline comments.Sep 1 2016, 12:17 PM

lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	To come back to the "more structured", the question is "how to get the list of function that need variants very quickly/cheaply". Having to look at all the attributes and do any kind of string processing when we need to generate a variant is not a problem, I'm more interested in doing as little work as possible when there is no variant. Maybe having an attribute on the function which is "hasVariants" and could be queried cheaply could be a solution. I'd have to look at how attributes works internally again. (Ideally I'd like a storage `key->value` where the key would be "variants" and the value would be a list of variants to generate, instead of the flat list where variants strings are mixed with the others and recognized by "magic" pattern matching)

mmasten updated this revision to Diff 73523.Oct 4 2016, 11:53 AM

mmasten marked an inline comment as done.

Herald added subscribers: modocache, mgorny, beanz. · View Herald TranscriptOct 4 2016, 11:53 AM

sodeh added a subscriber: sodeh.Nov 16 2016, 1:46 AM

simoll added a subscriber: simoll.Jan 23 2017, 1:22 AM

hfinkel added inline comments.Jan 27 2017, 4:14 AM

include/llvm/Analysis/VectorVariant.h
140	There's a lot of target information here in the target-independent code. Given that we're not going to vectorize without a target code model regardless, I'd like to see this information pushing into TargetTransformInfo. VecClone can then use TTI to convert the particular ISA tags into information about vector lengths, etc. Other architectures that are adapting this scheme can then extend this in a natural way.
lib/Analysis/VectorUtils.cpp
564 ↗	(On Diff #65447)	I'd like to come back to this. I agree with Mehdi, having unadorned mangled names as attributes isn't the right design - the "magic" pattern matching is unnecessary. It would seem much better to use something like: attributes #0 = { "vector-variants"="_ZGVbM4l_foo,_ZGVbN4l_foo,_ZGVcM8l_foo,_ZGVcN8l_foo,_ZGVdM8l_foo,_ZGVdN8l_foo,_ZGVeM16l_foo,_ZGVeN16l_foo" instead of: attributes #0 = { hasvectorvariants "_ZGVbM4l_foo" "_ZGVbN4l_foo" "_ZGVcM8l_foo" "_ZGVcN8l_foo" "_ZGVdM8l_foo" "_ZGVdN8l_foo" "_ZGVeM16l_foo" "_ZGVeN16l_foo" exactly like we do for "target-features".

New changes are:

Update function attributes to "vector-variants"="<variant list>" format.
Move target-specific code in the VectorVariant class to TTI.

Thanks for the feedback, Hal. I made the changes you suggested.

Hahnfeld added a subscriber: Hahnfeld.Mar 21 2017, 6:16 AM

fpetrogalli added a subscriber: fpetrogalli.Sep 1 2017, 2:59 AM

fpetrogalli added inline comments.Sep 1 2017, 3:14 AM

lib/Transforms/Utils/VecClone.cpp
42	Hello Matt, thank you for working on this. <nitpicking>Should the comment be updated with the `"vector-variants"="_ZGVbM4ul_, ZGVbN4ul_"` syntax? </nitpicking> Cheers, Francesco

Hello again,

I think it would be good to have references to the document that describes the vector ABI. Is it the one at [1]?

If so, I think it would be good to cover with tests the "linear clause" cases of Table 1 in section 2.4, "Element Data Type to Vector Data Type Mapping".

Francesco

[1] https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf

fpetrogalli added inline comments.Sep 1 2017, 3:46 AM

lib/Target/X86/X86TargetTransformInfo.cpp
2850	Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'. Are you basing this on [2] for gcc compatibility? Why do [1] and [2] differ? Francesco [1] https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf [2] https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt

hfinkel added inline comments.Sep 1 2017, 4:04 PM

lib/Transforms/Utils/VecClone.cpp
39	This function doesn't return anything, and I think that makes the example hard to understand.
100	I don't understand what's going on here. Is it not possible to write the transformation such that it's semantically correct regardless of whether we've run mem2reg? This is not always an either/or situation.

Also, does the fact that we now have the VPlan infrastructure in the vectorizer change, in any way, how we'd like to approach this problem?

Hello - gentle ping on this patch. I think it is an important addition to the compiler. Do we need to reactivate the discussion in the mailing list?

Francesco

egarcia added a subscriber: egarcia.Oct 23 2017, 4:38 PM

Hi Francesco,

I'm working on updating my patch based on the recent comments and other modifications that have been made since the last patch was submitted. I should have it ready in the next couple of days.

Thanks,

Matt

Hahnfeld added a subscriber: llvm-commits.Oct 27 2017, 7:32 AM

mmasten added inline comments.Oct 31 2017, 12:26 PM

lib/Target/X86/X86TargetTransformInfo.cpp
2850	Currently, the implementation is based on gcc compatibility. However, it would be nice to extend to support both gcc and icc. The IsaClass values are different because of the calling convention differences. Intel icc uses regcall calling convention and gcc uses standard calling convention. I've added references to both ABI documents.
lib/Transforms/Utils/VecClone.cpp
39	I agree, this was a bad bug with the example. This has been fixed and updated with an example that demonstrates the behavior of all three types of parameters (uniform, linear, and vector). Please let me know if the bitcasts shown in the example are confusing. The purpose of the bitcasts is so that the loop can appear in scalar form to the loop vectorizer. i.e., we can use a .scalar gep and index to reference vectors in the loop.
42	Thanks Francesco. Fixed.
100	Perhaps the comments were written very clearly. The pass has been written to handle parameters regardless of whether or not mem2reg has run. Hopefully, the updated comments are better.

mmasten updated this revision to Diff 121035.Oct 31 2017, 12:29 PM

Herald added a subscriber: javed.absar. · View Herald TranscriptOct 31 2017, 12:29 PM

Hello,

thank you for updating the patch.

I have added one more comment, and I also have a question. Could you please add (here or in a different patch) the plumbing needed in the LoopVectorizer to make the functions available for vectorizationvia this pass? Apologies if you have already done it and I am just missing it.

Francesco

include/llvm/Analysis/VectorVariant.h
34–37	We will have to support OpenMP 4.5 linear modifiers, which are rendered as 2-char tokens in the mangled names. Woudn't it better to avoid #defines of chars and instead use enums inside the VectorKind class?

fhahn added a subscriber: fhahn.Nov 6 2017, 6:11 AM

mmasten added inline comments.Nov 6 2017, 11:49 AM

include/llvm/Analysis/VectorVariant.h
34–37	Ok, I can change this to an enum inside the VectorKind class. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. I also have a patch for clang that enables the "vector-variants" attribute support.

fpetrogalli added inline comments.Nov 6 2017, 2:25 PM

include/llvm/Analysis/VectorVariant.h
34–37	Ok, I can change this to an enum inside the VectorKind class. Thank you. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. That sounds great, thanks! I also have a patch for clang that enables the "vector-variants" attribute support. Clang is already generating the list of mangled vector names as string attributes. I believe that you need to rearrange the code so that strings get produced in the vector-variants attribute.

tschuett added a subscriber: tschuett.Nov 7 2017, 3:20 AM

a.elovikov added a subscriber: a.elovikov.Nov 8 2017, 1:08 PM

mmasten added inline comments.Nov 27 2017, 4:38 PM

include/llvm/Analysis/VectorVariant.h
34–37	Hi Francesco, I have the LoopVectorize part of this done. Are there any objections to making it part of this review? Thanks, Matt

fpetrogalli added inline comments.Nov 28 2017, 12:41 AM

include/llvm/Analysis/VectorVariant.h
34–37	Hello Matt, please don't add code to this review. I'd prefer to see the changes related to the Vectorizer in a separate patch. Kind regards, Francesco

mmasten mentioned this in D40575: LoopVectorize support for simd functions.Nov 28 2017, 1:11 PM

Moved calcCharacteristicType() function to VectorUtils so that VecClone and LV can share.
Removed vector-variant function attributes in LV instead of VecClone because LV needs to see them.

mmasten added inline comments.Nov 28 2017, 1:17 PM

include/llvm/Analysis/VectorVariant.h
34–37	Thanks Francesco. I made a couple of updates to this patch and created a new patch for the LV side of things. The LV patch is revision D40575.

mmasten mentioned this in D40577: Clang support for simd functions.Nov 28 2017, 1:38 PM

In general, I think the VecClone pass is too complicated because it tries to handle the "optimized code" vs. "non-optimized code" cases separately. I don't think we should (or, in a theoretical sense, can) do that. We should have a uniform algorithm to handle all incoming IR. I think that we can do something like this:

Split the entry block at the top and move all allocas in the original entry block to the new entry block.
For each constant-sized alloca in the entry block, expand it by a factor of VL. For vectorizable types, you can do an alloca of the vector type. Otherwise, use an alloca to generate an array of VL items (i.e., use VL as the alloca's second parameter). Generate an alloca of <RetTy x VL> to hold the return value.
Generate a new loop around all of the rest of the function (for i = [0, VL-1]) and a new return block (which loads the value from the return-value alloca and returns it).
Replace all uses of each entry-block alloca, a, with &a[i]. Remove the old allocas.
All uses of function parameters are now inside the loop. vector parameters will get a vector alloca and store in the entry block, and uses in the loop of the parameter will be replaced by load param[i]. linear parameters get replaced by (p+i). uniform parameters are unchanged.
Replace the returns with a store to RetVal[i] and a branch to the loop-exit block.

Something like that should work for all functions, optimized or not.

include/llvm/Analysis/TargetTransformInfo.h
632	This is X86 specific, and doesn't seem to belong here (also, it's not used anywhere).
647	This shouldn't be an enum like this. We don't need X86-specific things in the TTI interface. It looks like you've done a good job at making this opaque, so you can move this enum itself into the X86TTI implementation. The only thing we need here is the definition of UnknownISA (maybe define that to be an integer = -1).
652	UnknownISA
915	This shouldn't start with a capital letter. Either name it `isaClassMaxRegisterWidth` or `getISAClassMaxRegisterWidth`. I prefer the latter.
918	This shouldn't start with a capital letter. Either name it `isaClassToString` or `convertISAClassToString`. I prefer the latter.
include/llvm/Analysis/VectorVariant.h
35	These preprocessor should just be constexpr/static const ints.
include/llvm/Transforms/Utils/VecClone.h
27	Don't use all caps for these names. IT_Alloca or Alloca could work.
73	Please use consistent terminology: marked as SIMD -> requires a vector variant.
lib/Transforms/IPO/PassManagerBuilder.cpp
35	You also need to add the pass to the new pass manager: lib/Passes/PassBuilder.cpp
98	I wouldn't bother with this. It should be safe if no functions with the vector-variants attribute are present (and, if they are, then you need to run the pass of things might not link).
lib/Transforms/Utils/VecClone.cpp
87	What cleans up the extra loads/stores here (if we're optimizing)? Are assuming that there's a run of SROA afterwards, or does InstCombine do a good job here, or something else?
163	1-many -> one-to-many
290	Move this near the top of the function and check, before you generate code, that he function doesn't already exist. If it does, bail out (e.g., return a nullptr and the caller can move on to the next variant/function).
346	I don't think you need the iterators; this can just be: for (auto &Arg : Clone->args()) and you use this pattern a lot (declaring two iterators and then using them in a for loop). In almost all of these cases, you should use a range-based for loop instead.
419	What happens if there's more than one return in the function? You might want, in that case, to create a new block with the return and convert all other returns to branches to that block.
484	In addition to branches, you need to handle SwitchInst and IndirectBrInst (and, for the latter, you need to find any place where the address of the return block is taken, and replace it with the address of the LoopExitBlock).
535	Use CreateAdd here so you can set both nuw and nsw on this increment.
639	What happens if there is more than one store user?

Thanks for the comments, Hal. Just to clarify your point #2, I think what you're saying is that we should start from a common parameter representation; i.e., parameters should be loaded/stored through memory. Please correct me if I'm wrong. I certainly think this would be a great way to reduce the complexity of the algorithm. The remainder of items in your list should already be covered, but some tweaking may be involved.

In D22792#962822, @mmasten wrote:

Thanks for the comments, Hal.

No problem. Thanks for working on this!

Just to clarify your point #2, I think what you're saying is that we should start from a common parameter representation; i.e., parameters should be loaded/stored through memory. Please correct me if I'm wrong. I certainly think this would be a great way to reduce the complexity of the algorithm. The remainder of items in your list should already be covered, but some tweaking may be involved.

For point #2, I'm saying that we should take all local stack allocations and make them wider by a factor of VL. Thinking about this as having VL simultaneously-running copies of the function, one per vector lane, each of those gets a separate "lane" of the local stack allocations. In point #5, I sketched how I'd handle parameters (I'm not exactly sure what you mean by common representation, as different kinds of parameters do require different handling (i.e., vector, uniform, scalar)). What is true is that, for unoptimized code, where the function arguments are generally stored in local stack allocations, all of those stores are now just inside the loop with everything else, so nothing special needs to happen. Does that make sense?

Ok, after doing some experimentation I believe I understand where you're heading with this. Once I have done some more refactoring I'll post a new version of VecClone for review to make sure we're on the same page.

kmitropo added a subscriber: kmitropo.Jan 24 2018, 4:16 PM

Herald added a subscriber: hintonda. · View Herald TranscriptJan 24 2018, 4:16 PM

mmasten added inline comments.Jan 26 2018, 4:09 PM

lib/Transforms/Utils/VecClone.cpp
87	InstCombine will do it. I also know that EarlyCSE will work.
419	In all the test cases that I have used to this point, this type of re-wiring has already been done. Granted, I have mainly been testing some very simple multiple return functions, but if you have a test case where this happens it will be helpful. Thanks.

Extensive update to the VecClone algorithm based on Hal's feedback. VecClone pass is now supported through the new pass manager. Other minor code changes made.

The latest update includes some pretty extensive rework of the VecClone algorithm that Hal suggested. I also added support for VecClone in the new pass manager and since it is my first time doing this, I would welcome any specific comments on whether or not this was done correctly. Please note that I'm still working on fixing existing tests and will be adding new ones, but I wanted to make sure the overall algorithm is headed in the right direction. Thanks all.

Just a gentle reminder to have a look at the latest VecClone algorithm. Thanks, Matt.

Hi Matt - overall the patch looks good, I just have a couple of comments.

I think you should remove some of the testing you do on the vectorizer side of things in the vectorizer patch, not here. Here you should be testing only the cloning of the function.

Moreover, I would like to see more testing happening in the following cases:

when there is only one vector variant listed in the vector-variants attribute
when the vector variant listed in the attribute is not supported by the target and the cloning does not happen (say you are compiling for SSE but you only have AVX512 vector functions listed in the attribute).

Moreover, I have added one comment about using IR vector types instead of integers to maps ISAs to vector register sizes. That would probably make extending this pass to SVE easier because for vector length agnostic types we will need to do some symbolic computation on vector sizes.

Thank you,

Francesco

include/llvm/Analysis/TargetTransformInfo.h
918	Could you replace information about register size to use IR vector types? For example, you could use <16 x i8> for 128-bit wide registers, or <64 x i8> for 512-bit registers. You could then base all the computations of the VLEN of a vector function on that. I am asking because that will make easier the handling of Vector Length Agnostic (VLA) vector functions that we will end up using for targets like the AArch64 Scalable Vector Extension (SVE).
test/Transforms/LoopVectorize/masked_simd_func.ll
1 ↗	(On Diff #131670)	Any test that checks the caller side functionality should be be under the patch that enables the loop vectorizer to use the VecClone pass.
test/Transforms/LoopVectorize/simd_func.ll
83 ↗	(On Diff #131670)	I think we should unit test each variant, not a single test that picks up one vector function from a list of "vector-variants".

mmasten added inline comments.Apr 26 2018, 11:04 AM

test/Transforms/LoopVectorize/masked_simd_func.ll
1 ↗	(On Diff #131670)	Thanks Francesco. Looks like I accidentally included this test in this patch. I looked to make sure it was already included in the LoopVectorize patch.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 26 2018, 11:04 AM

joel_k_jones added a subscriber: joel_k_jones.Aug 20 2018, 4:40 PM

vchuravy added a subscriber: vchuravy.Oct 11 2018, 4:09 AM

ABataev added a subscriber: ABataev.Apr 12 2019, 11:27 AM

ABataev added inline comments.

include/llvm/Analysis/VectorVariant.h
39	Description of the class?
42	Explicitly set the base type to `char`
49	Use member initializers
70	`VectorKind(const VectorKind &Other) = default;`
76	No need to use `\brief` tag, you must remove it
78	const function
81	const function
84	const function
87	const function
92	const function
95	const function
98	const function
101	const function
104	const function
112	Why not `SmallString`?
113	Why not `llvm::raw_svector_ostream`?
133	`char`->`ParmKind`
145	Use `SmallVector` or `ArrayRef`
147	`std::string`->`StringRef`. Is this target-specific?
153	const function
156	const function
159	const function
162	const function
167	`SmallString`
167	const function
187	const function
194	const function
207	Target-specific?
213	Target-specific?
include/llvm/Transforms/Utils/VecClone.h
32	SmallVector
35	DenseMap
42	const function
52	const function

zsrkmyn added a subscriber: zsrkmyn.Mar 19 2020, 9:11 AM

Re ping

riccibruno added a subscriber: riccibruno.Jun 21 2020, 1:25 PM

Nuullll mentioned this in D141650: [VectorUtils] Enhance VFABI demangling API.Jan 18 2023, 7:08 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

57 lines

TargetTransformInfoImpl.h

17 lines

VectorVariant.h

231 lines

InitializePasses.h

1 line

LinkAllPasses.h

2 lines

Transforms/

Utils/

VecClone.h

236 lines

lib/

Analysis/

CMakeLists.txt

1 line

TargetTransformInfo.cpp

20 lines

VectorVariant.cpp

112 lines

Target/

X86/

X86TargetTransformInfo.h

6 lines

X86TargetTransformInfo.cpp

80 lines

Transforms/

IPO/

PassManagerBuilder.cpp

11 lines

Utils/

CMakeLists.txt

1 line

VecClone.cpp

1727 lines

test/

Transforms/

VecClone/

46 lines

19 lines

32 lines

35 lines

29 lines

22 lines

40 lines

59 lines

71 lines

two_vec_sum_mem2reg.ll

31 lines

uniform.ll

25 lines

vector_ptr.ll

25 lines

void_foo.ll

19 lines

tools/

bugpoint/

bugpoint.cpp

1 line

opt/

opt.cpp

1 line

Diff 121035

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines	enum OperandValueKind {
OK_UniformValue, // Operand is uniform (splat of a value).		OK_UniformValue, // Operand is uniform (splat of a value).
OK_UniformConstantValue, // Operand is uniform constant.		OK_UniformConstantValue, // Operand is uniform constant.
OK_NonUniformConstantValue // Operand is a non uniform constant value.		OK_NonUniformConstantValue // Operand is a non uniform constant value.
};		};

/// \brief Additional properties of an operand's values.		/// \brief Additional properties of an operand's values.
enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };		enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };

		/// \brief Targets defined in the vector function ABI.
		enum TargetProcessor {
		hfinkelUnsubmitted Not Done Reply Inline Actions This is X86 specific, and doesn't seem to belong here (also, it's not used anywhere). hfinkel: This is X86 specific, and doesn't seem to belong here (also, it's not used anywhere).
		Pentium4, // ISA extension = SSE2, ISA class = XMM
		Pentium4SSE3, // ISA extension = SSE3, ISA class = XMM
		Core2DuoSSSE3, // ISA extension = SSSE3, ISA class = XMM
		Core2DuoSSE41, // ISA extension = SSE4_1, ISA class = XMM
		CoreI7SSE42, // ISA extension = SSE4_2, ISA class = XMM
		Core2ndGenAVX, // ISA extension = AVX, ISA class = YMM1
		Core3rdGenAVX, // ISA extension = AVX, ISA class = YMM1
		Core4thGenAVX, // ISA extension = AVX2, ISA class = YMM2
		Mic, // ISA extension = Xeon Phi, ISA class = MIC(ZMM)
		FutureCpu22, // ISA extension = AVX512, ISA class = ZMM
		FutureCpu23, // ISA extension = AVX512, ISA class = ZMM
		};

		/// ISA classes defined in the vector function ABI.
		enum ISAClass {
		hfinkelUnsubmitted Not Done Reply Inline Actions This shouldn't be an enum like this. We don't need X86-specific things in the TTI interface. It looks like you've done a good job at making this opaque, so you can move this enum itself into the X86TTI implementation. The only thing we need here is the definition of UnknownISA (maybe define that to be an integer = -1). hfinkel: This shouldn't be an enum like this. We don't need X86-specific things in the TTI interface. It…
		XMM, // (SSE2)
		YMM1, // (AVX1)
		YMM2, // (AVX2)
		ZMM, // (MIC)
		ISAClassesNum
		hfinkelUnsubmitted Not Done Reply Inline Actions UnknownISA hfinkel: UnknownISA
		};

/// \return The number of scalar or vector registers that the target has.		/// \return The number of scalar or vector registers that the target has.
/// If 'Vectors' is true, it returns the number of vector registers. If it is		/// If 'Vectors' is true, it returns the number of vector registers. If it is
/// set to false, it returns the number of scalar registers.		/// set to false, it returns the number of scalar registers.
unsigned getNumberOfRegisters(bool Vector) const;		unsigned getNumberOfRegisters(bool Vector) const;

/// \return The width of the largest scalar or vector register type.		/// \return The width of the largest scalar or vector register type.
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;

▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const;		VectorType *VecTy) const;

/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes stores or has a better vector factor.		/// SizeInBytes stores or has a better vector factor.
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const;		VectorType *VecTy) const;

		/// \returns The maximum vector register width based on ISAClass \p Class,
		/// as defined in the vector function ABI.
		hfinkelUnsubmitted Not Done Reply Inline Actions This shouldn't start with a capital letter. Either name it `isaClassMaxRegisterWidth` or `getISAClassMaxRegisterWidth`. I prefer the latter. hfinkel: This shouldn't start with a capital letter. Either name it `isaClassMaxRegisterWidth` or…
		unsigned maximumSizeofISAClassVectorRegister(ISAClass Class, Type *Ty) const;

		/// \returns The encoded ISA class for the mangled vector variant name based
		hfinkelUnsubmitted Not Done Reply Inline Actions This shouldn't start with a capital letter. Either name it `isaClassToString` or `convertISAClassToString`. I prefer the latter. hfinkel: This shouldn't start with a capital letter. Either name it `isaClassToString` or…
		fpetrogalliUnsubmitted Not Done Reply Inline Actions Could you replace information about register size to use IR vector types? For example, you could use <16 x i8> for 128-bit wide registers, or <64 x i8> for 512-bit registers. You could then base all the computations of the VLEN of a vector function on that. I am asking because that will make easier the handling of Vector Length Agnostic (VLA) vector functions that we will end up using for targets like the AArch64 Scalable Vector Extension (SVE). fpetrogalli: Could you replace information about register size to use IR vector types? For example, you…
		/// on \p IsaClass.
		char encodeISAClass(ISAClass IsaClass) const;

		/// \returns The ISAClass from the character encoded \p IsaClass of the
		/// mangled vector variant function name.
		ISAClass decodeISAClass(char IsaClass) const;

		/// \returns The target legalized type of \P Ty based on ISAClass \p IsaClass.
		Type* promoteToSupportedType(Type *Ty, ISAClass IsaClass) const;

/// Flags describing the kind of vector reduction.		/// Flags describing the kind of vector reduction.
struct ReductionFlags {		struct ReductionFlags {
ReductionFlags() : IsMaxOp(false), IsSigned(false), NoNaN(false) {}		ReductionFlags() : IsMaxOp(false), IsSigned(false), NoNaN(false) {}
bool IsMaxOp; ///< If the op a min/max kind, true if it's a max operation.		bool IsMaxOp; ///< If the op a min/max kind, true if it's a max operation.
bool IsSigned; ///< Whether the operation is a signed int reduction.		bool IsSigned; ///< Whether the operation is a signed int reduction.
bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.		bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.
};		};

▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const = 0;		unsigned AddrSpace) const = 0;
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
		virtual unsigned maximumSizeofISAClassVectorRegister(ISAClass Class,
		Type *Ty) const = 0;
		virtual char encodeISAClass(ISAClass IsaClass) const = 0;
		virtual ISAClass decodeISAClass(char IsaClass) const = 0;
		virtual Type* promoteToSupportedType(Type *Ty, ISAClass IsaClass) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual int getInstructionLatency(const Instruction *I) = 0;		virtual int getInstructionLatency(const Instruction *I) = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
▲ Show 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);		return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
}		}
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}
		unsigned maximumSizeofISAClassVectorRegister(ISAClass Class,
		Type *Ty) const override {
		return Impl.maximumSizeofISAClassVectorRegister(Class, Ty);
		}
		char encodeISAClass(ISAClass IsaClass) const override {
		return Impl.encodeISAClass(IsaClass);
		}
		ISAClass decodeISAClass(char IsaClass) const override {
		return Impl.decodeISAClass(IsaClass);
		}
		Type* promoteToSupportedType(Type *Ty, ISAClass IsaClass) const override {
		return Impl.promoteToSupportedType(Ty, IsaClass);
		}
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const override {		ReductionFlags Flags) const override {
return Impl.useReductionIntrinsic(Opcode, Ty, Flags);		return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
}		}
bool shouldExpandReduction(const IntrinsicInst *II) const override {		bool shouldExpandReduction(const IntrinsicInst *II) const override {
return Impl.shouldExpandReduction(II);		return Impl.shouldExpandReduction(II);
}		}
int getInstructionLatency(const Instruction *I) override {		int getInstructionLatency(const Instruction *I) override {
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 523 Lines • ▼ Show 20 Lines	public:
}		}

unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const {		VectorType *VecTy) const {
return VF;		return VF;
}		}

		unsigned maximumSizeofISAClassVectorRegister(
		TTI::ISAClass I, Type *Ty) const {
		return 0;
		}

		char encodeISAClass(TTI::ISAClass IsaClass) const {
		return '?';
		}

		TTI::ISAClass decodeISAClass(char IsaClass) const {
		return TTI::ISAClassesNum;
		}

		Type* promoteToSupportedType(Type *Ty, TTI::ISAClass IsaClass) const {
		return Ty;
		}

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
return false;		return false;
}		}

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

include/llvm/Analysis/VectorVariant.h

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//===---- llvm/Transforms/VectorVariant.h - Vector utilities -- C++ -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This header file defines the VectorVariant class and implements the encoding
				/// and decoding utilities for VectorVariant objects. Multiple VectorVariant
				/// objects can be created (masked, non-masked, etc.) and associated with the
				/// original scalar function. These objects are then used to clone new functions
				/// that can be vectorized. This class follows the standards defined in the
				/// vector function ABI.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_UTILS_INTEL_VECTORVARIANT_H
				#define LLVM_TRANSFORMS_UTILS_INTEL_VECTORVARIANT_H

				#include <vector>
				#include <sstream>
				#include <cctype>
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/DerivedTypes.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/CommandLine.h"

				using namespace llvm;

				#define STRIDE_KIND 's'
				#define LINEAR_KIND 'l'
				hfinkelUnsubmitted Not Done Reply Inline Actions These preprocessor should just be constexpr/static const ints. hfinkel: These preprocessor should just be constexpr/static const ints.
				#define UNIFORM_KIND 'u'
				#define VECTOR_KIND 'v'
				fpetrogalliUnsubmitted Not Done Reply Inline Actions We will have to support OpenMP 4.5 linear modifiers, which are rendered as 2-char tokens in the mangled names. Woudn't it better to avoid #defines of chars and instead use enums inside the VectorKind class? fpetrogalli: We will have to support OpenMP 4.5 linear modifiers, which are rendered as 2-char tokens in the…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Ok, I can change this to an enum inside the VectorKind class. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. I also have a patch for clang that enables the "vector-variants" attribute support. mmasten: Ok, I can change this to an enum inside the VectorKind class. I don't have a patch ready for…
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Ok, I can change this to an enum inside the VectorKind class. Thank you. I don't have a patch ready for the LoopVectorize part, but I will prepare one and put it up for review in the next few days. That sounds great, thanks! I also have a patch for clang that enables the "vector-variants" attribute support. Clang is already generating the list of mangled vector names as string attributes. I believe that you need to rearrange the code so that strings get produced in the vector-variants attribute. fpetrogalli: > Ok, I can change this to an enum inside the VectorKind class. Thank you. > I don't have a…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Hi Francesco, I have the LoopVectorize part of this done. Are there any objections to making it part of this review? Thanks, Matt mmasten: Hi Francesco, I have the LoopVectorize part of this done. Are there any objections to making…
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Hello Matt, please don't add code to this review. I'd prefer to see the changes related to the Vectorizer in a separate patch. Kind regards, Francesco fpetrogalli: Hello Matt, please don't add code to this review. I'd prefer to see the changes related to the…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Thanks Francesco. I made a couple of updates to this patch and created a new patch for the LV side of things. The LV patch is revision D40575. mmasten: Thanks Francesco. I made a couple of updates to this patch and created a new patch for the LV…

				#define NOT_ALIGNED 1
				ABataevUnsubmitted Not Done Reply Inline Actions Description of the class? ABataev: Description of the class?

				#define POSITIVE 1
				#define NEGATIVE -1
				ABataevUnsubmitted Not Done Reply Inline Actions Explicitly set the base type to `char` ABataev: Explicitly set the base type to `char`

				class VectorKind {

				public:
				VectorKind(char K, int S, int A = NOT_ALIGNED) {

				assert((S == notAValue() \|\| K == STRIDE_KIND \|\| K == LINEAR_KIND) &&
				ABataevUnsubmitted Not Done Reply Inline Actions Use member initializers ABataev: Use member initializers
				"only linear vectors have strides");

				assert((K != LINEAR_KIND \|\| S != notAValue()) &&
				"linear vectors must have a stride");

				assert((K != STRIDE_KIND \|\| S != notAValue()) &&
				"variable stride vectors must have a stride");

				assert((K != STRIDE_KIND \|\| S >= 0) &&
				"variable stride position must be non-negative");

				assert(A > 0 && "alignment must be positive");

				Kind = K;
				Stride = S;
				Alignment = A;
				}

				VectorKind(const VectorKind &Other) {
				Kind = Other.Kind;
				Stride = Other.Stride;
				ABataevUnsubmitted Not Done Reply Inline Actions `VectorKind(const VectorKind &Other) = default;` ABataev: `VectorKind(const VectorKind &Other) = default;`
				Alignment = Other.Alignment;
				}

				/// \brief Is the stride for a linear parameter a uniform variable? (i.e.,
				/// the stride is stored in a variable but is uniform)
				bool isVariableStride() { return Kind == STRIDE_KIND; }
				ABataevUnsubmitted Not Done Reply Inline Actions No need to use `\brief` tag, you must remove it ABataev: No need to use `\brief` tag, you must remove it

				/// \brief Is the stride for a linear variable non-unit stride?
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				bool isNonUnitStride() { return Kind == LINEAR_KIND && Stride != 1; }

				/// \brief Is the stride for a linear variable unit stride?
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				bool isUnitStride() { return Kind == LINEAR_KIND && Stride == 1; }

				/// \brief Is this a linear parameter?
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				bool isLinear() {
				return isVariableStride() \|\| isNonUnitStride() \|\| isUnitStride();
				}
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Is this a uniform parameter?
				bool isUniform() { return Kind == UNIFORM_KIND; }

				/// \brief Is this a vector parameter?
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				bool isVector() { return Kind == VECTOR_KIND; }

				/// \brief Is the parameter aligned?
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				bool isAligned() { return Alignment != NOT_ALIGNED; }

				/// \brief Get the stride associated with a linear parameter.
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				int getStride() { return Stride; }

				/// \brief Get the alignment associated with a linear parameter.
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				int getAlignment() { return Alignment; }

				/// \brief Represents a don't care value for strides of parameters other
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				/// than linear parameters.
				static int notAValue() { return -1; }

				/// \brief Encode the parameter information into a mangled string
				/// corresponding to the standards defined in the vector function ABI.
				std::string encode() {
				std::stringstream SST;
				SST << Kind;
				ABataevUnsubmitted Not Done Reply Inline Actions Why not `SmallString`? ABataev: Why not `SmallString`?

				ABataevUnsubmitted Not Done Reply Inline Actions Why not `llvm::raw_svector_ostream`? ABataev: Why not `llvm::raw_svector_ostream`?
				if (isNonUnitStride()) {
				if (Stride >= 0)
				SST << Stride;
				else
				SST << "n" << -Stride;
				}

				if (isVariableStride())
				SST << Stride;

				if (isAligned())
				SST << 'a' << Alignment;

				return SST.str();
				}

				private:
				char Kind; // linear, uniform, vector
				int Stride;
				int Alignment;
				ABataevUnsubmitted Not Done Reply Inline Actions `char`->`ParmKind` ABataev: `char`->`ParmKind`
				};

				class VectorVariant {

				private:
				TargetTransformInfo *TTI;
				TargetTransformInfo::ISAClass Isa;
				hfinkelUnsubmitted Not Done Reply Inline Actions There's a lot of target information here in the target-independent code. Given that we're not going to vectorize without a target code model regardless, I'd like to see this information pushing into TargetTransformInfo. VecClone can then use TTI to convert the particular ISA tags into information about vector lengths, etc. Other architectures that are adapting this scheme can then extend this in a natural way. hfinkel: There's a lot of target information here in the target-independent code. Given that we're not…
				bool Mask;
				unsigned int Vlen;
				std::vector<VectorKind> Parameters;

				static std::string prefix() { return "_ZGV"; }
				ABataevUnsubmitted Not Done Reply Inline Actions Use `SmallVector` or `ArrayRef` ABataev: Use `SmallVector` or `ArrayRef`

				public:
				ABataevUnsubmitted Not Done Reply Inline Actions `std::string`->`StringRef`. Is this target-specific? ABataev: `std::string`->`StringRef`. Is this target-specific?
				VectorVariant(StringRef FuncName, TargetTransformInfo *TTI);

				/// \brief Get the ISA corresponding to this vector variant.
				TargetTransformInfo::ISAClass getISA() { return Isa; }

				/// \brief Is this a masked vector function variant?
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				bool isMasked() { return Mask; }

				/// \brief Get the vector length of the vector variant.
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				unsigned int getVlen() { return Vlen; }

				/// \brief Get the parameters of the vector variant.
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				std::vector<VectorKind> &getParameters() { return Parameters; }

				/// \brief Build the mangled name for the vector variant. This function
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				/// builds a mangled name by including the encodings for the ISA class,
				/// mask information, and all parameters.
				std::string encode() {

				std::stringstream SST;
				ABataevUnsubmitted Not Done Reply Inline Actions `SmallString` ABataev: `SmallString`
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				SST << prefix() << TTI->encodeISAClass(Isa) << encodeMask(Mask) << Vlen;

				std::vector<VectorKind>::iterator It = Parameters.begin();
				std::vector<VectorKind>::iterator End = Parameters.end();

				if (isMasked())
				End--; // mask parameter is not encoded

				for (; It != End; ++It)
				SST << (*It).encode();

				SST << "_";

				return SST.str();
				}

				/// \brief Generate a function name corresponding to a vector variant.
				std::string generateFunctionName(StringRef ScalarFuncName) {

				static StringRef ManglingPrefix("_Z");
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				std::string Name = encode();

				if (ScalarFuncName.startswith(ManglingPrefix))
				return Name + ScalarFuncName.drop_front(ManglingPrefix.size()).str();
				else
				return Name + ScalarFuncName.str();
				}
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function

				/// \brief Some targets do not support particular types, so promote to a type
				/// that is supported.
				Type promoteToSupportedType(Type Ty) {
				return TTI->promoteToSupportedType(Ty, getISA());
				}

				/// \brief Check to see if this is a vector variant based on the function
				/// name.
				static bool isVectorVariant(StringRef FuncName) {
				return FuncName.startswith(prefix());
				}

				ABataevUnsubmitted Not Done Reply Inline Actions Target-specific? ABataev: Target-specific?
				/// \brief Encode the mask information for the mangled variant name.
				static char encodeMask(bool EncodeMask) {

				return EncodeMask ? 'M' : 'N';
				}

				ABataevUnsubmitted Not Done Reply Inline Actions Target-specific? ABataev: Target-specific?
				/// \brief Decode the mask information from the mangled variant name.
				static bool decodeMask(char MaskToDecode) {

				switch (MaskToDecode) {
				case 'M':
				return true;
				case 'N':
				return false;
				}

				llvm_unreachable("unsupported mask");
				}

				/// \brief Calculate the vector length for the vector variant.
				unsigned calcVlen(TargetTransformInfo::ISAClass I, Type *Ty);
				};

				#endif // LLVM_TRANSFORMS_UTILS_INTEL_VECTORVARIANT_H

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 371 Lines • ▼ Show 20 Lines
	void initializeVerifierLegacyPassPass(PassRegistry&);			void initializeVerifierLegacyPassPass(PassRegistry&);
	void initializeVirtRegMapPass(PassRegistry&);			void initializeVirtRegMapPass(PassRegistry&);
	void initializeVirtRegRewriterPass(PassRegistry&);			void initializeVirtRegRewriterPass(PassRegistry&);
	void initializeWholeProgramDevirtPass(PassRegistry&);			void initializeWholeProgramDevirtPass(PassRegistry&);
	void initializeWinEHPreparePass(PassRegistry&);			void initializeWinEHPreparePass(PassRegistry&);
	void initializeWriteBitcodePassPass(PassRegistry&);			void initializeWriteBitcodePassPass(PassRegistry&);
	void initializeWriteThinLTOBitcodePass(PassRegistry&);			void initializeWriteThinLTOBitcodePass(PassRegistry&);
	void initializeXRayInstrumentationPass(PassRegistry&);			void initializeXRayInstrumentationPass(PassRegistry&);
				void initializeVecClonePass(PassRegistry&);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_INITIALIZEPASSES_H			#endif // LLVM_INITIALIZEPASSES_H

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/ObjCARC.h"		#include "llvm/Transforms/ObjCARC.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"		#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
		#include "llvm/Transforms/Utils/VecClone.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <cstdlib>		#include <cstdlib>

namespace {		namespace {
struct ForcePassLinking {		struct ForcePassLinking {
ForcePassLinking() {		ForcePassLinking() {
// We must reference the passes in such a way that compilers will not		// We must reference the passes in such a way that compilers will not
// delete it all as dead code, even with whole program optimization,		// delete it all as dead code, even with whole program optimization,
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createSpeculativeExecutionPass();		(void) llvm::createSpeculativeExecutionPass();
(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();		(void) llvm::createSpeculativeExecutionIfHasBranchDivergencePass();
(void) llvm::createRewriteSymbolsPass();		(void) llvm::createRewriteSymbolsPass();
(void) llvm::createStraightLineStrengthReducePass();		(void) llvm::createStraightLineStrengthReducePass();
(void) llvm::createMemDerefPrinter();		(void) llvm::createMemDerefPrinter();
(void) llvm::createFloat2IntPass();		(void) llvm::createFloat2IntPass();
(void) llvm::createEliminateAvailableExternallyPass();		(void) llvm::createEliminateAvailableExternallyPass();
(void) llvm::createScalarizeMaskedMemIntrinPass();		(void) llvm::createScalarizeMaskedMemIntrinPass();
		(void) llvm::createVecClonePass();

(void)new llvm::IntervalPartition();		(void)new llvm::IntervalPartition();
(void)new llvm::ScalarEvolutionWrapperPass();		(void)new llvm::ScalarEvolutionWrapperPass();
llvm::Function::Create(nullptr, llvm::GlobalValue::ExternalLinkage)->viewCFGOnly();		llvm::Function::Create(nullptr, llvm::GlobalValue::ExternalLinkage)->viewCFGOnly();
llvm::RGPassManager RGM;		llvm::RGPassManager RGM;
llvm::TargetLibraryInfoImpl TLII;		llvm::TargetLibraryInfoImpl TLII;
llvm::TargetLibraryInfo TLI(TLII);		llvm::TargetLibraryInfo TLI(TLII);
llvm::AliasAnalysis AA(TLI);		llvm::AliasAnalysis AA(TLI);
Show All 9 Lines

include/llvm/Transforms/Utils/VecClone.h

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//===-------------- VecClone.h - Class definition -- C++ ----------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				// ===--------------------------------------------------------------------=== //
				///
				/// \file
				/// This file defines the VecClone pass class.
				///
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why does this need to be in a public header? mehdi_amini: Why does this need to be in a public header?
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Do you mean that the class definition should be moved to VecClone.cpp? mmasten: Do you mean that the class definition should be moved to VecClone.cpp?
				// ===--------------------------------------------------------------------=== //

				#include "llvm/ADT/SmallSet.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Pass.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Analysis/VectorVariant.h"

				#ifndef LLVM_TRANSFORMS_VPO_VECCLONE_H
				#define LLVM_TRANSFORMS_VPO_VECCLONE_H

				enum InstType {
				ALLOCA = 0,
				STORE,
				hfinkelUnsubmitted Not Done Reply Inline Actions Don't use all caps for these names. IT_Alloca or Alloca could work. hfinkel: Don't use all caps for these names. IT_Alloca or Alloca could work.
				BITCAST
				};

				namespace llvm {

				ABataevUnsubmitted Not Done Reply Inline Actions SmallVector ABataev: SmallVector
				class ModulePass;

				/// \brief Represents the mapping of a vector parameter to its corresponding
				ABataevUnsubmitted Not Done Reply Inline Actions DenseMap ABataev: DenseMap
				/// vector to scalar type cast instruction. This done so that the scalar loop
				/// inserted by this pass contains instructions that are in scalar form so that
				/// the loop can later be vectorized.
				struct ParmRef {
				// Represents the parameter in one of two forms:
				// 1) A vector alloca instruction if the parameter has not been registerized.
				// 2) The parameter as the Value* passed in via the function call.
				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				Value *VectorParm;

				// Represents the vector parameter cast from a vector type to scalar type.
				Instruction *VectorParmCast;
				};

				class VecClone : public ModulePass {

				private:

				ABataevUnsubmitted Not Done Reply Inline Actions const function ABataev: const function
				/// \brief Contains the names of the declared vector function variants
				typedef std::vector<std::string> DeclaredVariants;

				/// \brief Contains a mapping of a function to its vector function variants
				typedef std::map<Function*, DeclaredVariants> FunctionVariants;

				/// \brief Determine the characteristic type of the vector function as
				/// specified according to the vector function ABI.
				Type* calcCharacteristicType(Function& F, VectorVariant& Variant);

				/// \brief Get all functions marked for vectorization in module and their
				/// list of variants.
				void getFunctionsToVectorize(
				Module &M, std::map<Function*, std::vector<StringRef> > &FuncVars);

				/// \brief Returns a floating point or integer constant depending on Ty.
				template <typename T>
				Constant* getConstantValue(Type *Ty, LLVMContext &Context, T Val);

				/// \brief Return true if the function has a complex type for the return
				/// or parameters.
				hfinkelUnsubmitted Not Done Reply Inline Actions Please use consistent terminology: marked as SIMD -> requires a vector variant. hfinkel: Please use consistent terminology: marked as SIMD -> requires a vector variant.
				bool hasComplexType(Function *F);

				/// \brief Make a copy of the function if it is marked as SIMD.
				Function* CloneFunction(Function &F, VectorVariant &V);

				/// \brief Take the entry basic block for the function as split off a second
				/// basic block that will form the loop entry.
				BasicBlock* splitEntryIntoLoop(Function *Clone, VectorVariant &V,
				BasicBlock *EntryBlock);

				/// \brief Take the loop entry basic block and split off a second basic
				/// block into a new return basic block.
				BasicBlock* splitLoopIntoReturn(Function Clone, BasicBlock LoopBlock);

				/// \brief Generate a basic block to test the loop exit condition.
				BasicBlock* createLoopExit(Function Clone, BasicBlock ReturnBlock);

				/// \brief Update the predecessors of the return basic block.
				void updateReturnPredecessors(Function Clone, BasicBlock LoopExitBlock,
				BasicBlock *ReturnBlock);

				/// \brief Create the backedge from the loop exit basic block to the loop
				/// entry block.
				PHINode* createPhiAndBackedgeForLoop(Function *Clone,
				BasicBlock *EntryBlock,
				BasicBlock *LoopBlock,
				BasicBlock *LoopExitBlock,
				BasicBlock *ReturnBlock,
				int VL);

				/// \brief Generate vector alloca instructions for vector parameters and
				/// change the parameter types to vector types. Expand the return value of
				/// the function to a vector type. This function returns the instruction
				/// corresponding to the expanded return and the instruction corresponding
				/// to the mask.
				Instruction* expandVectorParametersAndReturn(
				Function *Clone,
				VectorVariant &V,
				Instruction **Mask,
				BasicBlock *EntryBlock,
				BasicBlock *LoopBlock,
				BasicBlock *ReturnBlock,
				std::vector<ParmRef*>& ParmMap);

				/// \brief Expand the function parameters to vector types. This function
				/// returns the instruction corresponding to the mask.
				Instruction* expandVectorParameters(
				Function *Clone,
				VectorVariant &V,
				BasicBlock *EntryBlock,
				std::vector<ParmRef*>& ParmMap);

				/// \brief Expand the function's return value to a vector type.
				Instruction* expandReturn(Function Clone, BasicBlock EntryBlock,
				BasicBlock LoopBlock, BasicBlock ReturnBlock,
				std::vector<ParmRef*>& ParmMap);

				/// \brief Update the old parameter references to with the new vector
				/// references.
				void updateScalarMemRefsWithVector(
				Function *Clone,
				Function &F,
				BasicBlock *EntryBlock,
				BasicBlock *ReturnBlock,
				PHINode *Phi,
				std::vector<ParmRef*>& ParmMap);

				/// \brief Update the values of linear parameters by adding the stride
				/// before the use.
				void updateLinearReferences(Function *Clone, Function &F,
				VectorVariant &V, PHINode *Phi);

				/// \brief Update the instructions in the return basic block to return a
				/// vector temp.
				void updateReturnBlockInstructions(Function Clone, BasicBlock ReturnBlock,
				Instruction *VecReturnAlloca);

				/// \brief Create a separate basic block to mark the begin and end of the
				/// SIMD loop formed from the vector function. Essentially, this function
				/// transfers the information from the SIMD function keywords and creates
				/// new loop pragmas so that parameter information can be transferred to
				/// the loop.
				void insertDirectiveIntrinsics(Module& M, Function *Clone, Function &F,
				VectorVariant &V,
				BasicBlock *EntryBlock,
				BasicBlock *LoopExitBlock,
				BasicBlock *ReturnBlock);

				/// \brief Create the basic block indicating the begin of the SIMD loop.
				void insertBeginRegion(Module& M, Function *Clone, Function &F,
				VectorVariant &V, BasicBlock *EntryBlock);

				/// \brief Create the basic block indicating the end of the SIMD loop.
				void insertEndRegion(Module& M, Function Clone, BasicBlock LoopExitBlock,
				BasicBlock *ReturnBlock);

				/// \brief Create a new vector alloca instruction for the return vector and
				/// bitcast to the appropriate element type.
				Instruction* createExpandedReturn(Function F, BasicBlock BB,
				VectorType *ReturnType);

				/// \brief Return the position of the parameter in the function's parameter
				/// list.
				int getParmIndexInFunction(Function F, Value Parm);

				/// \brief Check to see if the function is simple enough that a loop does
				/// not need to be inserted into the function.
				bool isSimpleFunction(Function *Clone, VectorVariant &V,
				ReturnInst *Return);

				/// \brief Inserts the if/else split and mask condition for masked SIMD
				/// functions.
				void insertSplitForMaskedVariant(Function Clone, BasicBlock LoopBlock,
				BasicBlock *LoopExitBlock,
				Instruction Mask, PHINode Phi);

				/// \brief Utility function to insert instructions with other instructions
				/// of the same kind.
				void insertInstruction(Instruction Inst, BasicBlock BB);

				/// \brief Utility function that generates instructions that calculate the
				/// stride for a linear parameter.
				Instruction* generateStrideForParameter(Function Clone, Argument Arg,
				Instruction *ParmUser, int Stride,
				PHINode *Phi);

				/// \brief Utility function that returns true if Inst is a store of a vector
				/// or linear parameter.
				bool isVectorOrLinearParamStore(Function *Clone,
				std::vector<VectorKind> &ParmKinds,
				Instruction *Inst);

				/// \brief Removes the original scalar alloca instructions that correspond
				/// to a vector parameter before widening.
				void removeScalarAllocasForVectorParams(
				std::vector<ParmRef*> &VectorParmMap);

				/// \brief Adds metadata to the conditional branch of the simd loop latch to
				/// prevent loop unrolling.
				void disableLoopUnrolling(BasicBlock *Latch);

				/// \brief Check to see that the type of the gep used for a load instruction
				/// is compatible with the type needed as the result of the load. Basically,
				/// check the validity of the LLVM IR to make sure that proper pointer
				/// dereferencing is done.
				bool typesAreCompatibleForLoad(Type GepType, Type LoadType);

				bool runOnModule(Module &M) override;

				public:

				static char ID;
				VecClone();
				void print(raw_ostream &OS, const Module * = nullptr) const override;
				void getAnalysisUsage(AnalysisUsage &AU) const override;

				}; // end pass class

				ModulePass *createVecClonePass();

				} // end llvm namespace

				#endif // LLVM_TRANSFORMS_VPO_VECCLONE_H

lib/Analysis/CMakeLists.txt

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMAnalysis
Trace.cpp		Trace.cpp
TypeBasedAliasAnalysis.cpp		TypeBasedAliasAnalysis.cpp
TypeMetadataUtils.cpp		TypeMetadataUtils.cpp
ScopedNoAliasAA.cpp		ScopedNoAliasAA.cpp
ValueLattice.cpp		ValueLattice.cpp
ValueLatticeUtils.cpp		ValueLatticeUtils.cpp
ValueTracking.cpp		ValueTracking.cpp
VectorUtils.cpp		VectorUtils.cpp
		VectorVariant.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis		${LLVM_MAIN_INCLUDE_DIR}/llvm/Analysis

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines

	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,			unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
	unsigned StoreSize,			unsigned StoreSize,
	unsigned ChainSizeInBytes,			unsigned ChainSizeInBytes,
	VectorType *VecTy) const {			VectorType *VecTy) const {
	return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);			return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
	}			}

				unsigned TargetTransformInfo::maximumSizeofISAClassVectorRegister(
				ISAClass I, Type *Ty) const {

				return TTIImpl->maximumSizeofISAClassVectorRegister(I, Ty);
				}

				char TargetTransformInfo::encodeISAClass(ISAClass IsaClass) const {
				return TTIImpl->encodeISAClass(IsaClass);
				}

				TargetTransformInfo::ISAClass TargetTransformInfo::decodeISAClass(
				char IsaClass) const {
				return TTIImpl->decodeISAClass(IsaClass);
				}

				Type* TargetTransformInfo::promoteToSupportedType(Type *Ty,
				ISAClass IsaClass) const {
				return TTIImpl->promoteToSupportedType(Ty, IsaClass);
				}

	bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,			bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
	Type *Ty, ReductionFlags Flags) const {			Type *Ty, ReductionFlags Flags) const {
	return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);			return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
	}			}

	bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {			bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
	return TTIImpl->shouldExpandReduction(II);			return TTIImpl->shouldExpandReduction(II);
	}			}
	▲ Show 20 Lines • Show All 601 Lines • Show Last 20 Lines

lib/Analysis/VectorVariant.cpp

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//===---------- VectorVariant.cpp - Vector function ABI -- C++ ----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file implements the VectorVariant class and corresponding utilities.
				/// VectorVariant objects are associated with a scalar function and are used
				/// to generate new functions that can be vectorized. VectorVariants are
				/// determined by inspecting the function attributes associated with the scalar
				/// function. When a mangled function name is found in the attributes (indicated
				/// as "_ZGV"), a VectorVariant object is created. The class and utilities
				/// in this file follow the standards defined in the vector function ABI.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/Analysis/VectorVariant.h"
				#include "llvm/IR/Type.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				/// \brief Generate a vector variant by decoding the mangled string for the
				/// variant contained in the original scalar function's attributes. For
				/// example: "_ZGVxN4". The name mangling is defined in the vector function
				/// ABI. Based on this string, the parameter kinds (uniform, linear, vector),
				/// vector length, parameter alignment, and masking are determined.
				VectorVariant::VectorVariant(StringRef FuncName,
				TargetTransformInfo *TTI) : TTI(TTI) {

				assert(isVectorVariant(FuncName) && "invalid vector variant format");

				std::stringstream SST(FuncName.drop_front(prefix().size()));

				// mandatory annotations
				char EncodedISA;
				SST.get(EncodedISA);
				Isa = TTI->decodeISAClass(EncodedISA);

				char EncodedMask;
				SST.get(EncodedMask);
				Mask = decodeMask(EncodedMask);
				SST >> Vlen;

				// optional parameter annotations
				while (SST.peek() != '_') {

				char Kind;
				int Stride = VectorKind::notAValue();
				int StrideSign = POSITIVE;
				int Alignment = NOT_ALIGNED;

				// Get parameter kind
				SST.get(Kind);

				// Default stride for linear is 1. If the stride for a parameter is 1,
				// then the front-end will not encode it and we will not have set the
				// correct stride below.
				if (Kind == LINEAR_KIND)
				Stride = 1;

				// Handle optional stride
				if (SST.peek() == 'n') {
				// Stride is negative
				SST.ignore(1);
				StrideSign = NEGATIVE;
				}

				if (std::isdigit(SST.peek())) {
				// Extract constant stride
				SST >> Stride;
				assert((Kind != STRIDE_KIND \|\| Stride >= 0) &&
				"variable stride argument index cannot be negative");
				}

				Stride *= StrideSign;
				// Handle optional alignment
				if (SST.peek() == 'a') {
				SST.ignore(1);
				SST >> Alignment;
				}

				VectorKind VecKind(Kind, Stride, Alignment);
				Parameters.push_back(VecKind);
				}

				if (Mask) {
				// Masked variants will have an additional mask parameter
				VectorKind VecKind(VECTOR_KIND, VectorKind::notAValue());
				Parameters.push_back(VecKind);
				}
				}

				/// \brief Determine the vector variant's vector length based on the
				/// characteristic data type defined in the vector function ABI and target
				/// vector register width.
				unsigned int VectorVariant::calcVlen(TargetTransformInfo::ISAClass I,
				Type* CharacteristicDataType) {
				assert(CharacteristicDataType &&
				CharacteristicDataType->getPrimitiveSizeInBits() != 0 &&
				"expected characteristic data type to have a primitive size in bits");

				unsigned int VectorRegisterSize =
				TTI->maximumSizeofISAClassVectorRegister(I, CharacteristicDataType);

				return VectorRegisterSize / CharacteristicDataType->getPrimitiveSizeInBits();
				}

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedStore(Type *DataType);		bool isLegalMaskedStore(Type *DataType);
bool isLegalMaskedGather(Type *DataType);		bool isLegalMaskedGather(Type *DataType);
bool isLegalMaskedScatter(Type *DataType);		bool isLegalMaskedScatter(Type *DataType);
bool hasDivRemOp(Type *DataType, bool IsSigned);		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
bool enableMemCmpExpansion(unsigned &MaxLoadSize);		bool enableMemCmpExpansion(unsigned &MaxLoadSize);
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();

		unsigned maximumSizeofISAClassVectorRegister(TTI::ISAClass IsaClass,
		Type *Ty) const;
		char encodeISAClass(TTI::ISAClass IsaClass) const;
		TargetTransformInfo::ISAClass decodeISAClass(char IsaClass) const;
		Type* promoteToSupportedType(Type *Ty, TTI::ISAClass IsaClass) const;
private:		private:
int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,		int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,802 Lines • ▼ Show 20 Lines	return getInterleavedMemoryOpCostAVX512(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
if (ST->hasAVX2())		if (ST->hasAVX2())
return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,		return getInterleavedMemoryOpCostAVX2(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);

return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return BaseT::getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
}		}

		unsigned X86TTIImpl::maximumSizeofISAClassVectorRegister(TTI::ISAClass I,
		Type *Ty) const {

		assert((Ty->isIntegerTy() \|\| Ty->isFloatTy() \|\| Ty->isDoubleTy() \|\|
		Ty->isPointerTy()) &&
		"unsupported type");

		unsigned int VectorRegisterSize = 0;

		switch (I) {
		case TTI::XMM:
		VectorRegisterSize = 128;
		break;
		case TTI::YMM1:
		if (Ty->isIntegerTy() \|\| Ty->isPointerTy())
		VectorRegisterSize = 128;
		else
		VectorRegisterSize = 256;
		break;
		case TTI::YMM2:
		if (Ty->isIntegerTy(8))
		VectorRegisterSize = 128;
		else
		VectorRegisterSize = 256;
		break;
		case TTI::ZMM:
		VectorRegisterSize = 512;
		break;
		default:
		llvm_unreachable("unknown isa class");
		return 0;
		}

		assert(VectorRegisterSize != 0 && "unsupported ISA/type combination");
		return VectorRegisterSize;
		}

		char X86TTIImpl::encodeISAClass(TTI::ISAClass IsaClass) const {
		switch (IsaClass) {
		fpetrogalliUnsubmitted Not Done Reply Inline Actions Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'. Are you basing this on [2] for gcc compatibility? Why do [1] and [2] differ? Francesco [1] https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf [2] https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt fpetrogalli: Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'.
		mmastenAuthorUnsubmitted Not Done Reply Inline Actions Currently, the implementation is based on gcc compatibility. However, it would be nice to extend to support both gcc and icc. The IsaClass values are different because of the calling convention differences. Intel icc uses regcall calling convention and gcc uses standard calling convention. I've added references to both ABI documents. mmasten: Currently, the implementation is based on gcc compatibility. However, it would be nice to…
		case TTI::XMM:
		return 'b';
		case TTI::YMM1:
		return 'c';
		case TTI::YMM2:
		return 'd';
		case TTI::ZMM:
		return 'e';
		default:
		break;
		}

		assert(false && "unsupported ISA class");
		return '?';
		}

		TargetTransformInfo::ISAClass X86TTIImpl::decodeISAClass(char IsaClass) const {
		switch (IsaClass) {
		case 'b':
		return TTI::XMM;
		case 'c':
		return TTI::YMM1;
		case 'd':
		return TTI::YMM2;
		case 'e':
		return TTI::ZMM;
		default:
		llvm_unreachable("unsupported ISA class");
		return TTI::XMM;
		}
		}

		Type* X86TTIImpl::promoteToSupportedType(Type *Ty, TTI::ISAClass I) const {
		// On ZMM promote char and short to int
		if (I == TargetTransformInfo::ISAClass::ZMM && (Ty->isIntegerTy(8) \|\|
		Ty->isIntegerTy(16)))
		return Type::getInt32Ty(Ty->getContext());

		return Ty;
		}

lib/Transforms/IPO/PassManagerBuilder.cpp

Show All 26 Lines
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/ModuleSummaryIndex.h"		#include "llvm/IR/ModuleSummaryIndex.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ManagedStatic.h"		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"		#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
		hfinkelUnsubmitted Not Done Reply Inline Actions You also need to add the pass to the new pass manager: lib/Passes/PassBuilder.cpp hfinkel: You also need to add the pass to the new pass manager: lib/Passes/PassBuilder.cpp
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/IPO/InferFunctionAttrs.h"		#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
		#include "llvm/Transforms/Utils/VecClone.h"

using namespace llvm;		using namespace llvm;

static cl::opt<bool>		static cl::opt<bool>
RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,		RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,
cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));		cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));

static cl::opt<bool>		static cl::opt<bool>
Show All 38 Lines	UseCFLAA("use-cfl-aa", cl::init(CFLAAType::None), cl::Hidden,
"Enable inclusion-based CFL-AA"),		"Enable inclusion-based CFL-AA"),
clEnumValN(CFLAAType::Both, "both",		clEnumValN(CFLAAType::Both, "both",
"Enable both variants of CFL-AA")));		"Enable both variants of CFL-AA")));

static cl::opt<bool> EnableLoopInterchange(		static cl::opt<bool> EnableLoopInterchange(
"enable-loopinterchange", cl::init(false), cl::Hidden,		"enable-loopinterchange", cl::init(false), cl::Hidden,
cl::desc("Enable the new, experimental LoopInterchange Pass"));		cl::desc("Enable the new, experimental LoopInterchange Pass"));

		static cl::opt<bool> RunVecClone("enable-vec-clone", cl::init(false), cl::Hidden,
		hfinkelUnsubmitted Not Done Reply Inline Actions I wouldn't bother with this. It should be safe if no functions with the vector-variants attribute are present (and, if they are, then you need to run the pass of things might not link). hfinkel: I wouldn't bother with this. It should be safe if no functions with the vector-variants…
		cl::desc("Run Vector Function Cloning"));

static cl::opt<bool>		static cl::opt<bool>
EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,		EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,
cl::desc("Enable preparation for ThinLTO."));		cl::desc("Enable preparation for ThinLTO."));

static cl::opt<bool> RunPGOInstrGen(		static cl::opt<bool> RunPGOInstrGen(
"profile-generate", cl::init(false), cl::Hidden,		"profile-generate", cl::init(false), cl::Hidden,
cl::desc("Enable PGO instrumentation."));		cl::desc("Enable PGO instrumentation."));

▲ Show 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	if (OptLevel == 0) {
addExtensionsToPM(EP_EnabledOnOptLevel0, MPM);		addExtensionsToPM(EP_EnabledOnOptLevel0, MPM);

// Rename anon globals to be able to export them in the summary.		// Rename anon globals to be able to export them in the summary.
// This has to be done after we add the extensions to the pass manager		// This has to be done after we add the extensions to the pass manager
// as there could be passes (e.g. Adddress sanitizer) which introduce		// as there could be passes (e.g. Adddress sanitizer) which introduce
// new unnamed globals.		// new unnamed globals.
if (PrepareForThinLTO)		if (PrepareForThinLTO)
MPM.add(createNameAnonGlobalPass());		MPM.add(createNameAnonGlobalPass());

		if (RunVecClone)
		MPM.add(createVecClonePass());

return;		return;
}		}

// Add LibraryInfo if we have some.		// Add LibraryInfo if we have some.
if (LibraryInfo)		if (LibraryInfo)
MPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));		MPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));

addInitialAliasAnalysisPasses(MPM);		addInitialAliasAnalysisPasses(MPM);
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));

// Distribute loops to allow partial vectorization. I.e. isolate dependences		// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is		// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata		// currently only performed for loops marked with the metadata
// llvm.loop.distribute=true or when -enable-loop-distribute is specified.		// llvm.loop.distribute=true or when -enable-loop-distribute is specified.
MPM.add(createLoopDistributePass());		MPM.add(createLoopDistributePass());

		if (RunVecClone)
		MPM.add(createVecClonePass());

MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));		MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));

// Eliminate loads by forwarding stores from the previous iteration to loads		// Eliminate loads by forwarding stores from the previous iteration to loads
// of the current iteration.		// of the current iteration.
MPM.add(createLoopLoadEliminationPass());		MPM.add(createLoopLoadEliminationPass());

// FIXME: Because of #pragma vectorize enable, the passes below are always		// FIXME: Because of #pragma vectorize enable, the passes below are always
// inserted in the pipeline, even when the vectorizer doesn't run (ex. when		// inserted in the pipeline, even when the vectorizer doesn't run (ex. when
▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

lib/Transforms/Utils/CMakeLists.txt

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMTransformUtils
SimplifyInstructions.cpp		SimplifyInstructions.cpp
SimplifyLibCalls.cpp		SimplifyLibCalls.cpp
SplitModule.cpp		SplitModule.cpp
StripNonLineTableDebugInfo.cpp		StripNonLineTableDebugInfo.cpp
SymbolRewriter.cpp		SymbolRewriter.cpp
UnifyFunctionExitNodes.cpp		UnifyFunctionExitNodes.cpp
Utils.cpp		Utils.cpp
ValueMapper.cpp		ValueMapper.cpp
		VecClone.cpp
VNCoercion.cpp		VNCoercion.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Utils		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Utils

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/Transforms/Utils/VecClone.cpp

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Author Date Id Rev URL
svn:mime-type	null	text/plain

				//=------- VecClone.cpp - Vector function to loop transform -- C++ --------=//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				// ===--------------------------------------------------------------------=== //
				///
				/// \file
				/// This pass inserts the body of a vector function inside a vector length
				/// trip count scalar loop for functions that are declared SIMD. The pass
				/// currently follows the gcc vector ABI requirements for name mangling
				/// encodings, but will be extended in the future to also support the Intel
				/// vector ABI. References to both ABIs can be found here:
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Some overview with an example/overview of what the pass accomplish would be nice. It is already captured in the RFC, but for someone reading the code it'd be helpful. mehdi_amini: Some overview with an example/overview of what the pass accomplish would be nice. It is already…
				///
				/// https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt
				/// https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf
				///
				/// Conceptually, this pass performs the following transformation:
				///
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think it should be spelled-out more clearly that it is required for correctness to have all the variants mentioned in the attribute list to be codegen'd, even at O0, because even if the vectorizer does not run in this module, it may run in another module that would expect these variant to exist. mehdi_amini: I think it should be spelled-out more clearly that it is required for correctness to have all…
				/// Before Translation:
				///
				/// main.cpp
				///
				/// #pragma omp declare simd uniform(a) linear(k)
				/// extern float dowork(float *a, float b, int k);
				///
				/// float a[4096];
				/// float b[4096];
				/// int main() {
				/// int k;
				/// for (k = 0; k < 4096; k++) {
				/// b[k] = k;
				/// }
				/// #pragma clang loop vectorize(enable)
				/// for (k = 0; k < 4096; k++) {
				/// a[k] = k * 0.5;
				/// a[k] = dowork(a, b[k], k);
				hfinkelUnsubmitted Not Done Reply Inline Actions This function doesn't return anything, and I think that makes the example hard to understand. hfinkel: This function doesn't return anything, and I think that makes the example hard to understand.
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions I agree, this was a bad bug with the example. This has been fixed and updated with an example that demonstrates the behavior of all three types of parameters (uniform, linear, and vector). Please let me know if the bitcasts shown in the example are confusing. The purpose of the bitcasts is so that the loop can appear in scalar form to the loop vectorizer. i.e., we can use a .scalar gep and index to reference vectors in the loop. mmasten: I agree, this was a bad bug with the example. This has been fixed and updated with an example…
				/// }
				/// }
				///
				fpetrogalliUnsubmitted Not Done Reply Inline Actions Hello Matt, thank you for working on this. <nitpicking>Should the comment be updated with the `"vector-variants"="_ZGVbM4ul_, ZGVbN4ul_"` syntax? </nitpicking> Cheers, Francesco fpetrogalli: Hello Matt, thank you for working on this. <nitpicking>Should the comment be updated with the…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Thanks Francesco. Fixed. mmasten: Thanks Francesco. Fixed.
				/// dowork.cpp
				///
				/// #pragma omp declare simd uniform(a) linear(k) #0
				/// float dowork(float *a, float b, int k) {
				/// return sinf(a[k]) + b;
				/// }
				///
				/// attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4uvl_",
				/// "ZGVbN4uvl_", ... }
				///
				/// After Translation:
				///
				/// dowork.cpp
				///
				/// // Non-masked variant
				///
				/// <VL x float> "_ZGVbN4uvl_dowork(float *a, <VL x float> b, int k) {
				/// alloc <VL x float> vec_ret;
				/// alloc <VL x float> vec_b;
				/// // casts from vector to scalar pointer allows loop to be in a scalar form
				/// // that can be vectorized easily.
				/// ret_cast = bitcast <VL x float>* vec_ret to float*;
				/// vec_b_cast = bitcast <VL x float>* vec_b to float*;
				/// store <VL x float> b, <VL x float>* vec_b;
				/// for (int i = 0; i < VL; i++, k++) {
				/// ret_cast[i] = sinf(a[k]) + vec_b_cast[i];
				/// }
				/// return vec_ret;
				/// }
				///
				/// // Masked variant
				///
				/// <VL x float> "_ZGVbM4uvl_dowork(float *a, <VL x float> b, int k, <VL x int>
				/// mask) {
				/// alloc <VL x float> vec_ret;
				/// alloc <VL x float> vec_b;
				/// ret_cast = bitcast <VL x float>* vec_ret to float*;
				/// vec_b_cast = bitcast <VL x float>* vec_b to float*;
				/// store <VL x float> b, <VL x float>* vec_b;
				/// for (int i = 0; i < VL; i++, k++) {
				/// if (mask[i] != 0)
				/// ret_cast[i] = sinf(a[k]) + vec_b_cast[i];
				/// }
				/// return vec_ret;
				/// }
				hfinkelUnsubmitted Not Done Reply Inline Actions What cleans up the extra loads/stores here (if we're optimizing)? Are assuming that there's a run of SROA afterwards, or does InstCombine do a good job here, or something else? hfinkel: What cleans up the extra loads/stores here (if we're optimizing)? Are assuming that there's a…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions InstCombine will do it. I also know that EarlyCSE will work. mmasten: InstCombine will do it. I also know that EarlyCSE will work.
				///
				// ===--------------------------------------------------------------------=== //

				// This pass is flexible enough to recognize whether or not parameters have been
				// registerized so that the users of the parameter can be properly updated. For
				// instance, we need to know where the users of linear parameters are so that
				// the stride can be added to them.
				//
				// In the following example, %i and %x are used directly by %add directly, so
				// in this case the pass can just look for users of %i and %x.
				//
				// define i32 @foo(i32 %i, i32 %x) #0 {
				// entry:
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't understand what's going on here. Is it not possible to write the transformation such that it's semantically correct regardless of whether we've run mem2reg? This is not always an either/or situation. hfinkel: I don't understand what's going on here. Is it not possible to write the transformation such…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions Perhaps the comments were written very clearly. The pass has been written to handle parameters regardless of whether or not mem2reg has run. Hopefully, the updated comments are better. mmasten: Perhaps the comments were written very clearly. The pass has been written to handle parameters…
				// %add = add nsw i32 %x, %i
				// ret i32 %add
				// }
				//
				// When parameters have not been registerized, parameters are used indirectly
				// through a store/load of the parameter to/from memory that has been allocated
				// for them in the function. Thus, in this case, the pass looks for users of
				// %0 and %1.
				//
				// define i32 @foo(i32 %i, i32 %x) #0 {
				// entry:
				// %i.addr = alloca i32, align 4
				// %x.addr = alloca i32, align 4
				// store i32 %i, i32* %i.addr, align 4
				// store i32 %x, i32* %x.addr, align 4
				// %0 = load i32, i32* %x.addr, align 4
				// %1 = load i32, i32* %i.addr, align 4
				// %add = add nsw i32 %0, %1
				// ret i32 %add
				// }
				//
				// The pass must run at all optimization levels because it is possible that
				// a loop calling the vector function is vectorized, but the vector function
				// itself is not vectorized. For example, above main.cpp may be compiled at
				// -O2, but dowork.cpp may be compiled at -O0. Therefore, it is required that
				// the attribute list for the vector function specify all variants that must
				// be generated by this pass so as to avoid any linking problems. This pass
				// also serves to canonicalize the input IR to the loop vectorizer.

				#include "llvm/Transforms/Utils/VecClone.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/Passes.h"
				#include "llvm/Analysis/VectorUtils.h"
				#include "llvm/Analysis/VectorVariant.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/PassRegistry.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Utils/Cloning.h"
				#include <map>
				#include <set>

				#define SV_NAME "vec-clone"
				#define DEBUG_TYPE "VecClone"

				using namespace llvm;

				VecClone::VecClone() : ModulePass(ID) {}

				void VecClone::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.addRequired<TargetTransformInfoWrapperPass>();
				}

				Type *VecClone::calcCharacteristicType(Function &F, VectorVariant &Variant) {
				Type *ReturnType = F.getReturnType();
				Type *CharacteristicDataType = nullptr;

				if (!ReturnType->isVoidTy())
				hfinkelUnsubmitted Not Done Reply Inline Actions 1-many -> one-to-many hfinkel: 1-many -> one-to-many
				CharacteristicDataType = ReturnType;

				if (!CharacteristicDataType) {

				std::vector<VectorKind> &ParmKinds = Variant.getParameters();
				Function::const_arg_iterator ArgIt = F.arg_begin();
				Function::const_arg_iterator ArgEnd = F.arg_end();
				std::vector<VectorKind>::iterator VKIt = ParmKinds.begin();

				for (; ArgIt != ArgEnd; ++ArgIt, ++VKIt) {
				if (VKIt->isVector()) {
				CharacteristicDataType = (*ArgIt).getType();
				break;
				}
				}
				}

				// TODO except Clang's ComplexType
				if (!CharacteristicDataType \|\| CharacteristicDataType->isStructTy()) {
				CharacteristicDataType = Type::getInt32Ty(F.getContext());
				}

				// Legalize the characteristic type based on target requirements.
				CharacteristicDataType =
				Variant.promoteToSupportedType(CharacteristicDataType);

				if (CharacteristicDataType->isPointerTy()) {
				// For such cases as 'int* foo(int x)', where x is a non-vector type, the
				// characteristic type at this point will be i32*. If we use the DataLayout
				// to query the supported pointer size, then a promotion to i64* is
				// incorrect because the mask element type will mismatch the element type
				// of the characteristic type.
				PointerType *PointerTy = cast<PointerType>(CharacteristicDataType);
				CharacteristicDataType = PointerTy->getElementType();
				}

				return CharacteristicDataType;
				}

				void VecClone::getFunctionsToVectorize(
				llvm::Module &M, std::map<Function *, std::vector<StringRef>> &FuncVars) {

				// FuncVars will contain a 1-many mapping between the original scalar
				// function and the vector variant encoding strings (represented as
				// attributes). The encodings correspond to functions that will be created by
				// the caller of this function as vector versions of the original function.
				// For example, if foo() is a function marked as a simd function, it will have
				// several vector variant encodings like: "_ZGVbM4_foo", "_ZGVbN4_foo",
				// "_ZGVcM8_foo", "_ZGVcN8_foo", "_ZGVdM8_foo", "_ZGVdN8_foo", "_ZGVeM16_foo",
				// "_ZGVeN16_foo". The caller of this function will then clone foo() and name
				// the clones using the above name manglings. The variant encodings correspond
				// to differences in masked/non-masked execution, vector length, and target
				// vector register size, etc. For more details, please refer to the following
				// reference for details on the vector function encodings.
				// https://www.cilkplus.org/sites/default/files/open_specifications/
				// Intel-ABI-Vector-Function-2012-v0.9.5.pdf

				for (auto It = M.begin(), End = M.end(); It != End; ++It) {
				Function &F = *It;
				if (F.hasFnAttribute("vector-variants")) {
				Attribute Attr = F.getFnAttribute("vector-variants");
				StringRef VariantsStr = Attr.getValueAsString();
				SmallVector<StringRef, 8> Variants;
				VariantsStr.split(Variants, ',');
				for (unsigned i = 0; i < Variants.size(); i++) {
				FuncVars[&F].push_back(Variants[i]);
				}
				}
				}
				}

				template Constant *
				VecClone::getConstantValue<int>(Type *Ty, LLVMContext &Context, int Val);
				template Constant *
				VecClone::getConstantValue<float>(Type *Ty, LLVMContext &Context, float Val);
				template Constant *
				VecClone::getConstantValue<double>(Type *Ty, LLVMContext &Context, double Val);
				template <typename T>
				Constant VecClone::getConstantValue(Type Ty, LLVMContext &Context, T Val) {
				Constant *ConstVal = nullptr;

				if (Ty->isIntegerTy()) {
				ConstVal = ConstantInt::get(Ty, Val);
				} else if (Ty->isFloatTy()) {
				ConstVal = ConstantFP::get(Ty, Val);
				}

				assert(ConstVal && "Could not generate constant for type");

				return ConstVal;
				}

				void VecClone::insertInstruction(Instruction Inst, BasicBlock BB) {
				// This function inserts instructions in a way that groups like instructions
				// together for debuggability/readability purposes. This was designed to make
				// the entry basic block easier to read since this pass creates/modifies
				// alloca, store, and bitcast instructions for each vector parameter and
				// return. Thus, this function ensures all allocas are grouped together, all
				// stores are grouped together, and so on. If the type of instruction passed
				// in does not exist in the basic block, then it is added to the end of the
				// basic block, just before the terminator instruction.

				BasicBlock::reverse_iterator BBIt = BB->rbegin();
				BasicBlock::reverse_iterator BBEnd = BB->rend();
				BasicBlock::iterator AnchorInstIt = BB->end();
				AnchorInstIt--;
				Instruction Anchor = &AnchorInstIt;

				for (; BBIt != BBEnd; ++BBIt) {
				if (Inst->getOpcode() == (&*BBIt)->getOpcode()) {
				Anchor = &*BBIt;
				break;
				}
				}

				if (isa<BranchInst>(Anchor)) {
				Inst->insertBefore(Anchor);
				} else {
				Inst->insertAfter(Anchor);
				}
				}

				bool VecClone::hasComplexType(Function *F) {
				Function::arg_iterator ArgListIt = F->arg_begin();
				Function::arg_iterator ArgListEnd = F->arg_end();

				for (; ArgListIt != ArgListEnd; ++ArgListIt) {
				hfinkelUnsubmitted Not Done Reply Inline Actions Move this near the top of the function and check, before you generate code, that he function doesn't already exist. If it does, bail out (e.g., return a nullptr and the caller can move on to the next variant/function). hfinkel: Move this near the top of the function and check, before you generate code, that he function…
				// Complex types for parameters/return come in as vector.
				if (ArgListIt->getType()->isVectorTy()) {
				return true;
				}
				}

				return false;
				}

				Function *VecClone::CloneFunction(Function &F, VectorVariant &V) {

				DEBUG(dbgs() << "Cloning Function: " << F.getName() << "\n");
				DEBUG(F.dump());

				FunctionType *OrigFunctionType = F.getFunctionType();
				Type *ReturnType = F.getReturnType();
				Type *CharacteristicType = calcCharacteristicType(F, V);

				// Expand return type to vector.
				if (!ReturnType->isVoidTy())
				ReturnType = VectorType::get(ReturnType, V.getVlen());

				std::vector<VectorKind> ParmKinds = V.getParameters();
				SmallVector<Type *, 4> ParmTypes;
				FunctionType::param_iterator ParmIt = OrigFunctionType->param_begin();
				FunctionType::param_iterator ParmEnd = OrigFunctionType->param_end();
				std::vector<VectorKind>::iterator VKIt = ParmKinds.begin();
				for (; ParmIt != ParmEnd; ++ParmIt, ++VKIt) {
				if (VKIt->isVector())
				ParmTypes.push_back(
				VectorType::get((*ParmIt)->getScalarType(), V.getVlen()));
				else
				ParmTypes.push_back(*ParmIt);
				}

				if (V.isMasked()) {
				Type *MaskVecTy = VectorType::get(CharacteristicType, V.getVlen());
				ParmTypes.push_back(MaskVecTy);
				}

				FunctionType *CloneFuncType = FunctionType::get(ReturnType, ParmTypes, false);

				std::string VariantName = V.generateFunctionName(F.getName());
				Function *Clone = Function::Create(
				CloneFuncType, GlobalValue::ExternalLinkage, VariantName, F.getParent());

				// Remove vector variant attributes from the original function. They are
				// not needed for the cloned function and it prevents any attempts at
				// trying to clone the function again in case the pass is called more than
				// once.
				F.removeFnAttr("vector-variants");

				Function::arg_iterator ArgIt = F.arg_begin();
				Function::arg_iterator ArgEnd = F.arg_end();
				ValueToValueMapTy Vmap;
				Function::arg_iterator NewArgIt = Clone->arg_begin();
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't think you need the iterators; this can just be: for (auto &Arg : Clone->args()) and you use this pattern a lot (declaring two iterators and then using them in a for loop). In almost all of these cases, you should use a range-based for loop instead. hfinkel: I don't think you need the iterators; this can just be: for (auto &Arg : Clone->args()) and…
				for (; ArgIt != ArgEnd; ++ArgIt, ++NewArgIt) {
				NewArgIt->setName(ArgIt->getName());
				Vmap[&ArgIt] = &NewArgIt;
				}

				if (V.isMasked()) {
				Argument &MaskArg = *NewArgIt;
				MaskArg.setName("mask");
				}

				SmallVector<ReturnInst *, 8> Returns;
				CloneFunctionInto(Clone, &F, Vmap, true, Returns);

				// Remove incompatible argument attributes (applied to the scalar argument,
				// does not apply to its vector counterpart). This must be done after cloning
				// the function because CloneFunctionInto() transfers parameter attributes
				// from the original parameters in the Vmap.
				ArgIt = Clone->arg_begin();
				ArgEnd = Clone->arg_end();
				AttrBuilder AB;
				for (uint64_t Idx = 0; ArgIt != ArgEnd; ++ArgIt, ++Idx) {
				Type ArgType = (ArgIt).getType();
				AB = AttributeFuncs::typeIncompatible(ArgType);
				Clone->removeParamAttrs(Idx, AB);
				}

				AB = AttributeFuncs::typeIncompatible(ReturnType);
				Clone->removeAttributes(AttributeList::ReturnIndex, AB);

				DEBUG(dbgs() << "After Cloning and Function Signature expansion\n");
				DEBUG(Clone->dump());

				return Clone;
				}

				bool VecClone::isVectorOrLinearParamStore(Function *Clone,
				std::vector<VectorKind> &ParmKinds,
				Instruction *Inst) {
				if (StoreInst *Store = dyn_cast<StoreInst>(Inst)) {
				Value *Op0 = Store->getOperand(0);
				Function::arg_iterator ArgListIt = Clone->arg_begin();
				Function::arg_iterator ArgListEnd = Clone->arg_end();

				for (; ArgListIt != ArgListEnd; ++ArgListIt) {
				unsigned ParmIdx = ArgListIt->getArgNo();
				if (&*ArgListIt == Op0 &&
				(ParmKinds[ParmIdx].isVector() \|\| ParmKinds[ParmIdx].isLinear())) {
				return true;
				}
				}
				}

				return false;
				}

				BasicBlock VecClone::splitEntryIntoLoop(Function Clone, VectorVariant &V,
				BasicBlock *EntryBlock) {

				// EntryInsts contains all instructions that need to stay in the entry basic
				// block. These instructions include allocas and stores involving vector and
				// linear parameters to alloca. Linear parameter stores to alloca are kept in
				// the entry block because there will be a load from this alloca in the loop
				// for which we will apply the stride. Instructions involving uniform
				// parameter stores to alloca should be sunk into the loop to maintain
				// uniform behavior. All instructions involving private variables are also
				// sunk into the loop.

				SmallVector<Instruction *, 4> EntryInsts;
				std::vector<VectorKind> ParmKinds = V.getParameters();
				BasicBlock::iterator BBIt = EntryBlock->begin();
				BasicBlock::iterator BBEnd = EntryBlock->end();

				for (; BBIt != BBEnd; ++BBIt) {
				hfinkelUnsubmitted Not Done Reply Inline Actions What happens if there's more than one return in the function? You might want, in that case, to create a new block with the return and convert all other returns to branches to that block. hfinkel: What happens if there's more than one return in the function? You might want, in that case, to…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions In all the test cases that I have used to this point, this type of re-wiring has already been done. Granted, I have mainly been testing some very simple multiple return functions, but if you have a test case where this happens it will be helpful. Thanks. mmasten: In all the test cases that I have used to this point, this type of re-wiring has already been…
				if (isa<AllocaInst>(BBIt) \|\|
				isVectorOrLinearParamStore(Clone, ParmKinds, &*BBIt)) {
				// If this is a store of a vector parameter, keep it in the entry block
				// because it will be modified with the vector alloca reference. Since the
				// parameter has already been expanded, this becomes a vector store (i.e.,
				// packing instruction) that we do not want to appear in the scalar loop.
				// It is correct to leave linear parameter stores in the entry or move
				// them to the scalar loop, but leaving them in the entry block prevents
				// an additional store inside the loop. Uniform parameter stores must be
				// moved to the loop body to behave as uniform. Consider the following:
				//
				// __declspec(vector(uniform(x)))
				// int foo(int a, int x) {
				// x++;
				// return (a + x);
				// }
				//
				// Assume x = 1 for the call to foo. This implies x = 2 for the vector
				// add. e.g., a[0:VL-1] + <2, 2, 2, 2>. If the initial store of x to the
				// stack is done in the entry block outside of the loop, then x will be
				// incremented by one each time within the loop because the increment of
				// x will reside in the loop. Therefore, if the store of x is sunk into
				// the loop, the initial value of 1 will always be stored to a temp
				// before the increment, resulting in the value of 2 always being computed
				// in the scalar loop.
				EntryInsts.push_back(&*BBIt);
				}
				}

				BasicBlock *LoopBlock =
				EntryBlock->splitBasicBlock(EntryBlock->begin(), "simd.loop");

				for (auto *Inst : EntryInsts) {
				Inst->removeFromParent();
				Inst->insertBefore(EntryBlock->getTerminator());
				}

				DEBUG(dbgs() << "After Entry Block Split\n");
				DEBUG(Clone->dump());

				return LoopBlock;
				}

				BasicBlock VecClone::splitLoopIntoReturn(Function Clone,
				BasicBlock *LoopBlock) {

				// Determine the basic block with the return. For simple cases, the 'ret'
				// instruction will be part of the entry block. In this case, separate the
				// 'ret' into a new basic block because we don't want this as part of the
				// loop. For more complex cases, the 'ret' and corresponding instructions
				// (i.e., load from auto variable) will already be in a separate basic block,
				// so no need to split here.

				Instruction *SplitPt = LoopBlock->getTerminator();

				if (ReturnInst *Return = dyn_cast<ReturnInst>(SplitPt)) {

				// If the return is from a preceeding load, make sure the load is also put
				// in the return block. This is the old scalar load that will end up getting
				// replaced with the vector return and will get cleaned up later.

				// Make sure this is not a void function before getting the return
				// operand.
				if (!Clone->getReturnType()->isVoidTy()) {
				Value *RetOp = Return->getOperand(0);
				hfinkelUnsubmitted Not Done Reply Inline Actions In addition to branches, you need to handle SwitchInst and IndirectBrInst (and, for the latter, you need to find any place where the address of the return block is taken, and replace it with the address of the LoopExitBlock). hfinkel: In addition to branches, you need to handle SwitchInst and IndirectBrInst (and, for the latter…
				Value::use_iterator UseIt = RetOp->use_begin();
				Value::use_iterator UseEnd = RetOp->use_end();

				for (; UseIt != UseEnd; ++UseIt) {
				LoadInst RetLoad = dyn_cast<LoadInst>(UseIt);
				if (RetLoad) {
				SplitPt = RetLoad;
				}
				}
				}
				}

				Function::iterator ReturnBlockIt = Clone->end();
				BasicBlock *ReturnBlock;
				if (dyn_cast<LoadInst>(SplitPt) \|\| dyn_cast<ReturnInst>(SplitPt)) {
				ReturnBlock = LoopBlock->splitBasicBlock(SplitPt, "return");
				} else {
				ReturnBlockIt = Clone->end();
				ReturnBlockIt--;
				ReturnBlock = &*ReturnBlockIt;
				}

				return ReturnBlock;
				}

				void VecClone::updateReturnPredecessors(Function *Clone,
				BasicBlock *LoopExitBlock,
				BasicBlock *ReturnBlock) {
				// Update the branches of the ReturnBlock predecessors to point back to
				// LoopBlock if the index is less than VL.

				// First, collect the basic blocks to be updated since we don't want to update
				// the CFG while iterating through it.
				SmallVector<BranchInst *, 4> BranchesToUpdate;
				Function::iterator FI = Clone->begin();
				Function::iterator FE = Clone->end();
				for (; FI != FE; ++FI) {

				BasicBlock::iterator BBI = FI->begin();
				BasicBlock::iterator BBE = FI->end();

				for (; BBI != BBE; ++BBI) {

				BranchInst *Branch = dyn_cast<BranchInst>(BBI);

				if (Branch) {
				unsigned NumSucc = Branch->getNumSuccessors();

				for (unsigned I = 0; I < NumSucc; ++I) {
				if (Branch->getSuccessor(I) == ReturnBlock) {
				BranchesToUpdate.push_back(Branch);
				hfinkelUnsubmitted Not Done Reply Inline Actions Use CreateAdd here so you can set both nuw and nsw on this increment. hfinkel: Use CreateAdd here so you can set both nuw and nsw on this increment.
				}
				}
				}
				}
				}

				// Now, do the actual update. The code below handles both conditional and
				// unconditional branches because we loop through all successors of the
				// branch to see if any of them point to the ReturnBlock.
				for (unsigned I = 0; I < BranchesToUpdate.size(); ++I) {
				unsigned int NumOps = BranchesToUpdate[I]->getNumSuccessors();
				for (unsigned Idx = 0; Idx < NumOps; ++Idx) {
				BasicBlock *Successor = BranchesToUpdate[I]->getSuccessor(Idx);
				if (Successor == ReturnBlock) {
				BranchesToUpdate[I]->setOperand(Idx, LoopExitBlock);
				}
				}
				}
				}

				BasicBlock VecClone::createLoopExit(Function Clone, BasicBlock *ReturnBlock) {
				BasicBlock *LoopExitBlock = BasicBlock::Create(
				Clone->getContext(), "simd.loop.exit", Clone, ReturnBlock);

				updateReturnPredecessors(Clone, LoopExitBlock, ReturnBlock);
				return LoopExitBlock;
				}

				PHINode *VecClone::createPhiAndBackedgeForLoop(
				Function Clone, BasicBlock EntryBlock, BasicBlock *LoopBlock,
				BasicBlock LoopExitBlock, BasicBlock ReturnBlock, int VectorLength) {

				// Create the phi node for the top of the loop block and add the back
				// edge to the loop from the loop exit.

				PHINode *Phi = PHINode::Create(Type::getInt32Ty(Clone->getContext()), 2,
				"index", &*LoopBlock->getFirstInsertionPt());

				Constant *Inc = ConstantInt::get(Type::getInt32Ty(Clone->getContext()), 1);
				Constant *IndInit =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), 0);

				Instruction *Induction =
				BinaryOperator::CreateNUWAdd(Phi, Inc, "indvar", LoopExitBlock);

				Constant *VL =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), VectorLength);

				Instruction *VLCmp =
				new ICmpInst(*LoopExitBlock, CmpInst::ICMP_ULT, Induction, VL, "vl.cond");

				BranchInst::Create(LoopBlock, ReturnBlock, VLCmp, LoopExitBlock);

				Phi->addIncoming(IndInit, EntryBlock);
				Phi->addIncoming(Induction, LoopExitBlock);

				DEBUG(dbgs() << "After Loop Insertion\n");
				DEBUG(Clone->dump());

				return Phi;
				}

				Instruction *
				VecClone::expandVectorParameters(Function *Clone, VectorVariant &V,
				BasicBlock *EntryBlock,
				std::vector<ParmRef *> &VectorParmMap) {
				// For vector parameters, expand the existing alloca (if there is one) to a
				// vector. If there isn't one, create a VL-sized alloca for it. Then, bitcast
				// the vector and store this instruction in a map. The map is later used to
				// insert the new instructions and to replace the old scalar memory
				// references. If there are no parameters, then the function simply does not
				// perform any expansion since we iterate over the function's arg list. We
				// must always have a vector alloca for vector parameters so that we can
				// bitcast them to a scalar pointer that can be loaded from using the loop
				// index.

				Instruction *Mask = nullptr;
				SmallVector<StoreInst *, 4> StoresToInsert;

				Function::arg_iterator ArgIt = Clone->arg_begin();
				Function::arg_iterator ArgEnd = Clone->arg_end();

				unsigned LastArg = Clone->arg_size() - 1;
				unsigned ArgIdx = 0;

				for (; ArgIt != ArgEnd; ++ArgIt) {

				User::user_iterator UserIt = ArgIt->user_begin();
				User::user_iterator UserEnd = ArgIt->user_end();

				VectorType *VecType = dyn_cast<VectorType>(ArgIt->getType());

				if (VecType) {

				// Some args other than the mask may not have users, but have not been
				// removed as dead. In those cases, just go on to the next argument.
				// There's no need to expand non-mask parameters with no users.
				bool MaskArg = V.isMasked() && ArgIdx == LastArg;

				if (!(!MaskArg && ArgIt->getNumUses() == 0)) {

				// Create a new vector alloca and bitcast to a pointer to the element
				// type. The following is an example of what the cast should look like:
				//
				hfinkelUnsubmitted Not Done Reply Inline Actions What happens if there is more than one store user? hfinkel: What happens if there is more than one store user?
				// %veccast = bitcast <2 x i32>* %vec_a.addr to i32*
				//
				// geps using the bitcast will appear in a scalar form instead of
				// casting to an array or using vector. For example,
				//
				// %vecgep1 = getelementptr i32, i32* %veccast, i32 %index
				//
				// instead of:
				//
				// getelementptr inbounds [4 x i32], [4 x i32]* %a, i32 0, i64 1
				//
				// We do this to put the geps in a more scalar form.

				const DataLayout &DL = Clone->getParent()->getDataLayout();
				AllocaInst *VecAlloca = new AllocaInst(VecType, DL.getAllocaAddrSpace(),
				"vec." + ArgIt->getName());
				insertInstruction(VecAlloca, EntryBlock);
				PointerType *ElemTypePtr = PointerType::get(
				VecType->getElementType(), VecAlloca->getType()->getAddressSpace());

				BitCastInst *VecParmCast = nullptr;
				if (MaskArg) {
				Mask = new BitCastInst(VecAlloca, ElemTypePtr, "mask.cast");
				} else {
				VecParmCast = new BitCastInst(VecAlloca, ElemTypePtr,
				"vec." + ArgIt->getName() + ".cast");
				insertInstruction(VecParmCast, EntryBlock);
				}

				StoreInst *StoreUser = nullptr;
				AllocaInst *Alloca = nullptr;
				ParmRef *PRef = new ParmRef();

				for (; UserIt != UserEnd; ++UserIt) {

				StoreUser = dyn_cast<StoreInst>(*UserIt);

				if (StoreUser) {
				// For non-mask parameters, find the initial store of the parameter
				// to an alloca instruction. Map this alloca to the vector bitcast
				// created above so that we can update the old scalar references.
				Alloca = dyn_cast<AllocaInst>(UserIt->getOperand(1));
				PRef->VectorParm = Alloca;
				break;
				}
				}

				if (!Alloca && !Mask) {
				// Since Mem2Reg has run, there is no existing scalar store for
				// the parameter, but we must still pack (store) the expanded vector
				// parameter to a new vector alloca. This store is created here and
				// put in a container for later insertion. We cannot insert it here
				// since this will be a new user of the parameter and we are still
				// iterating over the original users of the parameter. This will
				// invalidate the iterator. We also map the parameter directly to the
				// vector bitcast so that we can later update any users of the
				// parameter.
				Value *ArgValue = dyn_cast<Value>(ArgIt);
				StoreInst *Store = new StoreInst(ArgValue, VecAlloca);
				StoresToInsert.push_back(Store);
				PRef->VectorParm = ArgValue;
				}

				if (!Mask) {
				// Mapping not needed for the mask parameter because there will
				// be no users of it to replace. This parameter will only be used to
				// introduce if conditions on each mask bit.
				PRef->VectorParmCast = VecParmCast;
				VectorParmMap.push_back(PRef);
				}
				}
				}

				ArgIdx++;
				}

				// Insert any necessary vector parameter stores here. This is needed for when
				// there were no existing scalar stores that we can update to vector stores
				// for the parameter. This is needed when Mem2Reg has registerized parameters.
				// The stores are inserted after the allocas in the entry block.
				for (auto *Inst : StoresToInsert) {
				insertInstruction(Inst, EntryBlock);
				}

				return Mask;
				}

				Instruction VecClone::createExpandedReturn(Function Clone,
				BasicBlock *EntryBlock,
				VectorType *ReturnType) {
				// Expand the return temp to a vector.

				VectorType *AllocaType = dyn_cast<VectorType>(Clone->getReturnType());

				const DataLayout &DL = Clone->getParent()->getDataLayout();
				AllocaInst *VecAlloca =
				new AllocaInst(AllocaType, DL.getAllocaAddrSpace(), "vec.retval");
				insertInstruction(VecAlloca, EntryBlock);
				PointerType *ElemTypePtr = PointerType::get(
				ReturnType->getElementType(), VecAlloca->getType()->getAddressSpace());

				BitCastInst *VecCast = new BitCastInst(VecAlloca, ElemTypePtr, "ret.cast");
				insertInstruction(VecCast, EntryBlock);

				return VecCast;
				}

				Instruction VecClone::expandReturn(Function Clone, BasicBlock *EntryBlock,
				BasicBlock *LoopBlock,
				BasicBlock *ReturnBlock,
				std::vector<ParmRef *> &VectorParmMap) {
				// Determine how the return is currently handled, since this will determine
				// if a new vector alloca is required for it. For simple functions, an alloca
				// may not have been created for the return value. The function may just
				// simply return a value defined by some operation that now exists within the
				// loop. If an alloca was generated already, then the return block will load
				// from it and then return. Thus, we look for a return resulting from a load
				// in the return block. If found, we have already expanded all alloca
				// instructions to vector types and the old scalar references have already
				// been replaced with them. In this case, we only need to pack the results
				// from the vector alloca into a temp and return the temp. If a vector alloca
				// was not generated for the return, we need to add one for it because we have
				// a scalar reference in the loop that needs to be replaced. After creating
				// the new vector alloca, replace the reference to it in the loop and then
				// pack the results into a temp and return it.
				//
				// Example 1: // alloca not generated in entry block
				//
				// loop:
				// ... // some set of instructions
				// %add1 = add nsw i32 %1, %2
				// br label %loop.exit (loop exit contains br to return block)
				//
				// return:
				// ret i32 %add1
				//
				//
				// Example 2:
				//
				// loop:
				// ... // some set of instructions
				// %vecgep1 = getelementptr <2 x i32>* %vec_ret, i32 0, i32 %index
				// store i32 %add2, i32* %vecgep1
				// br label %loop.exit (loop exit contains br to return block)
				//
				// return:
				// %7 = load i32, i32*, %retval // the original scalar alloca
				// ret i32 %7
				//

				ReturnInst *FuncReturn = dyn_cast<ReturnInst>(ReturnBlock->getTerminator());
				assert(FuncReturn && "Expected ret instruction to terminate the return\
				basic block");

				LoadInst *LoadFromAlloca = dyn_cast<LoadInst>(FuncReturn->getOperand(0));

				// We need to generate a vector alloca for the return vector.
				// Two cases exist, here:
				//
				// 1) For simple functions, the return is a temp defined within the
				// loop body and the temp is not loaded from an alloca, or the return is
				// a constant. (obviously, also not loaded from an alloca)
				//
				// 2) The return temp traces back to an alloca.
				//
				// For both cases, generate a vector alloca so that we can later load from it
				// and return the vector temp from the function. The alloca is used to load
				// and store from so that the scalar loop contains load/store/gep
				// instructions. This enables AVR construction to remain straightforward.
				// E.g., we don't need to worry about figuring out how to represent
				// insert/extract when building AVR nodes. This keeps consistent with how ICC
				// is operating.
				//
				// Additionally, for case 1 we must generate a gep and store after the
				// instruction that defines the original return temp, so that we can store
				// the result into the proper index of the return vector. For case 2, we must
				// go into the loop and replace the old scalar alloca reference with the one
				// just created as vector.

				Instruction *VecReturn = NULL;
				VectorType *ReturnType = dyn_cast<VectorType>(Clone->getReturnType());

				if (!LoadFromAlloca) {

				// Case 1

				VecReturn = createExpandedReturn(Clone, EntryBlock, ReturnType);
				Value *RetVal = FuncReturn->getReturnValue();
				Instruction *RetFromTemp = dyn_cast<Instruction>(RetVal);

				Instruction *InsertPt;
				Value *ValToStore;
				Instruction Phi = &LoopBlock->begin();

				if (RetFromTemp) {
				// If we're returning from an SSA temp, set the insert point to the
				// definition of the temp.
				InsertPt = RetFromTemp;
				ValToStore = RetFromTemp;
				} else {
				// If we're returning a constant, then set the insert point to the loop
				// phi. From here, a store to the vector using the constant is inserted.
				InsertPt = Phi;
				ValToStore = RetVal;
				}

				// Generate a gep from the bitcast of the vector alloca used for the return
				// vector.
				GetElementPtrInst *VecGep =
				GetElementPtrInst::Create(ReturnType->getElementType(), VecReturn, Phi,
				VecReturn->getName() + ".gep");
				VecGep->insertAfter(InsertPt);

				// Store the constant or temp to the appropriate lane in the return vector.
				StoreInst *VecStore = new StoreInst(ValToStore, VecGep);
				VecStore->insertAfter(VecGep);

				} else {

				// Case 2

				AllocaInst *Alloca = dyn_cast<AllocaInst>(LoadFromAlloca->getOperand(0));
				bool AllocaFound = false;
				unsigned ParmIdx = 0;

				for (; ParmIdx < VectorParmMap.size(); ParmIdx++) {
				Value *ParmVal = VectorParmMap[ParmIdx]->VectorParm;
				if (ParmVal == Alloca)
				AllocaFound = true;
				}

				if (AllocaFound) {
				// There's already a vector alloca created for the return, which is the
				// same one used for the parameter. E.g., we're returning the updated
				// parameter.
				VecReturn = VectorParmMap[ParmIdx]->VectorParmCast;
				} else {
				// A new return vector is needed because we do not load the return value
				// from an alloca.
				VecReturn = createExpandedReturn(Clone, EntryBlock, ReturnType);
				ParmRef *PRef = new ParmRef();
				PRef->VectorParm = Alloca;
				PRef->VectorParmCast = VecReturn;
				VectorParmMap.push_back(PRef);
				}
				}

				return VecReturn;
				}

				Instruction *VecClone::expandVectorParametersAndReturn(
				Function Clone, VectorVariant &V, Instruction *Mask,
				BasicBlock EntryBlock, BasicBlock LoopBlock, BasicBlock *ReturnBlock,
				std::vector<ParmRef *> &VectorParmMap) {
				// If there are no parameters, then this function will do nothing and this
				// is the expected behavior.
				*Mask = expandVectorParameters(Clone, V, EntryBlock, VectorParmMap);

				// If the function returns void, then don't attempt to expand to vector.
				Instruction *ExpandedReturn = ReturnBlock->getTerminator();
				if (!Clone->getReturnType()->isVoidTy()) {
				ExpandedReturn =
				expandReturn(Clone, EntryBlock, LoopBlock, ReturnBlock, VectorParmMap);
				}

				// So, essentially what has been done to this point is the creation and
				// insertion of the vector alloca instructions. Now, we insert the bitcasts of
				// those instructions, which have been stored in the map. The insertion of the
				// vector bitcast to element type pointer is done at the end of the EntryBlock
				// to ensure that any initial stores of vector parameters have been done
				// before the cast.

				std::vector<ParmRef *>::iterator MapIt;
				for (auto MapIt : VectorParmMap) {
				Instruction *ExpandedCast = MapIt->VectorParmCast;
				if (!ExpandedCast->getParent()) {
				insertInstruction(ExpandedCast, EntryBlock);
				}
				}

				// Insert the mask parameter store to alloca and bitcast if this is a masked
				// variant.
				if (*Mask) {
				// Mask points to the bitcast of the alloca instruction to element type
				// pointer. Insert the bitcast after all of the other bitcasts for vector
				// parameters.
				insertInstruction(*Mask, EntryBlock);

				Value MaskVector = (Mask)->getOperand(0);

				// MaskParm points to the function's mask parameter.
				Function::arg_iterator MaskParm = Clone->arg_end();
				MaskParm--;

				// Find the last parameter store in the function entry block and insert the
				// the store of the mask parameter after it. We do this just to make the
				// LLVM IR easier to read. If there are no parameters, just insert the store
				// before the terminator. For safety, if we cannot find a store, then insert
				// this store after the last alloca. At this point, there will at least be
				// an alloca for either a parameter or return. This code just ensures that
				// the EntryBlock instructions are grouped by alloca, followed by store,
				// followed by bitcast for readability reasons.

				StoreInst MaskStore = new StoreInst(&MaskParm, MaskVector);
				insertInstruction(MaskStore, EntryBlock);
				}

				DEBUG(dbgs() << "After Parameter/Return Expansion\n");
				DEBUG(Clone->dump());

				return ExpandedReturn;
				}

				bool VecClone::typesAreCompatibleForLoad(Type GepType, Type LoadType) {
				// GepType will always be a pointer since this refers to an alloca for a
				// vector.
				PointerType *GepPtrTy = dyn_cast<PointerType>(GepType);
				Type *LoadFromTy = GepPtrTy->getElementType();
				Type *LoadToTy = LoadType;

				// Dereferencing pointers in LLVM IR means that we have to have a load for
				// each level of indirection. This means that we load from a gep and the
				// resulting load value type is reduced by one level of indirection. For
				// example, we load from a gep of i32* to a temp that has an i32 type. We
				// cannot do multiple levels of dereferencing in a single load. For example,
				// we cannot load from a gep of i32** to an i32. This requires two loads.
				//
				// Legal Case: GepType = i32*, LoadFromTy = i32,
				// LoadType = i32, LoadToTy = i32
				//
				// %vec.b.elem.2 = load i32, i32* %vec.b.cast.gep1
				//
				// In this case, since both are pointers, types will be considered equal by
				// LLVM, so we must continue getting the element types of each pointer type
				// until one is no longer a pointer type. Then do an equality check.
				//
				// Legal Case: GepType = i32*, LoadFromTy = i32,
				// LoadType = i32, LoadToTy = i32
				//
				// %vec.b.elem.2 = load i32, i32* %vec.b.cast.gep1
				//
				// Ready to compare as is
				//
				// Illegal Case: GepType = i32*, LoadFromTy = i32
				// LoadType = i32, LoadToTy = i32
				//
				// %vec.b.elem.2 = load i32, i32** %vec.b.cast.gep1
				//
				// This case arises due to differences in the LLVM IR at -O0 and >= -O1.
				// For >= -O1, Mem2Reg registerizes parameters and there are no alloca
				// instructions created for function parameters. At -O0, vector parameters
				// are expanded and we modify the existing alloca that was used for the scalar
				// parameter. When there is no alloca for vector parameters, we must create
				// one for them. Thus, we have introduced an additional level of indirection
				// for users of parameters at >= -O1. This can become a problem for load
				// instructions and results in this illegal case. This function helps to
				// check that we are not attempting to do an extra level of indirection
				// within the load instructions for elements of vector parameters in the
				// simd loop. If an illegal case is encountered, an additional load is
				// inserted to account for the extra level of indirection and any users are
				// updated accordingly.

				while (LoadFromTy->getTypeID() == Type::PointerTyID &&
				LoadToTy->getTypeID() == Type::PointerTyID) {

				PointerType *FromPtrTy = cast<PointerType>(LoadFromTy);
				PointerType *ToPtrTy = cast<PointerType>(LoadToTy);

				LoadFromTy = FromPtrTy->getElementType();
				LoadToTy = ToPtrTy->getElementType();
				}

				if (LoadFromTy->getTypeID() == LoadToTy->getTypeID()) {
				return true;
				}

				return false;
				}

				void VecClone::updateScalarMemRefsWithVector(
				Function Clone, Function &F, BasicBlock EntryBlock,
				BasicBlock ReturnBlock, PHINode Phi,
				std::vector<ParmRef *> &VectorParmMap) {
				// This function replaces the old scalar uses of a parameter with a reference
				// to the new vector one. A gep is inserted using the vector bitcast created
				// in the entry block and any uses of the parameter are replaced with this
				// gep. The only users that will not be updated are those in the entry block
				// that do the initial store to the vector alloca of the parameter.

				std::vector<ParmRef *>::iterator VectorParmMapIt;

				for (auto VectorParmMapIt : VectorParmMap) {

				SmallVector<Instruction *, 4> InstsToUpdate;
				Value *Parm = VectorParmMapIt->VectorParm;
				Instruction *Cast = VectorParmMapIt->VectorParmCast;

				for (User *U : Parm->users()) {
				InstsToUpdate.push_back(dyn_cast<Instruction>(U));
				}

				for (unsigned I = 0; I < InstsToUpdate.size(); ++I) {

				Instruction *User = InstsToUpdate[I];
				if (!(dyn_cast<StoreInst>(User) && User->getParent() == EntryBlock)) {

				BitCastInst *BitCast = dyn_cast<BitCastInst>(Cast);
				PointerType *BitCastType = dyn_cast<PointerType>(BitCast->getType());
				Type *PointeeType = BitCastType->getElementType();

				GetElementPtrInst *VecGep = GetElementPtrInst::Create(
				PointeeType, BitCast, Phi, BitCast->getName() + ".gep", User);

				unsigned NumOps = User->getNumOperands();
				for (unsigned I = 0; I < NumOps; ++I) {
				if (User->getOperand(I) == Parm) {

				bool TypesAreCompatible = false;

				if (isa<LoadInst>(User)) {
				TypesAreCompatible =
				typesAreCompatibleForLoad(VecGep->getType(), User->getType());
				}

				if ((isa<LoadInst>(User) && TypesAreCompatible) \|\|
				isa<StoreInst>(User)) {
				// If the user is a load/store and the dereferencing is legal,
				// then just modify the load/store operand to use the gep.
				User->setOperand(I, VecGep);
				} else {
				// Otherwise, we need to load the value from the gep first before
				// using it. This effectively loads the particular element from
				// the vector parameter.
				LoadInst *ParmElemLoad =
				new LoadInst(VecGep, "vec." + Parm->getName() + ".elem");
				ParmElemLoad->insertAfter(VecGep);
				User->setOperand(I, ParmElemLoad);
				}
				}
				}
				} else {
				// The user is the parameter store to alloca in the entry block. Replace
				// the old scalar alloca with the new vector one.
				AllocaInst *VecAlloca = dyn_cast<AllocaInst>(Cast->getOperand(0));
				User->setOperand(1, VecAlloca);
				}
				}
				}

				DEBUG(dbgs() << "After Alloca Replacement\n");
				DEBUG(Clone->dump());
				}

				Instruction VecClone::generateStrideForParameter(Function Clone,
				Argument *Arg,
				Instruction *ParmUser,
				int Stride, PHINode *Phi) {
				// For linear values, a mul/add sequence is needed to generate the correct
				// value. i.e., val = linear_var * stride + loop_index;
				//
				// StrideInst is returned as the last instruction needed to update the users
				// of the old parameter reference.
				Instruction *StrideInst = nullptr;

				// The phi for the loop index is generated by this pass as i32, which is
				// why the mul instruction is i32.
				Constant *StrideConst =
				ConstantInt::get(Type::getInt32Ty(Clone->getContext()), Stride);

				Instruction *Mul = BinaryOperator::CreateMul(StrideConst, Phi, "stride.mul");

				// Insert the stride related instructions after the user if the instruction
				// involves a redefinition of the parameter. For example, a load from the
				// parameter's associated alloca or a cast. For these situations, we want to
				// apply the stride to this Value. For other instructions, e.g., add, the
				// instruction computing the stride must be inserted before the usage of it.

				if (!isa<UnaryInstruction>(ParmUser)) {
				Mul->insertBefore(ParmUser);
				} else {
				Mul->insertAfter(ParmUser);
				}

				if (Arg->getType()->isPointerTy()) {

				// Linear updates to pointer parameters involves an address calculation, so
				// use gep. To properly update linear pointers we only need to multiply the
				// loop index and stride since gep is indexed starting at 0 from the base
				// address passed to the vector function.
				PointerType *ParmPtrType = dyn_cast<PointerType>(Arg->getType());

				// The base address used for linear gep computations.
				Value *BaseAddr = nullptr;
				StringRef RefName;

				if (LoadInst *ParmLoad = dyn_cast<LoadInst>(ParmUser)) {
				// We are loading from the alloca of the pointer parameter (no Mem2Reg)
				// i.e., loading a pointer to another Value.
				BaseAddr = ParmUser;
				RefName = ParmLoad->getOperand(0)->getName();
				} else {
				// The user is using the pointer parameter directly.
				BaseAddr = Arg;
				RefName = BaseAddr->getName();
				}

				// Mul is always generated as i32 since it is calculated using the i32 loop
				// phi that is inserted by this pass. No cast on Mul is necessary because
				// gep can use a base address of one type with an index of another type.
				GetElementPtrInst *LinearParmGep = GetElementPtrInst::Create(
				ParmPtrType->getElementType(), BaseAddr, Mul, RefName + ".gep");

				LinearParmGep->insertAfter(Mul);
				StrideInst = LinearParmGep;
				} else {
				// Note: the phi for the loop index is generated by this pass as i32.
				// Also, Mul above is generated as i32 because the phi type is always i32.
				// However, ParmUser may be another type, so
				//
				// Generate the instruction that computes the stride.
				//
				// Example of applying stride:
				//
				// define float @dowork(float* nocapture readonly %a, float %b, i64 %k) {
				// entry:
				// %arrayidx = getelementptr inbounds float, float* %a, i64 %k
				// %0 = load float, float* %arrayidx, align 4, !tbaa !2
				// %call = tail call float @sinf(float %0) #4
				// %add = fadd float %call, %b
				// %conv = sitofp i64 %k to float
				// %add1 = fadd float %add, %conv
				// ret float %add1
				// }
				//
				// Case 1: stride for %k must be applied to %conv after %conv so that
				// %add1 gets the updated stride value.
				//
				// Case 2: stride for %k in %arrayidx must be applied before the gep,
				// so that the correct index is used. This is the same for other
				// non-unary type instructions.
				//
				BinaryOperator *Add;
				Value *StrideVal;
				Type *StrideTy;
				StringRef TempName = "stride.add";
				if (isa<UnaryInstruction>(ParmUser)) {
				// Case 1
				StrideVal = ParmUser;
				StrideTy = ParmUser->getType();
				} else {
				// Case 2
				StrideVal = Arg;
				StrideTy = Arg->getType();
				}

				// Stride calculations may need typecasting since the index multiply
				// is i32, but the users of the linear value may not be i32.
				//
				// Example (stride applied to %conv):
				//
				// %conv = sitofp i64 %k to float
				// %stride.mul = mul i32 1, %index
				// %stride.cast = bitcast i32 %stride.mul to float
				// %stride.add = fadd float %conv, %stride.cast
				// %add1 = fadd float %add, %stride.add
				//
				if (StrideTy != Mul->getType()) {
				Instruction *MulConv =
				CastInst::CreateSExtOrBitCast(Mul, StrideTy, "stride.cast");
				MulConv->insertAfter(Mul);
				Mul = MulConv;
				}

				if (StrideTy->isFloatingPointTy()) {
				Add = BinaryOperator::CreateFAdd(StrideVal, Mul, TempName);
				} else if (StrideTy->isIntegerTy()) {
				Add = BinaryOperator::CreateAdd(StrideVal, Mul, TempName);
				} else {
				llvm_unreachable("Expected integer or floating point type");
				}

				Add->insertAfter(Mul);
				StrideInst = Add;
				}

				return StrideInst;
				}

				void VecClone::updateLinearReferences(Function *Clone, Function &F,
				VectorVariant &V, PHINode *Phi) {
				// Add stride to parameters marked as linear. This is done by finding all
				// users of the scalar alloca associated with the parameter. The user should
				// be a load from this alloca to a temp. The stride is then added to this temp
				// and its uses are replaced with the new temp. Or, if Mem2Reg eliminates the
				// alloca/load, the parameter is used directly and this use is updated with
				// the stride.

				Function::arg_iterator ArgListIt = Clone->arg_begin();
				Function::arg_iterator ArgListEnd = Clone->arg_end();
				std::vector<VectorKind> ParmKinds = V.getParameters();

				for (; ArgListIt != ArgListEnd; ++ArgListIt) {

				User::user_iterator ArgUserIt = ArgListIt->user_begin();
				User::user_iterator ArgUserEnd = ArgListIt->user_end();
				unsigned ParmIdx = ArgListIt->getArgNo();
				SmallVector<Instruction *, 4> LinearParmUsers;

				if (ParmKinds[ParmIdx].isLinear()) {

				int Stride = ParmKinds[ParmIdx].getStride();

				for (; ArgUserIt != ArgUserEnd; ++ArgUserIt) {

				// Collect all uses of the parameter so that they can later be used to
				// apply the stride.
				Instruction ParmUser = dyn_cast<Instruction>(ArgUserIt);
				if (StoreInst *ParmStore = dyn_cast<StoreInst>(ParmUser)) {

				// This code traces the store of the parameter to its associated
				// alloca. Then, we look for a load from that alloca to a temp. This
				// is the value we need to add the stride to. This is for when
				// Mem2Reg has not been run.
				AllocaInst *Alloca = dyn_cast<AllocaInst>(ArgUserIt->getOperand(1));

				if (Alloca) {
				for (auto *AU : Alloca->users()) {

				LoadInst *ParmLoad = dyn_cast<LoadInst>(AU);

				if (ParmLoad) {
				// The parameter is being loaded from an alloca to a new SSA
				// temp. We must replace the users of this load with an
				// instruction that adds the result of this load with the
				// stride.
				LinearParmUsers.push_back(ParmLoad);
				}
				}
				} else {
				// Mem2Reg has run, so the parameter is directly referenced in the
				// store instruction.
				LinearParmUsers.push_back(ParmStore);
				}
				} else {
				// Mem2Reg has registerized the parameters, so users of it will use
				// it directly, and not through a load of the parameter.
				LinearParmUsers.push_back(ParmUser);
				}
				}

				for (unsigned I = 0; I < LinearParmUsers.size(); I++) {
				// For each user of parameter:
				//
				// We must deal with two cases here, based on whether Mem2Reg has been
				// run.
				//
				// Example:
				//
				// __declspec(vector(linear(i:1),uniform(x),vectorlength(4)))
				// extern int foo(int i, int x) {
				// return (x + i);
				// }
				//
				// 1) We are loading the parameter from an alloca and the SSA temp as
				// as a result of the load is what we need to add the stride to.
				// Then, any users of that temp must be replaced. The only load
				// instructions put in the collection above are guaranteed to be
				// associated with the parameter's alloca. Thus, we only need to
				// check to see if a load is in the map to know what to do.
				//
				// Before Linear Update:
				//
				// simd.loop: ; preds = %simd.loop.exit, %entry
				// %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				// store i32 %x, i32* %x.addr, align 4
				// %0 = load i32, i32* %x.addr, align 4
				// %1 = load i32, i32* %i.addr, align 4 <--- %i
				// %add = add nsw i32 %0, %1 <--- replace %1 with stride
				// %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				// store i32 %add, i32* %ret.cast.gep
				// br label %simd.loop.exit
				//
				// After Linear Update:
				//
				// simd.loop: ; preds = %simd.loop.exit, %entry
				// %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				// store i32 %x, i32* %x.addr, align 4
				// %0 = load i32, i32* %x.addr, align 4
				// %1 = load i32, i32* %i.addr, align 4
				// %stride.mul = mul i32 1, %index
				// %stride.add = add i32 %1, %stride.mul <--- stride
				// %add = add nsw i32 %0, %stride.add <--- new %i with stride
				// %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				// store i32 %add, i32* %ret.cast.gep
				// br label %simd.loop.exit
				//
				// 2) The user uses the parameter directly, and so we must apply the
				// stride directly to the parameter. Any users of the parameter
				// must then be updated.
				//
				// Before Linear Update:
				//
				// simd.loop: ; preds = %simd.loop.exit, %entry
				// %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				// %add = add nsw i32 %x, %i <-- direct usage of %i
				// %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				// store i32 %add, i32* %ret.cast.gep
				// br label %simd.loop.exit
				//
				// After Linear Update:
				//
				// simd.loop: ; preds = %simd.loop.exit, %entry
				// %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				// %stride.mul = mul i32 1, %index
				// %stride.add = add i32 %i, %stride.mul <--- stride
				// %add = add nsw i32 %x, %stride.add <--- new %i with stride
				// %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				// store i32 %add, i32* %ret.cast.gep
				// br label %simd.loop.exit

				Instruction *StrideInst = generateStrideForParameter(
				Clone, &*ArgListIt, LinearParmUsers[I], Stride, Phi);

				SmallVector<Instruction *, 4> InstsToUpdate;
				Value *ParmUser;

				if (isa<UnaryInstruction>(LinearParmUsers[I])) {
				// Case 1
				ParmUser = LinearParmUsers[I];
				User::user_iterator StrideUserIt = LinearParmUsers[I]->user_begin();
				User::user_iterator StrideUserEnd = LinearParmUsers[I]->user_end();

				// Find the users of the redefinition of the parameter so that we
				// can apply the stride to those instructions.
				for (; StrideUserIt != StrideUserEnd; ++StrideUserIt) {
				Instruction StrideUser = dyn_cast<Instruction>(StrideUserIt);
				if (StrideUser != StrideInst) {
				// We've already inserted the stride which is now also a user of
				// the parameter, so don't update that instruction. Otherwise,
				// we'll create a self reference. Hence, why we don't use
				// replaceAllUsesWith().
				InstsToUpdate.push_back(StrideUser);
				}
				}
				} else {
				// Case 2
				ParmUser = &*ArgListIt;
				InstsToUpdate.push_back(LinearParmUsers[I]);
				}

				// Replace the old references to the parameter with the instruction
				// that applies the stride.
				for (unsigned J = 0; J < InstsToUpdate.size(); ++J) {
				unsigned NumOps = InstsToUpdate[J]->getNumOperands();
				for (unsigned K = 0; K < NumOps; ++K) {
				if (InstsToUpdate[J]->getOperand(K) == ParmUser) {
				InstsToUpdate[J]->setOperand(K, StrideInst);
				}
				}
				}
				}
				}
				}

				DEBUG(dbgs() << "After Linear Updates\n");
				DEBUG(Clone->dump());
				}

				void VecClone::updateReturnBlockInstructions(Function *Clone,
				BasicBlock *ReturnBlock,
				Instruction *ExpandedReturn) {
				// If the vector function returns void, then there is no need to do any
				// packing. The only instruction in the ReturnBlock is 'ret void', so
				// we can just leave this instruction and we're done.
				if (Clone->getReturnType()->isVoidTy())
				return;

				// Collect all instructions in the return basic block. They will be removed.
				SmallVector<Instruction *, 4> InstToRemove;
				BasicBlock::iterator InstIt = ReturnBlock->begin();
				BasicBlock::iterator InstEnd = ReturnBlock->end();

				for (; InstIt != InstEnd; ++InstIt) {
				InstToRemove.push_back(&*InstIt);
				}

				// Remove all instructions from the return block. These will be replaced
				// with the instructions necessary to return a vector temp. The verifier
				// will complain if we remove the definitions of users first, so remove
				// instructions from the bottom up.
				for (int I = InstToRemove.size() - 1; I >= 0; I--) {
				InstToRemove[I]->eraseFromParent();
				}
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions So this will walk all the functions in the module and walk all the attributes and do a string comparison on all of these. I'm not sure if it is fine to pay this when this pass has nothing to do (i.e. the early exit should be fast). mehdi_amini: So this will walk all the functions in the module and walk all the attributes and do a string…
				mmastenAuthorUnsubmitted Not Done Reply Inline Actions We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP switch has been used. Otherwise, I don't know of another way to figure out which functions will need to be generated. Do you have a suggestion? mmasten: We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP…

				// Pack up the elements into a vector temp and return it. If the return
				// vector was bitcast to a pointer to the element type, we must bitcast to
				// vector before returning.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Coding style: no braces (other places as well). mehdi_amini: Coding style: no braces (other places as well).
				Instruction *Return;
				if (dyn_cast<BitCastInst>(ExpandedReturn)) {
				// Operand 0 is the actual alloc reference in the bitcast.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Spurious empty line (other places as well). mehdi_amini: Spurious empty line (other places as well).
				AllocaInst *Alloca = dyn_cast<AllocaInst>(ExpandedReturn->getOperand(0));
				PointerType *PtrVecType = PointerType::get(
				Clone->getReturnType(), Alloca->getType()->getAddressSpace());
				BitCastInst *BitCast =
				new BitCastInst(ExpandedReturn, PtrVecType,
				"vec." + ExpandedReturn->getName(), ReturnBlock);
				Return = BitCast;
				} else {
				Return = ExpandedReturn;
				}

				LoadInst *VecReturn = new LoadInst(Return, "vec.ret", ReturnBlock);
				ReturnInst::Create(Clone->getContext(), VecReturn, ReturnBlock);

				DEBUG(dbgs() << "After Return Block Update\n");
				DEBUG(Clone->dump());
				}

				int VecClone::getParmIndexInFunction(Function F, Value Parm) {
				Function::arg_iterator ArgIt = F->arg_begin();
				Function::arg_iterator ArgEnd = F->arg_end();
				for (unsigned Idx = 0; ArgIt != ArgEnd; ++ArgIt, ++Idx) {
				if (Parm == &*ArgIt)
				return Idx;
				}

				return -1;
				}

				bool VecClone::isSimpleFunction(Function *Clone, VectorVariant &V,
				ReturnInst *ReturnOnly) {
				// For really simple functions, there is no need to go through the process
				// of inserting a loop.

				// Example:
				//
				// void foo(void) {
				// return;
				// }
				//
				// No need to insert a loop for this case since it's basically a no-op. Just
				// clone the function and return. It's possible that we could have some code
				// inside of a vector function that modifies global memory. Let that case go
				// through.
				if (ReturnOnly && Clone->getReturnType()->isVoidTy()) {
				return true;
				}

				mehdi_aminiUnsubmitted Not Done Reply Inline Actions What about a SmallVector and move it out-of-the loop (the call to `clear()` below is already handling the reset between iterations). mehdi_amini: What about a SmallVector and move it out-of-the loop (the call to `clear()` below is already…
				return false;
				}

				void VecClone::insertSplitForMaskedVariant(Function *Clone,
				BasicBlock *LoopBlock,
				BasicBlock *LoopExitBlock,
				Instruction Mask, PHINode Phi) {
				BasicBlock *LoopThenBlock =
				LoopBlock->splitBasicBlock(LoopBlock->getFirstNonPHI(), "simd.loop.then");

				BasicBlock *LoopElseBlock = BasicBlock::Create(
				Clone->getContext(), "simd.loop.else", Clone, LoopExitBlock);

				BranchInst::Create(LoopExitBlock, LoopElseBlock);

				BitCastInst *BitCast = dyn_cast<BitCastInst>(Mask);
				PointerType *BitCastType = dyn_cast<PointerType>(BitCast->getType());
				Type *PointeeType = BitCastType->getElementType();

				GetElementPtrInst *MaskGep = GetElementPtrInst::Create(
				PointeeType, Mask, Phi, "mask.gep", LoopBlock->getTerminator());

				LoadInst *MaskLoad =
				new LoadInst(MaskGep, "mask.parm", LoopBlock->getTerminator());

				Type *CompareTy = MaskLoad->getType();
				Instruction *MaskCmp;
				Constant *Zero;

				// Generate the compare instruction to see if the mask bit is on. In ICC, we
				// use the movemask intrinsic which takes both float/int mask registers and
				// converts to an integer scalar value, one bit representing each element.
				// AVR construction will be complicated if this intrinsic is introduced here,
				// so the current solution is to just generate either an integer or floating
				// point compare instruction for now. This may change anyway if we decide to
				// go to a vector of i1 values for the mask. I suppose this would be one
				// positive reason to use vector of i1.
				if (CompareTy->isIntegerTy()) {
				Zero = getConstantValue(CompareTy, Clone->getContext(), 0);
				MaskCmp = new ICmpInst(LoopBlock->getTerminator(), CmpInst::ICMP_NE,
				MaskLoad, Zero, "mask.cond");
				} else if (CompareTy->isFloatingPointTy()) {
				Zero = getConstantValue(CompareTy, Clone->getContext(), 0.0);
				MaskCmp = new FCmpInst(LoopBlock->getTerminator(), CmpInst::FCMP_UNE,
				MaskLoad, Zero, "mask.cond");
				} else {
				assert(0 && "Unsupported mask compare");
				}

				TerminatorInst *Term = LoopBlock->getTerminator();
				Term->eraseFromParent();
				BranchInst::Create(LoopThenBlock, LoopElseBlock, MaskCmp, LoopBlock);

				DEBUG(dbgs() << "After Split Insertion For Masked Variant\n");
				DEBUG(Clone->dump());
				}

				void VecClone::removeScalarAllocasForVectorParams(
				std::vector<ParmRef *> &VectorParmMap) {
				std::vector<ParmRef *>::iterator VectorParmMapIt;

				for (auto VectorParmMapIt : VectorParmMap) {
				Value *Parm = VectorParmMapIt->VectorParm;
				if (AllocaInst *ScalarAlloca = dyn_cast<AllocaInst>(Parm)) {
				ScalarAlloca->eraseFromParent();
				}
				}
				}

				void VecClone::disableLoopUnrolling(BasicBlock *Latch) {
				// Set disable unroll metadata on the conditional branch of the loop latch
				// for the simd loop. The following is an example of what the loop latch
				// and Metadata will look like. The !llvm.loop marks the beginning of the
				// loop Metadata and is always placed on the terminator of the loop latch.
				// (i.e., simd.loop.exit in this case). According to LLVM documentation, to
				// properly set the loop Metadata, the 1st operand of !16 must be a self-
				// reference to avoid some type of Metadata merging conflicts that have
				// apparently arisen in the past. This is part of LLVM history that I do not
				// know. Also, according to LLVM documentation, any Metadata nodes referring
				// to themselves are marked as distinct. As such, all Metadata corresponding
				// to a loop belongs to that loop alone and no sharing of Metadata can be
				// done across different loops.
				//
				// simd.loop.exit: ; preds = %simd.loop, %if.else, %if.then
				// %indvar = add nuw i32 %index, 1
				// %vl.cond = icmp ult i32 %indvar, 2
				// br i1 %vl.cond, label %simd.loop, label %simd.end.region, !llvm.loop !16
				//
				// !16 = distinct !{!16, !17}
				// !17 = !{!"llvm.loop.unroll.disable"}

				SmallVector<Metadata *, 4> MDs;

				// Reserve first location for self reference to the LoopID metadata node.
				MDs.push_back(nullptr);

				// Add unroll(disable) metadata to disable future unrolling.
				LLVMContext &Context = Latch->getContext();
				SmallVector<Metadata *, 1> DisableOperands;
				DisableOperands.push_back(MDString::get(Context, "llvm.loop.unroll.disable"));
				MDNode *DisableNode = MDNode::get(Context, DisableOperands);
				MDs.push_back(DisableNode);

				MDNode *NewLoopID = MDNode::get(Context, MDs);
				// Set operand 0 to refer to the loop id itself.
				NewLoopID->replaceOperandWith(0, NewLoopID);
				Latch->getTerminator()->setMetadata("llvm.loop", NewLoopID);
				}

				bool VecClone::runOnModule(Module &M) {

				DEBUG(dbgs() << "\nExecuting SIMD Function Cloning ...\n\n");

				std::map<Function *, std::vector<StringRef>> FunctionsToVectorize;
				getFunctionsToVectorize(M, FunctionsToVectorize);

				// VectorParmMap contains the mapping of the parameter to the bitcast
				// instruction that casts the vector alloca for vector parameters to a scalar
				// pointer for use in the simd loop. When parameters are registerized, the
				// Value* in the map correponds directly to the function parameter. When
				// parameters are not registerized, then the Value* in the map is the original
				// scalar alloca before expansion. Later, users of the parameter, either
				// directly or through the alloca, are replaced with a gep using the bitcast
				// of the vector alloca for the parameter and the current loop induction
				// variable value.
				//
				// IMPORTANT NOTE: std::vector was used here because later we emit LLVM
				// instructions using the members of ParmRef, and these instructions should be
				// ordered consistently for easier testability.

				std::vector<ParmRef *> VectorParmMap;

				std::map<Function *, std::vector<StringRef>>::iterator VarIt;
				std::map<Function *, std::vector<StringRef>>::iterator VarEnd;
				for (VarIt = FunctionsToVectorize.begin(),
				VarEnd = FunctionsToVectorize.end();
				VarIt != VarEnd; ++VarIt) {

				Function &F = *(VarIt->first);
				std::vector<StringRef> Variants = VarIt->second;
				TargetTransformInfo *TTI =
				&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

				for (unsigned i = 0; i < Variants.size(); i++) {

				VectorVariant Variant(Variants[i], TTI);

				// Clone the original function.
				DEBUG(dbgs() << "Before SIMD Function Cloning\n");
				DEBUG(F.dump());
				Function *Clone = CloneFunction(F, Variant);
				Function::iterator EntryBlock = Clone->begin();
				BasicBlock::iterator FirstInst = EntryBlock->begin();
				ReturnInst *ReturnOnly = dyn_cast<ReturnInst>(FirstInst);

				if (isSimpleFunction(Clone, Variant, ReturnOnly)) {
				continue;
				}

				BasicBlock LoopBlock = splitEntryIntoLoop(Clone, Variant, &EntryBlock);
				BasicBlock *ReturnBlock = splitLoopIntoReturn(Clone, &Clone->back());
				BasicBlock *LoopExitBlock = createLoopExit(Clone, ReturnBlock);
				PHINode Phi = createPhiAndBackedgeForLoop(Clone, &EntryBlock, LoopBlock,
				LoopExitBlock, ReturnBlock,
				Variant.getVlen());

				// At this point, we've gathered some parameter information and have
				// restructured the function into an entry block, a set of blocks
				// forming the loop, a loop exit block, and a return block. Now,
				// we can go through and update instructions since we know what
				// is part of the loop.

				// Create a new vector alloca instruction for all vector parameters and
				// return. For parameters, replace the initial store to the old alloca
				// with the vector one. Users of the old alloca within the loop will be
				// replaced with a gep using this address along with the proper loop
				// index.

				Instruction *Mask = NULL;
				Instruction *ExpandedReturn = expandVectorParametersAndReturn(
				Clone, Variant, &Mask, &*EntryBlock, LoopBlock, ReturnBlock,
				VectorParmMap);
				updateScalarMemRefsWithVector(Clone, F, &*EntryBlock, ReturnBlock, Phi,
				VectorParmMap);

				// Update any linear variables with the appropriate stride. This function
				// will insert a mul/add sequence before the use of the parameter. For
				// linear pointer parameters, the stride calculation is just a mul
				// instruction using the loop induction var and the stride value on the
				// parameter. This mul instruction is then used as the index of the gep
				// that will be inserted before the next use of the parameter. The
				// function also updates the users of the parameter with the new
				// calculation involving the stride.
				updateLinearReferences(Clone, F, Variant, Phi);

				// Remove the old scalar instructions associated with the return and
				// replace with packing instructions.
				updateReturnBlockInstructions(Clone, ReturnBlock, ExpandedReturn);

				// Remove the old scalar allocas associated with vector parameters since
				// these have now been replaced with vector ones.
				removeScalarAllocasForVectorParams(VectorParmMap);

				for (auto *Parm : VectorParmMap) {
				delete Parm;
				}
				VectorParmMap.clear();

				// If this is the masked vector variant, insert the mask condition and
				// if/else blocks.
				if (Variant.isMasked()) {
				insertSplitForMaskedVariant(Clone, LoopBlock, LoopExitBlock, Mask, Phi);
				}

				DEBUG(dbgs() << "After SIMD Function Cloning\n");
				DEBUG(Clone->dump());

				// Disable unrolling from kicking in on the simd loop.
				disableLoopUnrolling(LoopExitBlock);

				} // End of function cloning for the variant
				} // End of function cloning for all variants

				return true; // LLVM IR has been modified
				}

				void VecClone::print(raw_ostream &OS, const Module *M) const {
				// TODO
				}

				ModulePass *llvm::createVecClonePass() { return new llvm::VecClone(); }

				char VecClone::ID = 0;

				static const char lv_name[] = "VecClone";
				INITIALIZE_PASS_BEGIN(VecClone, SV_NAME, lv_name, false /* modifies CFG */,
				false /* transform pass */)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(VecClone, SV_NAME, lv_name, false /* modififies CFG */,
				false /* transform pass */)

test/Transforms/VecClone/all_parm_types.ll

				; Test all different kinds of parameters (uniform, linear, vector), multiple uses of linear k, and that stride calculations can handle type conversions.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4uvl_dowork
				; CHECK: simd.loop:
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %stride.cast{{.}} = sext i32 %stride.mul{{.}}
				; CHECK: %stride.add{{.}} = add i64 %k, %stride.cast{{.}}
				; CHECK: %arrayidx = getelementptr inbounds float, float* %a, i64 %stride.add{{.*}}
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %stride.cast{{.}} = bitcast i32 %stride.mul{{.}} to float
				; CHECK: %stride.add{{.}} = fadd float %conv, %stride.cast{{.}}
				; CHECK: %add{{.}} = fadd float %add, %stride.add{{.}}

				; ModuleID = 'rfc.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define float @dowork(float* %a, float %b, i64 %k) #0 {
				entry:
				%arrayidx = getelementptr inbounds float, float* %a, i64 %k
				%0 = load float, float* %arrayidx, align 4, !tbaa !2
				%call = call float @sinf(float %0) #5
				%add = fadd float %call, %b
				%conv = sitofp i64 %k to float
				%add1 = fadd float %add, %conv
				ret float %add1
				}

				; Function Attrs: nounwind
				declare float @sinf(float) #1

				attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN4uvl_dowork,_ZGVcN8uvl_dowork,_ZGVdN8uvl_dowork,_ZGVeN16uvl_dowork,_ZGVbM4uvl_dowork,_ZGVcM8uvl_dowork,_ZGVdM8uvl_dowork,_ZGVeM16uvl_dowork" }
				attributes #1 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 6.0.0 (trunk 316400)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"float", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C/C++ TBAA"}

test/Transforms/VecClone/broadcast.ll

				; Check broadcast of a constant. The store of the constant should be moved inside of the loop.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4_foo
				; CHECK: simd.loop:
				; CHECK: store i32 99, i32* %ret.cast.gep

				; ModuleID = 'foo.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo() #0 {
				entry:
				ret i32 99
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM4_foo,_ZGVbN4_foo,_ZGVcM8_foo,_ZGVcN8_foo,_ZGVdM8_foo,_ZGVdN8_foo,_ZGVeM16_foo,_ZGVeN16_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/convert_linear.ll

				; Check handling of upconverting a linear (variable %i) to ensure stride calculation
				; is inserted correctly and the old convert (sext) uses the stride instead of the old
				; reference to %i.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN2vl_foo
				; CHECK: simd.loop:
				; CHECK: %0 = load i32, i32* %i.addr
				; CHECK-NEXT: %stride.mul = mul i32 1, %index
				; CHECK-NEXT: %stride.add = add i32 %0, %stride.mul
				; CHECK-NEXT: %conv = sext i32 %stride.add to i64

				; ModuleID = 'convert.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i64 @foo(i64 %x, i32 %i) #0 {
				entry:
				%x.addr = alloca i64, align 8
				%i.addr = alloca i32, align 4
				store i64 %x, i64* %x.addr, align 8
				store i32 %i, i32* %i.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%conv = sext i32 %0 to i64
				%1 = load i64, i64* %x.addr, align 8
				%add = add nsw i64 %conv, %1
				ret i64 %add
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM2vl_foo,_ZGVbN2vl_foo,_ZGVcM4vl_foo,_ZGVcN4vl_foo,_ZGVdM4vl_foo,_ZGVdN4vl_foo,_ZGVeM8vl_foo,_ZGVeN8vl_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/external_array.ll

				; Check to see that we are applying the correct updated linear index for an external array access gep.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4ul_foo
				; CHECK: simd.loop:
				; CHECK: %1 = load i32, i32* %i.addr
				; CHECK: %stride.mul = mul i32 1, %index
				; CHECK: %stride.add = add i32 %1, %stride.mul
				; CHECK: %idxprom = sext i32 %stride.add to i64
				; CHECK: %arrayidx = getelementptr inbounds [128 x i32], [128 x i32]* @ext_a, i64 0, i64 %idxprom
				; CHECK: store i32 %0, i32* %arrayidx

				; ModuleID = 'external_array_assign.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@ext_a = common global [128 x i32] zeroinitializer, align 16

				; Function Attrs: nounwind uwtable
				define void @foo(i32 %x, i32 %i) #0 {
				entry:
				%x.addr = alloca i32, align 4
				%i.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %i, i32* %i.addr, align 4
				%0 = load i32, i32* %x.addr, align 4
				%1 = load i32, i32* %i.addr, align 4
				%idxprom = sext i32 %1 to i64
				%arrayidx = getelementptr inbounds [128 x i32], [128 x i32]* @ext_a, i64 0, i64 %idxprom
				store i32 %0, i32* %arrayidx, align 4
				ret void
				}

				attributes #0 = { norecurse nounwind uwtable "vector-variants"="_ZGVbM4ul_foo,_ZGVbN4ul_foo,_ZGVcM8ul_foo,_ZGVcN8ul_foo,_ZGVdM8ul_foo,_ZGVdN8ul_foo,_ZGVeM16ul_foo,_ZGVeN16ul_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/linear.ll

				; Check to see that the linear parameter i is updated with the correct stride, indicated by a mul/add instruction sequence after the load.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4lu_foo
				; CHECK: simd.loop:
				; CHECK: %1 = load i32, i32* %i.addr
				; CHECK: %stride.mul = mul i32 1, %index
				; CHECK: %stride.add = add i32 %1, %stride.mul
				; CHECK: %add = add nsw i32 %0, %stride.add

				; ModuleID = 'linear.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %i, i32 %x) #0 {
				entry:
				%i.addr = alloca i32, align 4
				%x.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %x, i32* %x.addr, align 4
				%0 = load i32, i32* %x.addr, align 4
				%1 = load i32, i32* %i.addr, align 4
				%add = add nsw i32 %0, %1
				ret i32 %add
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM4lu_foo,_ZGVbN4lu_foo,_ZGVcM8lu_foo,_ZGVcN8lu_foo,_ZGVdM8lu_foo,_ZGVdN8lu_foo,_ZGVeM16lu_foo,_ZGVeN16lu_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/linear_mem2reg.ll

				; Check to see that the linear parameter i is updated with the correct stride when Mem2Reg is on.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4lu_foo
				; CHECK: simd.loop:
				; CHECK: %stride.mul = mul i32 1, %index
				; CHECK-NEXT: %stride.add = add i32 %i, %stride.mul
				; CHECK-NEXT: %add = add nsw i32 %x, %stride.add

				;ModuleID = 'linear.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %i, i32 %x) #0 {
				entry:
				%add = add nsw i32 %x, %i
				ret i32 %add
				}

				attributes #0 = { norecurse nounwind readnone uwtable "vector-variants"="_ZGVbM4lu_foo,_ZGVbN4lu_foo,_ZGVcM8lu_foo,_ZGVcN8lu_foo,_ZGVdM8lu_foo,_ZGVdN8lu_foo,_ZGVeM16lu_foo,_ZGVeN16lu_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/struct_linear_ptr.ll

				; Test that the stride is being applied correctly to struct field accesses.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN4l_foo
				; CHECK: simd.loop:
				; CHECK: %0 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %s.addr.gep{{.}} = getelementptr %struct.my_struct, %struct.my_struct %0, i32 %stride.mul{{.*}}
				; CHECK: %field1 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %s.addr.gep{{.*}}, i32 0, i32 0
				; CHECK: %1 = load float, float* %field1, align 8
				; CHECK: %2 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				; CHECK: %stride.mul{{.*}} = mul i32 1, %index
				; CHECK: %s.addr.gep{{.}} = getelementptr %struct.my_struct, %struct.my_struct %2, i32 %stride.mul{{.*}}
				; CHECK: %field5 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %s.addr.gep{{.*}}, i32 0, i32 4
				; CHECK: %3 = load float, float* %field5, align 8
				; CHECK: %add = fadd float %1, %3

				; ModuleID = 'struct_linear_ptr.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.my_struct = type { float, i8, i32, i16, float, i64 }

				; Function Attrs: nounwind uwtable
				define float @foo(%struct.my_struct* %s) #0 {
				entry:
				%s.addr = alloca %struct.my_struct*, align 8
				store %struct.my_struct* %s, %struct.my_struct** %s.addr, align 8
				%0 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				%field1 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %0, i32 0, i32 0
				%1 = load float, float* %field1, align 8
				%2 = load %struct.my_struct, %struct.my_struct* %s.addr, align 8
				%field5 = getelementptr inbounds %struct.my_struct, %struct.my_struct* %2, i32 0, i32 4
				%3 = load float, float* %field5, align 8
				%add = fadd float %1, %3
				ret float %add
				}

				attributes #0 = { norecurse nounwind readonly uwtable "vector-variants"="_ZGVbM4l_foo,_ZGVbN4l_foo,_ZGVcM8l_foo,_ZGVcN8l_foo,_ZGVdM8l_foo,_ZGVdN8l_foo,_ZGVeM16l_foo,_ZGVeN16l_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/two_vec_sum.ll

				; Do a sanity check on the structure of the LLVM that VecClone produces for the non-masked variant.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; Begin non-masked variant checking
				; NOTE: This test checks order very strictly and can change depending on optimization level used.
				; FYI, the IR here was generated using -O0 in the event an issue needs to be reproduced.

				; CHECK-LABEL: <4 x i32> @_ZGVbN4vv_vec_sum(<4 x i32> %i, <4 x i32> %j)
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %vec.i = alloca <4 x i32>
				; CHECK-NEXT: %vec.j = alloca <4 x i32>
				; CHECK-NEXT: %vec.retval = alloca <4 x i32>
				; CHECK-NEXT: store <4 x i32> %i, <4 x i32>* %vec.i
				; CHECK-NEXT: store <4 x i32> %j, <4 x i32>* %vec.j
				; CHECK-NEXT: %vec.i.cast = bitcast <4 x i32>* %vec.i to i32*
				; CHECK-NEXT: %vec.j.cast = bitcast <4 x i32>* %vec.j to i32*
				; CHECK-NEXT: %ret.cast = bitcast <4 x i32>* %vec.retval to i32*
				; CHECK-NEXT: br label %simd.loop

				; CHECK: simd.loop:
				; CHECK-NEXT: %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				; CHECK-NEXT: %vec.i.cast.gep = getelementptr i32, i32* %vec.i.cast, i32 %index
				; CHECK-NEXT: %0 = load i32, i32* %vec.i.cast.gep, align 4
				; CHECK-NEXT: %vec.j.cast.gep = getelementptr i32, i32* %vec.j.cast, i32 %index
				; CHECK-NEXT: %1 = load i32, i32* %vec.j.cast.gep, align 4
				; CHECK-NEXT: %add = add nsw i32 %0, %1
				; CHECK-NEXT: %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				; CHECK-NEXT: store i32 %add, i32* %ret.cast.gep
				; CHECK-NEXT: br label %simd.loop.exit

				; CHECK: simd.loop.exit:
				; CHECK-NEXT: %indvar = add nuw i32 %index, 1
				; CHECK-NEXT: %vl.cond = icmp ult i32 %indvar, 4
				; CHECK-NEXT: br i1 %vl.cond, label %simd.loop, label %return

				; CHECK: return:
				; CHECK-NEXT: %vec.ret.cast = bitcast i32* %ret.cast to <4 x i32>*
				; CHECK-NEXT: %vec.ret = load <4 x i32>, <4 x i32>* %vec.ret.cast
				; CHECK-NEXT: ret <4 x i32> %vec.ret

				; ModuleID = 'two_vec_sum.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @vec_sum(i32 %i, i32 %j) #0 {
				entry:
				%i.addr = alloca i32, align 4
				%j.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %j, i32* %j.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%1 = load i32, i32* %j.addr, align 4
				%add = add nsw i32 %0, %1
				ret i32 %add
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16vv_vec_sum" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/two_vec_sum_mask.ll

				; Do a sanity check on the structure of the LLVM that VecClone produces for the masked variant.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s
				; NOTE: This test checks order very strictly and can change depending on optimization level used.
				; FYI, the IR here was generated using -O0 in the event an issue needs to be reproduced.

				; Begin non-masked variant checking

				; CHECK-LABEL: <4 x i32> @_ZGVbM4vv_vec_sum(<4 x i32> %i, <4 x i32> %j, <4 x i32> %mask)
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %vec.i = alloca <4 x i32>
				; CHECK-NEXT: %vec.j = alloca <4 x i32>
				; CHECK-NEXT: %vec.mask = alloca <4 x i32>
				; CHECK-NEXT: %vec.retval = alloca <4 x i32>
				; CHECK-NEXT: store <4 x i32> %i, <4 x i32>* %vec.i, align 4
				; CHECK-NEXT: store <4 x i32> %j, <4 x i32>* %vec.j, align 4
				; CHECK-NEXT: store <4 x i32> %mask, <4 x i32>* %vec.mask
				; CHECK-NEXT: %vec.i.cast = bitcast <4 x i32>* %vec.i to i32*
				; CHECK-NEXT: %vec.j.cast = bitcast <4 x i32>* %vec.j to i32*
				; CHECK-NEXT: %ret.cast = bitcast <4 x i32>* %vec.retval to i32*
				; CHECK-NEXT: %mask.cast = bitcast <4 x i32>* %vec.mask to i32*
				; CHECK-NEXT: br label %simd.loop

				; CHECK: simd.loop:
				; CHECK-NEXT: %index = phi i32 [ 0, %entry ], [ %indvar, %simd.loop.exit ]
				; CHECK-NEXT: %mask.gep = getelementptr i32, i32* %mask.cast, i32 %index
				; CHECK-NEXT: %mask.parm = load i32, i32* %mask.gep
				; CHECK-NEXT: %mask.cond = icmp ne i32 %mask.parm, 0
				; CHECK-NEXT: br i1 %mask.cond, label %simd.loop.then, label %simd.loop.else

				; CHECK: simd.loop.then:
				; CHECK-NEXT: %vec.i.cast.gep = getelementptr i32, i32* %vec.i.cast, i32 %index
				; CHECK-NEXT: %0 = load i32, i32* %vec.i.cast.gep, align 4
				; CHECK-NEXT: %vec.j.cast.gep = getelementptr i32, i32* %vec.j.cast, i32 %index
				; CHECK-NEXT: %1 = load i32, i32* %vec.j.cast.gep, align 4
				; CHECK-NEXT: %add = add nsw i32 %0, %1
				; CHECK-NEXT: %ret.cast.gep = getelementptr i32, i32* %ret.cast, i32 %index
				; CHECK-NEXT: store i32 %add, i32* %ret.cast.gep
				; CHECK-NEXT: br label %simd.loop.exit

				; CHECK: simd.loop.else:
				; CHECK-NEXT: br label %simd.loop.exit

				; CHECK: simd.loop.exit:
				; CHECK-NEXT: %indvar = add nuw i32 %index, 1
				; CHECK-NEXT: %vl.cond = icmp ult i32 %indvar, 4
				; CHECK-NEXT: br i1 %vl.cond, label %simd.loop, label %return

				; CHECK: return:
				; CHECK-NEXT: %vec.ret.cast = bitcast i32* %ret.cast to <4 x i32>*
				; CHECK-NEXT: %vec.ret = load <4 x i32>, <4 x i32>* %vec.ret.cast
				; CHECK-NEXT: ret <4 x i32> %vec.ret

				; ModuleID = 'two_vec_sum.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @vec_sum(i32 %i, i32 %j) #0 {
				entry:
				%i.addr = alloca i32, align 4
				%j.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				store i32 %j, i32* %j.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%1 = load i32, i32* %j.addr, align 4
				%add = add nsw i32 %0, %1
				ret i32 %add
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16vv_vec_sum" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/two_vec_sum_mem2reg.ll

				; Check to be sure that when Mem2Reg is on that all updates to instructions referring to the original
				; parameter are updated correctly. When Mem2Reg is on, instructions will refer to the parameters
				; directly and not through a load, which is why this is tested separately.

				; Note: the LLVM IR used as input to this test has already had Mem2Reg applied to it, so no need to
				; do that here. This happens at higher optimization levels such as -O2.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; Begin non-masked variant checking

				; CHECK-LABEL: @_ZGVbN4vv_vec_sum
				; CHECK: simd.loop:
				; CHECK: %vec.i.cast.gep = getelementptr i32, i32* %vec.i.cast, i32 %index
				; CHECK: %vec.i.elem = load i32, i32* %vec.i.cast.gep
				; CHECK: %vec.j.cast.gep = getelementptr i32, i32* %vec.j.cast, i32 %index
				; CHECK: %vec.j.elem = load i32, i32* %vec.j.cast.gep
				; CHECK: %add = add nsw i32 %vec.i.elem, %vec.j.elem

				; ModuleID = 'two_vec_sum.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @vec_sum(i32 %i, i32 %j) #0 {
				entry:
				%add = add nsw i32 %i, %j
				ret i32 %add
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16vv_vec_sum" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/uniform.ll

				; Check to make sure the initial parameter store of the uniform parameter is sunk into the loop.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: <4 x i32> @_ZGVbN4u_foo(i32 %b)
				; CHECK: simd.loop:
				; CHECK: store i32 %b

				; ModuleID = 'uniform.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %b) #0 {
				entry:
				%b.addr = alloca i32, align 4
				store i32 %b, i32* %b.addr, align 4
				%0 = load i32, i32* %b.addr, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %b.addr, align 4
				%1 = load i32, i32* %b.addr, align 4
				ret i32 %1
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4u_foo,_ZGVbN4u_foo,_ZGVcM8u_foo,_ZGVcN8u_foo,_ZGVdM8u_foo,_ZGVdN8u_foo,_ZGVeM16u_foo,_ZGVeN16u_foo" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/Transforms/VecClone/vector_ptr.ll

				; Test that vector of pointers are handled with correctly in loop and that incompatible function return/arg attributes are removed.

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: @_ZGVbN2v_dowork
				; CHECK: simd.loop:
				; CHECK: %vec.p.cast.gep = getelementptr float, float* %vec.p.cast, i32 %index
				; CHECK: %vec.p.elem = load float, float* %vec.p.cast.gep
				; CHECK: %add.ptr = getelementptr inbounds float, float* %vec.p.elem, i64 1
				; CHECK: %ret.cast.gep = getelementptr float, float* %ret.cast, i32 %index
				; CHECK: store float* %add.ptr, float** %ret.cast.gep

				source_filename = "vector_ptr.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: norecurse nounwind readnone uwtable
				define nonnull float* @dowork(float* readnone %p) local_unnamed_addr #0 {
				entry:
				%add.ptr = getelementptr inbounds float, float* %p, i64 1
				ret float* %add.ptr
				}

				attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" "vector-variants"="_ZGVbN2v_dowork,_ZGVcN4v_dowork,_ZGVdN4v_dowork,_ZGVeN8v_
				dowork,_ZGVbM2v_dowork,_ZGVcM4v_dowork,_ZGVdM4v_dowork,_ZGVeM8v_dowork" }

test/Transforms/VecClone/void_foo.ll

				; Check to make sure we can handle void foo() function

				; RUN: opt -vec-clone -S < %s \| FileCheck %s

				; CHECK-LABEL: void @_ZGVbN4_foo()
				; CHECK: entry:
				; CHECK: ret void

				; ModuleID = 'foo.c'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define void @foo() #0 {
				entry:
				ret void
				}

				attributes #0 = { nounwind uwtable "vector-variants"="_ZGVbM4_foo1,_ZGVbN4_foo1,_ZGVcM8_foo1,_ZGVcN8_foo1,_ZGVdM8_foo1,_ZGVdN8_foo1,_ZGVeM16_foo1,_ZGVeN16_foo1" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

tools/bugpoint/bugpoint.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	#endif
initializeObjCARCOpts(Registry);		initializeObjCARCOpts(Registry);
initializeVectorization(Registry);		initializeVectorization(Registry);
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
		initializeVecClonePass(Registry);

#ifdef LINK_POLLY_INTO_TOOLS		#ifdef LINK_POLLY_INTO_TOOLS
polly::initializePollyPasses(Registry);		polly::initializePollyPasses(Registry);
#endif		#endif

if (std::getenv("bar") == (char*) -1) {		if (std::getenv("bar") == (char*) -1) {
InitializeAllTargets();		InitializeAllTargets();
InitializeAllTargetMCs();		InitializeAllTargetMCs();
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

tools/opt/opt.cpp

Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeObjCARCOpts(Registry);		initializeObjCARCOpts(Registry);
initializeVectorization(Registry);		initializeVectorization(Registry);
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
		initializeVecClonePass(Registry);
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
initializeScalarizeMaskedMemIntrinPass(Registry);		initializeScalarizeMaskedMemIntrinPass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
▲ Show 20 Lines • Show All 391 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

VecClone PassNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 121035

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/Analysis/VectorVariant.h

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Utils/VecClone.h

lib/Analysis/CMakeLists.txt

lib/Analysis/TargetTransformInfo.cpp

lib/Analysis/VectorVariant.cpp

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Utils/CMakeLists.txt

lib/Transforms/Utils/VecClone.cpp

test/Transforms/VecClone/all_parm_types.ll

test/Transforms/VecClone/broadcast.ll

test/Transforms/VecClone/convert_linear.ll

test/Transforms/VecClone/external_array.ll

test/Transforms/VecClone/linear.ll

test/Transforms/VecClone/linear_mem2reg.ll

test/Transforms/VecClone/struct_linear_ptr.ll

test/Transforms/VecClone/two_vec_sum.ll

test/Transforms/VecClone/two_vec_sum_mask.ll

test/Transforms/VecClone/two_vec_sum_mem2reg.ll

test/Transforms/VecClone/uniform.ll

test/Transforms/VecClone/vector_ptr.ll

test/Transforms/VecClone/void_foo.ll

tools/bugpoint/bugpoint.cpp

tools/opt/opt.cpp

VecClone Pass
Needs ReviewPublic