VecClone Pass
Needs ReviewPublic

Authored by mmasten on Jul 25 2016, 5:37 PM.



This work is part of an RFC sent by Xinmin back on 3/2/2016 regarding explicit function vectorization. The VecClone pass translates functions marked with "#pragma omp declare simd" into vector length trip count loops. Please see the RFC for details. It can be found at

Diff Detail

mmasten retitled this revision from to VecClone Pass.Jul 25 2016, 5:37 PM
mmasten updated this object.
mmasten added reviewers: mzolotukhin, hfinkel, Ayal.

Thanks for the nicely commented code and the provided tests.
A few inlined comments after a quick overview.

146 ↗(On Diff #65447)

Why do all these need to be public APIs?


Why does this need to be in a public header?

564 ↗(On Diff #65447)

This is all not very inefficient:

  1. getVectorVariantAttributes creates a temporary vector and copy attributes in it, just for iterating over a few lines later to convert these as strings (you can't keep StringRef because the attribute was copied I guess).
  2. FuncVars is queried two times for no apparent good reason.

It seems that the only use for this function is to populate a list of function to "vectorize". Populating a "vector<Function *>" should be enough for that. A nice property would be to be able to figure easily/efficiently if a function needs to be "vectorized".
Maybe reworking the attribute to be more "structured" instead of separate strings.


Some overview with an example/overview of what the pass accomplish would be nice. It is already captured in the RFC, but for someone reading the code it'd be helpful.


I think it should be spelled-out more clearly that it is required for correctness to have all the variants mentioned in the attribute list to be codegen'd, even at O0, because even if the vectorizer does not run in this module, it may run in another module that would expect these variant to exist.


So this will walk all the functions in the module and walk all the attributes and do a string comparison on all of these.
I'm not sure if it is fine to pay this when this pass has nothing to do (i.e. the early exit should be *fast*).


Coding style: no braces (other places as well).


Spurious empty line (other places as well).


What about a SmallVector and move it out-of-the loop (the call to clear() below is already handling the reset between iterations).

Thanks for the comments, Mehdi. I had some other things come up, but I'm making some corrections now.

mmasten added inline comments.Sep 1 2016, 10:44 AM
146 ↗(On Diff #65447)

They don't need to be. I can move these inside the VecClone class. Some of these could eventually become more generalized utilities, but I'll change it for now.


Do you mean that the class definition should be moved to VecClone.cpp?

564 ↗(On Diff #65447)

I will change #2. We need a little more than vector<Function*> here because each simd function will have multiple vector variants. FuncVars is a 1-many mapping of the original function to the string encodings corresponding to the variants that will be generated later. The string representation is essentially what is defined in We can easily tell which functions need to be vectorized by checking if any of these encodings exist as attributes on the original function. Can you elaborate on what you mean by "more structured"? Are you suggested an alternative representation to string-based function attributes? Thanks


We could selectively run the VecClone pass from PassManagerBuilder based on whether the OpenMP switch has been used. Otherwise, I don't know of another way to figure out which functions will need to be generated. Do you have a suggestion?

mehdi_amini added inline comments.Sep 1 2016, 12:11 PM
564 ↗(On Diff #65447)

Let me clarify what I meant with a vector<Function *>.
I meant that my take on getFunctionsToVectorize was that it sole purpose was to set a *single* entry in FuncVars, and this entry is a std::vector<std::string> containing the list of variants.

This is used exactly *once* here:

for (auto& pair : FunctionsToVectorize) {
    Function& F = *pair.first;
    DeclaredVariants &DeclaredVariants = pair.second;

If FunctionsToVectorize was a vector of Function *, you could get the list of DeclaredVariants on the fly while iterating on the attributes, without creating any temporary "string list".

mehdi_amini added inline comments.Sep 1 2016, 12:17 PM
564 ↗(On Diff #65447)

To come back to the "more structured", the question is "how to get the list of function that need variants very quickly/cheaply".
Having to look at all the attributes and do any kind of string processing *when we need to generate a variant* is not a problem, I'm more interested in doing as little work as possible when there is no variant.
Maybe having an attribute on the function which is "hasVariants" and could be queried cheaply could be a solution. I'd have to look at how attributes works internally again.

(Ideally I'd like a storage key->value where the key would be "variants" and the value would be a list of variants to generate, instead of the flat list where variants strings are mixed with the others and recognized by "magic" pattern matching)

mmasten updated this revision to Diff 73523.Oct 4 2016, 11:53 AM
mmasten marked an inline comment as done.
sodeh added a subscriber: sodeh.Nov 16 2016, 1:46 AM
simoll added a subscriber: simoll.Jan 23 2017, 1:22 AM
hfinkel added inline comments.Jan 27 2017, 4:14 AM

There's a lot of target information here in the target-independent code. Given that we're not going to vectorize without a target code model regardless, I'd like to see this information pushing into TargetTransformInfo. VecClone can then use TTI to convert the particular ISA tags into information about vector lengths, etc. Other architectures that are adapting this scheme can then extend this in a natural way.

564 ↗(On Diff #65447)

I'd like to come back to this. I agree with Mehdi, having unadorned mangled names as attributes isn't the right design - the "magic" pattern matching is unnecessary. It would seem much better to use something like:

attributes #0 = { "vector-variants"="_ZGVbM4l_foo,_ZGVbN4l_foo,_ZGVcM8l_foo,_ZGVcN8l_foo,_ZGVdM8l_foo,_ZGVdN8l_foo,_ZGVeM16l_foo,_ZGVeN16l_foo"

instead of:

attributes #0 = { hasvectorvariants "_ZGVbM4l_foo" "_ZGVbN4l_foo" "_ZGVcM8l_foo" "_ZGVcN8l_foo" "_ZGVdM8l_foo" "_ZGVdN8l_foo" "_ZGVeM16l_foo" "_ZGVeN16l_foo"

exactly like we do for "target-features".

mmasten updated this revision to Diff 87567.Feb 7 2017, 4:58 PM

New changes are:

  1. Update function attributes to "vector-variants"="<variant list>" format.
  2. Move target-specific code in the VectorVariant class to TTI.

Thanks for the feedback, Hal. I made the changes you suggested.

fpetrogalli added inline comments.Sep 1 2017, 3:14 AM

Hello Matt, thank you for working on this.

<nitpicking>Should the comment be updated with the "vector-variants"="_ZGVbM4ul_, ZGVbN4ul_" syntax? </nitpicking>



Hello again,

I think it would be good to have references to the document that describes the vector ABI. Is it the one at [1]?

If so, I think it would be good to cover with tests the "linear clause" cases of Table 1 in section 2.4, "Element Data Type to Vector Data Type Mapping".



fpetrogalli added inline comments.Sep 1 2017, 3:46 AM

Section 2.6 "Vector Function Name Mangling" of [1] maps the IsaClass to other values, "x', 'y'. Are you basing this on [2] for gcc compatibility?
Why do [1] and [2] differ?



hfinkel added inline comments.Sep 1 2017, 4:04 PM

This function doesn't return anything, and I think that makes the example hard to understand.


I don't understand what's going on here. Is it not possible to write the transformation such that it's semantically correct regardless of whether we've run mem2reg? This is not always an either/or situation.

Also, does the fact that we now have the VPlan infrastructure in the vectorizer change, in any way, how we'd like to approach this problem?