This is an archive of the discontinued LLVM Phabricator instance.

[VectorUtils] Introduce the Vector Function Database (VFDatabase).
ClosedPublic

Authored by fpetrogalli on Sep 13 2019, 2:40 PM.

Details

Summary
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]

In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.

The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...)  the system should prefer, when multiple versions with the
same VFShape are present.

Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.

[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.

Differential Revision: https://reviews.llvm.org/D67572

Event Timeline

fpetrogalli created this revision.Sep 13 2019, 2:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2019, 2:40 PM

This is a follow up patch to https://reviews.llvm.org/D66024

It is a WIP becase I want to seek feedback before doing a full implementation with tests.

The current tests/unittests pass without failures, so at least it seems I haven't broken anything :)

I am in particular interested to know what you think about the way I have deferred the question "isFunctionVectorizable" and related "get" method from the TargetLibraryInfo (TLI). I have marched such methods as private, so that with this patch the only way to ask such questions around vectorizable function is to go through the SearchVFSystem class.

Notice that this patch doesn't implement the interface explained in the RFC at http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html (yet).

One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?

Kind regards,

Francesco

I'm fine with the general idea. The non-WIP patch needs documentation and a test for the new functionality (non-TLI-based selection).

One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?

Could you add it to the attribute list of the call site at AttributeList::FunctionIndex?

One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?

Could you add it to the attribute list of the call site at AttributeList::FunctionIndex?

Thank you for the suggestion. I have looked into this but it seems that adding an attribute to a CallInst in the AttributeList::FunctionIndex has the effect of juts adding the attribute to the function being called.

One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?

Could you add it to the attribute list of the call site at AttributeList::FunctionIndex?

Thank you for the suggestion. I have looked into this but it seems that adding an attribute to a CallInst in the AttributeList::FunctionIndex has the effect of juts adding the attribute to the function being called.

The attribute lists are different. You might have added it to the function one but you can add it to the CallBase one as well.

One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?

Could you add it to the attribute list of the call site at AttributeList::FunctionIndex?

Thank you for the suggestion. I have looked into this but it seems that adding an attribute to a CallInst in the AttributeList::FunctionIndex has the effect of juts adding the attribute to the function being called.

The attribute lists are different. You might have added it to the function one but you can add it to the CallBase one as well.

Exactly

One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?

Could you add it to the attribute list of the call site at AttributeList::FunctionIndex?

Thank you for the suggestion. I have looked into this but it seems that adding an attribute to a CallInst in the AttributeList::FunctionIndex has the effect of juts adding the attribute to the function being called.

The attribute lists are different. You might have added it to the function one but you can add it to the CallBase one as well.

Exactly

Yep! I have a working version that I cleaning up for submission.

fpetrogalli retitled this revision from [SVFS] SearchVFSystem interface (WIP). to [SVFS] The Search Vector Function System..
fpetrogalli edited the summary of this revision. (Show Details)
fpetrogalli added reviewers: hsaito, ABataev.
fpetrogalli marked an inline comment as done.Nov 4 2019, 1:22 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
58

I have added this to make it work with the TLI. An alternative would be to add an extra field to the VecDesc instances stored in the TLI to hold also the ISA information. I am happy to do so (or to investigate any other alternative approach) is the solution proposed in this patch is not convincing.

fpetrogalli marked an inline comment as done.Nov 4 2019, 1:32 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
231

The VFISAKind::LLVM_INTERNAL_TLI value, at the moment, is restricting the use of the SVFS to only those functions that are listed in the TLI. This makes me wonder whether it would be better to move the VFISAKind from the VFShape to the VFInfo. This could be justified under the assumption that a scalar function can be mapped to multiple vector functions with the same vectorization shape (same FunctionType), just with a different underlying ISA.

Any thoughts?

fpetrogalli marked an inline comment as done.Nov 5 2019, 11:05 AM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
206–208

This method fixUpVFABINames could be invoked once directly in the LoopVectorizer (runOnFunction), so that:

  1. it avoids calling it every time the SVFS is instantiated, by doing it once for all the calls in the module (or function).
  2. it allows the SVFS to be independent on the TLI.

This seems a better solution to me - any opinions?

jdoerfert added inline comments.Nov 6 2019, 12:03 PM
llvm/include/llvm/Analysis/TargetLibraryInfo.h
201

Documentation missing, and please do not call it "fixup", maybe "adjust" or something but not "fixup".

263

Why do you privatize these redirects? I would expect them to be used, thus public, or unused, thus deleted.

llvm/include/llvm/Analysis/VectorUtils.h
58

I generally dislike non-scoped macros. Is there a reason a static constexpr char* doesn't work?
I would also opt for a shorter name, "_llvm_" or "_llvm_tli_" but that is debatable.

133

I doubt this change is necessary. Only if you need to own the underlying data you need to replace StringRef with std::string.

143

I'd prefer
void getVectorVariantNames(CallInst *CI, SmallSet<std::string, 8> &VariantMappings);
but for sure you want a reference here:

void setVectorVariantNames(CallInst *CI,
                           const SmallSet<std::string, 8> &VariantMappings);
fpetrogalli marked 5 inline comments as done.Nov 6 2019, 2:00 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/TargetLibraryInfo.h
263

I am privatizing these because isFunctionVectorizable and getVectorizedFunction to be asked to any component other than the SearchVFSystem introduced in this patch.

After the last rework, I think this patch deserves a bit of a summary.

The patch as it is should be split in at least the following components, that needs to be done one by one in sequence:

  1. Methods in the VFABI namespace to read and set the vector-function-abi-variant attribute. In the current patch, this is done with the methods [get|set]VectorVariantNames.
  2. An early pass to be run early in the opt pass sequence, at least before the LoopVectorize pass and any of the analysis passes needed for vectorization, that populates the module with the attribute that describes the TLI mappings. This is currently _hacked_ in this patch by invoking addMappingsFromTLI in LoopVectorizer.cpp.
  3. Once we have the TLI mappings in the IR, we can add the SearchVFSystem and use it's search mechanism to perform vectorization (an example of how this works is already in this patch)

Now, once we have this first item in places, we can extend the getVectorizedFunction interface to be more clever, and ask questions also areound properties of the vector function like "linear" and "uniform" parameters.

I would like to ask to the reviewers if this implementation sequence I propose makes sense. Please le t me know if you have any question.

Also, I have added a couple of people that I know have opinion on the vectorizers. I am sure there are more out there, but the list of contributors to all the places in LLVM that this patch touches is quite extended and it is difficult for me to make a choice.

I have implemented the first item of the breakdown described in https://reviews.llvm.org/D67572#1736230 in patch https://reviews.llvm.org/D69976 - read/write methods for the VFABI attribute.

Thanks all,

Francesco

This is an update of the code after extracting the read/write function of the VFABI attribute in https://reviews.llvm.org/D69976.

jdoerfert added inline comments.Nov 7 2019, 6:02 PM
llvm/include/llvm/Analysis/VectorUtils.h
203

Arguably, you could just call getVFMappings and cache the result. Whoever created a SearchVFSystem will need the values eventually and this way the isFucntionVectorizable call is free.

CI->getModule()

The explicit and delete seems a lot of overhead, make the member a CallInst &CI and you should not need any of it.

245

Typos in the above TODO.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2433–2434

Leftover?

llvm/lib/Analysis/TargetLibraryInfo.cpp
22 ↗(On Diff #228337)

Leftover?

fpetrogalli marked 3 inline comments as done.

Changes:

  1. Update the patch to use the current version of the VFABI read/write methods at https://reviews.llvm.org/D69976
  2. Address review from @jdoerfert

Note: in the next step I will extract from this patch the changes that are needed for the internal ISA for the TLI functions.

fpetrogalli marked 2 inline comments as done.Nov 8 2019, 3:12 PM

Internal vector function mangling for TLI added at https://reviews.llvm.org/D70089

I have rebased the changes after the addition of the pass for injecting the TLI calls.

The vectorization process itself works, but I don't undertstand why a
bunch of tests are failing - I must have done somethign wrong in the
initialization of the pass.

Most (if not all) failures assert on the following:

Unable to schedule 'Scalar Evolution Analysis' required by 'Loop Vectorization'
Unable to schedule pass
UNREACHABLE executed at /home/frapet01/projects/upstream-clang/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1289!

Down to two failures.

Failing Tests (2):

LLVM :: Other/pass-pipelines.ll
LLVM :: Transforms/SimplifyCFG/HoistCode.ll

The latter probably caused by a recent rebase.

The former is a genuine failure of this patch.

In this patch:

  1. The SearchVFSystem has been renamed to VFDatabase.
  1. The query interface of the VFDatabase has been reduced to only two functions.
  1. The field VFIsaKind ISA of VFShape has been moved to VFInfo. This change is justified in the new commit message.
fpetrogalli retitled this revision from [SVFS] The Search Vector Function System. to [VectorUtils] Introduce the Vector Function Database (VFDatabase)..Nov 18 2019, 3:48 PM
fpetrogalli edited the summary of this revision. (Show Details)

Thanks @fpetrogalli for updating this patch!

Down to two failures.
Failing Tests (2):

Just checking, are these failures resolved in the latest revision?

llvm/include/llvm/Analysis/VectorUtils.h
94

nit: retrive -> retrieve

99

I don't really understand what you mean by a "flat" vectorization shape. Is this function supposed to return a 'widened' (vector version of a) function? If so, can we please rename this to getVectorShapeForCall ?

102

nit: unnecessary curly braces

105

is HasGlobalPred the same as isPredicated?

Also, is the predicate always known/expected to be the last parameter by the Vector ABI?

152

nit: in a class members are private by default.

153

nit: put comments above variable

162

StringRef ?

171

nit: These two if-statements can be merged with a &&

173

should CI.getModule()->getFunction(Shape.getValue().VectorName) be an assert? When would it ever happen that the vector function is not declared * in the IR?

*note that I'm specifically saying declared and not defined here.

200

Is it worth implementing a operator< for VFShape and sorting the result for getMappings()?
That way we can use binary search using llvm::lower_bound instead of looping through each shape in ScalarToVectorMappings.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

getMappings is quite an expensive call, so you'll want to add a special function here that bails out earlier, and doesn't have to demangle string and populate (and possibly sort) an array.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3254

the call to isFunctionVectorizable is expensive, so please reorder this after CI->isNoBuiltin(), so that the function can bail out more cheaply.

fpetrogalli added inline comments.Nov 20 2019, 7:00 AM
llvm/include/llvm/Analysis/VectorUtils.h
99

Yeah, the name is misleading. This is supposed to return a widenened version of the function... but I don't like the name getVectorShapeForCall, because a Shape for a Vector Call can have Linear modifiers for example, while this function is returning only the shape that uses VFParamKind::Vector for all VFPArameters.

How about getAllVectorsShape?

Thanks @fpetrogalli for updating this patch!

Down to two failures.
Failing Tests (2):

Just checking, are these failures resolved in the latest revision?

Yes, all good:

[1440/1440] Running the LLVM regression tests

Testing Time: 269.21s
  Expected Passes    : 33842
  Expected Failures  : 149
  Unsupported Tests  : 451
fpetrogalli marked 15 inline comments as done.

Address review comments from @sdesmalen. Thank you!

llvm/include/llvm/Analysis/VectorUtils.h
99

I explained the meaning of the function in the comment.

99

I have renamed to VFShape::getAllVectorsParams.

105

is HasGlobalPred the same as isPredicated?

Th GlobalPredicate is listed in the VFParamKind enum class:

  GlobalPredicate,   // Global logical predicate that acts on all lanes
                     // of the input and output mask concurrently. For
		     // example, it is implied by the `M` token in the
		     // Vector Function ABI mangled name.

It is a special predicate because it differ from the parameter masks that can be individually attached to the parameters. These kind of masks are not handled yet by the VFParamKind, but I understand that they were needed by @simoll when discussing the RFC.

Also, is the predicate always known/expected to be the last parameter by the Vector ABI?

For all the Vector Function ABIs supported at the moment in LLVM, yes. If vendor X decides to produce an ABI were the mask is the first parameter, we will have to change this code. But for now, we will assume the global predicate is the last parameter.

173

should CI.getModule()->getFunction(Shape.getValue().VectorName) be an assert? When would it ever happen that the vector function is not declared * in the IR?

Yes, done. In fact, and IR where there is a VFABI attribute with a mapping to function X, but doesn't have X declared, shoudl be considered broken.

*note that I'm specifically saying declared and not defined here.

You are reaching enlightenment! :)

200

sort them by what field? VF?

Also, I don't expect an attribute to hold more than 8 functions , which seems to be the worst case scenario when all X86 vector extensions are being used... are you sure you want to add such optimization.

How about we leave it as it is (slow but simple) and leave optimizations for the future if it turns out we need to speed up the search?

I have optimized the getVFABIMappings method of VFDatabase with an
early exit if the VFABI attribute is empty.

fpetrogalli marked 2 inline comments as done.Nov 20 2019, 9:23 AM
fpetrogalli added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

Done, check the implementation of getVFABIMappings.

fpetrogalli marked an inline comment as done.

Remove commented code...

fpetrogalli marked an inline comment as done.Nov 20 2019, 2:35 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
99

I have extracted the VFShape API here: https://reviews.llvm.org/D70513 (it is a work in progress as of now, I might finish up things later today).

sdesmalen added inline comments.Nov 21 2019, 3:15 AM
llvm/include/llvm/Analysis/VectorUtils.h
99

I think this can take FunctionType instead of CallInst.

I have renamed to VFShape::getAllVectorsParams

nit: I know this is bikeshedding, but what do you think of VFShape::widenAllParams?

100

these parameters shouldn't use const.

100

Is there a reason to pass VF and IsScalable separately, instead of passing an ElementCount ?

105

It is a special predicate because it differ from the parameter masks that can be individually attached to the parameters.

I'd expect the operation to be predicated, not the individual operands. What would be the meaning of a predicated operand?

For all the Vector Function ABIs supported at the moment in LLVM, yes. If vendor X decides to produce an ABI were the mask is the first parameter, we will have to change this code.

Can we add an assert somewhere to enforce the assumption that the global predicate is passed as the last operand?

180

Please add a message to the assert.

188

this method doesn't do anything more than invoke getVFABIMappings so has no value (other than being public).

200

Alright, if it only has a few elements and is constructed to do a single lookup it is probably not worth the overhead of sorting.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

The early exit in getMappings doesn't stop a SmallVector from having to be created/destroyed. It would be better to create a new method such as bool hasVectorVariants() that answers the question directly.

fpetrogalli marked 16 inline comments as done.Nov 21 2019, 1:22 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
99

I think this can take FunctionType instead of CallInst

Yes, but... what's the point? We will have to introduce one or two extra calls to get the FunctionType... All we need is the number of arguments of the function, so in fact the function could just take an unsigned integer. But I don't like it. The fact that I use CallInst is cleaner and not worse than having an optimized interface. But this is my personal preference, so if you ask nicely :P I will change the interface.

nit: I know this is bike shedding, but what do you think of VFShape::widenAllParams?

I know where you come from, the Vectorizer :). I'd rather not, I really want to have a static get method attached to the VFShape. It seems to be in the style of LLVM to use get for static public member functions that build objects (see http://llvm.org/doxygen/classllvm_1_1VectorType.html). Since this is the only get method of VFShape, I have renamed it to get.

For the record, the changes have been applied to https://reviews.llvm.org/D70513

This function this disappear from this revision.

99

Please take any other further review of getAllVectorParams to https://reviews.llvm.org/D70513

100

these parameters shouldn't use const.

I disagree, especially for the CI argument. The method is not supposed to change the reference.

I have anyway removed the const from the other parameters.

(again, in https://reviews.llvm.org/D70513, not here, but I will update this patch)

105

I'd expect the operation to be predicated, not the individual operands. What would be the meaning of a predicated operand?

This was a requirement from @simoll, he needs to have masking associated to each of the operands.

Can we add an assert somewhere to enforce the assumption that the global predicate is passed as the last operand?

Is is done in https://reviews.llvm.org/D70513

188

this method doesn't do anything more than invoke getVFABIMappings so has no value (other than being public).

Yeah, but the value is int he comment inside of it, stating that other VFShapes can be build outside of a VFABI context. I'd prefer to keep it.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

I understand your concerns in performance here, but I am not keen in doing this. The attribute might contain junk that don't demangle correctly to a VFInfo - in that case the function would return "yes, this call has vector variant", but that wouldn't make sense because there would be no variant. I'd rather play it safe.

fpetrogalli marked 5 inline comments as done.

I have addressed the comments (some of the changes relative to the VFShape API are reflected in https://reviews.llvm.org/D70513).

I have also rebased the code on top of https://reviews.llvm.org/D70513.

Gentle ping after the merge of https://reviews.llvm.org/D70513.

Thank you,

Francesco

ABataev added inline comments.Dec 4 2019, 1:18 PM
llvm/include/llvm/Analysis/VectorUtils.h
98–100

Use \\\ style of comment here

147

No need for \brief

173

const auto &

fpetrogalli marked 2 inline comments as done.

Address review comments from @ABataev.

Thank you,

Francesco

fpetrogalli marked an inline comment as done.Dec 4 2019, 1:57 PM

Thanks Francesco, looks ok to me, but I'll leave to @jdoerfert or @sdesmalen to approve.

This revision is now accepted and ready to land.Dec 10 2019, 5:39 AM
This revision was automatically updated to reflect the committed changes.

Hi @fpetrogalli ,

A question regarding this patch.
For my out-of-tree target vectorization of intrinsics added for my target seems to have stopped working with this patch.
Is there something/what do I have to do to make the vectorizer understand my intrinsics are vectorizable?

Looking at this code in LoopVectorizationLegality.cpp:

// We handle calls that:
//   * Are debug info intrinsics.
//   * Have a mapping to an IR intrinsic.
//   * Have a vector version available.
auto *CI = dyn_cast<CallInst>(&I);
if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
    !isa<DbgInfoIntrinsic>(CI) &&
    !(CI->getCalledFunction() && TLI &&
      !VFDatabase::getMappings(*CI).empty())) {

VFDatabase::getMappings(*CI).empty() is indeed true for my intrisic, and if I dig further, I take this return in

static void getVFABIMappings(const CallInst &CI,
                             SmallVectorImpl<VFInfo> &Mappings) {
  const StringRef ScalarName = CI.getCalledFunction()->getName();
  const StringRef S =
      CI.getAttribute(AttributeList::FunctionIndex, VFABI::MappingsAttrName)
          .getValueAsString();
  if (S.empty())
    return;

Is there some existing commit where in-tree targets have been modified already to work with the new VFDatabase?

Thanks!

Hi @uabelho

Hi @fpetrogalli ,

A question regarding this patch.
For my out-of-tree target vectorization of intrinsics added for my target seems to have stopped working with this patch.

Ops... sorry!

Is there something/what do I have to do to make the vectorizer understand my intrinsics are vectorizable?

Yes, you need to add an attribute in the IR that maps the (scalar) attribute to its vector counterpart.

The attribute is called vector-function-abi-variant.

You can add it by using the following method in llvm/include/llvm/Transforms/Utils/ModuleUtils.h:

namespace VFABI {
/// Overwrite the Vector Function ABI variants attribute with the names provide
/// in \p VariantMappings.
void setVectorVariantNames(CallInst *CI,
                           const SmallVector<std::string, 8> &VariantMappings);
} // End VFABI namespace

The VariantMappins are strings that need to be generated according to some Vector Function ABI (VFABI). If your target doesn't have such ABI, you can use the LLVM internal mangling.

For example, say that your attribute is double @llvm.funky.intrinsic (double), and you need to map it to an unmasked vector function with a vectorization factor of two, say custom_vector_function, the string that you need to add in the attribute is __ZGV_LLVM_N2v_llvm.funky.intrinsic(custom_vector_function).

The name mangling rules are admittedly not well documented for the internal mangling, but other than for the ISA token (which is _LLVM_), they correspond to the ones of x86 and AArch64, which are the same (you can browse the latter here: https://github.com/ARM-software/software-standards/blob/master/abi/vfabia64/vfabia64.rst#vector-function-name-mangling)

I will definitely add some docs to explain more in detail the mangling rules, but for the moment you can look at the tests in llvm/unittests/Analysis/VectorFunctionABITest.cpp to get a sense of the meaning of the different tokens in the mangled name, especially the use of the <parameters> in _ZGV<isa><mask><vlen><parameters>_<scalarname>[(<redirection>)].

Looking at this code in LoopVectorizationLegality.cpp:

// We handle calls that:
//   * Are debug info intrinsics.
//   * Have a mapping to an IR intrinsic.
//   * Have a vector version available.
auto *CI = dyn_cast<CallInst>(&I);
if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
    !isa<DbgInfoIntrinsic>(CI) &&
    !(CI->getCalledFunction() && TLI &&
      !VFDatabase::getMappings(*CI).empty())) {

VFDatabase::getMappings(*CI).empty() is indeed true for my intrisic, and if I dig further, I take this return in

static void getVFABIMappings(const CallInst &CI,
                             SmallVectorImpl<VFInfo> &Mappings) {
  const StringRef ScalarName = CI.getCalledFunction()->getName();
  const StringRef S =
      CI.getAttribute(AttributeList::FunctionIndex, VFABI::MappingsAttrName)
          .getValueAsString();
  if (S.empty())
    return;

Is there some existing commit where in-tree targets have been modified already to work with the new VFDatabase?

Unless I have been missing something, all targets in the in-tree version are using VFDatabase now. The patch in this revision is what introduced the change.

Thanks!

I hope you find this useful. Let me know if you need more help with this, I am generally available on IRC and discord too.

Francesco

Hi,

Thanks for the reply!

Ok, I think I understand what is happening now at least.

We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.

This meant that before this patch

LoopVectorizationLegality::canVectorizeInstrs()

accepted to vectorize the loop since

TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())

returned true for the intrinsic.

Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since

TLI->isFunctionVectorizable(FnName, VF)

returned false.

So the loop was vectorized, but we got VF calls to the scalar version of the intrinsic, just as we wanted.

However, with this patch, the check in

LoopVectorizationLegality::canVectorizeInstrs()

now says false, since we do

VFDatabase::getMappings(*CI).empty()

and we indeed get empty mappings since we don't provide any vector version.

So the presence of an intrinsic that we don't provide a vector version for prevents vectorization of the entire loop, even if it would we totally ok to do VF calls to the scalar version instead.

Is this change in behavior intended?

fhahn added a comment.Dec 12 2019, 2:23 AM

Hi,

Thanks for the reply!

Ok, I think I understand what is happening now at least.

We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.

This meant that before this patch

LoopVectorizationLegality::canVectorizeInstrs()

accepted to vectorize the loop since

TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())

returned true for the intrinsic.

Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since

TLI->isFunctionVectorizable(FnName, VF)

returned false.

So the loop was vectorized, but we got VF calls to the scalar version of the intrinsic, just as we wanted.

However, with this patch, the check in

LoopVectorizationLegality::canVectorizeInstrs()

now says false, since we do

VFDatabase::getMappings(*CI).empty()

and we indeed get empty mappings since we don't provide any vector version.

So the presence of an intrinsic that we don't provide a vector version for prevents vectorization of the entire loop, even if it would we totally ok to do VF calls to the scalar version instead.

Is this change in behavior intended?

This change in behavior sounds a bit worrying for down-stream targets.

I think we should have at least an assertion making sure the check in LoopVEctorizationLegality succeeds in all cases it did previously with isFunctionVectorizable, otherwise down-stream targets will silently miss out on vectorization. But I think ideally the patch would preserve the existing behavior in the cases @uabelho described.

[...]

Is this change in behavior intended?

This change in behavior sounds a bit worrying for down-stream targets.

I think we should have at least an assertion making sure the check in LoopVEctorizationLegality succeeds in all cases it did previously with isFunctionVectorizable, otherwise down-stream targets will silently miss out on vectorization.

Yes that's exactly what happened to us. Or well, "silently", since our benchmark numbers went crazy after the merge :)

But I think ideally the patch would preserve the existing behavior in the cases @uabelho described.

Sounds good!

...

This meant that before this patch

LoopVectorizationLegality::canVectorizeInstrs()

accepted to vectorize the loop since

TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())

returned true for the intrinsic.

Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since

TLI->isFunctionVectorizable(FnName, VF)

returned false.

Ah, OK. I see what you mean. I think we could solve this by scalarizing after vectorization, instead of doing it in the vectorizer.

We could have a IR pass that runs after the vectorizer that looks for intrinsics calls .When it sees an intrinsics that operates on vector, it checks whether the target is able to lower it to some function or instruction. If not, it scalarizes it. With this we wouldn't have to introduce special behavior in the vectorizer for handling intrinsics: it could just vectorize any call to intrinsics for which vectorization make sense.

We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.

You can still use the name mangling of the VFABI attribute (for internal LLVM mangling) to map a scalar attribute to its vector version:

_ZGV_LLVM_N2v_llvm.custom.attribute(llvm.custom.attribute)

With this, the VFDatabase::getMappings(*CI).empty() woudl return false, so that the intrinsic would be vectorized as a regular function. Then, you could use the post-vectorization pass I mentioned in the previous message to scalarize it.

The mappings in the IR could be added by the frontend, or in a pre-vectorization pass if you don't want to touch the frontend.

fhahn added a comment.Dec 12 2019, 9:30 AM

It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.

It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.

I am happy to do so. Shall I just revert it in git and push the change, or is there a formal way to do it via phabricator or arc?

@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

Thank you,

Francesco

It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.

I am happy to do so. Shall I just revert it in git and push the change, or is there a formal way to do it via phabricator or arc?

Just pushing the revert in git is fine.

@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.

What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
So we have like:

const VecDesc VecIntrinsics[] = {
  {"llvm.phx.abs.i32", "", 4}
};

TLI.addVectorizableFunctions(VecIntrinsics);

where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.

I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.

Then if I run -loop-vectorize on the following input

define i32 @f() {
  br label %bb1

bb1:                                              ; preds = %bb1, %0
  %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
  %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
  %call = tail call i32 @llvm.phx.abs.i32(i32 0)
  %sum_next = add i32 %sum, %call
  %i_inc = add nuw nsw i16 %i, 1
  %exit = icmp eq i16 %i_inc, 100
  br i1 %exit, label %bb3, label %bb1

bb3:                                              ; preds = %bb1
  ret i32 %sum_next
}

declare i32 @llvm.phx.abs.i32(i32)

I used to get a vectorized loop like

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
  %offset.idx = trunc i32 %index to i16
  %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
  %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
  %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
  %1 = add i16 %offset.idx, 0
  %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %6 = insertelement <4 x i32> undef, i32 %2, i32 0
  %7 = insertelement <4 x i32> %6, i32 %3, i32 1
  %8 = insertelement <4 x i32> %7, i32 %4, i32 2
  %9 = insertelement <4 x i32> %8, i32 %5, i32 3
  %10 = add <4 x i32> %vec.phi, %9
  %index.next = add i32 %index, 4
  %11 = icmp eq i32 %index.next, 100
  br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0

but with this patch LoopVectorizationLegality bails out with

LV: Not vectorizing: Found a non-intrinsic callsite   %call = tail call i32 @llvm.phx.abs.i32(i32 0)

Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:

@@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
       if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
           !isa<DbgInfoIntrinsic>(CI) &&
           !(CI->getCalledFunction() && TLI &&
-            !VFDatabase::getMappings(*CI).empty())) {
+            (!VFDatabase::getMappings(*CI).empty() ||
+             // Hack: Allow vectorization even if we didn't provide
+             // a vector version of the intrinsic.
+             (CI->getParent()->getModule()->isTargetPhoenix() &&
+              TLI->isFunctionVectorizable(CI->getCalledFunction()
+                                          ->getName()))))) {
uabelho added a subscriber: bjope.Dec 13 2019, 12:14 AM

Ah, OK. I see what you mean. I think we could solve this by scalarizing after vectorization, instead of doing it in the vectorizer.

We could have a IR pass that runs after the vectorizer that looks for intrinsics calls .When it sees an intrinsics that operates on vector, it checks whether the target is able to lower it to some function or instruction. If not, it scalarizes it. With this we wouldn't have to introduce special behavior in the vectorizer for handling intrinsics: it could just vectorize any call to intrinsics for which vectorization make sense.

That would mean we would have to introduce vector versions of of all intrinsics, that would just be used between the vectorizer and the scalarizer, right? Sounds a little bit cumbersome since those vector versions don't exist today, but perhaps it's the best way anyway, I don't know.

Btw, in case you didn't know, there is already an existing scalarizer pass, though I don't think it's widely used.

fpetrogalli added a comment.EditedDec 13 2019, 1:37 PM

@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.

What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
So we have like:

const VecDesc VecIntrinsics[] = {
  {"llvm.phx.abs.i32", "", 4}
};

TLI.addVectorizableFunctions(VecIntrinsics);

where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.

I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.

I cannot find anything like that in the current TargetLibraryInfo.cpp.

Then if I run -loop-vectorize on the following input

define i32 @f() {
  br label %bb1

bb1:                                              ; preds = %bb1, %0
  %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
  %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
  %call = tail call i32 @llvm.phx.abs.i32(i32 0)
  %sum_next = add i32 %sum, %call
  %i_inc = add nuw nsw i16 %i, 1
  %exit = icmp eq i16 %i_inc, 100
  br i1 %exit, label %bb3, label %bb1

bb3:                                              ; preds = %bb1
  ret i32 %sum_next
}

declare i32 @llvm.phx.abs.i32(i32)

I used to get a vectorized loop like

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
  %offset.idx = trunc i32 %index to i16
  %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
  %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
  %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
  %1 = add i16 %offset.idx, 0
  %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %6 = insertelement <4 x i32> undef, i32 %2, i32 0
  %7 = insertelement <4 x i32> %6, i32 %3, i32 1
  %8 = insertelement <4 x i32> %7, i32 %4, i32 2
  %9 = insertelement <4 x i32> %8, i32 %5, i32 3
  %10 = add <4 x i32> %vec.phi, %9
  %index.next = add i32 %index, 4
  %11 = icmp eq i32 %index.next, 100
  br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0

but with this patch LoopVectorizationLegality bails out with

LV: Not vectorizing: Found a non-intrinsic callsite   %call = tail call i32 @llvm.phx.abs.i32(i32 0)

Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:

@@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
       if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
           !isa<DbgInfoIntrinsic>(CI) &&
           !(CI->getCalledFunction() && TLI &&
-            !VFDatabase::getMappings(*CI).empty())) {
+            (!VFDatabase::getMappings(*CI).empty() ||
+             // Hack: Allow vectorization even if we didn't provide
+             // a vector version of the intrinsic.
+             (CI->getParent()->getModule()->isTargetPhoenix() &&
+              TLI->isFunctionVectorizable(CI->getCalledFunction()
+                                          ->getName()))))) {

Hi @uabelho and @fhahn ,

I have reverted the change to avoid disruption in your work.

@uabelho, the example you posted here is very useful, I will send you a modified version of the code for review, so that you can verify it works for you.

Kind regards,

Francesco

@uabelho ,

I am working on a solution for the problem you reported.

So far, I have decided to use a special name in the mappings scalar-to-vector that informs the compiler that the function that is being vectorized by the vectorizer should be scalarized after vectorization.
That would require modifying the mappings from

const VecDesc VecIntrinsics[] = {
  {"llvm.phx.abs.i32", "", 4}
};

to the following form:

const VecDesc VecIntrinsics[] = {
  {"llvm.phx.abs.i32", "__LLVM_scalarize__", 4}
};

The method bool VFDatabase::shouldFunctionScalarize(VFShape Shape) would be used to test such situation.

Would that work for you?

Kind regards,

Francesco

Would that work for you?

That sounds ok to me.

(Note that I don't really know this code, so I don't know if this is acceptable from a general design point of view. But for my target, changing the mappings in that way is ok.)

Thanks!