This patch introduced the VFDatabase, the framework proposed in http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*] In this patch the VFDatabase is used to bridge the TargetLibraryInfo (TLI) calls that were previously used to query for the availability of vector counterparts of scalar functions. The VFISAKind field `ISA` of VFShape have been moved into into VFInfo, under the assumption that different vector ISAs may provide the same vector signature. At the moment, the vectorizer accepts any of the available ISAs as long as the signature provided by the VFDatabase matches the one expected in the vectorization process. For example, when targeting AVX or AVX2, which both have 256-bit registers, the IR signature of the two vector functions associated to the two ISAs is the same. The `getVectorizedFunction` method at the moment returns the first available match. We will need to add more heuristics to the search system to decide which of the available version (TLI, AVX, AVX2, ...) the system should prefer, when multiple versions with the same VFShape are present. Some of the code in this patch is based on the work done by Sumedh Arani in https://reviews.llvm.org/D66025. [*] Notice that in the proposal the VFDatabase was called SVFS. The name VFDatabase is more in line with LLVM recommendations for naming classes and variables. Differential Revision: https://reviews.llvm.org/D67572
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is a follow up patch to https://reviews.llvm.org/D66024
It is a WIP becase I want to seek feedback before doing a full implementation with tests.
The current tests/unittests pass without failures, so at least it seems I haven't broken anything :)
I am in particular interested to know what you think about the way I have deferred the question "isFunctionVectorizable" and related "get" method from the TargetLibraryInfo (TLI). I have marched such methods as private, so that with this patch the only way to ask such questions around vectorizable function is to go through the SearchVFSystem class.
Notice that this patch doesn't implement the interface explained in the RFC at http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html (yet).
One more thing: in the RFC we decided to pass the CallInst to the methods of the class instead of the Function definition, because we wanted to make the attribute vector-function-abi-variant to be local to the call site, not the declaration. Unfortunately, I couldn't find a way to attach an attribute to a CallInst. Is this the case? Or have I missed somethign around Instructions and attributes?
Kind regards,
Francesco
I'm fine with the general idea. The non-WIP patch needs documentation and a test for the new functionality (non-TLI-based selection).
Could you add it to the attribute list of the call site at AttributeList::FunctionIndex?
Thank you for the suggestion. I have looked into this but it seems that adding an attribute to a CallInst in the AttributeList::FunctionIndex has the effect of juts adding the attribute to the function being called.
The attribute lists are different. You might have added it to the function one but you can add it to the CallBase one as well.
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
57 | I have added this to make it work with the TLI. An alternative would be to add an extra field to the VecDesc instances stored in the TLI to hold also the ISA information. I am happy to do so (or to investigate any other alternative approach) is the solution proposed in this patch is not convincing. |
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
246 | The VFISAKind::LLVM_INTERNAL_TLI value, at the moment, is restricting the use of the SVFS to only those functions that are listed in the TLI. This makes me wonder whether it would be better to move the VFISAKind from the VFShape to the VFInfo. This could be justified under the assumption that a scalar function can be mapped to multiple vector functions with the same vectorization shape (same FunctionType), just with a different underlying ISA. Any thoughts? |
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
221–223 | This method fixUpVFABINames could be invoked once directly in the LoopVectorizer (runOnFunction), so that:
This seems a better solution to me - any opinions? |
llvm/include/llvm/Analysis/TargetLibraryInfo.h | ||
---|---|---|
201 | Documentation missing, and please do not call it "fixup", maybe "adjust" or something but not "fixup". | |
262 | Why do you privatize these redirects? I would expect them to be used, thus public, or unused, thus deleted. | |
llvm/include/llvm/Analysis/VectorUtils.h | ||
55 | I generally dislike non-scoped macros. Is there a reason a static constexpr char* doesn't work? | |
132–156 | I doubt this change is necessary. Only if you need to own the underlying data you need to replace StringRef with std::string. | |
164 | I'd prefer void setVectorVariantNames(CallInst *CI, const SmallSet<std::string, 8> &VariantMappings); |
llvm/include/llvm/Analysis/TargetLibraryInfo.h | ||
---|---|---|
262 | I am privatizing these because isFunctionVectorizable and getVectorizedFunction to be asked to any component other than the SearchVFSystem introduced in this patch. |
After the last rework, I think this patch deserves a bit of a summary.
The patch as it is should be split in at least the following components, that needs to be done one by one in sequence:
- Methods in the VFABI namespace to read and set the vector-function-abi-variant attribute. In the current patch, this is done with the methods [get|set]VectorVariantNames.
- An early pass to be run early in the opt pass sequence, at least before the LoopVectorize pass and any of the analysis passes needed for vectorization, that populates the module with the attribute that describes the TLI mappings. This is currently _hacked_ in this patch by invoking addMappingsFromTLI in LoopVectorizer.cpp.
- Once we have the TLI mappings in the IR, we can add the SearchVFSystem and use it's search mechanism to perform vectorization (an example of how this works is already in this patch)
Now, once we have this first item in places, we can extend the getVectorizedFunction interface to be more clever, and ask questions also areound properties of the vector function like "linear" and "uniform" parameters.
I would like to ask to the reviewers if this implementation sequence I propose makes sense. Please le t me know if you have any question.
Also, I have added a couple of people that I know have opinion on the vectorizers. I am sure there are more out there, but the list of contributors to all the places in LLVM that this patch touches is quite extended and it is difficult for me to make a choice.
I have implemented the first item of the breakdown described in https://reviews.llvm.org/D67572#1736230 in patch https://reviews.llvm.org/D69976 - read/write methods for the VFABI attribute.
Thanks all,
Francesco
This is an update of the code after extracting the read/write function of the VFABI attribute in https://reviews.llvm.org/D69976.
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
218 | Arguably, you could just call getVFMappings and cache the result. Whoever created a SearchVFSystem will need the values eventually and this way the isFucntionVectorizable call is free. CI->getModule() The explicit and delete seems a lot of overhead, make the member a CallInst &CI and you should not need any of it. | |
260 | Typos in the above TODO. | |
llvm/lib/Analysis/LoopAccessAnalysis.cpp | ||
2428–2429 | Leftover? | |
llvm/lib/Analysis/TargetLibraryInfo.cpp | ||
22 ↗ | (On Diff #228337) | Leftover? |
Changes:
- Update the patch to use the current version of the VFABI read/write methods at https://reviews.llvm.org/D69976
- Address review from @jdoerfert
Note: in the next step I will extract from this patch the changes that are needed for the internal ISA for the TLI functions.
I have rebased the changes after the addition of the pass for injecting the TLI calls.
The vectorization process itself works, but I don't undertstand why a
bunch of tests are failing - I must have done somethign wrong in the
initialization of the pass.
Most (if not all) failures assert on the following:
Unable to schedule 'Scalar Evolution Analysis' required by 'Loop Vectorization' Unable to schedule pass UNREACHABLE executed at /home/frapet01/projects/upstream-clang/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1289!
Down to two failures.
Failing Tests (2):
LLVM :: Other/pass-pipelines.ll LLVM :: Transforms/SimplifyCFG/HoistCode.ll
The latter probably caused by a recent rebase.
The former is a genuine failure of this patch.
In this patch:
- The SearchVFSystem has been renamed to VFDatabase.
- The query interface of the VFDatabase has been reduced to only two functions.
- The field VFIsaKind ISA of VFShape has been moved to VFInfo. This change is justified in the new commit message.
Thanks @fpetrogalli for updating this patch!
Down to two failures.
Failing Tests (2):
Just checking, are these failures resolved in the latest revision?
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
119 | nit: retrive -> retrieve | |
124 | I don't really understand what you mean by a "flat" vectorization shape. Is this function supposed to return a 'widened' (vector version of a) function? If so, can we please rename this to getVectorShapeForCall ? | |
127 | nit: unnecessary curly braces | |
130 | is HasGlobalPred the same as isPredicated? Also, is the predicate always known/expected to be the last parameter by the Vector ABI? | |
167 | nit: in a class members are private by default. | |
168 | nit: put comments above variable | |
177 | StringRef ? | |
186 | nit: These two if-statements can be merged with a && | |
188 | should CI.getModule()->getFunction(Shape.getValue().VectorName) be an assert? When would it ever happen that the vector function is not declared * in the IR? *note that I'm specifically saying declared and not defined here. | |
215 | Is it worth implementing a operator< for VFShape and sorting the result for getMappings()? | |
llvm/lib/Analysis/LoopAccessAnalysis.cpp | ||
1847 | getMappings is quite an expensive call, so you'll want to add a special function here that bails out earlier, and doesn't have to demangle string and populate (and possibly sort) an array. | |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
3214 | the call to isFunctionVectorizable is expensive, so please reorder this after CI->isNoBuiltin(), so that the function can bail out more cheaply. |
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
124 | Yeah, the name is misleading. This is supposed to return a widenened version of the function... but I don't like the name getVectorShapeForCall, because a Shape for a Vector Call can have Linear modifiers for example, while this function is returning only the shape that uses VFParamKind::Vector for all VFPArameters. How about getAllVectorsShape? |
Yes, all good:
[1440/1440] Running the LLVM regression tests Testing Time: 269.21s Expected Passes : 33842 Expected Failures : 149 Unsupported Tests : 451
Address review comments from @sdesmalen. Thank you!
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
124 | I explained the meaning of the function in the comment. | |
124 | I have renamed to VFShape::getAllVectorsParams. | |
130 |
Th GlobalPredicate is listed in the VFParamKind enum class: GlobalPredicate, // Global logical predicate that acts on all lanes // of the input and output mask concurrently. For // example, it is implied by the `M` token in the // Vector Function ABI mangled name. It is a special predicate because it differ from the parameter masks that can be individually attached to the parameters. These kind of masks are not handled yet by the VFParamKind, but I understand that they were needed by @simoll when discussing the RFC.
For all the Vector Function ABIs supported at the moment in LLVM, yes. If vendor X decides to produce an ABI were the mask is the first parameter, we will have to change this code. But for now, we will assume the global predicate is the last parameter. | |
188 |
Yes, done. In fact, and IR where there is a VFABI attribute with a mapping to function X, but doesn't have X declared, shoudl be considered broken.
You are reaching enlightenment! :) | |
215 | sort them by what field? VF? Also, I don't expect an attribute to hold more than 8 functions , which seems to be the worst case scenario when all X86 vector extensions are being used... are you sure you want to add such optimization. How about we leave it as it is (slow but simple) and leave optimizations for the future if it turns out we need to speed up the search? |
I have optimized the getVFABIMappings method of VFDatabase with an
early exit if the VFABI attribute is empty.
llvm/lib/Analysis/LoopAccessAnalysis.cpp | ||
---|---|---|
1847 | Done, check the implementation of getVFABIMappings. |
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
124 | I have extracted the VFShape API here: https://reviews.llvm.org/D70513 (it is a work in progress as of now, I might finish up things later today). |
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
124 | I think this can take FunctionType instead of CallInst.
nit: I know this is bikeshedding, but what do you think of VFShape::widenAllParams? | |
125 | these parameters shouldn't use const. | |
125 | Is there a reason to pass VF and IsScalable separately, instead of passing an ElementCount ? | |
130 |
I'd expect the operation to be predicated, not the individual operands. What would be the meaning of a predicated operand?
Can we add an assert somewhere to enforce the assumption that the global predicate is passed as the last operand? | |
195 | Please add a message to the assert. | |
203 | this method doesn't do anything more than invoke getVFABIMappings so has no value (other than being public). | |
215 | Alright, if it only has a few elements and is constructed to do a single lookup it is probably not worth the overhead of sorting. | |
llvm/lib/Analysis/LoopAccessAnalysis.cpp | ||
1847 | The early exit in getMappings doesn't stop a SmallVector from having to be created/destroyed. It would be better to create a new method such as bool hasVectorVariants() that answers the question directly. |
llvm/include/llvm/Analysis/VectorUtils.h | ||
---|---|---|
124 |
Yes, but... what's the point? We will have to introduce one or two extra calls to get the FunctionType... All we need is the number of arguments of the function, so in fact the function could just take an unsigned integer. But I don't like it. The fact that I use CallInst is cleaner and not worse than having an optimized interface. But this is my personal preference, so if you ask nicely :P I will change the interface.
I know where you come from, the Vectorizer :). I'd rather not, I really want to have a static get method attached to the VFShape. It seems to be in the style of LLVM to use get for static public member functions that build objects (see http://llvm.org/doxygen/classllvm_1_1VectorType.html). Since this is the only get method of VFShape, I have renamed it to get. For the record, the changes have been applied to https://reviews.llvm.org/D70513 This function this disappear from this revision. | |
124 | Please take any other further review of getAllVectorParams to https://reviews.llvm.org/D70513 | |
125 |
I disagree, especially for the CI argument. The method is not supposed to change the reference. I have anyway removed the const from the other parameters. (again, in https://reviews.llvm.org/D70513, not here, but I will update this patch) | |
130 |
This was a requirement from @simoll, he needs to have masking associated to each of the operands.
Is is done in https://reviews.llvm.org/D70513 | |
203 |
Yeah, but the value is int he comment inside of it, stating that other VFShapes can be build outside of a VFABI context. I'd prefer to keep it. | |
llvm/lib/Analysis/LoopAccessAnalysis.cpp | ||
1847 | I understand your concerns in performance here, but I am not keen in doing this. The attribute might contain junk that don't demangle correctly to a VFInfo - in that case the function would return "yes, this call has vector variant", but that wouldn't make sense because there would be no variant. I'd rather play it safe. |
I have addressed the comments (some of the changes relative to the VFShape API are reflected in https://reviews.llvm.org/D70513).
I have also rebased the code on top of https://reviews.llvm.org/D70513.
Thanks Francesco, looks ok to me, but I'll leave to @jdoerfert or @sdesmalen to approve.
Hi @fpetrogalli ,
A question regarding this patch.
For my out-of-tree target vectorization of intrinsics added for my target seems to have stopped working with this patch.
Is there something/what do I have to do to make the vectorizer understand my intrinsics are vectorizable?
Looking at this code in LoopVectorizationLegality.cpp:
// We handle calls that: // * Are debug info intrinsics. // * Have a mapping to an IR intrinsic. // * Have a vector version available. auto *CI = dyn_cast<CallInst>(&I); if (CI && !getVectorIntrinsicIDForCall(CI, TLI) && !isa<DbgInfoIntrinsic>(CI) && !(CI->getCalledFunction() && TLI && !VFDatabase::getMappings(*CI).empty())) {
VFDatabase::getMappings(*CI).empty() is indeed true for my intrisic, and if I dig further, I take this return in
static void getVFABIMappings(const CallInst &CI, SmallVectorImpl<VFInfo> &Mappings) { const StringRef ScalarName = CI.getCalledFunction()->getName(); const StringRef S = CI.getAttribute(AttributeList::FunctionIndex, VFABI::MappingsAttrName) .getValueAsString(); if (S.empty()) return;
Is there some existing commit where in-tree targets have been modified already to work with the new VFDatabase?
Thanks!
Hi @uabelho
Ops... sorry!
Is there something/what do I have to do to make the vectorizer understand my intrinsics are vectorizable?
Yes, you need to add an attribute in the IR that maps the (scalar) attribute to its vector counterpart.
The attribute is called vector-function-abi-variant.
You can add it by using the following method in llvm/include/llvm/Transforms/Utils/ModuleUtils.h:
namespace VFABI { /// Overwrite the Vector Function ABI variants attribute with the names provide /// in \p VariantMappings. void setVectorVariantNames(CallInst *CI, const SmallVector<std::string, 8> &VariantMappings); } // End VFABI namespace
The VariantMappins are strings that need to be generated according to some Vector Function ABI (VFABI). If your target doesn't have such ABI, you can use the LLVM internal mangling.
For example, say that your attribute is double @llvm.funky.intrinsic (double), and you need to map it to an unmasked vector function with a vectorization factor of two, say custom_vector_function, the string that you need to add in the attribute is __ZGV_LLVM_N2v_llvm.funky.intrinsic(custom_vector_function).
The name mangling rules are admittedly not well documented for the internal mangling, but other than for the ISA token (which is _LLVM_), they correspond to the ones of x86 and AArch64, which are the same (you can browse the latter here: https://github.com/ARM-software/software-standards/blob/master/abi/vfabia64/vfabia64.rst#vector-function-name-mangling)
I will definitely add some docs to explain more in detail the mangling rules, but for the moment you can look at the tests in llvm/unittests/Analysis/VectorFunctionABITest.cpp to get a sense of the meaning of the different tokens in the mangled name, especially the use of the <parameters> in _ZGV<isa><mask><vlen><parameters>_<scalarname>[(<redirection>)].
Looking at this code in LoopVectorizationLegality.cpp:
// We handle calls that: // * Are debug info intrinsics. // * Have a mapping to an IR intrinsic. // * Have a vector version available. auto *CI = dyn_cast<CallInst>(&I); if (CI && !getVectorIntrinsicIDForCall(CI, TLI) && !isa<DbgInfoIntrinsic>(CI) && !(CI->getCalledFunction() && TLI && !VFDatabase::getMappings(*CI).empty())) {VFDatabase::getMappings(*CI).empty() is indeed true for my intrisic, and if I dig further, I take this return in
static void getVFABIMappings(const CallInst &CI, SmallVectorImpl<VFInfo> &Mappings) { const StringRef ScalarName = CI.getCalledFunction()->getName(); const StringRef S = CI.getAttribute(AttributeList::FunctionIndex, VFABI::MappingsAttrName) .getValueAsString(); if (S.empty()) return;Is there some existing commit where in-tree targets have been modified already to work with the new VFDatabase?
Unless I have been missing something, all targets in the in-tree version are using VFDatabase now. The patch in this revision is what introduced the change.
Thanks!
I hope you find this useful. Let me know if you need more help with this, I am generally available on IRC and discord too.
Francesco
Hi,
Thanks for the reply!
Ok, I think I understand what is happening now at least.
We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.
This meant that before this patch
LoopVectorizationLegality::canVectorizeInstrs()
accepted to vectorize the loop since
TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())
returned true for the intrinsic.
Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since
TLI->isFunctionVectorizable(FnName, VF)
returned false.
So the loop was vectorized, but we got VF calls to the scalar version of the intrinsic, just as we wanted.
However, with this patch, the check in
LoopVectorizationLegality::canVectorizeInstrs()
now says false, since we do
VFDatabase::getMappings(*CI).empty()
and we indeed get empty mappings since we don't provide any vector version.
So the presence of an intrinsic that we don't provide a vector version for prevents vectorization of the entire loop, even if it would we totally ok to do VF calls to the scalar version instead.
Is this change in behavior intended?
This change in behavior sounds a bit worrying for down-stream targets.
I think we should have at least an assertion making sure the check in LoopVEctorizationLegality succeeds in all cases it did previously with isFunctionVectorizable, otherwise down-stream targets will silently miss out on vectorization. But I think ideally the patch would preserve the existing behavior in the cases @uabelho described.
Yes that's exactly what happened to us. Or well, "silently", since our benchmark numbers went crazy after the merge :)
But I think ideally the patch would preserve the existing behavior in the cases @uabelho described.
Sounds good!
Ah, OK. I see what you mean. I think we could solve this by scalarizing after vectorization, instead of doing it in the vectorizer.
We could have a IR pass that runs after the vectorizer that looks for intrinsics calls .When it sees an intrinsics that operates on vector, it checks whether the target is able to lower it to some function or instruction. If not, it scalarizes it. With this we wouldn't have to introduce special behavior in the vectorizer for handling intrinsics: it could just vectorize any call to intrinsics for which vectorization make sense.
We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.
You can still use the name mangling of the VFABI attribute (for internal LLVM mangling) to map a scalar attribute to its vector version:
_ZGV_LLVM_N2v_llvm.custom.attribute(llvm.custom.attribute)
With this, the VFDatabase::getMappings(*CI).empty() woudl return false, so that the intrinsic would be vectorized as a regular function. Then, you could use the post-vectorization pass I mentioned in the previous message to scalarize it.
The mappings in the IR could be added by the frontend, or in a pre-vectorization pass if you don't want to touch the frontend.
It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.
I am happy to do so. Shall I just revert it in git and push the change, or is there a formal way to do it via phabricator or arc?
@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?
Thank you,
Francesco
@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?
I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.
What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
So we have like:
const VecDesc VecIntrinsics[] = { {"llvm.phx.abs.i32", "", 4} }; TLI.addVectorizableFunctions(VecIntrinsics);
where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.
I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.
Then if I run -loop-vectorize on the following input
define i32 @f() { br label %bb1 bb1: ; preds = %bb1, %0 %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ] %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ] %call = tail call i32 @llvm.phx.abs.i32(i32 0) %sum_next = add i32 %sum, %call %i_inc = add nuw nsw i16 %i, 1 %exit = icmp eq i16 %i_inc, 100 br i1 %exit, label %bb3, label %bb1 bb3: ; preds = %bb1 ret i32 %sum_next } declare i32 @llvm.phx.abs.i32(i32)
I used to get a vectorized loop like
vector.body: ; preds = %vector.body, %vector.ph %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ] %offset.idx = trunc i32 %index to i16 %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0 %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3> %1 = add i16 %offset.idx, 0 %2 = tail call i32 @llvm.phx.abs.i32(i32 0) %3 = tail call i32 @llvm.phx.abs.i32(i32 0) %4 = tail call i32 @llvm.phx.abs.i32(i32 0) %5 = tail call i32 @llvm.phx.abs.i32(i32 0) %6 = insertelement <4 x i32> undef, i32 %2, i32 0 %7 = insertelement <4 x i32> %6, i32 %3, i32 1 %8 = insertelement <4 x i32> %7, i32 %4, i32 2 %9 = insertelement <4 x i32> %8, i32 %5, i32 3 %10 = add <4 x i32> %vec.phi, %9 %index.next = add i32 %index, 4 %11 = icmp eq i32 %index.next, 100 br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0
but with this patch LoopVectorizationLegality bails out with
LV: Not vectorizing: Found a non-intrinsic callsite %call = tail call i32 @llvm.phx.abs.i32(i32 0)
Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:
@@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() { if (CI && !getVectorIntrinsicIDForCall(CI, TLI) && !isa<DbgInfoIntrinsic>(CI) && !(CI->getCalledFunction() && TLI && - !VFDatabase::getMappings(*CI).empty())) { + (!VFDatabase::getMappings(*CI).empty() || + // Hack: Allow vectorization even if we didn't provide + // a vector version of the intrinsic. + (CI->getParent()->getModule()->isTargetPhoenix() && + TLI->isFunctionVectorizable(CI->getCalledFunction() + ->getName()))))) {
That would mean we would have to introduce vector versions of of all intrinsics, that would just be used between the vectorizer and the scalarizer, right? Sounds a little bit cumbersome since those vector versions don't exist today, but perhaps it's the best way anyway, I don't know.
Btw, in case you didn't know, there is already an existing scalarizer pass, though I don't think it's widely used.
I cannot find anything like that in the current TargetLibraryInfo.cpp.
Then if I run -loop-vectorize on the following input
define i32 @f() { br label %bb1 bb1: ; preds = %bb1, %0 %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ] %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ] %call = tail call i32 @llvm.phx.abs.i32(i32 0) %sum_next = add i32 %sum, %call %i_inc = add nuw nsw i16 %i, 1 %exit = icmp eq i16 %i_inc, 100 br i1 %exit, label %bb3, label %bb1 bb3: ; preds = %bb1 ret i32 %sum_next } declare i32 @llvm.phx.abs.i32(i32)I used to get a vectorized loop like
vector.body: ; preds = %vector.body, %vector.ph %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ] %offset.idx = trunc i32 %index to i16 %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0 %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3> %1 = add i16 %offset.idx, 0 %2 = tail call i32 @llvm.phx.abs.i32(i32 0) %3 = tail call i32 @llvm.phx.abs.i32(i32 0) %4 = tail call i32 @llvm.phx.abs.i32(i32 0) %5 = tail call i32 @llvm.phx.abs.i32(i32 0) %6 = insertelement <4 x i32> undef, i32 %2, i32 0 %7 = insertelement <4 x i32> %6, i32 %3, i32 1 %8 = insertelement <4 x i32> %7, i32 %4, i32 2 %9 = insertelement <4 x i32> %8, i32 %5, i32 3 %10 = add <4 x i32> %vec.phi, %9 %index.next = add i32 %index, 4 %11 = icmp eq i32 %index.next, 100 br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0but with this patch LoopVectorizationLegality bails out with
LV: Not vectorizing: Found a non-intrinsic callsite %call = tail call i32 @llvm.phx.abs.i32(i32 0)Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:
@@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() { if (CI && !getVectorIntrinsicIDForCall(CI, TLI) && !isa<DbgInfoIntrinsic>(CI) && !(CI->getCalledFunction() && TLI && - !VFDatabase::getMappings(*CI).empty())) { + (!VFDatabase::getMappings(*CI).empty() || + // Hack: Allow vectorization even if we didn't provide + // a vector version of the intrinsic. + (CI->getParent()->getModule()->isTargetPhoenix() && + TLI->isFunctionVectorizable(CI->getCalledFunction() + ->getName()))))) {
I have reverted the change to avoid disruption in your work.
@uabelho, the example you posted here is very useful, I will send you a modified version of the code for review, so that you can verify it works for you.
Kind regards,
Francesco
@uabelho ,
I am working on a solution for the problem you reported.
So far, I have decided to use a special name in the mappings scalar-to-vector that informs the compiler that the function that is being vectorized by the vectorizer should be scalarized after vectorization.
That would require modifying the mappings from
const VecDesc VecIntrinsics[] = { {"llvm.phx.abs.i32", "", 4} };
to the following form:
const VecDesc VecIntrinsics[] = { {"llvm.phx.abs.i32", "__LLVM_scalarize__", 4} };
The method bool VFDatabase::shouldFunctionScalarize(VFShape Shape) would be used to test such situation.
Would that work for you?
Kind regards,
Francesco
That sounds ok to me.
(Note that I don't really know this code, so I don't know if this is acceptable from a general design point of view. But for my target, changing the mappings in that way is ok.)
Thanks!
Documentation missing, and please do not call it "fixup", maybe "adjust" or something but not "fixup".