Page MenuHomePhabricator

[VectorUtils] Introduce the Vector Function Database (VFDatabase).
ClosedPublic

Authored by fpetrogalli on Sep 13 2019, 2:40 PM.

Details

Summary
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]

In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.

The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...)  the system should prefer, when multiple versions with the
same VFShape are present.

Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.

[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.

Differential Revision: https://reviews.llvm.org/D67572

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Changes:

  1. Update the patch to use the current version of the VFABI read/write methods at https://reviews.llvm.org/D69976
  2. Address review from @jdoerfert

Note: in the next step I will extract from this patch the changes that are needed for the internal ISA for the TLI functions.

fpetrogalli marked 2 inline comments as done.Nov 8 2019, 3:12 PM

Internal vector function mangling for TLI added at https://reviews.llvm.org/D70089

I have rebased the changes after the addition of the pass for injecting the TLI calls.

The vectorization process itself works, but I don't undertstand why a
bunch of tests are failing - I must have done somethign wrong in the
initialization of the pass.

Most (if not all) failures assert on the following:

Unable to schedule 'Scalar Evolution Analysis' required by 'Loop Vectorization'
Unable to schedule pass
UNREACHABLE executed at /home/frapet01/projects/upstream-clang/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1289!

Down to two failures.

Failing Tests (2):

LLVM :: Other/pass-pipelines.ll
LLVM :: Transforms/SimplifyCFG/HoistCode.ll

The latter probably caused by a recent rebase.

The former is a genuine failure of this patch.

In this patch:

  1. The SearchVFSystem has been renamed to VFDatabase.
  1. The query interface of the VFDatabase has been reduced to only two functions.
  1. The field VFIsaKind ISA of VFShape has been moved to VFInfo. This change is justified in the new commit message.
fpetrogalli retitled this revision from [SVFS] The Search Vector Function System. to [VectorUtils] Introduce the Vector Function Database (VFDatabase)..Mon, Nov 18, 3:48 PM
fpetrogalli edited the summary of this revision. (Show Details)

Thanks @fpetrogalli for updating this patch!

Down to two failures.
Failing Tests (2):

Just checking, are these failures resolved in the latest revision?

llvm/include/llvm/Analysis/VectorUtils.h
92

nit: retrive -> retrieve

97

I don't really understand what you mean by a "flat" vectorization shape. Is this function supposed to return a 'widened' (vector version of a) function? If so, can we please rename this to getVectorShapeForCall ?

100

nit: unnecessary curly braces

103

is HasGlobalPred the same as isPredicated?

Also, is the predicate always known/expected to be the last parameter by the Vector ABI?

176

nit: in a class members are private by default.

177

nit: put comments above variable

186

StringRef ?

195

nit: These two if-statements can be merged with a &&

197

should CI.getModule()->getFunction(Shape.getValue().VectorName) be an assert? When would it ever happen that the vector function is not declared * in the IR?

*note that I'm specifically saying declared and not defined here.

224

Is it worth implementing a operator< for VFShape and sorting the result for getMappings()?
That way we can use binary search using llvm::lower_bound instead of looping through each shape in ScalarToVectorMappings.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

getMappings is quite an expensive call, so you'll want to add a special function here that bails out earlier, and doesn't have to demangle string and populate (and possibly sort) an array.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3259

the call to isFunctionVectorizable is expensive, so please reorder this after CI->isNoBuiltin(), so that the function can bail out more cheaply.

fpetrogalli added inline comments.Wed, Nov 20, 7:00 AM
llvm/include/llvm/Analysis/VectorUtils.h
97

Yeah, the name is misleading. This is supposed to return a widenened version of the function... but I don't like the name getVectorShapeForCall, because a Shape for a Vector Call can have Linear modifiers for example, while this function is returning only the shape that uses VFParamKind::Vector for all VFPArameters.

How about getAllVectorsShape?

Thanks @fpetrogalli for updating this patch!

Down to two failures.
Failing Tests (2):

Just checking, are these failures resolved in the latest revision?

Yes, all good:

[1440/1440] Running the LLVM regression tests

Testing Time: 269.21s
  Expected Passes    : 33842
  Expected Failures  : 149
  Unsupported Tests  : 451
fpetrogalli marked 15 inline comments as done.

Address review comments from @sdesmalen. Thank you!

llvm/include/llvm/Analysis/VectorUtils.h
97

I explained the meaning of the function in the comment.

97

I have renamed to VFShape::getAllVectorsParams.

103

is HasGlobalPred the same as isPredicated?

Th GlobalPredicate is listed in the VFParamKind enum class:

  GlobalPredicate,   // Global logical predicate that acts on all lanes
                     // of the input and output mask concurrently. For
		     // example, it is implied by the `M` token in the
		     // Vector Function ABI mangled name.

It is a special predicate because it differ from the parameter masks that can be individually attached to the parameters. These kind of masks are not handled yet by the VFParamKind, but I understand that they were needed by @simoll when discussing the RFC.

Also, is the predicate always known/expected to be the last parameter by the Vector ABI?

For all the Vector Function ABIs supported at the moment in LLVM, yes. If vendor X decides to produce an ABI were the mask is the first parameter, we will have to change this code. But for now, we will assume the global predicate is the last parameter.

197

should CI.getModule()->getFunction(Shape.getValue().VectorName) be an assert? When would it ever happen that the vector function is not declared * in the IR?

Yes, done. In fact, and IR where there is a VFABI attribute with a mapping to function X, but doesn't have X declared, shoudl be considered broken.

*note that I'm specifically saying declared and not defined here.

You are reaching enlightenment! :)

224

sort them by what field? VF?

Also, I don't expect an attribute to hold more than 8 functions , which seems to be the worst case scenario when all X86 vector extensions are being used... are you sure you want to add such optimization.

How about we leave it as it is (slow but simple) and leave optimizations for the future if it turns out we need to speed up the search?

I have optimized the getVFABIMappings method of VFDatabase with an
early exit if the VFABI attribute is empty.

fpetrogalli marked 2 inline comments as done.Wed, Nov 20, 9:23 AM
fpetrogalli added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

Done, check the implementation of getVFABIMappings.

fpetrogalli marked an inline comment as done.

Remove commented code...

fpetrogalli marked an inline comment as done.Wed, Nov 20, 2:35 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
97

I have extracted the VFShape API here: https://reviews.llvm.org/D70513 (it is a work in progress as of now, I might finish up things later today).

sdesmalen added inline comments.Thu, Nov 21, 3:15 AM
llvm/include/llvm/Analysis/VectorUtils.h
97

I think this can take FunctionType instead of CallInst.

I have renamed to VFShape::getAllVectorsParams

nit: I know this is bikeshedding, but what do you think of VFShape::widenAllParams?

98

these parameters shouldn't use const.

98

Is there a reason to pass VF and IsScalable separately, instead of passing an ElementCount ?

103

It is a special predicate because it differ from the parameter masks that can be individually attached to the parameters.

I'd expect the operation to be predicated, not the individual operands. What would be the meaning of a predicated operand?

For all the Vector Function ABIs supported at the moment in LLVM, yes. If vendor X decides to produce an ABI were the mask is the first parameter, we will have to change this code.

Can we add an assert somewhere to enforce the assumption that the global predicate is passed as the last operand?

204

Please add a message to the assert.

212

this method doesn't do anything more than invoke getVFABIMappings so has no value (other than being public).

224

Alright, if it only has a few elements and is constructed to do a single lookup it is probably not worth the overhead of sorting.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

The early exit in getMappings doesn't stop a SmallVector from having to be created/destroyed. It would be better to create a new method such as bool hasVectorVariants() that answers the question directly.

fpetrogalli marked 16 inline comments as done.Thu, Nov 21, 1:22 PM
fpetrogalli added inline comments.
llvm/include/llvm/Analysis/VectorUtils.h
97

I think this can take FunctionType instead of CallInst

Yes, but... what's the point? We will have to introduce one or two extra calls to get the FunctionType... All we need is the number of arguments of the function, so in fact the function could just take an unsigned integer. But I don't like it. The fact that I use CallInst is cleaner and not worse than having an optimized interface. But this is my personal preference, so if you ask nicely :P I will change the interface.

nit: I know this is bike shedding, but what do you think of VFShape::widenAllParams?

I know where you come from, the Vectorizer :). I'd rather not, I really want to have a static get method attached to the VFShape. It seems to be in the style of LLVM to use get for static public member functions that build objects (see http://llvm.org/doxygen/classllvm_1_1VectorType.html). Since this is the only get method of VFShape, I have renamed it to get.

For the record, the changes have been applied to https://reviews.llvm.org/D70513

This function this disappear from this revision.

97

Please take any other further review of getAllVectorParams to https://reviews.llvm.org/D70513

98

these parameters shouldn't use const.

I disagree, especially for the CI argument. The method is not supposed to change the reference.

I have anyway removed the const from the other parameters.

(again, in https://reviews.llvm.org/D70513, not here, but I will update this patch)

103

I'd expect the operation to be predicated, not the individual operands. What would be the meaning of a predicated operand?

This was a requirement from @simoll, he needs to have masking associated to each of the operands.

Can we add an assert somewhere to enforce the assumption that the global predicate is passed as the last operand?

Is is done in https://reviews.llvm.org/D70513

212

this method doesn't do anything more than invoke getVFABIMappings so has no value (other than being public).

Yeah, but the value is int he comment inside of it, stating that other VFShapes can be build outside of a VFABI context. I'd prefer to keep it.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1848

I understand your concerns in performance here, but I am not keen in doing this. The attribute might contain junk that don't demangle correctly to a VFInfo - in that case the function would return "yes, this call has vector variant", but that wouldn't make sense because there would be no variant. I'd rather play it safe.

fpetrogalli marked 5 inline comments as done.

I have addressed the comments (some of the changes relative to the VFShape API are reflected in https://reviews.llvm.org/D70513).

I have also rebased the code on top of https://reviews.llvm.org/D70513.

Gentle ping after the merge of https://reviews.llvm.org/D70513.

Thank you,

Francesco

ABataev added inline comments.Wed, Dec 4, 1:18 PM
llvm/include/llvm/Analysis/VectorUtils.h
120–124

Use \\\ style of comment here

171

No need for \brief

197

const auto &

fpetrogalli marked 2 inline comments as done.

Address review comments from @ABataev.

Thank you,

Francesco

fpetrogalli marked an inline comment as done.Wed, Dec 4, 1:57 PM

Thanks Francesco, looks ok to me, but I'll leave to @jdoerfert or @sdesmalen to approve.

This revision is now accepted and ready to land.Tue, Dec 10, 5:39 AM
This revision was automatically updated to reflect the committed changes.

Hi @fpetrogalli ,

A question regarding this patch.
For my out-of-tree target vectorization of intrinsics added for my target seems to have stopped working with this patch.
Is there something/what do I have to do to make the vectorizer understand my intrinsics are vectorizable?

Looking at this code in LoopVectorizationLegality.cpp:

// We handle calls that:
//   * Are debug info intrinsics.
//   * Have a mapping to an IR intrinsic.
//   * Have a vector version available.
auto *CI = dyn_cast<CallInst>(&I);
if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
    !isa<DbgInfoIntrinsic>(CI) &&
    !(CI->getCalledFunction() && TLI &&
      !VFDatabase::getMappings(*CI).empty())) {

VFDatabase::getMappings(*CI).empty() is indeed true for my intrisic, and if I dig further, I take this return in

static void getVFABIMappings(const CallInst &CI,
                             SmallVectorImpl<VFInfo> &Mappings) {
  const StringRef ScalarName = CI.getCalledFunction()->getName();
  const StringRef S =
      CI.getAttribute(AttributeList::FunctionIndex, VFABI::MappingsAttrName)
          .getValueAsString();
  if (S.empty())
    return;

Is there some existing commit where in-tree targets have been modified already to work with the new VFDatabase?

Thanks!

Hi @uabelho

Hi @fpetrogalli ,

A question regarding this patch.
For my out-of-tree target vectorization of intrinsics added for my target seems to have stopped working with this patch.

Ops... sorry!

Is there something/what do I have to do to make the vectorizer understand my intrinsics are vectorizable?

Yes, you need to add an attribute in the IR that maps the (scalar) attribute to its vector counterpart.

The attribute is called vector-function-abi-variant.

You can add it by using the following method in llvm/include/llvm/Transforms/Utils/ModuleUtils.h:

namespace VFABI {
/// Overwrite the Vector Function ABI variants attribute with the names provide
/// in \p VariantMappings.
void setVectorVariantNames(CallInst *CI,
                           const SmallVector<std::string, 8> &VariantMappings);
} // End VFABI namespace

The VariantMappins are strings that need to be generated according to some Vector Function ABI (VFABI). If your target doesn't have such ABI, you can use the LLVM internal mangling.

For example, say that your attribute is double @llvm.funky.intrinsic (double), and you need to map it to an unmasked vector function with a vectorization factor of two, say custom_vector_function, the string that you need to add in the attribute is __ZGV_LLVM_N2v_llvm.funky.intrinsic(custom_vector_function).

The name mangling rules are admittedly not well documented for the internal mangling, but other than for the ISA token (which is _LLVM_), they correspond to the ones of x86 and AArch64, which are the same (you can browse the latter here: https://github.com/ARM-software/software-standards/blob/master/abi/vfabia64/vfabia64.rst#vector-function-name-mangling)

I will definitely add some docs to explain more in detail the mangling rules, but for the moment you can look at the tests in llvm/unittests/Analysis/VectorFunctionABITest.cpp to get a sense of the meaning of the different tokens in the mangled name, especially the use of the <parameters> in _ZGV<isa><mask><vlen><parameters>_<scalarname>[(<redirection>)].

Looking at this code in LoopVectorizationLegality.cpp:

// We handle calls that:
//   * Are debug info intrinsics.
//   * Have a mapping to an IR intrinsic.
//   * Have a vector version available.
auto *CI = dyn_cast<CallInst>(&I);
if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
    !isa<DbgInfoIntrinsic>(CI) &&
    !(CI->getCalledFunction() && TLI &&
      !VFDatabase::getMappings(*CI).empty())) {

VFDatabase::getMappings(*CI).empty() is indeed true for my intrisic, and if I dig further, I take this return in

static void getVFABIMappings(const CallInst &CI,
                             SmallVectorImpl<VFInfo> &Mappings) {
  const StringRef ScalarName = CI.getCalledFunction()->getName();
  const StringRef S =
      CI.getAttribute(AttributeList::FunctionIndex, VFABI::MappingsAttrName)
          .getValueAsString();
  if (S.empty())
    return;

Is there some existing commit where in-tree targets have been modified already to work with the new VFDatabase?

Unless I have been missing something, all targets in the in-tree version are using VFDatabase now. The patch in this revision is what introduced the change.

Thanks!

I hope you find this useful. Let me know if you need more help with this, I am generally available on IRC and discord too.

Francesco

Hi,

Thanks for the reply!

Ok, I think I understand what is happening now at least.

We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.

This meant that before this patch

LoopVectorizationLegality::canVectorizeInstrs()

accepted to vectorize the loop since

TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())

returned true for the intrinsic.

Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since

TLI->isFunctionVectorizable(FnName, VF)

returned false.

So the loop was vectorized, but we got VF calls to the scalar version of the intrinsic, just as we wanted.

However, with this patch, the check in

LoopVectorizationLegality::canVectorizeInstrs()

now says false, since we do

VFDatabase::getMappings(*CI).empty()

and we indeed get empty mappings since we don't provide any vector version.

So the presence of an intrinsic that we don't provide a vector version for prevents vectorization of the entire loop, even if it would we totally ok to do VF calls to the scalar version instead.

Is this change in behavior intended?

fhahn added a comment.Thu, Dec 12, 2:23 AM

Hi,

Thanks for the reply!

Ok, I think I understand what is happening now at least.

We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.

This meant that before this patch

LoopVectorizationLegality::canVectorizeInstrs()

accepted to vectorize the loop since

TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())

returned true for the intrinsic.

Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since

TLI->isFunctionVectorizable(FnName, VF)

returned false.

So the loop was vectorized, but we got VF calls to the scalar version of the intrinsic, just as we wanted.

However, with this patch, the check in

LoopVectorizationLegality::canVectorizeInstrs()

now says false, since we do

VFDatabase::getMappings(*CI).empty()

and we indeed get empty mappings since we don't provide any vector version.

So the presence of an intrinsic that we don't provide a vector version for prevents vectorization of the entire loop, even if it would we totally ok to do VF calls to the scalar version instead.

Is this change in behavior intended?

This change in behavior sounds a bit worrying for down-stream targets.

I think we should have at least an assertion making sure the check in LoopVEctorizationLegality succeeds in all cases it did previously with isFunctionVectorizable, otherwise down-stream targets will silently miss out on vectorization. But I think ideally the patch would preserve the existing behavior in the cases @uabelho described.

[...]

Is this change in behavior intended?

This change in behavior sounds a bit worrying for down-stream targets.

I think we should have at least an assertion making sure the check in LoopVEctorizationLegality succeeds in all cases it did previously with isFunctionVectorizable, otherwise down-stream targets will silently miss out on vectorization.

Yes that's exactly what happened to us. Or well, "silently", since our benchmark numbers went crazy after the merge :)

But I think ideally the patch would preserve the existing behavior in the cases @uabelho described.

Sounds good!

...

This meant that before this patch

LoopVectorizationLegality::canVectorizeInstrs()

accepted to vectorize the loop since

TLI->isFunctionVectorizable(CI->getCalledFunction()->getName())

returned true for the intrinsic.

Then in LoopVectorizationCostModel::getVectorCallCost we decided that the call to the intrinsic should be scalarized, since

TLI->isFunctionVectorizable(FnName, VF)

returned false.

Ah, OK. I see what you mean. I think we could solve this by scalarizing after vectorization, instead of doing it in the vectorizer.

We could have a IR pass that runs after the vectorizer that looks for intrinsics calls .When it sees an intrinsics that operates on vector, it checks whether the target is able to lower it to some function or instruction. If not, it scalarizes it. With this we wouldn't have to introduce special behavior in the vectorizer for handling intrinsics: it could just vectorize any call to intrinsics for which vectorization make sense.

We have a bunch of target intrinsics that we say are vectorizable, but we don't provide a name of the vector version of the intrinsic.

You can still use the name mangling of the VFABI attribute (for internal LLVM mangling) to map a scalar attribute to its vector version:

_ZGV_LLVM_N2v_llvm.custom.attribute(llvm.custom.attribute)

With this, the VFDatabase::getMappings(*CI).empty() woudl return false, so that the intrinsic would be vectorized as a regular function. Then, you could use the post-vectorization pass I mentioned in the previous message to scalarize it.

The mappings in the IR could be added by the frontend, or in a pre-vectorization pass if you don't want to touch the frontend.

fhahn added a comment.Thu, Dec 12, 9:30 AM

It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.

It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.

I am happy to do so. Shall I just revert it in git and push the change, or is there a formal way to do it via phabricator or arc?

@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

Thank you,

Francesco

It sounds like resolving this will require some extra thought. It would probably be good to revert this patch until then.

I am happy to do so. Shall I just revert it in git and push the change, or is there a formal way to do it via phabricator or arc?

Just pushing the revert in git is fine.

@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.

What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
So we have like:

const VecDesc VecIntrinsics[] = {
  {"llvm.phx.abs.i32", "", 4}
};

TLI.addVectorizableFunctions(VecIntrinsics);

where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.

I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.

Then if I run -loop-vectorize on the following input

define i32 @f() {
  br label %bb1

bb1:                                              ; preds = %bb1, %0
  %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
  %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
  %call = tail call i32 @llvm.phx.abs.i32(i32 0)
  %sum_next = add i32 %sum, %call
  %i_inc = add nuw nsw i16 %i, 1
  %exit = icmp eq i16 %i_inc, 100
  br i1 %exit, label %bb3, label %bb1

bb3:                                              ; preds = %bb1
  ret i32 %sum_next
}

declare i32 @llvm.phx.abs.i32(i32)

I used to get a vectorized loop like

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
  %offset.idx = trunc i32 %index to i16
  %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
  %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
  %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
  %1 = add i16 %offset.idx, 0
  %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %6 = insertelement <4 x i32> undef, i32 %2, i32 0
  %7 = insertelement <4 x i32> %6, i32 %3, i32 1
  %8 = insertelement <4 x i32> %7, i32 %4, i32 2
  %9 = insertelement <4 x i32> %8, i32 %5, i32 3
  %10 = add <4 x i32> %vec.phi, %9
  %index.next = add i32 %index, 4
  %11 = icmp eq i32 %index.next, 100
  br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0

but with this patch LoopVectorizationLegality bails out with

LV: Not vectorizing: Found a non-intrinsic callsite   %call = tail call i32 @llvm.phx.abs.i32(i32 0)

Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:

@@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
       if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
           !isa<DbgInfoIntrinsic>(CI) &&
           !(CI->getCalledFunction() && TLI &&
-            !VFDatabase::getMappings(*CI).empty())) {
+            (!VFDatabase::getMappings(*CI).empty() ||
+             // Hack: Allow vectorization even if we didn't provide
+             // a vector version of the intrinsic.
+             (CI->getParent()->getModule()->isTargetPhoenix() &&
+              TLI->isFunctionVectorizable(CI->getCalledFunction()
+                                          ->getName()))))) {
uabelho added a subscriber: bjope.Fri, Dec 13, 12:14 AM

Ah, OK. I see what you mean. I think we could solve this by scalarizing after vectorization, instead of doing it in the vectorizer.

We could have a IR pass that runs after the vectorizer that looks for intrinsics calls .When it sees an intrinsics that operates on vector, it checks whether the target is able to lower it to some function or instruction. If not, it scalarizes it. With this we wouldn't have to introduce special behavior in the vectorizer for handling intrinsics: it could just vectorize any call to intrinsics for which vectorization make sense.

That would mean we would have to introduce vector versions of of all intrinsics, that would just be used between the vectorizer and the scalarizer, right? Sounds a little bit cumbersome since those vector versions don't exist today, but perhaps it's the best way anyway, I don't know.

Btw, in case you didn't know, there is already an existing scalarizer pass, though I don't think it's widely used.

fpetrogalli added a comment.EditedFri, Dec 13, 1:37 PM

@uabelho: would it be possible for you to provide me a minimal reproducer that I could use to craft the (wip) solution I have in mind?

I'm not sure what kind of reproducer you expect since my reproducer requires our out-of-tree target and intrinsics but I can at least try to show something.

What we have in our target is that in initialize() in TargetLibraryInfo.cpp we add our target intrinsics that we allow in vectorized loops.
So we have like:

const VecDesc VecIntrinsics[] = {
  {"llvm.phx.abs.i32", "", 4}
};
 
TLI.addVectorizableFunctions(VecIntrinsics);

where we say that it's ok to vectorize a loop containing a call to the intrinsic llvm.phx.abs.i32, but we don't provide a vector version that should be used when it's vectorized.

I think in-tree targets did like this before, I'm not sure if they do anymore or if that has changed now.

I cannot find anything like that in the current TargetLibraryInfo.cpp.

Then if I run -loop-vectorize on the following input

define i32 @f() {
  br label %bb1

bb1:                                              ; preds = %bb1, %0
  %sum = phi i32 [ 0, %0 ], [ %sum_next, %bb1 ]
  %i = phi i16 [ 0, %0 ], [ %i_inc, %bb1 ]
  %call = tail call i32 @llvm.phx.abs.i32(i32 0)
  %sum_next = add i32 %sum, %call
  %i_inc = add nuw nsw i16 %i, 1
  %exit = icmp eq i16 %i_inc, 100
  br i1 %exit, label %bb3, label %bb1

bb3:                                              ; preds = %bb1
  ret i32 %sum_next
}

declare i32 @llvm.phx.abs.i32(i32)

I used to get a vectorized loop like

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %10, %vector.body ]
  %offset.idx = trunc i32 %index to i16
  %broadcast.splatinsert = insertelement <4 x i16> undef, i16 %offset.idx, i32 0
  %broadcast.splat = shufflevector <4 x i16> %broadcast.splatinsert, <4 x i16> undef, <4 x i32> zeroinitializer
  %induction = add <4 x i16> %broadcast.splat, <i16 0, i16 1, i16 2, i16 3>
  %1 = add i16 %offset.idx, 0
  %2 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %3 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %4 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %5 = tail call i32 @llvm.phx.abs.i32(i32 0)
  %6 = insertelement <4 x i32> undef, i32 %2, i32 0
  %7 = insertelement <4 x i32> %6, i32 %3, i32 1
  %8 = insertelement <4 x i32> %7, i32 %4, i32 2
  %9 = insertelement <4 x i32> %8, i32 %5, i32 3
  %10 = add <4 x i32> %vec.phi, %9
  %index.next = add i32 %index, 4
  %11 = icmp eq i32 %index.next, 100
  br i1 %11, label %middle.block, label %vector.body, !llvm.loop !0

but with this patch LoopVectorizationLegality bails out with

LV: Not vectorizing: Found a non-intrinsic callsite   %call = tail call i32 @llvm.phx.abs.i32(i32 0)

Right now I've done a hacky workaround in LoopVectorizationLegality to get the old behavior for our target so we still get vectorization for the above case:

@@ -704,7 +704,12 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
       if (CI && !getVectorIntrinsicIDForCall(CI, TLI) &&
           !isa<DbgInfoIntrinsic>(CI) &&
           !(CI->getCalledFunction() && TLI &&
-            !VFDatabase::getMappings(*CI).empty())) {
+            (!VFDatabase::getMappings(*CI).empty() ||
+             // Hack: Allow vectorization even if we didn't provide
+             // a vector version of the intrinsic.
+             (CI->getParent()->getModule()->isTargetPhoenix() &&
+              TLI->isFunctionVectorizable(CI->getCalledFunction()
+                                          ->getName()))))) {

Hi @uabelho and @fhahn ,

I have reverted the change to avoid disruption in your work.

@uabelho, the example you posted here is very useful, I will send you a modified version of the code for review, so that you can verify it works for you.

Kind regards,

Francesco