This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Fix cost for calls to functions that have vector versions
ClosedPublic

Authored by nemanjai on Feb 20 2020, 7:25 PM.

Details

Summary

A recent commit (https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed the cost for calls to functions that have a vector version for some vectorization factor. However, no check is performed for whether the vectorization factor matches the current one being cost modeled. This leads to attempts to widen call instructions to a vectorization factor for which such a function does not exist, which in turn leads to an assertion failure.

This patch adds the check for vectorization factor (i.e. not just that the called function has a vector version for some VF, but that it has a vector version for this VF).

Diff Detail

Event Timeline

nemanjai created this revision.Feb 20 2020, 7:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2020, 7:25 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Hi @nemanjai , thank you for pointing this out! I didn't realize that my code was creating this problem.

Kind regards,

Francesco

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3280–3283

I think that what you have implemented here is doing something very similar to what is already implemented the API provided by the VFShape class and the VFDatabase.

You could first build the VFShape you expect to have for VF and CI by invoking the get method at https://github.com/llvm/llvm-project/blob/f37e899fd73d1a8958246d761eeb306a8846e81a/llvm/include/llvm/Analysis/VectorUtils.h#L103:

VFShape Shape = VFShape::get(*CI, {VF, false} /* EC*/, false /*HasGlobalPred */);

This will create the "shape" that the vector version of the vector call associated with CI will have when trying to vectorize with a vectorization factor of VF.

Then, you can check whether that specific shape for VF is available by querying the VFDatabase with it (see https://github.com/llvm/llvm-project/blob/f37e899fd73d1a8958246d761eeb306a8846e81a/llvm/include/llvm/Analysis/VectorUtils.h#L242):

VFDatabase(*CI).getVectorizedFunction(Shape);

This call will return a nullptr if there is no vector function that realizes vectorization with VF lanes.

3286

VectorMappings.empty() || NoMappingForVF): something is redundant here. Surely if VectorMappings.empty() there is NoMappingForVF. At the same time, if there is NoMappingForVF we should not attempt to vectorize? Or am I missing something?

llvm/test/Transforms/LoopVectorize/widened-massv-call.ll
3

The problem you have pointed out is not related to the use of any specific -vector-library, as the VFDatabase class is build around the vector-function-abi-variant attribute in IR (see https://llvm.org/docs/LangRef.html#call-site-attributes).

I am happy for you to keep this MASSV-specific test, but I think you should also add a test in which the problem is raised directly by a call that uses the vector-function-abi-variant attribute.

Also, I think that you should also add a test in which you use the attribute, and vectorization doesn't happen because there are some mappings in the attribute, but there is no mapping for the VF being checked.

nemanjai marked 2 inline comments as done.Feb 21 2020, 10:09 AM
nemanjai added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3280–3283

Ah, cool. I didn't really look at the interface very much. Thanks for the suggestion.

3286

Right, the cost we return in this block includes the scalarization cost, so if NoMappingForVF is true, we won't try to vectorize. I suppose we could return some extremely high cost in this case that is guaranteed not to allow vectorization.

I just implemented it this way because this just gets us back to the behaviour before the commit.

nemanjai updated this revision to Diff 245931.Feb 21 2020, 11:23 AM

Updated to use the info from VFDatabase. Thanks @fpetrogalli!

Hi @nemanjai ,

thank you for updating the code.

Sorry for being picky, I think you should add another test. Your code need to work also for the vector-function-avi-variant attribute and not just for the -vector-library= option.

Kind regards,

Francesco

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3281

You should not use auto when the type is not obvious: http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

Please s/auto/Function/.

3283

Nit: I just noticed that also the !TLI condition can probably be removed?

Hi @nemanjai ,

thank you for updating the code.

Sorry for being picky, I think you should add another test. Your code need to work also for the vector-function-avi-variant attribute and not just for the -vector-library= option.

Kind regards,

Francesco

Will do. Note also that this will be the only such test case as far as I can tell, so we might want to add some more coverage there as well independent of this patch.

nemanjai marked an inline comment as done.Feb 24 2020, 7:52 AM
nemanjai added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3283

Can you please justify? I think the idea is that if we didn't have a TLI object, we would end up with nullptr from getVectorizedFunction() but want to make sure.

nemanjai updated this revision to Diff 246217.Feb 24 2020, 7:56 AM

Removed egregious use of auto and added a vfabi atrribute test case.

Hi @nemanjai ,

thank you for updating the code.

Sorry for being picky, I think you should add another test. Your code need to work also for the vector-function-avi-variant attribute and not just for the -vector-library= option.

Kind regards,

Francesco

Will do. Note also that this will be the only such test case as far as I can tell, so we might want to add some more coverage there as well independent of this patch.

Yes, you are right. I'll prepare a patch that does vectorization without using the pre-built TLI infos, just the attribute in the IR.

Thank you for your work!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3283

Please correct me if I am wrong, but it seems to me that the situation !VecFunc when VecFunc == nullptr is not checked by the tests you have in this patch.

I believe that such test can be done by using the same IR as in llvm/test/Transforms/LoopVectorize/widened-massv-vfabi-attr.ll, with the attribute string being replaced by a 4 lane mangled name _ZGV_LLVM_N4v_llvm.sin.f64(whatever).

Could you please add such test?

Regarding the following:

Can you please justify? I think the idea is that if we didn't have a TLI object, we would end up with nullptr from getVectorizedFunction() but want to make sure.

The TLI is not needed anymore in the LoopVectorizer to check for vectorized function, because all infos needed are carried by the VFDatabase. The TLI is a requirement of the InjectTLIMappings pass, which populate the IR with the vector-function-abi-variant attribute that are constructed out of the ones stored in the TLI. By explaining this, I just realized that removing the !TLI from inside the vectorizer might require a separate patch, so I am happy for you to keep it in here. Sorry for the noise.

I am not sure I am actually following what it is you are requesting in terms of testing. Perhaps I need to describe the problems I am solving more thoroughly.

  1. The actual failure only happens with MASSV as far as I can tell. I've actually tried very hard to find a situation in which it can happen with other vector libraries, to no avail. I'll explain why below.
  2. I have added a test case that runs the entire pipeline with the -vector-library option as well as a test case that omits the pass that adds the attributes - i.e. a test for which the vectorizer uses the provided attributes.
  3. The problem is precisely a missing vector function for a vectorization factor of 4, so I am not sure how it would help if I provided an attribute for a fictional function that covers such a vectorization factor.

So why is this uniquely a PPC/MASSV problem?
For the SVML library, no attempt will ever be made to select a vectorization factor wider than 8 (since the largest vector register the X86 back end specifies is 256). Since all the functions for SVML have a VF=8 variant, the problem will never happen. We can artificially force the problem to happen with SVML if we use the same test case and --force-vector-width=16. But I don't think that is representative of real-world code.
For the Accelerate library, there are only entries for the f32 type so we will never attempt to vectorize to anything wider than a factor of 4. Of course, we can artificially force the assert with the same option.
I am not opposed to adding two test cases I just described, but I think they will be artificial and superfluous since the problem is the same regardless of the vector library used.

fpetrogalli accepted this revision.Feb 25 2020, 7:57 AM

Hello @nemanjai , thank you for the exhaustive explanation.

This LGTM now!

Francesco

This revision is now accepted and ready to land.Feb 25 2020, 7:57 AM
This revision was automatically updated to reflect the committed changes.