This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Support to vectorize inner loops with VPlan native path enabled
Needs ReviewPublic

Authored by Kazhuu on Feb 24 2021, 4:21 AM.

Details

Summary

When no explicit loop is marked to be vectorized and VPlan native path is
enabled. The innermost loop will be vectorized using the inner loop vectorizer.
Previously this setup caused errors like this to happen:
https://bugs.llvm.org/show_bug.cgi?id=42592.

This patch fixes that and add a test case to test inner loop vectorization
happens when VPlan native path is enabled.

Diff Detail

Event Timeline

Kazhuu created this revision.Feb 24 2021, 4:21 AM
Kazhuu requested review of this revision.Feb 24 2021, 4:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2021, 4:21 AM
fhahn edited reviewers, added: fhahn; removed: Florian.Feb 24 2021, 7:22 AM
fhahn added a subscriber: fhahn.

Thanks for the patch!

llvm/test/Transforms/LoopVectorize/vplan-vectorize-inner-loop.ll
5 ↗(On Diff #326046)

Can you add a reference to the bug report (PR42592)?

62 ↗(On Diff #326046)

I think the outer loop could be simplified and only contain the bare minimum (outer induction variable & checks)

70 ↗(On Diff #326046)

Do we need a reduction here? Might be simpler to just have a simple store instead?

Kazhuu updated this revision to Diff 326324.EditedFeb 25 2021, 1:45 AM

Add bare minimum test case that I was able to use to reproduce the issues and it passes when the patch is applied. Also add one more assert fix that I noticed came up when testing this.

fhahn added inline comments.Feb 25 2021, 2:55 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4448–4450

I think we should probably have a UseVPlanNativePath variable in ILV, which is true if EnableVPlanNativePath && !OrigLoop->isInnerMost() and replace all checks of EnableVPlanNativePath with checking UseVPlanNativePath.

5189

Why is this needed? If we handle inner loops with the regular ILV even in the VPlanNativePath, the assert should not change I think. Is it possible that we need to update other parts to set the right decision for ILV in VPlanNativePath?

Kazhuu added inline comments.Feb 25 2021, 5:56 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5189

I needed to add this when inner loop contained load instruction that is uniform on all iterations . This load is also in the test case I added in the loop body:

%d = load float, float* %arrayidx5.i, align 4

I guess technically LICM should lift this load up from the loop but I experienced this error when using PoCL with CPU backend and polybench OpenCL testbech using normal -O3 pass pipeline. I guess I could post it as a separate issue as well if needed. The assert line reported to fail was this:

opt: /home/kazooie/extra/programming/llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5184: llvm::LoopVectorizationCostModel::collectLoopUniforms(llvm::ElementCount)::<lambda(llvm::Instruction*, llvm::ElementCount)>: Assertion `WideningDecision == CM_Scalarize' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.

And to my knowledge where scatter/gather widening decision is coming from is form the fact that VPlan doesn't support recognizing uniform memory strides, so it is using scatter/gather instead of loads.

Kazhuu updated this revision to Diff 326634.Feb 26 2021, 2:23 AM

Add UseVPlanNative path boolean to inform when to use VPlan native path, removed unnecessary assert modification from earlier and update the test case. Now vectorizing inner loop produces the same vectorized code when VPlan native path flag is enabled and not enabled.

Kazhuu marked 2 inline comments as done.Feb 26 2021, 2:27 AM
Kazhuu added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4448–4450

Added the boolean to LoopVectorizationCostModel class instead because the flags were used there as well and to avoid having the same flag in both ILV and LoopVectorizationCostModel classes.

5189

Yeah you were right this was unnecessary indeed when the VPlan flag usage is fixed.

a.elovikov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4448–4450

For my education, why shouldn't we use VPlan native path for inner loop vectorization? Isn't it a simpler way to make the path stable enough (it's already covered by an options that is disabled by default and is served for development purposes only)? I'd expect inner loop vectorization should be a subset of what it should be able to do.

In other words, do we expect it to be a temporary workaround to enable more work on VPlan native path right now, or do I miss some design decision that would explain why this should be a long term fix?

To clarify - I'm not going to block the review/request changes to the patch because of this. It is solely for the purpose of me getting a better understanding of the ongoing VPlan development and future plans.

Kazhuu added a subscriber: Florian.Mar 3 2021, 8:34 PM
Kazhuu added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4448–4450

I wanted to fix this because of the crashes it causes which in my opinion are not the good thing have. To be honest I'm not aware of the long term plan that well either. Maybe @Florian or someone who knows the plan better can shed some light on this?

Kazhuu added inline comments.Mar 3 2021, 8:36 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4448–4450

Oh no sorry. I meant @fhahn :D

fhahn added inline comments.Mar 4 2021, 10:02 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1837

I'd not specify the conditions when this is set true. I'd say something like Controls whether the VPlan native path is used or not.

nit: use docygen comment ///

4448–4450

In other words, do we expect it to be a temporary workaround to enable more work on VPlan native path right now, or do I miss some design decision that would explain why this should be a long term fix?

Unfortunately the VPlan native path is extremely constrained at the moment, exclusively relying on user annotation for legality checks. Most recent work has been in the other direction, VPlanization of the inner loop vectorizer, so in a way the term 'VPlan native path' is becoming more inaccurate and in the medium term the 'plan native path' will be more like 'VPlan outer loop vectorization'.

In terms of user experience, I think the behavior of the patch is much more user-friendly: currently the VPlan native path also runs on inner loops without requiring any pragmas, causing crashes as outlined in the bug-report. With the current patch, inner loops should go through the existing inner loop vectorizer and only outer loops explicitly marked for vectorization will go through the VPlan native path. This should make it more use-able for people who want to experiment with it on larger code-bases, where they only want to use it for certain loops.

I don't think we are losing much, as we are *very* far off from the VPlan native path being ready and as I said work is underway VPlanizing the inner loop vectorizer towards unifying the paths. This also means the focus of the VPlan native path can remain brining up pieces specifically needed for outer loops.

Kazhuu updated this revision to Diff 328413.Mar 5 2021, 12:06 AM

Fix coment.