Page MenuHomePhabricator

[VPLAN] Determine Vector Width programmatically.
ClosedPublic

Authored by fpetrogalli on Feb 1 2019, 9:51 AM.

Details

Summary

With this change, the VPlan native path is triggered with the directive:

#pragma clang loop vectorize(enable)

There is no need to specify the vectorize_width(N) clause.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

While we are at this, let's talk about downstream dependency, if any, for allowing more than one candidate VF along the native path. At least we can write down a list of TODOs so that we'll be aware of the things we need to improve at a time in the future.

This may be a bit more than what you anticipated for this patch, but I think this will help everyone.

lib/Transforms/Vectorize/LoopVectorize.cpp
7098 ↗(On Diff #184775)

This is in some sense reinventing the wheel, and I kind of imagine this can quickly get out of control if different people start trying to add different constraints of their interest.

Let's start talking about how we can get to the point of being able to use unified computeMaxVF(). For this patch, however, I'd be happy if we get to the point of being able to use unified getSmallestAndWidestTypes().

7138 ↗(On Diff #184775)

Let's try not do this here. Please do this kind of thing inside planInVPlanNativePath().

fhahn added a subscriber: fhahn.

Thanks for working on this Francesco! +1 to Hideki's point about moving to LoopVectorizationPlanner.

lib/Transforms/Vectorize/LoopVectorize.cpp
1001 ↗(On Diff #184775)

Is this related to the patch? I suppose guessVectorizationFactor could set UserVF to 1, which could reach this code. It would be better to not vectorize with VF == 1.

3787 ↗(On Diff #184775)

Related to the patch?

test/Transforms/LoopVectorize/outer_loop_test1_no_explicit_vect_width.ll
90 ↗(On Diff #184775)

Looks like we are just using a single store width in this test. Maybe it would be worth adding loads/stores to a different type as well?

npanchen added inline comments.Feb 1 2019, 4:08 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1001 ↗(On Diff #184775)

For VF = 1 CM is not called, thus this change looks as unrelated to this commit. If it does, what is a reason and why other places in CM are not modified ?

7098 ↗(On Diff #184775)

Agree with Hideki. Ideally, this function has to return a range [MinVF, MaxVF], which can be used in buildVPlans() to minimize number of built VPlans.

Please, remember that target can have different vector registers.

7103 ↗(On Diff #184775)

For a cases when load and store instructions were hoisted out of the loop, this function will return WidestVectorRegBits, which could be >= 128.

7106 ↗(On Diff #184775)

getMemInstValueType(I)

fpetrogalli marked 5 inline comments as done.Feb 4 2019, 8:35 AM

@hsaito , @fhahn , @npanchen ,

Thank you for looking into this.

I have responded to all comments about the design. I will work on a new patch an submit it according to your suggestions. For now, I have ignored those comments on trivial changes (for example, @fhahn comment on adding more test coverage). I will fix such comments and update them along the way with the re-implementation of the patch.

Thank you,

Francesco

lib/Transforms/Vectorize/LoopVectorize.cpp
1001 ↗(On Diff #184775)

Yes, this is related to this patch. By guessing the number of lanes with the algorithm i proposed, all the target independent invocations of opt that use VPLAN end up reporting vector registers of size 32-bit (TTI->getRegisterBitWidth(true /* Vector*/)), which means the vectorization performed in some of the tests are done with a vectorization factor of 1. I am not saying this is the right thing to do, I was actually hacking something that worked to initiate the discussion.

1001 ↗(On Diff #184775)

This assertion was firing when testing test/Transforms/LoopVectorize/explicit_outer_detection.ll. Because the test is run with no explicit vector with (this patch removes such need) and without specifying a target for opt, the vectorizer was generating a vector loop with one lane, for the same reason explained in my previsous comment (no target implies TTI->getRegisterBitWidth(true /* Vector*/) returning 32).

3787 ↗(On Diff #184775)

Yes. Again, this was needed because of an assertion firing when building a binary operator in the process of vectorizing case2 in test/Transforms/LoopVectorize/explicit_outer_detection.ll. The loop in case2 get's vectorized with this patch, with a vectorization factor of 1, but the code generation in the inner loop vectorizer is trying to build a binary operator between a scalar phi and a vector consisting of one lane, which is of course not possible. This change makes sure that the phi is generated not as a scalar but a one lane vector to prevent the failure when building the binary operator.

7098 ↗(On Diff #184775)

Responding to both comments from @hsaito and @npanchen here:

> This is in some sense reinventing the wheel

I fully agree :). I needed a starting point to be able to discuss this with you.

Let's start talking about how we can get to the point of being able to use unified computeMaxVF(). For this patch, however, I'd be happy if we get to the point of being able to use unified getSmallestAndWidestTypes().

OK, I will look into getting access to getSmallestAndWidestTypes here.

Please, remember that target can have different vector registers.

Yes, I understand that using the widest ones is not ideal, as you miss for example 2-lane vectorization on floating point data on machines that support both 64-bit and 128-bit vector registers (for example aarch64).

Just to make sure we are on the same page here. This could be solved by considering all possible power of two vector widths up to the maximum one. For example, if the loop is processing 32-bit data, and the target has 128, 256, and 512-bit wide registers, we should ask VPlan to consider 4, 8 and 16 lanes vectorizations. Is my interpretation correct here?

7103 ↗(On Diff #184775)

Sorry I don't understand this comment, could you please explain it with and example? Also, given that my understanding is that you want me to use getSmallestAndWidestTypes, is the comment still valid? Because I think that by using getSmallestAndWidestTypes I essentially will remove my custom code.

Thanks Francesco for helping us remove some of the constraints we have in VPlan native path!
I agree with the idea of using getSmallestAndWidestTypes() as a starting point whereas we don't have the proper cost model.

One more comment below.
Thanks,
Diego

dcaballe added inline comments.Feb 4 2019, 9:31 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7141 ↗(On Diff #184775)

Regarding the issue with VF = 1, we are using VF = 1 to denote no vectorization and I think we should preserve that behavior. This basically means we shouldn't try to generate vector code (shouldn't invoke getWideningDecision et al.) with VF = 1.

I think the problem happens because we are setting UserVF here, instead of VF, and probably 1 is an unexpected value for UserVF (just guessing, I haven't checked it out). Maybe we should leave UserVF to values actually coming from the user and set VF instead? If you move this code to planInVPlanNativePath, as suggested, I think the VF = 1 problem would be fixed.

npanchen added inline comments.Feb 4 2019, 6:05 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
7103 ↗(On Diff #184775)

The example I was thinking about is pretty simple:

#pragma clang loop vectorize(enable)
for (i = 0; i < n; ++i) {
  red += i;
}

After some optimization the loop body can look like:

loop_body:
%1 = phi i64     %i_init,              %i
%2 = phi i32     %red_init,          %red
%red = %2 + %1
%i = %1 + 1
cmp %i, %n

In this case guessVPlanVF() will not find any LD/ST instruction, thus Max will be equal to 1 and WidestVectorRegBits will be returned.
For example, in case of SKX WidestVectorRegBits = 512.

fpetrogalli marked 16 inline comments as done.Mar 13 2019, 1:24 PM

Hi all,

I finally addressed your comments and update the patch. Let me know what you think.

Francesco

lib/Transforms/Vectorize/LoopVectorize.cpp
1001 ↗(On Diff #184775)

I restored this assertion.

3787 ↗(On Diff #184775)

I restored this code.

7098 ↗(On Diff #184775)

I have a version that does [MinVF, MaxVF] computation, but this triggers the assertion in this method:

void LoopVectorizationPlanner::setBestPlan(unsigned VF, unsigned UF) {
  LLVM_DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF
                    << '\n');
  BestVF = VF;
  BestUF = UF;

  erase_if(VPlans, [VF](const VPlanPtr &Plan) {
    return !Plan->hasVF(VF);
  });
  assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");
}

Am I correct thinking that to be able to choose among different plans, we need to provide VPLAN with a cost model that is able to evaluate all the options? If that's the case, I think that generating Max and Min VF goes beyond the scope of this patch.

7103 ↗(On Diff #184775)

The new version doesn't use my custom code but getSmallestAndWidestTypes. I could add a test that does the hoisting on outer loops, but I am not sure how to construct this example. Can you provide a C example to start with? Or are you happy for me to skip this check?

7106 ↗(On Diff #184775)

This code is not used anymore in the last version.

7141 ↗(On Diff #184775)

The VF = 1 case is not generated anymore after the last change set, marking this comment as done.

test/Transforms/LoopVectorize/outer_loop_test1_no_explicit_vect_width.ll
90 ↗(On Diff #184775)

The patch doesn't rely anymore on the load/store types, but uses getSmallestAndWidestTypes. Do you still want me to add such test, or can we trust getSmallestAndWidestTypes of doing the right job here?

fpetrogalli marked 3 inline comments as done.

I mostly changes to code to use the infrastructure that LLVM already provides to determine the vectorization factor.

I mostly changes to code to use the infrastructure that LLVM already provides to determine the vectorization factor.

Haven't gone through the LIT test yet, but the code addressed my concerns.

lib/Transforms/Vectorize/LoopVectorize.cpp
6108 ↗(On Diff #190488)

If the return value is always 2 or greater, we should replace lines 6099-6103 with assert for VF >= 2.
Else, we should move lines 6099-6103 below line 6108 and adjust the code and the comment accordingly.

6112 ↗(On Diff #190488)

This is no longer user VF. Should be something along the lines of

"LV: Using " << UserVF ? "user VF " : "computed VF " << VF
fpetrogalli marked an inline comment as done.

Addressed second round of comments from @hsaito.

fpetrogalli marked an inline comment as done.Mar 13 2019, 8:30 PM
hsaito added inline comments.Mar 14 2019, 9:51 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6105 ↗(On Diff #190562)

(VPlanBuildStressTest && VF < 2) ?

fpetrogalli marked an inline comment as done.Mar 14 2019, 9:56 AM
fpetrogalli added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
6105 ↗(On Diff #190562)

Sure, but why? If we are in VPlanBuildStressTest, why would you care about the value of VF?

Isn't it better to have full control on the stress tests and make sure that we always vectorize with VF = 4?

hsaito added inline comments.Mar 14 2019, 10:25 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6105 ↗(On Diff #190562)

I then suggest going back to (VPlanBuildStressTest && !UserVF) --- override only if the programmer doesn't explicitly set it.

For the spirit of stress testing, I think it makes sense to have the ability to choose any legal VF (and in the future extend it to "scalable"). Maybe, we should take compiler option instead of hard-coded 4. So, something along like the following?

if (!UserVF)

if (VPlanBuildStressTest)
     VF = 4
else
     VF = determineVplanVF()
fpetrogalli marked 3 inline comments as done.
fpetrogalli added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
6105 ↗(On Diff #190562)

Thank you for explaining. I have re-worked the if statement as requested.

hsaito accepted this revision.Mar 14 2019, 12:19 PM

LGTM. Please wait for a few days to give others a chance to go over your updated patch.

This revision is now accepted and ready to land.Mar 14 2019, 12:19 PM
Meinersbur added inline comments.
lib/Transforms/Vectorize/LoopVectorizationPlanner.h
233–234 ↗(On Diff #190562)

[nit] It it somewhat unusual to pass by-value with const.

test/Transforms/LoopVectorize/outer_loop_test1_no_explicit_vect_width.ll
16–17 ↗(On Diff #190562)

The RUN lines are usually at the top of the file

fpetrogalli marked an inline comment as done.Mar 14 2019, 12:41 PM
fpetrogalli added inline comments.
lib/Transforms/Vectorize/LoopVectorizationPlanner.h
233–234 ↗(On Diff #190562)

I see, but I run into problems while developing this caused by the fact that I could modify UserVF inside the function. Would a const ref be better? I can try that, if it doesn't work and you don't like this const value, I will revert the interface to the original one.

Meinersbur added inline comments.Mar 14 2019, 1:38 PM
lib/Transforms/Vectorize/LoopVectorizationPlanner.h
233–234 ↗(On Diff #190562)

It should be a coding standard question. It may help against accidental assignment in the implementation to have a by-value const. Unfortunately, this function implementation detail leaks into the function signature.

Since I don't see this anywhere else in the LLVM code base, so I'd prefer to not do it. Otherwise, if we want to apply this consequently, we'd have to add const to by-value parameters to many functions (e.g. setBestPlan below).

Const ref would be worse.

fpetrogalli marked 4 inline comments as done.
fpetrogalli added inline comments.
lib/Transforms/Vectorize/LoopVectorizationPlanner.h
233–234 ↗(On Diff #190562)

Thank you for explaining. I have removed the const and restored the original interface..

fpetrogalli marked an inline comment as done and 2 inline comments as not done.Mar 14 2019, 1:48 PM
dcaballe accepted this revision.Mar 14 2019, 7:36 PM

Thanks, Francesco. LGTM!

@npanchen , @fhahn , gentle ping :)

Francesco

@npanchen , @fhahn , gentle ping :)

Francesco

I think you waited long enough already. I suggest proceeding to commit. Any further comments can be addressed post-commit.

I think you waited long enough already. I suggest proceeding to commit. Any further comments can be addressed post-commit.

OK.

@hsaito, I don't have commit access, could you commit this change for me?

Thank you!

@hsaito, I don't have commit access, could you commit this change for me?

Will do. I had a day off today. Will take care of it, hopefully tomorrow.

fhahn added inline comments.Wed, Mar 27, 3:11 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6111 ↗(On Diff #190716)

Is the lambda necessary? You can just have << (UserVF ? "user VF" : "computed VF") << inline, I think

6114 ↗(On Diff #190716)

I think it would be clearer to say something along the lines of "LV: Using .. VF to build VPlans", to make it a bit clearer that VF is not necessarily what will be used for vectorization (e.g. VF == 1 means no vectorization).

7157 ↗(On Diff #190716)

There is VectorizationFactor::Disabled(), which is used in other places. Could you use this here instead of VF.Widht == 1?

test/Transforms/LoopVectorize/explicit_outer_detection.ll
71 ↗(On Diff #190716)

Comment needs updating, we now analyze loops without user VF too.

test/Transforms/LoopVectorize/outer_loop_test1_no_explicit_vect_width.ll
1 ↗(On Diff #190716)

This test will fail on targets not built with X86/AArch64 targets. The X86 version should go in test/Transforms/LoopVectorize/X86/ and the AArch64 one in test/Transforms/LoopVectorize/AArch64/

fhahn added inline comments.Wed, Mar 27, 3:14 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
6103 ↗(On Diff #190716)

I think as a follow up, we can drop VPlanBuildStressTest, now that we do not require a UserVF to build VPlans.

fpetrogalli marked 4 inline comments as done.

I have addressed last round of comments from @fhahn .

fpetrogalli added inline comments.Wed, Mar 27, 12:31 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6103 ↗(On Diff #190716)

Should I address this in a separate patch or this one?

7157 ↗(On Diff #190716)

struct VectorizationFactor did not have a "==" operator. I have added it, and adapted this condition.

I forgot to update the comment in the test as requested from @fhahn . Now it is done.

fpetrogalli marked an inline comment as done.Wed, Mar 27, 12:35 PM
fhahn added a comment.Wed, Mar 27, 2:10 PM

Thanks Francesco! I'll commit the change tomorrow, unless @hsaito does it today :)

lib/Transforms/Vectorize/LoopVectorize.cpp
6103 ↗(On Diff #190716)

Yep that would be great!

Thanks Francesco! I'll commit the change tomorrow, unless @hsaito does it today :)

Thank you @fhahn !

fhahn added inline comments.Wed, Mar 27, 2:34 PM
lib/Transforms/Vectorize/LoopVectorizationPlanner.h
180 ↗(On Diff #192498)

No braces needed, I'll run clang-format on your patch before committing.

Thanks Francesco! I'll commit the change tomorrow, unless @hsaito does it today :)

@fhahn, I'm trying to figure out the appropriate proxy setting for external git.
You may be quicker. First commit attempt after SVN is gone. Some learning curve here.

This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Thu, Mar 28, 3:40 AM

Thanks Francesco!

Thanks Francesco!

Thanks, Francesco and Florian.