Page MenuHomePhabricator

[LV] Move interleave count computation to LVP::plan().
Changes PlannedPublic

Authored by fhahn on Jan 29 2019, 6:58 AM.

Details

Summary

This patch moves the interleave count computation into LVP::plan(). It
makes sure VPlans are built in all cases we could generate code in the
vectorizer. If we know we do not generate any code (MaxVF == 1, no
interleaving), we can avoid building any VPlans.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Event Timeline

fhahn created this revision.Jan 29 2019, 6:58 AM

Hey Florian! Thanks a lot for taking care of this! I'm leaving a comment below. Please, let me know what you think.

Diego

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6168

I wondered if we could make LoopVectorizationPlanner::plan more interleave group aware, since the code above also builds a vplan for VF=1 and and returns NoVectorization. Maybe we don't have to modify computeMaxVF but just pass InterleaveGroup flag to LoopVectorizationPlanner::plan and make sure a vplan is built when the flag is true. We are already passing UserVF so it would be more of the same. In that way we wouldn't need a independent handling for interleaving cases. I wouldn't worried about compile time since the code executed for InterleaveGroup would be mostly the code that you have in the new planForInterleaving, right?. What do you think?

fhahn marked an inline comment as done.Jan 31 2019, 11:10 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6168

The interleave count is currently determined by CM.selectInterleaveCount (or the user provided one, as in the test case) and depends on the vectorization factor. It might make sense to determine the interleave count in plan as well, and return a pair of VF and interleave count. What do you think?

(Please note that the interleave count I am talking about is not related to (Memory) InterleaveGroups)

dcaballe added inline comments.Jan 31 2019, 11:36 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6168

Sorry, I didn't mean interleave "group" but interleave "loop" in my previous message (including the flag, line 7212) :).
Yes, IMO what you suggest is the the right think to do. VPlan is meant to be used for vectorization and interleaving. We mostly focused on the first one but we should catch up on the second.

fhahn updated this revision to Diff 185415.Feb 5 2019, 2:48 PM
fhahn retitled this revision from [LV] Add planForInterleaving helper, to create VPlan if they are missing. to [LV] Move interleave count computation to LVP::plan()..
fhahn edited the summary of this revision. (Show Details)

Update the patch to move interleave count computation into LVP::plan(). The early exit if computeMaxVF == None made it a bit tricky.

As it is at the moment, we need to know the selected vectorization factor to compute the interleave count. And we need to know the interleave count to know if we need to generate VPlans. Maybe with this refactor it would also make sense to only generate a VPlan for the selected vectorization factor for now? There is no need to build VPlans for multiple vectorization factors in the legacy planning and we have the VPlan native path which builds the VPlans up front for planning.

fhahn marked an inline comment as done.Feb 5 2019, 2:49 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
845

Left over, I'll remove it before committing.

hsaito added a comment.Feb 5 2019, 3:48 PM

As it is at the moment, we need to know the selected vectorization factor to compute the interleave count. And we need to know the interleave count to know if we need to generate VPlans. Maybe with this refactor it would also make sense to only generate a VPlan for the selected vectorization factor for now? There is no need to build VPlans for multiple vectorization factors in the legacy planning and we have the VPlan native path which builds the VPlans up front for planning.

I don't disagree with you in principle, but let's wait until planInVPlanNativePath() actually starts to build VPlans for multiple VFs. I hate to see the rest of the things getting bit rotten before that happens.
This is partly because I haven't worked on the legacy side of CM running on VPlan as much as I should --- so, I can't complain. Once that version of CM is ready, once again, I'll have to build multiple VPlans.
Please be forewarned.

hsaito added inline comments.Feb 5 2019, 4:00 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6147

Need to take UserIC into account here as well?

dcaballe added inline comments.Feb 5 2019, 6:19 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6134

Merge with next line?

6154

Shouldn't VF now starts from 1 and 6160 be removed?

6170–6176

multiple? do you mean at least one?

fhahn updated this revision to Diff 185515.Feb 6 2019, 2:42 AM

Address comments, thanks!

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 2:42 AM
fhahn marked 5 inline comments as done.Feb 6 2019, 2:45 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6147

Unfortunately the logic to decide between UserIC and computed IC is currently tightly tied with debug messages/optimization remarks. I think it would make sense to move that to plan() as well, but I would prefer to do it in a follow up patch, if that is OK with.

Ayal added a comment.Feb 6 2019, 7:51 AM

The bug revealed by the test is rather subtle; does it match the original https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 ?

LV decided (rightfully) not the vectorize the loop in the test having trip-count == 1, but does decide to interleave it (wrongfully, see below). LVP::plan() returns {VF=1,Cost=0} representing the decision not to vectorize, but w/o building any VPlan, which breaks the interleaving decision taken afterwards.

LVP::plan() builds a VPlan for VF=1 only to serve interleaving. So one simple possible fix for this bug is to always build such a VPlan. This patch tries to be smarter, and have LVP::plan() build this single VPlan only if interleaving will actually need it. Another way to building this VPlan only if needed, is to build it on demand, if it's missing, once IC is determined (and before/at setBestPlan() which raised the assert). But note that LVP::plan() refrains from building a VPlan for VF=1 only if MaybeMaxVF is None, meaning "vectorization should be avoided up front". In such cases, interleaving should also be avoided up front! This "up frontness" can be propagated from CM.computeMaxVF() to CM.selectInterleaveCount() in one of several ways: raising a "CM.up-front" flag; have LVP::plan() return (1) an Optional<VectorizationFactor>; (2) a std::tie(VF, NoIC), similar to the current patch, but where NoIC is a boolean indicating IC should be set to 1; (3) a {VF=0,Cost=0} to denote that vectorization and interleaving are to be avoided up-front --- note that {VF=1,Cost=0} *also* represents UserVF=1, which *is* subject to interleaving.

It would be good to build VPlans more selectively to save compile-time, and possibly refactor how VF and IC decisions are taken, but better in a separate patch from one that fixes an erroneously missing VPlan?

fhahn marked an inline comment as done.Feb 6 2019, 8:34 AM

The bug revealed by the test is rather subtle; does it match the original https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 ?

LV decided (rightfully) not the vectorize the loop in the test having trip-count == 1, but does decide to interleave it (wrongfully, see below). LVP::plan() returns {VF=1,Cost=0} representing the decision not to vectorize, but w/o building any VPlan, which breaks the interleaving decision taken afterwards.

LVP::plan() builds a VPlan for VF=1 only to serve interleaving. So one simple possible fix for this bug is to always build such a VPlan. This patch tries to be smarter, and have LVP::plan() build this single VPlan only if interleaving will actually need it. Another way to building this VPlan only if needed, is to build it on demand, if it's missing, once IC is determined (and before/at setBestPlan() which raised the assert). But note that LVP::plan() refrains from building a VPlan for VF=1 only if MaybeMaxVF is None, meaning "vectorization should be avoided up front". In such cases, interleaving should also be avoided up front! This "up frontness" can be propagated from CM.computeMaxVF() to CM.selectInterleaveCount() in one of several ways: raising a "CM.up-front" flag; have LVP::plan() return (1) an Optional<VectorizationFactor>; (2) a std::tie(VF, NoIC), similar to the current patch, but where NoIC is a boolean indicating IC should be set to 1; (3) a {VF=0,Cost=0} to denote that vectorization and interleaving are to be avoided up-front --- note that {VF=1,Cost=0} *also* represents UserVF=1, which *is* subject to interleaving.

It would be good to build VPlans more selectively to save compile-time, and possibly refactor how VF and IC decisions are taken, but better in a separate patch from one that fixes an erroneously missing VPlan?

Thanks Ayal! Do I understand correctly that you propose to first fix the missing VPlan issue by make interleaving respect the fact that interleaving should be avoided when computeMaxVF returns None? If that is the intended behavior, then I think it makes sense to fix that first and I'll prepare a separate patch.

Ayal added a comment.Feb 6 2019, 8:40 AM

The bug revealed by the test is rather subtle; does it match the original https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477 ?

LV decided (rightfully) not the vectorize the loop in the test having trip-count == 1, but does decide to interleave it (wrongfully, see below). LVP::plan() returns {VF=1,Cost=0} representing the decision not to vectorize, but w/o building any VPlan, which breaks the interleaving decision taken afterwards.

LVP::plan() builds a VPlan for VF=1 only to serve interleaving. So one simple possible fix for this bug is to always build such a VPlan. This patch tries to be smarter, and have LVP::plan() build this single VPlan only if interleaving will actually need it. Another way to building this VPlan only if needed, is to build it on demand, if it's missing, once IC is determined (and before/at setBestPlan() which raised the assert). But note that LVP::plan() refrains from building a VPlan for VF=1 only if MaybeMaxVF is None, meaning "vectorization should be avoided up front". In such cases, interleaving should also be avoided up front! This "up frontness" can be propagated from CM.computeMaxVF() to CM.selectInterleaveCount() in one of several ways: raising a "CM.up-front" flag; have LVP::plan() return (1) an Optional<VectorizationFactor>; (2) a std::tie(VF, NoIC), similar to the current patch, but where NoIC is a boolean indicating IC should be set to 1; (3) a {VF=0,Cost=0} to denote that vectorization and interleaving are to be avoided up-front --- note that {VF=1,Cost=0} *also* represents UserVF=1, which *is* subject to interleaving.

It would be good to build VPlans more selectively to save compile-time, and possibly refactor how VF and IC decisions are taken, but better in a separate patch from one that fixes an erroneously missing VPlan?

Thanks Ayal! Do I understand correctly that you propose to first fix the missing VPlan issue by make interleaving respect the fact that interleaving should be avoided when computeMaxVF returns None? If that is the intended behavior, then I think it makes sense to fix that first and I'll prepare a separate patch.

Yes, that's right. Just curious if that'll indeed fix the original bug.

fhahn added a comment.Feb 6 2019, 12:31 PM

Thanks Ayal! Do I understand correctly that you propose to first fix the missing VPlan issue by make interleaving respect the fact that interleaving should be avoided when computeMaxVF returns None? If that is the intended behavior, then I think it makes sense to fix that first and I'll prepare a separate patch.

Yes, that's right. Just curious if that'll indeed fix the original bug.

Yep it fixes the original bug. I've created D57837. Once that is resolved, I'll update this patch.

fhahn planned changes to this revision.Feb 7 2019, 12:55 PM

I'll come back to this and will try to update it with the improvements discussed here and at D57837. Thanks everyone for all the feedback!