This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Implement initial vector code generation support for simple outer loops.
ClosedPublic

Authored by sguggill on Aug 15 2018, 4:42 PM.

Details

Summary

[VPlan] Implement vector code generation support for simple outer loops.

Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:

  • force vector code generation using explicit vectorize_width
  • add conservative early returns in cost model and other places for VPlanNativePath
  • add code for setting up outer loop inductions
  • support for widening non-induction PHIs that can result from inner loops and uniform conditional branches
  • support for generating uniform inner branches

We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.

Diff Detail

Repository
rL LLVM

Event Timeline

sguggill created this revision.Aug 15 2018, 4:42 PM
sguggill edited the summary of this revision. (Show Details)Aug 15 2018, 4:54 PM
fhahn added a comment.Aug 16 2018, 7:28 AM

Great to see this. I've left a few small comments after a first pass and will take a closer look in the next few days!

We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.

Excellent, having some test cases in the test-suite to exercise the VPlan native path will be great.

lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
592 ↗(On Diff #160940)

can we just use Header->phis() here? Using something like llvm::any_of/llvm::all_of might also make the code slightly simpler.

lib/Transforms/Vectorize/LoopVectorize.cpp
7806 ↗(On Diff #160940)

Is it worth explicitly calling this out as FIXME/TODO? IIUC this could be done independently and I think it would be good to start preserving this sooner rather than later :)

lib/Transforms/Vectorize/VPlan.cpp
241 ↗(On Diff #160940)

nit: if (!EnableVPlanNativePath) continue;?

Great to see this. I've left a few small comments after a first pass and will take a closer look in the next few days!

Thanks!

sguggill added inline comments.Aug 16 2018, 10:33 PM
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
592 ↗(On Diff #160940)

I will make the change.

lib/Transforms/Vectorize/LoopVectorize.cpp
7806 ↗(On Diff #160940)

Yes, I plan to do this independently soon afterwards. I will add a TODO.

lib/Transforms/Vectorize/VPlan.cpp
241 ↗(On Diff #160940)

I hope I am not missing something here. We want to call execute for all blocks for non VPlan-native path and skip calling execute for outer loop's PH/exit blocks.

sguggill updated this revision to Diff 161167.Aug 16 2018, 11:16 PM

Addressed initial comments from Florian.

sguggill updated this revision to Diff 161277.Aug 17 2018, 9:25 AM

Minor formatting changes.

fhahn added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
2800 ↗(On Diff #161277)

Should we ever hit this code path in the vplan native path at the moment? If not, I would turn it into an assert.

3857 ↗(On Diff #161277)

nit: ScalarBBPredecessors be initialized in the constructor so we don't need this loop (Same for VectorBBPredecessors)? There's a constructor that takes an iterator range.

7186 ↗(On Diff #161277)

Could we use unsigned VF = Range.Start; at the start of the for loop, so we do not need plan->addVF outside?

7400 ↗(On Diff #161277)

as it is currently, we should only get to this point for outer loops, right? Then this could be an assert.

lib/Transforms/Vectorize/VPlan.cpp
403 ↗(On Diff #161277)

This message needs to be updated I think. Something like 'Expected InnerLoop VPlan CFG to terminate with unreachable')

sguggill updated this revision to Diff 163446.Aug 30 2018, 4:33 PM

Addressing latest comments from Florian.

sguggill marked 3 inline comments as done.Aug 30 2018, 4:48 PM
sguggill added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
2800 ↗(On Diff #161277)

We do hit this code in vplan native path and later hit a seg fault as getLAI returns NULL.

sguggill updated this revision to Diff 163452.Aug 30 2018, 5:19 PM
sguggill marked an inline comment as done.Aug 30 2018, 5:21 PM
sguggill added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
7400 ↗(On Diff #161277)

Yes, that is right. I have removed this code as the call to this function is guarded by a check for non-empty loops.

fhahn added a comment.Sep 4 2018, 7:18 AM

I just tried to build your patch, and it triggers

lib/Transforms/Vectorize/LoopVectorize.cpp:1288: bool llvm::LoopVectorizationCostModel::isScalarAfterVectorization(llvm::Instruction*, unsigned int) const: Assertion `ScalarsPerVF != Scalars.end() && "Scalar values are not calculated for VF"' failed.

for the vplan tests.

lib/Transforms/Vectorize/LoopVectorize.cpp
2800 ↗(On Diff #161277)

Right! I think for now it is better to exit early here then,

fhahn added a comment.Sep 4 2018, 7:23 AM

I just tried to build your patch, and it triggers

lib/Transforms/Vectorize/LoopVectorize.cpp:1288: bool llvm::LoopVectorizationCostModel::isScalarAfterVectorization(llvm::Instruction*, unsigned int) const: Assertion `ScalarsPerVF != Scalars.end() && "Scalar values are not calculated for VF"' failed.

for the vplan tests.

Nevermind, it was because the patch did not apply cleanly. Could you rebase the patch on latest trunk?

sguggill updated this revision to Diff 163940.Sep 4 2018, 4:20 PM
sguggill marked an inline comment as done.

Rebased my changes.

fhahn accepted this revision.Sep 5 2018, 2:54 AM

LGTM with some more nits to the test cases. The EnableVPlanNativePath sprinkled around are not ideal, but I don't think there is a better alternative and we should be able to get rid of them as we go along. Please wait a few days with submitting this, to give other people a change to raise any additional concerns.

test/Transforms/LoopVectorize/outer_loop_test1.ll
17 ↗(On Diff #163940)

Using CHECK-LABEL to check for blocks gives better error messages in case of a mismatch.

18 ↗(On Diff #163940)

could you also add checks for the generated PHIs, increments for the induction vars, address calculation and exit condition check? Same for the second test.

25 ↗(On Diff #163940)

This line and the one below can be dropped.

28 ↗(On Diff #163940)

dso_local local_unnamed_addr can be dropped

32 ↗(On Diff #163940)

dso_local and local_unnamed_addr #0 can be dropped, same for the comment above.

test/Transforms/LoopVectorize/outer_loop_test2.ll
37 ↗(On Diff #163940)

IIUC the test does not rely on any X86 specifics, so this and the target triple can be dropped?

This revision is now accepted and ready to land.Sep 5 2018, 2:54 AM

LGTM with some more nits to the test cases. The EnableVPlanNativePath sprinkled around are not ideal, but I don't think there is a better alternative and we should be able to get rid of them as we go along. Please wait a few days with submitting this, to give other people a change to raise any additional concerns.

Thanks for the approval Florian! I will take care of the test case changes and wait a few days to submit the changes.

sguggill marked 6 inline comments as done.Sep 6 2018, 2:21 PM
sguggill updated this revision to Diff 164326.EditedSep 6 2018, 5:08 PM

Florian, I have made the suggested test case changes and uploaded the same. Please take a look.

I think I found a typo, but otherwise LGTM too!

lib/Transforms/Vectorize/VPlan.cpp
392 ↗(On Diff #164326)

Shouldn't this call getEntryBasicBlock()? If the successor is a region, the corresponding IR block should jump to the entry block of that region, not to its exit.

sguggill updated this revision to Diff 164990.Sep 11 2018, 3:43 PM

Addressing Robin's comment.

sguggill marked an inline comment as done.Sep 11 2018, 3:47 PM
sguggill added inline comments.
lib/Transforms/Vectorize/VPlan.cpp
392 ↗(On Diff #164326)

Thank you for pointing this out. This was an oversight. The code was working as all successors were VPBasicBlocks in VPlanNativePath at this point. I have fixed this.

sguggill marked an inline comment as done.Sep 11 2018, 3:49 PM

I have requested Hideki Saito to submit this change on my behalf.

fhahn added inline comments.Sep 12 2018, 7:57 AM
lib/Transforms/Vectorize/VPlan.cpp
392 ↗(On Diff #164326)

I am not sure if it is possible to construct a test case for that with the restrictions at the moment, but if it is, could you add a test case covering this?

I have requested Hideki Saito to submit this change on my behalf.

Status Update:
There are a few failures found in our internal testing. Waiting for @sguggill to look into it.

hsaito accepted this revision.Sep 13 2018, 5:34 PM

Accepting the last revision prior to the commit. Hope arc will automatically attribute the patch to @sguggill.

This revision was automatically updated to reflect the committed changes.
hsaito added a comment.EditedSep 14 2018, 8:06 AM

Fix for the buildbot failures, provided by @sguggill . There was a memory leak in the D50820 patch.

http://llvm.org/viewvc/llvm-project?view=revision&revision=342201