This is an archive of the discontinued LLVM Phabricator instance.

[LoopRotation] Allow loop header duplication if vectorization is forced
ClosedPublic

Authored by fhahn on Mar 26 2019, 11:47 AM.

Details

Summary

-Oz normally does not allow loop header duplication so this loop wouldn't be
vectorized. However the vectorization pragma should override this and allow
for loop rotation.

rdar://problem/49281061

Diff Detail

Event Timeline

anemet created this revision.Mar 26 2019, 11:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2019, 11:47 AM
ormris removed a subscriber: ormris.Mar 26 2019, 11:49 AM

This makes sense to me, given that loops with bottom checks are a precondition for LV. I think it would be good to also update the section about llvm.loop.vectorize.enable in LangRef to say it might also enable transformations like LoopRotate, in preparation for LV.

In case LV fails to vectorize the loop, it might be a bit surprising the loop has been rotated. But the way things are structured at the moment, there's nothing we can do about that (we cannot tell in LoopRotate if LV will be able to vectorize). By documenting that behavior, we can push the responsibility for that to the user. Maybe a remark in LoopRotate would be helpful to indicate that we only rotated because of the metadata.

Also adding Michael, as he worked on loop related metadata recently.

lib/Transforms/Scalar/LoopRotation.cpp
119 ↗(On Diff #192309)

Might be worth a comment why isForcedVectorize overrides the behavior here.

test/Transforms/LoopVectorize/AArch64/Oz-and-forced-vectorize.ll
1 ↗(On Diff #192309)

Could you limit the test case to running loop-rotate? The tests for LV should already be sufficient to check the transformed case. (same for second test)

59 ↗(On Diff #192309)

All metadata except this one can be stripped I think. (same for second test)

anemet marked 2 inline comments as done.Mar 27 2019, 8:00 AM

This makes sense to me, given that loops with bottom checks are a precondition for LV. I think it would be good to also update the section about llvm.loop.vectorize.enable in LangRef to say it might also enable transformations like LoopRotate, in preparation for LV.

In case LV fails to vectorize the loop, it might be a bit surprising the loop has been rotated. But the way things are structured at the moment, there's nothing we can do about that (we cannot tell in LoopRotate if LV will be able to vectorize). By documenting that behavior, we can push the responsibility for that to the user. Maybe a remark in LoopRotate would be helpful to indicate that we only rotated because of the metadata.

I think the picture is even simpler in this case. The user has asked for an optimization. If we can't deliver we issue a warning (-Wpass-failed). So the user has sufficient information to see that something went wrong. But yes a loop rotation remark makes sense! especially mentioning that we did it in preparation for forced vectorization.

Also adding Michael, as he worked on loop related metadata recently.

test/Transforms/LoopVectorize/AArch64/Oz-and-forced-vectorize.ll
1 ↗(On Diff #192309)

I don't think so. There is no other way to get sizelevel=2 activated for opt. You either run the entire -Oz pipeline or you run specific tests with sizelevel=0.

59 ↗(On Diff #192309)

I kept it in to get the *warning* with proper line info for failed user-requested optimization. I can drop if you don't think that is useful. I liked it because it completely captured the user experience that this test came from.

I can check for the actual diagnostics to make this less subtle.

fhahn added inline comments.Mar 27 2019, 8:32 AM
test/Transforms/LoopVectorize/AArch64/Oz-and-forced-vectorize.ll
1 ↗(On Diff #192309)

Argh, that's unfortunate :( . I thought it just respects the minsize attribute.......

This might be a more general problem, other passes might expect a normalized form as well, such as UnrollAndJam and LoopDistribute.

For LoopVectorizer, I am somewhat surprised. It calls simplifyLoop itself (instead of relying on LoopSimplifyPass). Could the same be done for LoopRotation?

lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

Consider using hasVectorizeTransformation in LoopUtils.h. I added it to make determining whether a pragma is added consistent between passes.

test/Transforms/LoopVectorize/AArch64/Oz-and-unforced-vectorize.ll
3–4 ↗(On Diff #192309)

There is no llvm.loop.vectorize.enable metadata here, so without -vectorize-loops option, it would not be vectorized anyway.

27 ↗(On Diff #192309)

I don't think we need to regression-test that we do not vectorize. If LoopVectorize one day manages to vectorize unrotated loop, it's a good thing, not a regression.

Btw, since it's the same content, you could instead add another RUN: line to the other test.

fhahn added a comment.Mar 27 2019, 9:16 AM

This might be a more general problem, other passes might expect a normalized form as well, such as UnrollAndJam and LoopDistribute.

For LoopVectorizer, I am somewhat surprised. It calls simplifyLoop itself (instead of relying on LoopSimplifyPass). Could the same be done for LoopRotation?

I guess it could, but the underlying problem is still the same: as things are currently structured, we have to rotate before we know if vectorization is legal and cannot go back if it is not. Potentially, VPlan allows this sort of thing, by doing those transforms on a separate representation (VPlan IR)

anemet marked 2 inline comments as done.Mar 27 2019, 9:48 AM

This might be a more general problem, other passes might expect a normalized form as well, such as UnrollAndJam and LoopDistribute.

For LoopVectorizer, I am somewhat surprised. It calls simplifyLoop itself (instead of relying on LoopSimplifyPass). Could the same be done for LoopRotation?

It felt that the decision belonged to LoopRotate. We make a profitability decision there (for -Oz loop rotate is a bad trade-off) which is inaccurate.

If you feel strongly about this I can explore rotating the loop in the vectorizer when I have a bit more time.

lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

Nice, I didn't know about this. BTW, LoopDistribute was not updated to use hasDistributeTransformation.

test/Transforms/LoopVectorize/AArch64/Oz-and-unforced-vectorize.ll
27 ↗(On Diff #192309)

I don't think we need to regression-test that we do not vectorize. If LoopVectorize one day manages to vectorize unrotated loop, it's a good thing, not a regression.

I like to have the negative tests as well but if you want I'll remove it.

Btw, since it's the same content, you could instead add another RUN: line to the other test.

Well, it's not the same content since it's missing the the forced-vectorize metadata. But perhaps I am not understanding the point you're trying to make.

anemet marked an inline comment as done.Mar 27 2019, 10:05 AM
anemet added inline comments.
lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

@Meinersbur I am actually confused whether I should be checking for Enable or ForcedByUser. We only set Enable and not Forced if the vectorization width is requested via a pragma but I think that is an even stronger enforcement of vectorization. What is the logic here?

If you feel strongly about this I can explore rotating the loop in the vectorizer when I have a bit more time.

No, this would go beyond fixing a bug.

lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

I only changed UnrollAndJam to use it for now. How these decisions are made is surprisingly subtle and e.g. in case of LoopVectorizeHints, integrated within other decisions. I was not brave enough to put up reviews for this and risking changing behavior without a need. But I would be happy if new code uses it.

test/Transforms/LoopVectorize/AArch64/Oz-and-unforced-vectorize.ll
27 ↗(On Diff #192309)

I like to have the negative tests as well but if you want I'll remove it.

I don't like them; they are fragile, require maintenance and check-llvm takes longer.

Well, it's not the same content since it's missing the the forced-vectorize metadata.

I was assuming the missing llvm.loop.vectorize.enable metadata was a mistake. Without it, LoopVectorize won't do anything, even if it could vectorize the loop (There is a difference in how clang and opt initialize the PassManagerBuilder).

An excellent example of how fragile this kind of test is.

fhahn commandeered this revision.Oct 26 2020, 3:08 AM
fhahn edited reviewers, added: anemet; removed: fhahn.

I'm taking this over, after checking with @anemet offline.

fhahn updated this revision to Diff 300612.Oct 26 2020, 3:10 AM

Rebased, updated to use hasVectorizeTransformation as suggested, removed negative test (existing loop rotation tests should cover that already) and removed unnecessary dbg & tbaa metadata from test.

Ping :)

Looks reasonable to me, on nit. Someone else?

llvm/lib/Transforms/Scalar/LoopRotation.cpp
132

Nit: Can we make this a standalone variable, they are free ;)

Looks reasonable to me, on nit. Someone else?

Seems fine to me too.

Meinersbur accepted this revision.Oct 26 2020, 9:50 AM

LGTM.

There is an inline comment on the previous diff that I think was never submitted; It does not apply to the latest diff as it already uses TM_ForcedByUser. I don't delete is since is might still be interesting.

lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

ForcedByUser follows llvm.loop.vectorize.enable, i.e. whether #pragma clang loop vectorize(enable) is present. Other metadata such as llvm.loop.vectorize.width logically are only used in case the vectorizer decides to transform a loop, without explicitly enabling it. Other passes like LoopUnroll and LoopUnrollAndJam enable themselves even if only llvm.loop.unroll.count is present.

LoopVectorizationHints::getForce() only checks the llvm.loop.vectorize.enable setting to override heuristicts, so I suggest to use ForcedByUser.

This is what I mean by "subtle".

This revision is now accepted and ready to land.Oct 26 2020, 9:50 AM
This revision was landed with ongoing or failed builds.Oct 27 2020, 2:28 AM
This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.Oct 27 2020, 2:34 AM
lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

That's an excellent point. I think after D66290 Clang also generates vectorize.enable when just the width pragma is specified, so for code generated by clang, this issue should not exist now, but may for other frontends.

119 ↗(On Diff #192309)

I added a variable similar to what we have in the new pm version in the committed patch,

Meinersbur added inline comments.Oct 27 2020, 8:30 AM
lib/Transforms/Scalar/LoopRotation.cpp
60 ↗(On Diff #192309)

Yes, some consistency would be nice. IMHO that vectorize_width implicitly forces vectorization is a front-end interpretation. vectorize.width without vectorize.enable has a valid interpretation "if you are vectorizing, use this SIMD width".