This is an archive of the discontinued LLVM Phabricator instance.

[VPLAN] Minor improvement to testing and debug messages.
ClosedPublic

Authored by fpetrogalli on Mar 28 2019, 11:50 AM.

Details

Summary
  1. Use computed VF for stress testing.
  1. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
  1. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).

Diff Detail

Repository
rL LLVM

Event Timeline

fpetrogalli created this revision.Mar 28 2019, 11:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2019, 11:50 AM

Thanks Francesco. I think we can only remove the arbitrary picking of VF = 4 with VPlanStressTest. I think we still need the other parts for testing, to build VPlans for loops without the pragma, which are then not vectorized.

lib/Transforms/Vectorize/LoopVectorize.cpp
267 ↗(On Diff #192692)

Ok I think we still want to keep this one here, as it enables building VPlans for loops without the pragma. But we can remove the code to arbitrarily pick VF = 4.

6116 ↗(On Diff #192692)

We still need this for correctness

hsaito added inline comments.Mar 28 2019, 12:19 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6106 ↗(On Diff #192692)

If we are removing stress testing, I'd like to make sure that, there is a mode where we always get VF > 1. Vectorizing for AVX2 almost fits the profile (double complex is 128bit, YMM register is 256bit), but at Intel, we don't want to stop at just thinking about POD types.

If that is not the case, there is a value in stress testing, at least until we become confident that VPlan vectorizer can handle all cases of OpenMP simd and declare simd. Helps our development.

hsaito added inline comments.Mar 28 2019, 12:25 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
267 ↗(On Diff #192692)

We need VF>1 to make this meaningful, and I don't think we'll get that from the current VF determination code. I'm fine setting VF=4 only when the computed VF is 1. Blindly setting VF to a fixed value gives more determinism, though.

sguggill added inline comments.Mar 28 2019, 12:51 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1404 ↗(On Diff #192692)

The intent here is to stress test VPlan build for any outer loop that is reducible (even ones that are not marked with clang vectorize enable). We lose that with this change and I don't think that is the right thing. Given this I think, we will need the VPlanBuildStressTest flag.

Hi all,

by the comment so far, I think we can abandon this change-set, because:

  1. As all of you have pointed out, we still need to do stress testing.
  2. Forcing VF=4 for stress testing seems to be a better choice than forcing VF=4 only when the computed VF =1, as it seems more reliable (stress test output doesn't change if we update the bits that compute the VF.

If you agree, I will abandon this patch.

Francesco

Yep, I agree on that we should keep the stress testing mechanism. It will be very useful to make sure that the construction and predication (and maybe other transformation) are robust enough since we can run it on loop nests that are not necessarily vectorizable.
Something important, though, is that we shouldn't use this mechanism to bypass legality or pragma simd requirements to vectorize a loop, i.e., we shouldn't use it to generate actual vector code.
What are you trying to achieve, Francesco?

fhahn added a comment.Mar 28 2019, 1:40 PM

Yep, I agree on that we should keep the stress testing mechanism. It will be very useful to make sure that the construction and predication (and maybe other transformation) are robust enough since we can run it on loop nests that are not necessarily vectorizable.
Something important, though, is that we shouldn't use this mechanism to bypass legality or pragma simd requirements to vectorize a loop, i.e., we shouldn't use it to generate actual vector code.
What are you trying to achieve, Francesco?

I suggested removing setting VF = 4 for VPlanStressTest now that we programmatically determine the VF, to streamline things a bit. In a way, just have VPlanStressTest mean: build a VPlan for any loop you can, with the automatically chosen VF.

I thought the point was to just have a VF to build a VPlan with and as it currently stands we build the same VPlans with VF = 4 or any other VF I think. With the programmatically determined VF, we would be a little closer to reality in the stress testing. As Hideki mentioned, having a constant factor for stress testing might have other benefits later on though

Yep, I agree on that we should keep the stress testing mechanism. It will be very useful to make sure that the construction and predication (and maybe other transformation) are robust enough since we can run it on loop nests that are not necessarily vectorizable.
Something important, though, is that we shouldn't use this mechanism to bypass legality or pragma simd requirements to vectorize a loop, i.e., we shouldn't use it to generate actual vector code.
What are you trying to achieve, Francesco?

I suggested removing setting VF = 4 for VPlanStressTest now that we programmatically determine the VF, to streamline things a bit. In a way, just have VPlanStressTest mean: build a VPlan for any loop you can, with the automatically chosen VF.

yeah, this patch was a follow up on the comment from @fhahn : https://reviews.llvm.org/D57598#1444238

I might have misunderstood his request. I will abandon the patch.

Thank you,

Francesco

I thought the point was to just have a VF to build a VPlan with and as it currently stands we build the same VPlans with VF = 4 or any other VF I think. With the programmatically determined VF, we would be a little closer to reality in the stress testing. As Hideki mentioned, having a constant factor for stress testing might have other benefits later on though

Yes, that sounds good to me. In the worst case, we could always use stress testing + user VF to achieve that.

fpetrogalli retitled this revision from [VPLAN] Remove option for stress testing. to [VPLAN] Use computed VF for stress testing..
fpetrogalli edited the summary of this revision. (Show Details)
fpetrogalli marked 5 inline comments as done.Mar 28 2019, 2:07 PM

I have updated the patch to address all the concerns. Now VF = 4 for stress testing only if the computed VF does not produce vector code (VF < 2).

Thank you all.

hsaito accepted this revision.Mar 28 2019, 2:57 PM

LGTM

This revision is now accepted and ready to land.Mar 28 2019, 2:57 PM

Should we add a small lit test that covers the new expected behavior? Probably reusing one from D57598 with the stress testing flag would suffice.

Should we add a small lit test that covers the new expected behavior? Probably reusing one from D57598 with the stress testing flag would suffice.

Good idea. Something that invokes stress testing and sets VF = 4 when the computed VF is 1. Did I get it right?

Should we add a small lit test that covers the new expected behavior? Probably reusing one from D57598 with the stress testing flag would suffice.

Good idea. Something that invokes stress testing and sets VF = 4 when the computed VF is 1. Did I get it right?

For example. Or one where VF != 4 (computed by determineVPlanVF).

For example. Or one where VF != 4 (computed by determineVPlanVF).

I believe that the test in test/Transforms/LoopVectorize/X86/outer_loop_test1_no_explicit_vect_width.ll is already doing that, as the AVX output is producing code with VF = 8.

Or are you saying this needs to be done explicitly for stress testing?

fpetrogalli retitled this revision from [VPLAN] Use computed VF for stress testing. to [VPLAN] Minor improvement to testing and debug messages..
fpetrogalli edited the summary of this revision. (Show Details)

I have addressed all concern raised in this review.

In particular, I have added a test for computed VF != 4 on AArch64 (on X86 that was just the case for AVX).

Or are you saying this needs to be done explicitly for stress testing?

Yes, something that covers the new stress testing behavior that this patch is introducing, to make sure that we preserve it. It could be a test using the stress testing flag and VF!=4 coming from determinePlanVF and/or a test using stress testing flag and where the default VF=4 is taken.
We are not generating code in stress testing mode so maybe you can match some VF information from the debug print.
Please, let me know if you find any roadblock.

fhahn added a comment.Apr 9 2019, 9:29 AM

Thanks Francesco! I think the test looks good now. If you need someone to commit it, just let me know.

dcaballe accepted this revision.Apr 9 2019, 10:03 AM

LGTM! Thanks, Francesco!

Thank you all.

@fhahn , did you say you volunteered for committing the patch? Please proceed. :)

Francesco

This revision was automatically updated to reflect the committed changes.
dstenb added a subscriber: dstenb.Apr 10 2019, 1:44 AM

Hi! One of the tests was missing "REQUIRES: asserts", so it failed on a non-assert build. I took the liberty to add that to the test in r358057. I hope that was okay!

Hi! One of the tests was missing "REQUIRES: asserts", so it failed on a non-assert build. I took the liberty to add that to the test in r358057. I hope that was okay!

Hi - thank your for taking care of this, apologies for breaking your runs!

Do you need a review?

Francesco