This is an archive of the discontinued LLVM Phabricator instance.

[LV] Invalidate widening decisions after maximizing vector bandwidth
ClosedPublic

Authored by dmgreen on Feb 20 2022, 10:54 AM.

Details

Summary

When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead.

This still now uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization.

This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 20 2022, 10:54 AM
dmgreen requested review of this revision.Feb 20 2022, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2022, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2022, 4:33 AM
fhahn added a comment.Mar 18 2022, 5:19 AM

This still now the EmulatedMaskMemRefHack costs (a bit unfortunately), ...

Looks like a word may be missing.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5225–5226

This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set.

IIUC this effectively ignores the default value of the option now, right? How does that make it easier to test?

5226

nit: check TTI.shouldMaximizeVectorBandwidth() first ,like in the original code?

llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll
14

Could you pre-commit the test and and update the diff to only show the difference?

95

The test is only checking the cost of %0, could the loop body be reduced to fewer loads/reductions?

dmgreen added inline comments.Mar 22 2022, 1:57 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5225–5226

I was mostly referring to it wanting to not check isScalarEpilogueAllowed any more. I think what we really want is for the option to always take precedence. I'll try and change it to that.

llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll
14

Unfortunately it requires the command line option in order to test. The costs of the last 2 are 1 without the change, as it treats them as continuous loads, not masked loads. And it choses a vector factor of 8.

I could commit the option handling change first if you prefer. It would leave just the invalidateCostModelingDecisions call and the changes to these scores/codegen differences.

95

It needs the whole thing to pick the wrong vector factor, which this is also testing. Without that it always chooses VF=1, even with the wrong scores.

dmgreen updated this revision to Diff 417211.Mar 22 2022, 1:58 AM
dmgreen edited the summary of this revision. (Show Details)

Update option logic.

fhahn accepted this revision.Mar 29 2022, 12:11 PM

LGTM, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5225–5226

Ok sounds good to me!

llvm/test/Transforms/LoopVectorize/AArch64/maximize-bandwidth-invalidate.ll
14

I could commit the option handling change first if you prefer. It would leave just the invalidateCostModelingDecisions call and the changes to these scores/codegen differences.

That seems more work for little benefit. Thanks for checking!

This revision is now accepted and ready to land.Mar 29 2022, 12:11 PM
This revision was landed with ongoing or failed builds.Mar 31 2022, 1:19 AM
This revision was automatically updated to reflect the committed changes.