This is an archive of the discontinued LLVM Phabricator instance.

[LV] Invalidate cost model decisions along with interleave groups.
ClosedPublic

Authored by fhahn on Apr 16 2020, 7:16 AM.

Details

Summary

Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.

Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.

Alternatively to forcing a complete recompute later on, we could try
to selectively invalidate the decisions connected to the interleave
groups. But we would likely need to run the uniform/scalar value
detection parts again anyways and the extra complexity is probably not
worth it.

Fixes PR45572.

Diff Detail

Event Timeline

fhahn created this revision.Apr 16 2020, 7:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2020, 7:16 AM
Ayal added a comment.Apr 16 2020, 9:58 AM

Clarifying the summary a bit:

Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). We [When] invalidating the
interleave groups, those decisions also need to be invalidated.

Otherwise there is a mis-match during VPlan construction where no
VPInterleave recipes are created and VPWidenMemoryRecipes are left
around for instructions marked as interleaved. This happens if
a vectorization factor is chosen for which computeMaxVF already computed
the CM decisions.

VPWidenMemoryRecipes created initially are indeed left around w/o converting them into VPInterleave recipes, but such conversion indeed should not take place, and these gather/scatter recipes may in fact be right. The crux is leaving around obsolete CM_Interleave (and dependent) markings of instructions along with their costs, instead of recalculating decisions, costs, and recipes.

llvm/include/llvm/Analysis/VectorUtils.h
749

Apply a similar fix to invalidateGroupsRequiringScalarEpilogue() as well? I.e., have it return an indication if any groups were invalidated, and if so clear WideningDecisions, Uniforms and Scalars.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1323

If there are no groups there's no need to call reset() either. Seems logical to first do InterleaveInfo.reset(), as it triggers clearing of WideningDecisions (at-least the CM_Interleave ones), which in turn triggers clearing Uniforms and Scalars.

Best have InterleaveInfo.reset() return an indication if any groups were destroyed, and iff so clear the rest? (Can indeed be more selective by also checking if there were any CM_Interleave decisions, etc., but agree that the extra complexity is probably not worth it.)

fhahn updated this revision to Diff 258264.Apr 17 2020, 2:08 AM
fhahn marked 3 inline comments as done.

Addressed comments.

Clarifying the summary a bit:

Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). We [When] invalidating the
interleave groups, those decisions also need to be invalidated.

Otherwise there is a mis-match during VPlan construction where no
VPInterleave recipes are created and VPWidenMemoryRecipes are left
around for instructions marked as interleaved. This happens if
a vectorization factor is chosen for which computeMaxVF already computed
the CM decisions.

VPWidenMemoryRecipes created initially are indeed left around w/o converting them into VPInterleave recipes, but such conversion indeed should not take place, and these gather/scatter recipes may in fact be right. The crux is leaving around obsolete CM_Interleave (and dependent) markings of instructions along with their costs, instead of recalculating decisions, costs, and recipes.

Thanks Ayal, I'll update the description to

Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.

Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.

Does that sound good?

fhahn edited the summary of this revision. (Show Details)Apr 17 2020, 2:08 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1323

I've renamed the function to invalidateCostModelingDecision, changed InterleaveInfo::reset to return a bool indicating if any groups have been invalidated (and renamed it to invalidateGroups to bring it in line with invalidateGroupsRequiring...). invalidateCostModelingDecisions is only called if invalidateGroups* returns true.

Ayal added a comment.Apr 17 2020, 7:29 AM

Sure, the updated summary's fine :-).

llvm/include/llvm/Analysis/VectorUtils.h
716

Simpler to early-exit at the start by "if (InterleaveGroups.empty()) return false;" and at the end "return true;"?

Still need to set RequiresScalarEpilogue to false. One would hope RequiresScalarEpilogue would always be false if there are no groups, but that requires another fix: releaseGroup() does not recompute RequiresScalarEpilogue (if it's true and the released group requiresScalarEpilogue); another option would be to set ReleaseScalarEpilogue once at the end, if any surviving group requiresScalarEpilogue.

751

Thanks for taking care of this as well! Would be good to have a test, perhaps a version of invalidate-cm-after-invalidating-interleavegroups.ll where idx.mul is bumped by more than 7, trip count is a power of 2, under optsize, would do(?).

llvm/lib/Analysis/VectorUtils.cpp
1256 ↗(On Diff #258264)

(As noted earlier, cannot simply assert that DelSet is not empty, and return true.
But at-least now RequiresScalarEpilogue is accurate ...)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4996

nit: could fuse into a single if. OTOH, perhaps an LLVM_DEBUG in between may be useful, as in the general invalidateGroups()?

Ayal added inline comments.Apr 17 2020, 8:30 AM
llvm/include/llvm/Analysis/VectorUtils.h
716

Actually releaseGroup() does not need to recompute RequiresScalarEpilogue, because the latter is indeed set according to surviving groups only, that are not released later.

fhahn updated this revision to Diff 258347.Apr 17 2020, 9:10 AM
fhahn marked 5 inline comments as done.

Address comments, add additional test case.

fhahn added inline comments.Apr 17 2020, 9:14 AM
llvm/include/llvm/Analysis/VectorUtils.h
716

I've added an assertion that empty groups implies RequiresScalarEpilogue == false.

751

This is actually not required, as the groups are invalidated before any cost-modeling decisions are taken. I've added a test case that should crash otherwise, as suggested.

llvm/lib/Analysis/VectorUtils.cpp
1256 ↗(On Diff #258264)

(from above)

Actually releaseGroup() does not need to recompute RequiresScalarEpilogue, because the latter is indeed set according to surviving groups only, that are not released later.

Right, it should always be up-to-date, right? in invalidateGroups all groups are removed and it is set to false. In invalidateGroupsRequiringScalarEpilogue, all groups requiring a scalar epilogue are removed and it is set to false as well. I've added an assertion.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4996

There's no need to invalidate anything CM related, as nothing's computed yet. I've added an assertion to that effect and a comment.

Ayal accepted this revision.Apr 17 2020, 11:46 AM

This looks good to me, thanks!

llvm/lib/Analysis/VectorUtils.cpp
1256 ↗(On Diff #258347)

Right, very good.
Better place the assert immediately after the loop that fills DelSet.

Independent fix: would be better to loop over InterleaveGroups rather than InterleaveGroupMap in order to fill DelSet; which could then be a SmallVector; w/o the need for the "Avoid releasing a Group twice" comment.

This revision is now accepted and ready to land.Apr 17 2020, 11:46 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.Apr 18 2020, 2:44 AM
fhahn added inline comments.
llvm/lib/Analysis/VectorUtils.cpp
1256 ↗(On Diff #258347)

Independent fix: would be better to loop over InterleaveGroups rather than InterleaveGroupMap in order to fill DelSet; which could then be a SmallVector; w/o the need for the "Avoid releasing a Group twice" comment.

I don't think we actually need an extra set/vector. Instead we should be able to just use an early_inc_range, which allows removing the current element as well: D78420