This patch fixes PR34965. I have some concerns, though, about the current solution. I'm attaching the information I posted in Bugzilla. You can skip the Investigation section to have a quick understanding of what's going on, what the current patch does, my concerns and alternative solutions.
Thanks,
Diego
Symptom
Return value of Legal->isConsecutivePtr(%11) during LV code gen has changed from 1 to 0 as a side effect of VPlan patch (revision r311077). This value 0 (meaning non-consecutive pointer) during code gen is inconsistent with value 1 (meaning stride-1 pointer) returned during legal/cost model.
For that reason, store using %11 is considered a stride-1 vector store during legal/cost model but a scatter during code gen, resulting in an assert.
Investigation
createVectorizedLoopSkeleton in LV code gen generates the vector loop, turns original scalar loop into the scalar remainder, introduces the middle and scalar.ph basic blocks after the vector loop and generates new phi nodes that feed the scalar remainder (the IR after createVectorizedLoopSkeleton is attached (after_createVectorizedLoopSkeleton.ll). After this IR modification, Legal->isConsecutivePtr() no longer returns 1 for the problematic pointer, now in the scalar remainder.
Legal->isConsecutivePtr() indirectly uses ScalarEvolution (SE) to determine the pointer stride. For this particular test, Legal->isUniform() also triggers the invalidation of some SCEVs in SE cache, including the one for %11. This bug is exposed by VPlan patch because it changes the execution order of Legal->isUniform() and Legal->isConsecutivePtr() happening before LV code gen. This is the invocation order:
- Before VPlan patch: Legal->isUniform(…) Invalidates SCEV for %11. Legal->isConsecutivePtr(%11) Computes new SCEV for %11. It returns 1. createVectorizedLoopSkeleton() Modifies IR (middle, scalar.ph and new phis) Legal->isConsecutivePtr(%11) Uses cached SCEV for %11. It returns 1.
- After VPlan patch: Legal->isConsecutivePtr(%11) Uses cached SCEV for %11. It returns 1. Legal->isUniform(…) Invalidates SCEV for %11. createVectorizedLoopSkeleton() Modifies IR (middle, scalar.ph and new phis) Legal->isConsecutivePtr(%11) Computes new SCEV for %11 using modified IR. It returns 0!
For this test, recomputing Legal->isConsecutivePtr(%11) after createVectorizedLoopSkeleton yields a different output because SE is not able to detect a recurrence. These are the SCEVs computed for the problematic pointer before and after createVectorizedLoopSkeleton:
- Before (%11 in original IR): ({(8 + (4 * (zext i32 %.ph2 to i64)) + undef),+,4}<nsw><%6>). // SCEVAddRec
- After (%21 in modified IR): ((4 * (zext i32 (2 + %18) to i64))<nuw><nsw> + undef)<nsw> // No recurrence
In order to return a SCEVAddRec for the latter, SE would have to be able to represent %bc.resume.val and %bc.resume.val1 in terms of more basic components to conclude that %bc.resume.val1 is always %bc.resume.val + 1. Unfortunately, there is no way to represent a non-inductive phi operation in SCEV at this point.
Root Cause
Current LV code gen is based on the fragile assumption that analyzing the scalar remainder loop will yield the same information as analyzing the input scalar loop. Unfortunately, this test demonstrates that this is not always the case so we cannot blindly rely on that assumption.
Tentative Fix
Pointer strides are cached in Legal during the analysis of the input scalar loop (following the same approach as for uniform and scalar values). Legal->isConsecutivePtr now uses cached pointer strides instead of blindly recomputing the same information every time. Pointer strides are recomputed when runtime checks are added as a requirement for vectorization (PSE.addPredicate in LV).
Concern: LAI-> replaceSymbolicStrideSCEV may also add predicates to PSE. This routine is called when pointer strides are collected but is also called from other analysis-like routines in LAI (hasComputableBounds, isNoWrap, getPTrStride, etc.). If a new predicate that changes pointer strides is added after collecting them, those changes won’t be taken into account when using Legal->isConsecutivePtr. This scenario could be unlikely given that LAI-> replaceSymbolicStrideSCEV is also used when collecting pointer strides but we need to be sure. I would need some feedback from someone with experience in PSE/LAI.
Alternative/Complementary Solutions
- If Tentative Fix is not enough to deal with unexpected PSE->AddPredicate from LAI, we could revert changes in Legal->isConsecutivePtr() and collect Legal->isConsecutivePtr() output before going into LV code gen.
- Extend SE so that a SCEVAddRec can be generated for this test.