This patch teaches loop vectorizer to vectorize phi nodes that have the
following characteristics:
- header phis that are not identified as induction/reduction/first order
recurrences.
- feeds in from a phi in the latch block and that phi can be if-converted
- unused outside the loop.
Condition #3 will be avoided in a follow on change.
This is to teach the vectorizer about general recurrences and cross iteration
dependencies.
The key point here is that if the header phi feeds from an if-convertable phi,
then the header phi can be vectorized. The 'resume' value extracted for this
header phi in the scalar post loop is the last element of the vectorized phi in
the latch block.
Please see added test cases. A current TODO item is we do not want to vectorize
'dead' loops, i.e. a read only loop whose values computed in the loop have no
outside users.
Please note: I will add more test cases, this is a proof of concept patch just
to make sure I've not missed some obvious legality constraint.
Looks like one thing being asked here is: why bail out if isa<PHINode>(Previous)? @mssimpso answers that in r265983, in response to PR27246. There, Previous was a header PHINode, (specifically an Induction), which effectively means the original header PHI is a 2nd (or greater) order recurrence. It may be interesting to see if that case could be vectorized correctly.
In any case, the motivation for this patch involves Previous which isa PHINode, but not of the header block. Such Previous's get vectorized into blends, so presumably this may be safe and helpful:
Note however that for 1st order recurrence to work, Previous (including if it's a PHI of a non-header block) must dominate all users of the original header PHI. Or be made to dominate them by Sinking the users After Previous. In particular, Previous cannot be data-dependent on the original header PHI, which will close a dependence cycle.
In the tests below, %phiuseout2 which is the candidate Previous of %hdrphi1, also depends on it (e.g., via %tmp50), disqualifying it from being a 1st order recurrence. Every such dependence cycle must be broken, if possible, or rather expanded into a cycle of distance VF, as done for Inductions and Reductions, in order for vectorization to succeed. Efforts to handle more cyclic dependencies include D49168 and D22341. Not sure I follow how the tests below with their cyclic dependencies are expected to be vectorized, nor the "over vectorized" argument - perhaps related to D50480(?).