Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.
Fixes PR44286 (and another PR or two).
Differential D84951
[LV] Try to sink users recursively for first-order recurrences. fhahn on Jul 30 2020, 9:20 AM. Authored by
Details Update isFirstOrderRecurrence to explore all uses of a recurrence phi Fixes PR44286 (and another PR or two).
Diff Detail
Event TimelineComment Actions I think the unidentified PHI node in the example above is a pointer induction, not a first order recurrence? Looks like the actual problem is that LV only sees the inner loop fully unrolled and the SLP vectorizer does not vectorize the body of the unrolled loop. If built with -fno-unroll-loops, LV will vectorize the inner loop, but it would probably be better if the SLP vectorizer can just vectorize the unrolled body.
Comment Actions Rebased, updated comment, added additional tests. It does not really impact the number of vectorized loops in SPEC2000/SPEC2006/MultiSource. But I think it should have a strictly positive impact (module exposing some existing cost model limitations due to vectorizing more complex loops). Comment Actions Sorry, I meant to do a more thorough review before approving. I had a closer look today and found some nits and a couple of questions. Please see inline comments.
Comment Actions How about making SinkAfter a MapVector instead of DenseMap (to have its iteration order match insertion order) and insert interdependent sink candidates in the desired order? Comment Actions Rebased, update to also avoid sinking instructions that may read from memory, as we cannot move them over ones with side-effects, move sort to isFirstOrderRecurrence. I'm not sure if it is possible to maintain the order by dominance in the map as we go along, because we an instruction could be used by multiple others and we may visit them in reverse dominance order (relative to each other), so we would miss the fact we need to 'fix' the order of the first instruction. I think it is also difficult & expensive to change the order of multiple entries in the map vector. I added a few more interesting scenarios as tests in 68d52f0dbe2e and c62f984814c4. I *think* however it should be enough to sort only sort the instructions to sink per FOR phi, as we currently allow sinking for each instruction only once. That should certainly be better, as we have less to sort overall and everything is contained in IVDescriptor.cpp. The new code already maintains a vector of instructions to sink, so the sort fits in quite nicely. WDYT? Comment Actions The instructions to sink reside inside the header block in the desired order. It should be possible to traverse them, once, from the FOR phi to the terminator or Previous, w/o sorting nor changing order, by maintaining the set of phi's (direct and indirect) users. When encountering such a user during the traversal, check if it is sunkable, append it to be moved after the last sunken user (or after Previous if this is the first user being sunk), and add its users to the set - those that belong to the header block, others need to be dominated by (and distinct from) Previous. Comment Actions IIUC your suggestion would require to iterate over the instructions in the header block compared to just traversing the def-use chains from the PHI we are analyzing as we do at the moment? That would work, as we would visit the candidates in the correct order. But requiring to potentially iterate over all instructions in the header block might be worse in practice than sorting the (probably few) instructions that require sinking? It will be indeed required once we want to sink memory operations, but until then it seems to me that just traversing the def-use chains + sorting would probably be less work in general for the cases we currently support (also given the interface that requires us to analyze each FOR separately). Please let me know if I understood correctly. Happy to update the patch as suggested if you think it's beneficial on its own. Comment Actions Happy to go with your sorting InstrsToSink approach, added some comments to it below. Right, this order-preserving alternative requires iterating over the instructions in the header block from the FOR phi until its last header-block-user is encountered. Agree that this would probably take longer than sorting InstrsToSink, e.g., if there's a user close to the end of the header. Mainly pointed out as an alternative, towards supporting sinking memory instructions (supporting multiple FOR phi's would be possible in a single scan, but seems an overkill), and as a response to
;-)
Comment Actions Address latest comments: Use ordered set for instructions to sink, entries in SinkAfter are chained together now as well, instead of all sinking after the same 'previous' instruction. I hope I did not miss anything.
Comment Actions Address latest comments, thanks!
Comment Actions This looks good to me, thanks!
|
update comment