Page MenuHomePhabricator

[GVN] Support chain of phi nodes in exit blocks.
Changes PlannedPublic

Authored by fhahn on Jan 25 2022, 7:02 AM.

Details

Summary

This patch extends to logic added in D118143 to allow chains of phis
from the lcssa phi to the using load instruction. The intermediate phis
must be in blocks only containing the phi and an unconditional branch.
The blocks must be connected directly. This ensures that the load
instruction executes if the loop exit executes.

Such chains of phi nodes are common for rotated loops and is required to
be able to handle cases such as (https://clang.godbolt.org/z/6czGzzqbs)

#include <vector>
#include <algorithm>

int foo(const std::vector<int> &V) {
   return *std::min_element(V.begin(), V.end());
}

Depends on D118143.

Diff Detail

Event Timeline

fhahn created this revision.Jan 25 2022, 7:02 AM
fhahn requested review of this revision.Jan 25 2022, 7:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 7:02 AM
fhahn updated this revision to Diff 402918.Jan 25 2022, 8:11 AM

add missing ;

fhahn updated this revision to Diff 404161.Jan 28 2022, 2:11 PM

A major rewrite/update following @reames' suggestion in D118143 as extension to the available value analysis. This patch now tries to find an existing value-select that can be used instead of loading from a pointer-select. The first version is intentionally kept quite restrictive and can be extended through follow-ups.

Together with D118143, this enables vectorization for common code like std::min_element, yielding substantial speedups for code that uses it for non-tiny numbers of elements.

Really not a fan here.

I think what we need to do is recognize the address clobber being from a select, and then perform both sides of the memory dependency "as if" we had two loads at the point of the select. Doing that would let this work generically.

I'd mentioned this scheme in the prior patch, but I was sorta okay with the special casing there. I'm really not on board with the extension made here.

Note that you don't need to go all the way to the fully general scheme. You could fork the analysis only once and handle only the case where there's a single select on the address. You do need to be consistent for both within block and cross block, but that's a different issue.

As an aside, I think this whole problem gets much easier if we can use the MSSA interfaces. I know there's someone trying to switch legacy GVN to MSSA. It might be worth trying to drive that through, and then revisiting.

fhahn planned changes to this revision.Feb 1 2022, 1:47 PM

Really not a fan here.

I think what we need to do is recognize the address clobber being from a select, and then perform both sides of the memory dependency "as if" we had two loads at the point of the select. Doing that would let this work generically.

That's a fair point, this patch is indeed very specific! Let me take a look at what some kind of 'fork-analysis' would take.