The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.
Part of D110978
I find this "if" condition quite difficult to follow.
I tried to simplify it by applying logical NOT into expressions in parens
and got the following (which, if I'm not mistaken, is equivalent):
where I still don't quite follow the second part.
Besides, what makes me nervous is use of isAltShuffle() interface here which was not originally designed for gather nodes. We can gather for multiple reasons including inability to schedule a bundle. We just leave this information not cleared when generate a gather node for such bundle. But originally we only used it to emulate binary operations not present in a hardware to vectorize a set of scalars.
As I understand you are trying to pre-filter gather nodes probably trying to find reshuffles of maximum two distinct scalar instructions before doing further analysis and use isAltShuffle() as an indicator for two ops specifically. But shouldn't we clear instructions state for irrelevant cases when we build a tree
if we are not going to use these nodes for further re-analysis?
We probably could write a reason why scalars are being gathered (directly into a node or as a sidecar data). That would likely simplify many places where we deal with re-analysis to restore that information after we have already built a tree.